2
2
Whilst creating a site mirror using wget 1.12 on Ubuntu links with a rel
attribute set are not downloaded:
<a href="link" rel="tag">text</a>
Rel="tag" is a microformat (By adding rel="tag"
to a hyperlink, a page indicates that the destination of that hyperlink is an author-designated "tag" (or keyword/subject) for the current page).
My WordPress theme uses this for link to tags, so 99% of the site is ignored.
Edit: it turns out all my permalinks use rel="bookmark"
and are skipped as well.
I'm using the following wget command (this ignores robots.txt and also follows nofollow links):
wget -mkp -e robots=off http://site
How do I make wget follow links with rel
set?
did you try it with
--follow-tags=rel
already? – JohannesM – 2012-03-23T10:16:37.977@JohannesM Manual says: "If a user wants only a subset of those tags to be considered, however, he or she should be specify such tags in a comma-separated list with this option. " your answer would only follow rel tags, which don't exist on the page. --follow-tags does not add to the internal list of tags/attributes to follow but replaces it. And no --ignore-tags= doesn't work either.. – svandragt – 2012-03-23T10:20:14.080