wget recursively with -np option still ascends to parent directory

4

2

tl;dr: will `wget --no-parrent -r ' download from a directory above the given url's directory?

when using wget to download, say images, recursively from example.com/a/b with the -r and -np options, will a picture that is under example.com/a/c/ be downloaded when example.com/a/b/ delivers a html-file containing a link to the picture? if so, how do i get all pictures, that are in a folder and it's subfolders and only those? the description of the option --no-parent says "Do not ever ascend to the parent directory when retrieving recursively". anyway directory browsing delivers a link to the parent directory, which wget will follow, despite mentioned option. now what did i miss?

edit: using GNU Wget 1.12

vectra

Posted 2011-03-04T01:40:56.087

Reputation: 51

Answers

5

I just ran some tests with WGET 1.10.2 for Windows and it worked as expected.

Make sure to add a trailing slash to the directory to indicate for example, that b is a sub-directory of a and not a file in it:

> wget … hxxp://example.com/a/b/

If that still doesn’t work, try specifying some of recursion exclusion options:

  • --reject=htm,html
  • --ignore-tags=a
  • --exclude-directories=http://example.com/a/c/

Synetech

Posted 2011-03-04T01:40:56.087

Reputation: 63 242

I was just having this problem (ascending to parent even with -np). I did a quick search, landed here, saw my old answer, added the trailing slash, and bam! it worked. ☺ – Synetech – 2014-12-13T02:36:43.280

Adding add a trailing slash didnt do, --reject=htm,html neither and there are too many subdirs to exclude manually. anyway im not looking for a work around, i wonder what -np is good for or if its broken. – vectra – 2011-03-04T15:38:39.133

But it’s not, I tried it and it worked. Perhaps it’s your copy; where is it from? Linux? A Windows port? What version is it?… – Synetech – 2011-03-04T18:55:48.303

it is GNU Wget 1.12 on ubuntu – vectra – 2011-03-04T23:05:48.223

i tried it on another site, too, and it worked, curiously enough. – vectra – 2011-03-04T23:42:14.190

Then it’s probably something about that one site or its contents. When I did the test, I specifically created a directory structure matching what you described, put some image files in them, set Indexes on, and put an HTML file in one of the directories (a/b) containing a hyperlink to an image in the other (a/c/img.png). It worked fine: a/b/* got downloaded, but a/c/ was not even created. – Synetech – 2011-03-04T23:45:34.817