wget recursive limited to children of URL path

10

1

I want to download the following subdomain with the recursive option using wget:

www.example.com/A/B

So if that URL has links to www.example.com/A/B/C and www.example.com/A/B/D, these two should also be downloaded.

But I don't want anything outside the www.example.com/A/B subdomain to be downloaded. For example, if www.example.com/A/B/C has a link back to www.example.com, the page www.example.com should not be downloaded.

What wget command should I use?

Paul S.

Posted 2012-10-08T19:48:49.313

Reputation: 295

Answers

9

Use the --no-parent option in wget:

--no-parent

Do not ever ascend to the parent directory when retrieving recursively. This is a useful option, since it guarantees that only the files below a certain hierarchy will be downloaded.

nneonneo

Posted 2012-10-08T19:48:49.313

Reputation: 901

Ah, that's what I'm looking for. The wget options are so numerous that I couldn't find it. :) – None – 2012-10-08T19:53:46.410

2

Try using the -I option to specify the directory to include in the download:

wget -r -I www.example.com/A/B/C,www.example.com/A/B/D

user22644

Posted 2012-10-08T19:48:49.313

Reputation: 339