Mirror website with wget but only matching url pattern

I want to mirror a website. It's built-on like this.

abc.com/A/B/1/...
abc.com/A/D/2/...
abc.com/A/R/3/...
abc.com/A/G/4/...
abc.com/A/F/5/...
abc.com/A/B/6/...

And I only want to get all links from this link:

abc.com/A/G/4/...

Is there any command in wget, to get all links only starting like this: abc.com/A/G/4/ ?

superuser

Posted 2013-05-26T23:30:16.927

Reputation: 31

possible duplicate of wget recursive limited within subdomain

– Karan – 2013-05-26T23:33:23.943

Yes, there was a good tip for me :) – superuser – 2013-05-26T23:42:06.377

Answers

Yes, the solution is -I:

  -I list
   --include-directories=list
       Specify a comma-separated list of directories you wish to follow
       when downloading.  Elements of list may contain wildcards.

E.g.,

wget http://abc.com/A/G/4/ --no-parent -I /A/G/4

xpt

Posted 2013-05-26T23:30:16.927

Reputation: 5 548

There are a couple of relevant flags:

-A acclist --accept acclist

(comma-separated glob-style pattern for filenames)

-I list
--include-directories=list

(comma-separated glob-style pattern for directories)

--accept-regex urlregex

(takes a regex for full URL)

Generally you would also pass -r to recurse, and -l inf otherwise the maximum recursion depth is 5. If you want to be able to start and stop the download, -nc "no clobber" avoids redownloading existing files. For this, -E (--adjust-extension) is also useful, which adds the .html extension to HTML pages which lack it; when the extension is present and -nc is specified, then wget will still read URLs from the on-disk copy of the file.

Here's an example to download a word-by-word translation of the Qur'an:

wget -E -nc -l inf -nd -r --no-parent 'http://corpus.quran.com/wordbyword.jsp?chapter=1&verse=1' -A '*wordbyword*'

It starts at the first verse, and since each page links to the next verses, it eventually downloads all of them. The -A option restricts us to just the pages we are interested in.

I think more examples are needed, so please feel free to suggest them and I will try to update this.

Metamorphic

Posted 2013-05-26T23:30:16.927

Reputation: 325