Wget Pattern Problem

2

I am having a problem with wget and accept patterns.

What I want to do is only download files which match the pattern

\*/images/src/test\*.jpg.

I am using the command

wget -r -A "\*/images/src/test\*.jpg" domain.com

For some reason the pattern will not work with slashes in it.

\*test\*.jpg works great, but as soon as there's a forward slash in there it fails! I know wget uses the shell's pattern matching, but slashes should work, somehow they do not though.

Any ideas?

theduke

Posted 2011-06-19T18:03:52.340

Reputation: 169

Answers

2

I believe the accept/reject patterns specified with the -A/-R switches are only matched against the filename portion of the URL, in other words the part after the last slash. The info documentation describes it as follows:

Finally, it's worth noting that the accept/reject lists are matched
twice against downloaded files: once against the URL's filename
portion, to determine if the file should be downloaded in the first
place; then, after it has been accepted and successfully downloaded,
the local file's name is also checked against the accept/reject lists
to see if it should be removed.

There are separate switches (-I/-X) which specify patterns to match against the directory part of the URL but as far as I can see there is nothing which matches against the whole path including both the directory and the filename.

TomH

Posted 2011-06-19T18:03:52.340

Reputation: 2 558