wget: Turn Off Forced .html Retreival

It does this because wget uses the html files to know where to scan next as it crawls through the webpage. I would just let wget do its business and then do a rm *.html after it is done, or something similar.

EDIT: Doing an rsync *dynamicfile* /foo/bar to a second directory might be a better way to filter your files to only keep the ones with the correct name (assuming that you want to keep some of the html files if they have the right name)

Jarvin

Posted 2010-04-20T17:13:26.963

Reputation: 6 712

1I'm trying to filter the file because it causes wget to get stuck in an infinite loop, so this won't work. – Mike B – 2010-04-20T18:05:12.377

Sounds like your infinite loop is the true issue your trying to deal with. This is different enough that you should probably just post a new question instead asking about preventing infinite loops with wget. – Jarvin – 2010-04-20T18:36:23.607

You should add a depth limit to wget. This will make sure it isn't an infinite loop. – Jarvin – 2010-04-20T18:42:26.697

wget: Turn Off Forced .html Retreival

Answers