wget terminates unexpectedly

2

I need to periodically traverse a site to update the server cache. It is a database driven site with very many (thousands of) pages. I use wget to mirror the site locally on the same server, so I use this command:

wget --mirror localhost

After some time it stops suddenly with this message:

HTTP request sent, awaiting response... Terminated

It happens always, but not on the exact same url. And I have tried on another server as well with the same result.

The --debug option does not provide any helpful information, nor does Apache's log file.

What could be the cause of this problem? I suspect some buffer running out of memory etc. Or a stack overflow.

Alternatively, are there other command line tools that can do the same?

It's Wget 1.11.4 on Debian Lenny.

Martin

marlar

Posted 2010-08-20T12:46:46.240

Reputation: 479

Answers

1

It's possible that the server is performing some sort of analysis of your download patterns and squashing your requests. Take a look at some of wget's options for limiting the rate of your requests -- look at --limit-rate, --wait, and --random-wait.

Doug Harris

Posted 2010-08-20T12:46:46.240

Reputation: 23 578

Thanks for the suggestions. Unfortunately none of these options helps. In the meantime I have found httrack. I will now see if that works better.... – marlar – 2010-08-20T20:53:07.683

0

I have found no ways to make wget traverse the full site without terminating prematurely, but I stumbled upon httrack which does the job perfectly.

marlar

Posted 2010-08-20T12:46:46.240

Reputation: 479