Xargs and Wget stops working after an hour

8

2

Running script with Cygwin on Windows XP with Dual Core and 4GB Ram

cat url_list.txt | xargs -P50 wget -i

I am trying to trawl through 4GB of URL to download (approx 43 Million)

Works okay for about the first hour, then the Bash shell and downloads stop even though its only 2% through the URL list.

Any ideas at what could be wrong?

What is the best way to debug why this is stoping after an hour?

Jake

Posted 2011-05-28T09:39:41.767

Reputation: 81

Answers

2

It's possible wget is taking time to download some of the files. Are there any wget/xargs processes in memory during the period that it appears to be hung? If so, is it the full 50 processes as you allocated with the -P50 flag to xargs, or has it somehow creeped up over that number or less than that number and no new instances are being spawned properly? Although it's being run under cygwin, take a look at the process list in windows itself, as each wget download should launch an instance in the task manager.

Matrix Mole

Posted 2011-05-28T09:39:41.767

Reputation: 3 303

0

I assume the URLs are for different sites. In that case you may hit sites that are slow to respond and which will hang one of your wgets. Since you have 50 running, you will have to hit 50 of those sites before nothing happens.

To see if this is the case try to kill one of the hanging wgets and see if that one is then unstuck.

To skip URLs that hang you can give wget a timeout:

wget -T 60

Ole Tange

Posted 2011-05-28T09:39:41.767

Reputation: 3 034