Batch download of URLs from command line multithreaded

3

I have 100,000 URLs of small files to download. Would like to use 10 threads and pipelining is a must. I concatenate the result to one file. Current approach is:

cat URLS | xargs -P5 -- curl >> OUTPUT

Is there a better option that will show progress of the whole operation? Must work from the command line.

William Entriken

Posted 2013-08-16T12:42:55.760

Reputation: 2 014

"Would like to use 10 threads and pipelining is a must. I concatenate the result to one file." So the order doesn't matter? – Bobby – 2013-08-16T13:21:41.627

1

Use GNU parallel, it will even keep the order of the output. If you tag your question accordingly, you might be lucky and the author might chime in ;-)

– Adrian Frühwirth – 2013-08-16T14:37:08.193

Order is not an issue. Tagged for gnu-parallel good idea. Is it possible to use parallel and still get the pipelining in curl? – William Entriken – 2013-08-16T15:45:45.280

Don't you get the files intermingled when you do that? Unless your webserver is single-threaded, I don't see how you would avoid having two processes writing simultaneously to your output file. – rici – 2013-08-16T16:30:17.983

Mangling, jumbling are all not a problem for me. – William Entriken – 2013-08-16T20:18:51.540

Answers

3

cat URLS | parallel -k -P10 curl >> OUTPUT

or if progress is more important:

cat URLS | parallel -k -P10 --eta curl >> OUTPUT

or:

cat URLS | parallel -k -P10 --progress curl >> OUTPUT

The 10 seconds installation will try do to a full installation; if that fails, a personal installation; if that fails, a minimal installation.

wget -O - pi.dk/3 | sh

Watch the intro video for a quick introduction: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Ole Tange

Posted 2013-08-16T12:42:55.760

Reputation: 3 034

I had tried this installer wget -O - pi.dk/3 | sh but seem to have gotten some lame excuse for parallel that really does nothing: parallel [OPTIONS] command -- arguments / for each argument, run command with argument, in parallel – William Entriken – 2013-08-18T14:55:06.963

1Ah, I had to uninstall moreutils first. apt-get remove moreutils – William Entriken – 2013-08-18T15:01:35.137