How to launch multiple urls per process with cURL?

1

2

I am trying to download a list of URLs with cURL with multiple URLs per process. The following works but it returns a weird result if the xargs -L parameter is more than 1. I want to launch 8 processes with each getting 4 urls so I don't spawn too many.

curl url1...url4

cat urls.txt | xargs -n 1 -L 4 -P 8 curl -I -s -o /dev/null -w "%{http_code} %{url_effective}\n" 

The result is quite chaotic.

503 http://somewebsite.txt
404 http://somewebsite.txt
503 http://somewebsite.txt
404 http://somewebsite.txt
HTTP/1.1 404 Not Found
Server: nginx
Date: Thu, 24 Nov 2016 10:11:36 GMT
Content-Type: text/html; charset=UTF-8
Connection: keep-alive
Access-Control-Allow-Origin: *

404 http://somewebsite.txt
HTTP/1.1 404 Not Found
Server: nginx
Date: Thu, 24 Nov 2016 10:11:36 GMT
Content-Type: text/html; charset=UTF-8
Connection: keep-alive
Access-Control-Allow-Origin: *

404 http://somewebsite.txt

Testr

Posted 2016-11-24T10:10:04.043

Reputation: 21

Answers

0

It does not look like a client side issue. It looks like a server issue, like a jsp which does not compile.

You can investigate with curl -vvv to have more information.

Plus, when issue solved, you likely want to use xargs -n 4 instead of "-n 1 -L 4".

Setop

Posted 2016-11-24T10:10:04.043

Reputation: 151

Does -n4 concatenate lines? To "url 1 url 2 url 3 url 4" – Testr – 2016-11-24T10:22:11.403

This happens with any server btw. – Testr – 2016-11-24T10:27:33.983

Yes, "-n 4" concatenate lines. This is what you want if you want to decrease spawn. – Setop – 2016-11-24T12:47:29.427

This works for me against "https://en.wikipedia.org/wiki/" server. Articles are downloaded and connections are reused.

– Setop – 2016-11-24T12:48:27.153