1
2
I am trying to download a list of URLs with cURL with multiple URLs per process. The following works but it returns a weird result if the xargs -L
parameter is more than 1. I want to launch 8 processes with each getting 4 urls so I don't spawn too many.
curl url1...url4
cat urls.txt | xargs -n 1 -L 4 -P 8 curl -I -s -o /dev/null -w "%{http_code} %{url_effective}\n"
The result is quite chaotic.
503 http://somewebsite.txt
404 http://somewebsite.txt
503 http://somewebsite.txt
404 http://somewebsite.txt
HTTP/1.1 404 Not Found
Server: nginx
Date: Thu, 24 Nov 2016 10:11:36 GMT
Content-Type: text/html; charset=UTF-8
Connection: keep-alive
Access-Control-Allow-Origin: *
404 http://somewebsite.txt
HTTP/1.1 404 Not Found
Server: nginx
Date: Thu, 24 Nov 2016 10:11:36 GMT
Content-Type: text/html; charset=UTF-8
Connection: keep-alive
Access-Control-Allow-Origin: *
404 http://somewebsite.txt
Does -n4 concatenate lines? To "url 1 url 2 url 3 url 4" – Testr – 2016-11-24T10:22:11.403
This happens with any server btw. – Testr – 2016-11-24T10:27:33.983
Yes, "-n 4" concatenate lines. This is what you want if you want to decrease spawn. – Setop – 2016-11-24T12:47:29.427
This works for me against "https://en.wikipedia.org/wiki/" server. Articles are downloaded and connections are reused.
– Setop – 2016-11-24T12:48:27.153