5
I have a long list of urls on my own website listed in a carriage return seperated text file. So for instance:
- http:/www.mysite.com/url1.html
- http:/www.mysite.com/url2.html
- http:/www.mysite.com/url3.html
I need to spawn a number of parallel wgets to hit each URL twice, check for and retrieve a particular header and then save the results in an array which I want to output in a nice report.
I have part of what I want by using the following xargs command:
xargs -x -P 20 -n 1 wget --server-response -q -O - --delete-after<./urls.txt 2>&1 | grep Caching
The question is how do I run this command twice and store the following:
- The URL hit
- The 1st result of the grep against the Caching header
- The 2nd result of the grep against the Caching header
So the output should look something like:
=====================================================
http:/www.mysite.com/url1.html
=====================================================
First Hit: Caching: MISS
Second Hit: Caching: HIT
=====================================================
http:/www.mysite.com/url2.html
=====================================================
First Hit: Caching: MISS
Second Hit: Caching: HIT
And so forth.
Order that the URLS appear isn't necessarily a concern as long as the header(s) are associated with the URL.
Because of the number of URLs I need to hit multiple URLs in parallel not serially otherwise it will take way too long.
The trick is how do I get multiple parallel wgets AND store the results in a meaningful way. I'm not married to using an array if there is a more logical way of doing this (maybe writing to a log file?)
Do any bash gurus have any suggestions for how I might proceed?
Are your entries really separated by carriage returns (
\r
), not new lines (\n
) or windows style(\r\n)
? Is this a file from an old Mac? – terdon – 2013-06-10T16:53:45.3401You may want to experiment with gnu parallel. in particular the manpage mentions "GNU parallel makes sure output from the commands is the same output as you would get had you run the commands sequentially." – kampu – 2013-06-11T04:16:38.243