1
I have about 3,000 files that are each 300MB, and I'd like to search them for a series of substrings as quickly as possible with my 16 core server.
This is what I tried but it doesnt seem to parallelize searching the files.
sudo find /mnt2/preprocessed/preprocessed/mo* | sudo xargs awk '/substring/ {c++} END {print c}' | paste -sd+ | bc
It's pasted together from different how-to's, I don't fully understand it. Do you have any suggestions for how I can split up the file processing?
1You're likely I/O, not CPU-bound. – Nicole Hamilton – 2013-02-26T06:14:47.830
It's a high-I/O instance (hi1.4xlarge ec2), but you're probably right. I still want to know how to use GNU parallel in this context but haven't been able to get it to work. – kelorek – 2013-02-26T06:45:55.960