Would the performance of this `grep` or `zgrep` command benefit from more memory, or from a faster CPU?

0

I have the following commands:

time grep -F -f 'in2.txt' test.fastq
time zgrep -F -f 'in2.txt' test.fastq.gz

There are about 30 search terms on files with ~5 GB. However I notice that on one computer it takes over 3-5x time to finish searching, this is on an Amazon spinup. Thus I'm wondering what is impacting the speed? Should I spin up an ECS that has more memory or better CPU speed?

ahdee

Posted 2018-03-13T03:32:59.730

Reputation: 1

2An Amazon ecs could be running on any physical hardware, right? You might not have any guarantee of what it's really using, regardless of what it reports... but anyway zgrep searches compressed files, grep doesn't, so they're very different. – Xen2050 – 2018-03-13T04:20:36.230

Xen2050, you're right about grep and zgrep being distinct in performance profile. Most notably, you should find that if you are I/O constrained, but not CPU constrained, operating on well-compressed files should help by reducing the time required to pull data from media. – Slartibartfast – 2018-03-18T16:12:40.203

Answers

1

CPU and I/O. If you are searching for a small (30 is quite small) set of terms, you are most likely to be I/O bound, and conceivably going to be CPU bound. You will not be memory bound.

[IMHO]

The right answer, of course, is to test it. You can do this a few ways, including having two terminals open and running 'dstat' while you run the command in question. If it takes a couple of seconds to complete, you should get an idea which resources are maxed out (to 100% or to some steady-state value), and which are not.

Slartibartfast

Posted 2018-03-13T03:32:59.730

Reputation: 6 899

I haven't reviewed the grep source code, but I see no reason why grep would benefit from more memory in this case. Unless the search string is exceedingly long, grep would likely work with small buffers (which I guess would be memory mapped). – Edward – 2018-03-14T08:02:41.600