2
1
I have a few gigabytes of source code.
using recursive grep for a term can take a while.
I am using ext3.
Is there a faster way? Would using find be faster and if so why? Would using a filesystem like XFS give noticeably better results?
2
1
I have a few gigabytes of source code.
using recursive grep for a term can take a while.
I am using ext3.
Is there a faster way? Would using find be faster and if so why? Would using a filesystem like XFS give noticeably better results?
5
Have you tried ack? It works pretty well here, on a 1mm+ sized codebase.
ack is easier and faster than using find | grep, and I often use it, but it doesn't index the results anywhere for later use. – njd – 2010-02-09T12:49:17.390
3
You can get better performance with agrep, which uses a novel bitmasking algorithm for search.
If you're looking for symbols, ctags or etags might work well enough to build an index for search.
Ctags indexes the results, so you can search quickly from your editor. Darren Hiebert's Exuberant Ctags (ctags.sourceforge.net) has improved options for recursive searching. – njd – 2010-02-09T12:54:19.963
2
The only way you'll get a significant improvement over grep is to use an indexed search system like Strigi. The filesystem makes very little difference unless you have a huge number of very small files.
1
This should likely be on superuser.
Grepping is not the ideal solution to your problem since it performs a linear search.
Index your files for search using a desktop indexing solution such as Beagle or Google Desktop.
1
I don't think the FS is going to make a big difference; chances are it's compute bound. You could check this using top
to see if your CPUs are smoking.
You could also post your regexp here and let the smart people of SO have a crack at optimizing it. There are a variety of techniques for avoiding backtracking, etc.
fgrep
is faster because it doesn't use regular expressions, it only searches for fixed strings. It's just an alias for grep -F
. – Tim Sylvester – 2009-12-16T20:44:46.117
Right you are, thanks. I removed that part of my suggestion. – Carl Smotricz – 2009-12-16T20:50:12.427
1
If you only need to grep a subset of files then use find first. For example to only grep .h header files:
find path/to/source -name *.h -print0 | xargs -0 grep pattern
This will be faster since you're only accessing filenames most of the time, rather than file contents, which means many fewer disc accesses.
1Better: find path/to/source -name *.h -exec grep pattern {} \;
– Ewan – 2009-12-16T20:50:40.790
2Even better: find path/to/source -name *.h -exec grep pattern {} \+
(less grep
invocations) – None – 2009-12-16T22:39:22.157
1
Here is what I understand -
You can use the policy of divide-and-rule. Partition your set into multiple file-sets, run multiple greps parallely.
Not sure if your need is a one-off thing or something repetitive in nature.
1with exec it can use grep which would be faster than just using grep – None – 2009-12-16T20:46:41.563
and why would that be faster? – akira – 2009-12-16T22:57:04.150