Using an index to make grep faster?

10

0

I find myself grepping the same codebase over and over. While it works great, each command takes about 10 seconds, so I am thinking about ways to make it faster.

So can grep use some sort of index? I understand an index probably won't help for complicated regexps, but I use mostly very simple patters. Does an indexer exist for this case?

EDIT: I know about ctags and the like, but I would like to do full-text search.

Peltier

Posted 2011-08-09T09:35:42.347

Reputation: 4 834

Are you using recursive oprtion for grep or some find/xargs like way? – Michał Šrajer – 2011-08-09T11:12:47.873

@Michał : yes, -R – Peltier – 2011-08-09T12:21:59.493

Answers

4

what about cscope, does this match your shoes?

Allows searching code for:

  • all references to a symbol
  • global definitions
  • functions called by a function
  • functions calling a function
  • text string
  • regular expression pattern
  • a file
  • files including a file

akira

Posted 2011-08-09T09:35:42.347

Reputation: 52 754

It looks like it just works well for C, maybe C++ and Java – neves – 2017-08-10T15:01:43.023

That could be what I'm looking for, I'll take a look. Thanks! – Peltier – 2011-08-09T15:22:37.413

4

Full-text indexing

There are tools such as recoll, swish-e and sphinx but you'd have to check if they can support the sort of search criteria you need.

Recoll

Recoll is a personal full text search tool for Unix/Linux.

Swish-e

Swish-e is a fast, flexible, and free open source system for indexing collections of Web pages or other files.

Sphinx

Sphinx lets you either batch index and search data stored in an SQL database, NoSQL storage, or just files quickly and easily

grep

I'm surprised grep is as slow as you describe, can you reduce the number of files being searched? For example when I only need to search the source files for one executable (out of many in a project) I feed grep the names from a command that lists the source files for that program:

grep expression `sources myprogram`

sources is a program specific to my development environment but you may have (or be able to construct) something equivalent.

I'm assuming you've tried obvious techniques such as

find /foo/myproject -name "*.c" -exec fgrep -l searchtext

I've read a suggestion that the -P option of current grep can speed up searches significantly.

RedGrittyBrick

Posted 2011-08-09T09:35:42.347

Reputation: 70 632

1AFAIK locate is only for filenames. recoll would work, but I would prefer a command-line tool. The code base is pretty big, and since I'm looking for a string, I don't know where it is, so it's hard to limit the number of files to be searched :) – Peltier – 2011-08-09T13:41:33.157

I think swish-e is command-line. I haven't tried any (grep is fast enough on my projects) – RedGrittyBrick – 2011-08-09T14:32:07.257

3

grep, no. But there are several programs which use indexes and aimed at code base. ctags (there is a version provided with vim), etags (aimed for use with emacs), global (more independent of the editor) are the one I'm thinking about now but there are probably other.

AProgrammer

Posted 2011-08-09T09:35:42.347

Reputation: 153

I use ctags, but isn't that limited to searching function names? I want to do full-text search. – Peltier – 2011-08-09T09:46:22.490

I'm pretty sure that ctag can also search for class definition and ISTR that it also find some use. I'm certain that global does both. But it is true that those tools don't do a full-text search and are using language knowledge to limit their scope. – AProgrammer – 2011-08-09T09:50:42.423

3

You could copy your codebase on a RAM disk.

jfg956

Posted 2011-08-09T09:35:42.347

Reputation: 1 021

2

if you want to use a fulltext search engine .. use one:

akira

Posted 2011-08-09T09:35:42.347

Reputation: 52 754

That's always an option, but I was wondering if a more lightweight, quick and dirty grep speedup option would exist. – Peltier – 2011-08-09T14:29:08.290

'more lightweight' but 'want to have my stuff fully indexed' are a bit of 2 extremes :) ctags is the best match for what you want, if you just want to go quick an dirty. with everything else you end up using a real fulltext-search-engine. eg, 'recoll' mentioned in @RedGrittyBrick answer is using xapian as the backend. – akira – 2011-08-09T14:52:45.517

1They're not necessarily incompatible. Imagine if ctags had a --full-text option, for instance, and grep a --tag-file option. Of course the fact that it could exist doesn't mean that it does :) – Peltier – 2011-08-09T15:05:46.577

-1

No, I don't think so. But there may be a simple solution: Try ack. I think if you give it a chance, you'll find it significantly faster than grep, requires shorter search strings to get better search results, and has many desirable features, while using much the same command switches. One thing that makes it faster (although not indexed) is that it ignores a lot more of the stuff that you don't want to search. It's written in Perl and uses Perl's regular expressions (and therefore also has Mac and Windows ports, too).

http://betterthangrep.com/

Mike from Shreveport

Posted 2011-08-09T09:35:42.347

Reputation: 1

Ack is pretty cool. But I really doubt it's any faster than grep, since it is based on the same mechanisms. – Peltier – 2012-12-07T16:16:40.687