ag (the_silver_searcher) not searching entire file - does it have an implicit maximum input size?

3

I have a very large plain text file (multiple gigabytes in size) which I need to search for certain strings. When using grep, I get over 11,000 matches of a string but with ag I get roughly 1,500. The output of the two commands is the same up to the point where ag stops.

I am aware of the -m option in ag defining the maximum number of matches but this defaults to 10,000 and so is not the issue.

To illustrate this, here is an example of what's happening:

$ grep -i 'string' hugefile.txt | wc -l
    11000
$ ag -i 'string' hugefile.txt | wc -l
    1500

The output of the two commands is identical to the same number of matches:

$ grep -m 1500 -i 'string' hugefile.txt > grep_output.txt
$ ag --no-numbers -i 'string' hugefile.txt > ag_output.txt

$ diff grep_output.txt ag_output.txt

(files identical)

Does ag have an implicit maximum input size and if so, is it possible to alter this?

r-gr

Posted 2013-11-02T19:30:21.833

Reputation: 31

Same here. Except my file has only 1.5 GB. With Grep it is slower, but complete :/ – Mailo Světel – 2014-11-19T09:28:23.607

Answers

0

Is it possible that you had multiple (in average 6 to 7) hits per line?

If so, the above method to count them is wrong: It only counts lines with matches, but not matches. So if you reach the limit of 10000 matches already after 1500 lines with matches, you'll get the above result and it would be correct.

Additionally the semantics of grep's and ag's -m option seem to differ:

  • For grep it's the maximum count of lines being read: Stop reading a file after NUM matching lines.
  • But for ag it's the maximum count of matches: Skip the rest of a file after NUM matches.

So I suspect you've hit the default maximum limit of matches and need to increase the value passed to the -m option of ag.

Axel Beckert

Posted 2013-11-02T19:30:21.833

Reputation: 514