Is this a bug in grep -P ? (I seem to be getting too many matches appearing)

0

Here is a file I have http://www.zen76171.zen.co.uk/blahsomefile1

It's a text file about 1.18MB in size

Looking at how many matching lines I get

With -P

C:\blah>grep -P "[^J]*J" blahsomefile1 | wc -l
72383

Without -P

C:\blah>grep "[^J]*J" blahsomefile1 | wc -l
51814

There shouldn't be a difference whether -P or without -P, but there is. The -P is matching too much.

With this test I should get the same figure 'cos i'm saying list every line that matches xyz and within that output list every line that matches xyz. It works without -P.

Without -P nothing funny happening.

C:\blah>grep "[^J]*J" blahsomefile1 | wc -l
51814

C:\blah>grep "[^J]*J" blahsomefile1 | grep "[^J]*J" | wc -l
51814

With -P things happening that should not be happening..

C:\blah>grep -P "[^J]*J" blahsomefile1 | wc -l
72383


C:\blah>grep -P "[^J]*J" blahsomefile1 | grep -P "[^J]*J" | wc -l
72229

If I do grep -P "[^J]*J" blahsomefile1 | more

I see it is matching things it shouldn't match, like the line that reads txxxraabcAA which contains no J.

txxxJbmmabcraabc
txxxraabcAA
txxxJxmmabcHaabc

Computer is running gnuwin32 grep

C:\blah>where grep
C:\Program Files (x86)\GnuWin32\bin\grep.exe

grep version is 2.5.4

C:\blah>"C:\Program Files (x86)\GnuWin32\bin\grep.exe" -V
GNU grep 2.5.4

Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.


C:\blah>

UPDATE

Interestingly cygwin's one is much later.. and doesn't have the bug

C:\blah\aeea2\a\a\a\a>c:\cygwin\bin\grep -P "[^J]*J" blahsomefile1 | wc -l
51814

C:\blah>c:\cygwin\bin\grep -V
/usr/bin/grep (GNU grep) 2.21
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Mike Haertel and others, see <http://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS>.

C:\blah>

gnuwin32 grep is still on 2.5.4 , a very old version.

Note- file is also available from wetransfer and downloadable from ge.tt with eg firefox.

barlop

Posted 2016-10-17T00:51:33.050

Reputation: 18 677

1

I cannot reproduce this problem on Linux using the current version of GNU grep: 2.26. You may want to see if you can update your software: version 2.5.4 dates to 2009

– John1024 – 2016-10-17T02:43:33.683

Answers

0

This may be a bug in grep 2.5.4

If you have a choice between gnuwin32 grep or cygwin grep, cygwin grep is far more up to date.

-V shows version and year, and that as of writing -

gnuwin32 grep is version 2.5.4 that's 2009. Years behind.

cygwin grep is years ahead of gnuwin32 grep. 2.21 (that's later than 2.5.4 because 2.21 is not like one arithmetic number with a decimal point) and the year of cygwin's grep is 2014.

C:\blah>c:\cygwin\bin\grep -P "[^J]*J" blahsomefile1 | wc -l
51814

C:\blah>c:\cygwin\bin\grep -P "[^J]*J" blahsomefile1 | c:\cygwin\bin\grep -P "[^J]*J" | wc -l
51814

See no error there with cygwin grep which is 2014.

This is not the first time i've run into a bug in gnuwin32 version of grep when the cygwin version of grep was much later and fine. gnuwin32 seems to be very out of date compared to alternatives.

barlop

Posted 2016-10-17T00:51:33.050

Reputation: 18 677