Regex match failing in GNU grep on Windows Server

0

Here's the head of my file:

id,date,section,identifier,action,level,user_id,week,month,seconds_since_start
9464384,334600,12,,complete,4,1124822691805,1,1,1
9464413,334626,12,,complete,4,1124822691805,1,1,1
9464430,334659,12,,complete,4,1124822691805,1,1,1
9464470,334692,12,,complete,4,1124822691805,1,1,1
9464560,334772,12,,complete,4,1124822691805,1,1,1
9464756,335003,12,,complete,4,1124822691805,1,1,-1

I am having trouble using grep and simple regular expressions from mintty (as installed with git) on Windows Server 2016

I want to know if any lines end with a minus number, so the command I was hoping would work was:

 grep '-[0-9]+$' file.csv

That finds no matches. I can simplify the regex even further, this one also finds no matches:

grep '1$' file.csv 

I have tried replacing single ticks with double (following here), and replacing grep with egrep, but those changes make no difference.

Am I missing something obvious?

dumbledad

Posted 2019-04-16T14:55:41.950

Reputation: 939

These PowerShell commands do work: Select-String -Path .\file.csv -Pattern '1$' & Select-String -Path .\file.csv -Pattern '-[0-9]+$' – dumbledad – 2019-04-16T15:46:32.670

Answers

0

Problem 1: Your pattern begins with a hyphen, which makes it ambiguous to grep. Is it a command line option, or is it a regexp? The -e option of grep specifically tells grep that the next option is the regexp to use. Similarly, -- tells grep that no more command-line options follow, therefore the next argument is the regexp.

I don't have access to a Windows machine, but FreeBSD's grep reports an error on your first example:

$ grep '-[0-9]+$' file.csv
grep: invalid option -- [
Usage: grep [OPTION]... PATTERN [FILE]...
Try `grep --help' for more information.

grep is seeing the -[... string as a quoted command-line option, and reports an error because grep has no -[ option.

Your second command succeeds on BSD:

$ grep '1$' file.csv
9464384,334600,12,,complete,4,1124822691805,1,1,1
9464413,334626,12,,complete,4,1124822691805,1,1,1
9464430,334659,12,,complete,4,1124822691805,1,1,1
9464470,334692,12,,complete,4,1124822691805,1,1,1
9464560,334772,12,,complete,4,1124822691805,1,1,1
9464756,335003,12,,complete,4,1124822691805,1,1,-1

For these two reasons, I would submit that your grep implementation may be broken.

Problem 2: Your regexp uses Extended regexp syntax, and grep defaults to Basic regexp syntax. The FreeBSD man page for egrep describes a key difference between Basic and Extended regexps:

In basic regular expressions the metacharacters ?, +, {, |, (, and ) lose their special meaning; instead use the backslashed versions \?, \+, \{, \|, \(, and \).

You need to use either the -E option to get grep to parse it as an extended regexp, or else escape the + with a backslash:

$ grep -- '-[0-9]+$' file.csv
$ grep -E -- '-[0-9]+$' file.csv
9464756,335003,12,,complete,4,1124822691805,1,1,-1
$ grep -- '-[0-9]\+$' file.csv
9464756,335003,12,,complete,4,1124822691805,1,1,-1

Jim L.

Posted 2019-04-16T14:55:41.950

Reputation: 669