Grep with ERE doesn't filter lines with -v option

2

I'm trying to use the extended regex option in grep to filter out from files, lines that have the following format of string at the beginning of the line.

any-non-space-char:      *

I'd assumed that the following command was going to do the trick; however, it just printed out all the lines from the 2 files that are picked-up by the wildcard.


~/tmp > cat * | grep -v -E "^\S+:.{6}\*"
hi
test1      blah, blah, blah:      * blah, blah, blah"
test:      * blah, blah, blah:      * blah, blah, blah
sd
hi
temp:      * blah, blah, blah:      * blah, blah, blah"
temp2:     blah, blah, blah:      * blah, blah, blah
sd
~/tmp >

BTW, I alias grep to 'grep --color=auto', so the command does highlight the matching strings as per the regex correctly which are test: * on line 3 and temp: * on line 6 in the above output. Nonetheless, these matching lines get printed on the screen which I didn't expect.

The contents of the two files:


~/tmp > ls -l
total 8
-rw-rw-r-- 1 pmn ccusers 116 Dec 11 09:22 1
-rw-rw-r-- 1 pmn ccusers 116 Dec 11 09:23 2
~/tmp >

~/tmp > cat 1
hi
test1      blah, blah, blah:      * blah, blah, blah"
test:      * blah, blah, blah:      * blah, blah, blah
sd
~/tmp >

~/tmp > cat 2
hi
temp:      * blah, blah, blah:      * blah, blah, blah"
temp2:     blah, blah, blah:      * blah, blah, blah
sd
~/tmp >

BTW, the following is similar to what I expect:


~/tmp > cat * | grep -v -E ":.{6}*"
hi
sd
hi
sd
~/tmp >

Which removed the lines


test1      blah, blah, blah:      * blah, blah, blah"
test:      * blah, blah, blah:      * blah, blah, blah
temp:      * blah, blah, blah:      * blah, blah, blah"
temp2:     blah, blah, blah:      * blah, blah, blah

(it also removed lines 1 and 4 above which is not what I want - hence this grep command won't work for me).

I know how to get this to work on PERL; however, for certain reasons I can use only grep, awk or sed.

How do I get this to work?


@PsychoData

Thanks for your response. I'm afraid the command did not do the trick. Your command returned the following

~/tmp > cat * | grep -v -E "^[^\S]+:.{6}\*"  
hi  
sd  
hi  
sd  
~/tmp >

which is the same as the output returned by grep -v -E ":.{6}*" in my question, which, however, is not what I wanted. I wanted a command to bring the following output:

hi  
test1      blah, blah, blah:      * blah, blah, blah"  
sd  
hi  
temp2:     blah, blah, blah:      * blah, blah, blah  
sd

IMHO, yours removed the following lines because ^[^\S]+: does a greedy-match, matching as much of the line as possible - which as you can see is until the right-most '*' in the following lines.

test1      blah, blah, blah:      * blah, blah, blah"  
test:      * blah, blah, blah:      * blah, blah, blah  
temp:      * blah, blah, blah:      * blah, blah, blah"  
temp2:     blah, blah, blah:      * blah, blah, blah

BTW, please note that there are exactly 6 spaces between each : and * pair. I think the formatting makes this hard to notice.

pmn

Posted 2013-12-10T23:10:19.000

Reputation: 21

Answers

1

try grep -v -E "^[^\S]+:.{6}\*"

Okay. So what I am doing with this is telling it that I want every line that does not contain the following pattern, and enabling extended expressions:

match the start of a line, then [anything EXCEPT whitespace] at least once,then a colon, then 6 characters, then an asterisk

anything that does not match that pattern will be shown

PsychoData

Posted 2013-12-10T23:10:19.000

Reputation: 1 331

Thanks for your reply. The command you gave me did not do the trick. Please see my reply below for more details. – pmn – 2013-12-17T00:44:44.560

1

There is no way of doing a non-greedy match in extended regular expressions. You can, however, easily do it with PCREs:

$ grep -hvP "^[^\s]+?:\s+\*" *
hi
test1      blah, blah, blah:      * blah, blah, blah"
sd
hi
temp2:     blah, blah, blah:      * blah, blah, blah
sd

You don't need to cat the files, grep can open them directly. The -h option turns of printing of the file name (necessary when not cating) and the -P turns on PCREs. You then search for one or more non-space characters at the beginning of the line ^[^\s]+?, followed by a :, one or more spaces (\s+) and finally a * (you need to escape the * else it is treated as a quantifier).

terdon

Posted 2013-12-10T23:10:19.000

Reputation: 45 216