Regular Expression: how to match 'ab','ac' and 'a'

3

Now, I have three strings to be matched by a single re -- 'a', 'ab','ac' I suppose one correct re should be 'a[bc]?', but it seems not correct. What is the correct one?

I try it in grep command. And it seems that there is no '?' in grep's regex. So how to do this matching in grep?

Thanks to @anubhava, I now can match all these three strings by:

grep -E 'a[bc]?' <file>

However, this expression also matches 'ad'.

In fact, I want to match all these situation: 'a','abc','ab','ac' but do not want to match 'ad' or 'ae'

Xing Shi

Posted 2013-06-17T06:03:27.143

Reputation: 131

1It matches 'ad' because it matches 'a'; did you try using the word boundaries as anubhava suggested? – None – 2013-06-17T06:26:29.390

@EarthWorm: My suggested command was grep -E "\ba[bc]?\b" file (note extra word boundaries). – anubhava – 2013-06-17T14:01:17.013

Answers

3

If you want to use above in a grep command then use extended regex support switch -E with word boundaries:

grep -E "\ba[bc]?\b" file

OR

grep -E "\<a[bc]?\>" file

anubhava

Posted 2013-06-17T06:03:27.143

Reputation: 930

3

To use this expression in basic mode, you need to escape the question mark:

grep 'a[bc]\?' file

Update

To address your latest question, I would advice using P(erl) mode:

grep -P 'a(?![de])|a[bc]|abc'

These are the alternations:

  1. Match an 'a' if not followed by 'd' or 'e' (it uses negative look-ahead)

  2. Match 'ab' or 'ac'

  3. Match 'abc'

Jack

Posted 2013-06-17T06:03:27.143

Reputation: 131

2

If you want the string to match the regex from beginning to end, you need to include in the regex the start ^ and end $ markers

^a[bc]?$

If the markers are not part of the regex 123ab456 will match the regex.

Ring Ø

Posted 2013-06-17T06:03:27.143

Reputation: 483