Grep regex result not as expected?

2

Using FreeBSD 11.1:

#!/bin/sh

if printf 'abcde.fgh' | grep -iEq '^[^][$^*_-]'; then
    echo "test 1 success"
else
    echo "test 1 fail"
fi

echo

if printf 'abcde.fgh' | grep -iEq '^[^][.$^*_-]'; then
    echo "test 2 success"
else
    echo "test 2 fail"
fi

Output:

test 1 success

grep: Unmatched [ or [^
test 2 fail

But AFAICT these should give the same result. They both contain a condition on the first character (only), that it isn't one of a list of specified non-alphabetic characters. Breakdown of the regex:

  • ^ = start of string
  • [^...] = match if none of these characters
  • Within the list, ] must be the first character, ^ must not be the first, and - must be the last. So ][.^$_- is a valid list of literal characters and the string mustn't match any of them.
  • To avoid confusion note that this means the ][ are literal "]" and "[" chars in the list, not a close-and-reopen of 2 lists.

The only difference between the 2 expressions is the "." but it's inside a list, so it should be treated as not literal . and indeed the first char doesn't match literal "."

What am I missing? Something very obvious and simple, probably?

Stilez

Posted 2018-08-07T01:06:00.517

Reputation: 1 183

Answers

3

You are missing a few other syntax rules. Within a bracket expansion, in addition to plain ranges, there are also a few types of multi-character expressions which start with a [. (See the regex(7) manual for Linux or FreeBSD at "With the exception of these and some combinations using '[' (see next paragraphs)".) These are:

  • Collating elements: [..]
  • Equivalence classes: [==]
  • Character classes: [::]

(You might have seen or used such expressions as [[:digit:]] – these are actually a character class [:digit:] that happens to be the lone element of a […] bracket expansion.)

So in your case, since the . happens to be immediately after a [, they are recognized as the opening delimiter of a collating element. GNU grep 3.1 has the correct error message:

$ printf 'abcde.fgh' | grep -iEq '^[^][.$^*_-]'
grep: Unmatched [, [^, [:, [., or [=

The same expressions can be used to escape such situations by using e.g. [...] or [=.=] to include a regular dot, or similarly [=-=] to match a dash, if there is nowhere to move them.

user1686

Posted 2018-08-07T01:06:00.517

Reputation: 283 655

Ahhh. Makes sense. I wasn't looking out for other multicharacter class syntax that it might resemble, so that was way off my horizons. Thanks! – Stilez – 2018-08-07T06:08:31.623