Shortest unmatchable regular expression

60

3

Your mission is to write the shortest valid regular expression that no string can match, empty string included.

Submissions must have this form ("literal notation"):

/pattern/optional-flags

Shortest regexp wins. The regexp size is counted in characters. (including slashes and flags)

Please explain how your regexp works (if it's not trivial)

Thanks, and have fun!

xem

Posted 2014-01-13T17:19:26.717

Reputation: 5 523

This inspired a question from me. I'm going to wait a few days though. Don't want 2 regex questions active at the same time – Cruncher – 2014-01-13T21:31:59.417

13

"Valid" according to which implementation? I've just found an amusing one that Perl is okay with (and that is valid according to the only RE grammar I can find, but that grep and Python's re module refuse.

– jscs – 2014-01-13T22:24:13.463

1Yes, which dialect(s) of regex? There are many many different ones. – hippietrail – 2014-01-14T04:53:22.680

1

But what about Presidents' names? http://xkcd.com/1313/

– Carl Witthoft – 2014-01-14T14:05:59.143

@CarlWitthoft You need to be a program to participate in that contest: http://codegolf.stackexchange.com/q/17718/2180

– boothby – 2014-01-14T16:18:30.293

@boothby I might be an AI :-) – Carl Witthoft – 2014-01-14T16:32:49.800

Which syntax? Perl or POSIX? – Braden Best – 2014-02-05T18:37:48.237

Answers

55

6 chars

Following on the answers of primo and Peter Taylor, and a hint from man perlre:

/(?!)/

This perl-compatible regex matches an empty string which is not followed by another empty string.

Nate Eldredge

Posted 2014-01-13T17:19:26.717

Reputation: 2 544

+1 - This is probably the shortest answer which is widely portable (along with /x\by/, but if I ever actually had to use a regex like this - for whatever reason - then this answer is also the clearest one) – Martin Ender – 2014-01-14T17:06:38.747

@m.buettner: Thanks. primo's /(*FAIL)/ is probably clearer, though. (And actually man perlre gave it away by mentioning that mine actually expands to his internally.) – Nate Eldredge – 2014-01-14T17:17:55.570

/(*FAIL)/ is not as portable though. And even in Perl, I think it's a more obscure feature than a negative lookahead. – Martin Ender – 2014-01-14T17:20:25.113

@m.buettner: Oh, I assumed both were Perl-specific. – Nate Eldredge – 2014-01-14T17:22:28.577

3You get lookarounds in almost all of the popular (Perl-inspired) flavours today, whereas I've never seen these control verbs anywhere but in Perl. – Martin Ender – 2014-01-14T17:23:35.270

1In fact, Perl documentation (and -Mre=debug) says that (?!) is optimized into (*FAIL) by Perl regex optimizer (OPFAIL according to -Mre=debug). Also, I don't think I saw (*FAIL) outside of Perl 5 (and Perl 6, where it's called <!>). – Konrad Borowski – 2014-01-14T20:07:46.703

40

8 chars

/(?=a)b/

We require a string containing a character which is both a and b, which is obviously impossible.

Peter Taylor

Posted 2014-01-13T17:19:26.717

Reputation: 41 901

19/(?!x)x/ looks even more impossible ;-) – Howard – 2014-01-13T17:58:46.863

@PeterTaylor where? – o0'. – 2014-01-14T12:09:54.723

@Lohoris, where what? – Peter Taylor – 2014-01-14T12:12:01.623

@PeterTaylor where did he put those absurd rules you talk about, I couldn't find them. – o0'. – 2014-01-14T12:13:38.833

7guys, sorry for the counting i chose, i thought it would be simpler to include slashes because of the optional flags that could come after them. – xem – 2014-01-14T17:03:29.473

33

5 chars

Unlike everybody who abuses $ and ^... this actually works in Perl:

/V\A/

\A matches the beginning of the string.

boothby

Posted 2014-01-13T17:19:26.717

Reputation: 9 038

It works with ^ too. – Tomas – 2014-02-02T14:30:32.010

30

6 chars

/x\by/

Based on Sven Hohenstein's answer.

Mr. Neutron

Posted 2014-01-13T17:19:26.717

Reputation: 371

28

8 characters

/\w\b\w/

A word boundary (\b) surrounded by 'word' characters (\w - one of [_a-zA-Z0-9]). It is unmatchable since one of the characters preceding or following a word boundary must be a non-'word' character.

By the way: this is similar to the unmatchable expression

/\W\b\W/

where \W means non-'word' character.

Sven Hohenstein

Posted 2014-01-13T17:19:26.717

Reputation: 2 464

This is 8 characters according to the rules of the competition, because the wrapping slashes / count. See OP's entry, for example. It's a great entry, though!

– jscs – 2014-01-14T01:41:40.160

It also might be a winner (or tied with Peter Taylor's entry), given the implementation-dependent problems with some of the shorter entries!

– jscs – 2014-01-14T02:03:25.203

Very elegant! I thought there must be something like this! – Tomas – 2014-02-02T14:31:11.687

21

4 chars

/$a/

searches a "a" after the end of the string.

or

/a^/

searches a before the beginning of the string.

xem

Posted 2014-01-13T17:19:26.717

Reputation: 5 523

Similarily to the other answers with this theme, this is wrong because ^ and $ are only special at the beginning and end, respectively, of the pattern; outside that, they are regular characters. – mirabilos – 2014-12-30T15:42:12.590

Isn't it only two chars? I would only count $a = 2 chars. – user14325 – 2014-01-13T17:40:44.360

21Why post the question if you know that there's a two-char solution? – Peter Taylor – 2014-01-13T17:41:26.983

I didn't know, I found it afterwards. – xem – 2014-01-13T17:42:47.713

3

@Howard: That matches an empty string: http://jsfiddle.net/RjLxJ/

– ProgramFOX – 2014-01-13T18:01:57.690

10Why do I always find these problems after an unbeatable solution is provided :( – Cruncher – 2014-01-13T21:23:19.880

47-1: Putting ^ and $ in "illegal" positions just causes them to be treated as ordinary characters. Your first example matches the literal $a in sed and probably other programs. – Ben Jackson – 2014-01-13T23:02:19.093

2

@Ben Jackson, that's not true for POSIX EREs. Try echo 'a^b' | grep 'a^b' vs. echo 'a^b' | grep -E 'a^b'. Check out 9.4.9 ERE Expression Anchoring

– laindir – 2014-01-14T16:05:05.403

1@BenJackson that's not true for ^ in PERL! perl -e'print "a^"=~/a^/ prints nothing. – Tomas – 2014-02-02T21:01:08.527

21

5 characters

/$.^/

/$^/ will match an empty string, whereas requiring a character in between will not.

Brian Glaz

Posted 2014-01-13T17:19:26.717

Reputation: 327

Similarily to the other answers with this theme, this is wrong because ^ and $ are only special at the beginning and end, respectively, of the pattern; outside that, they are regular characters. – mirabilos – 2014-12-30T15:42:52.113

6

This unfortunately matches "$a^" (or anything in place of the 'a') in Perl (and maybe sed). Still a nice one, though!

– jscs – 2014-01-14T02:00:44.353

@JoshCaswell: I guess perl might interpret $. as the current line number variable. Which might be empty, in which case this will be /^/. – MvG – 2014-01-14T08:44:09.770

A character 'between' just means a one-character string. – jwg – 2014-01-14T12:38:32.617

3@jwg notice the swapped ^ and $ – mniip – 2014-01-14T14:10:49.840

I tried the pattern '$^' with grep, but unfortunately it matched the string '$^'. Smartass grep. – joeytwiddle – 2014-01-15T15:07:14.760

20

9 chars

I'm not sure but /[^\S\s]/ should be unmatchable since it means not any character, but at least one of them.

user14325

Posted 2014-01-13T17:19:26.717

Reputation: 349

You don't need the +. – Peter Taylor – 2014-01-13T17:37:51.993

10/[^\S\s]/ = 9 chars – xem – 2014-01-13T17:38:41.663

19

4 characters

(ECMAScript flavour only)

/[]/

In other flavours this is not a valid character class (the ] would be considered a character in the class, so the expression isn't valid, because the class is never closed), but the ECMAScript standard accepts empty character classes. Since it is a class it has to match a character (so empty strings don't match), but since not a single character is included no actual character will match either.

Martin Ender

Posted 2014-01-13T17:19:26.717

Reputation: 184 808

Wouldn't this match an empty string even though you says it has to match a character? Or do you think this is illegal: /[]{0}/. (Ps. though my own answer partially looks like yours, I actually read yours after writing mine.) – nl-x – 2014-01-14T11:46:14.850

@nl-x paste this into your browser's console: /[]/.test(""). it returns false. a character class can never match an empty string, even if it doesn't contain characters (I imagine they are implemented like "IF the next character in the string is one of those listed, match; ELSE fail"). /[]{0}/ is legal (in ECMAScript) and does match the empty string... however, I'm not sure how that is relevant to my answer. – Martin Ender – 2014-01-14T16:12:08.623

Fails in Ruby 2.0 – Nakilon – 2014-01-14T19:01:43.417

1@Nakilon of course it does. Ruby doesn't implement the ECMAScript flavour. – Martin Ender – 2014-01-14T19:45:24.510

19

6 characters

/\b\B/

This matches a word boundary (\b) that isn't a word boundary (\B), which is obviously impossible.

The Guy with The Hat

Posted 2014-01-13T17:19:26.717

Reputation: 778

doesn't this one search for a word-boundary followed by a non-word-boundary? – grexter89 – 2014-01-14T14:16:35.313

1@grexter89 Yes, but they can't have any characters in between. i.e. The boundary and the non-boundary have to occupy the same space. – The Guy with The Hat – 2014-01-14T14:17:39.823

2I like this one. Good catch. – primo – 2014-01-14T14:33:23.983

15

6 chars

/b++b/

Possessive quantifier looks for as many b's as possible, then 1 more. 6 chars but points for symmetry?

VBCPP

Posted 2014-01-13T17:19:26.717

Reputation: 451

Huh... I just learned a new feature. Apparently, my regex skills are badly out of date. Thanks, and +1. – Ilmari Karonen – 2014-02-02T12:45:07.570

9

6 characters

/(\1)/

Not a winner, but I thought it was fun. grep and Python both barf on this one, but Perl seems okay with it.

Seems to be very implementation-dependent (which is hardly surprising, given its weirdness). Bob reports below that it matches anything in JavaScript's regex engine.

jscs

Posted 2014-01-13T17:19:26.717

Reputation: 900

.NET's regex engine seems to accept it. – Bob – 2014-01-14T01:45:03.357

And it always matches (an empty string) no matter what input on JS – Bob – 2014-01-14T01:54:00.273

8

Maybe a bit of cheating, but…

\0

… is unmatchable in POSIX regex in virtually all, if not all, implementations. BASIC RE and EXTENDED RE, even.

And POSIX RE does not need those pesky slashes and flags PCRE has.

mirabilos

Posted 2014-01-13T17:19:26.717

Reputation: 422

+1 Good!! Unfortunatelly, sole 0 doesn't work in PERL. "0"=~0 is true... – Tomas – 2014-02-02T14:27:44.233

sole \0 ITYM? Yes, most perlre(1) and PCRE implementations do not use C strings but size-bounded buffers, in whom this trick will not work, but most POSIX RE implementations work on C strings.

– mirabilos – 2014-02-03T13:16:41.170

5

5 chars

/^.^/

Matches string that begin with any single character before string begin.

P̲̳x͓L̳

Posted 2014-01-13T17:19:26.717

Reputation: 213

7Also matches the string ".^" – boothby – 2014-01-14T04:57:40.473

@boothby: in which language does matches? in Python doesn't. re.findall(r'^.^', '.^', re.DEBUG) – P̲̳x͓L̳ – 2014-01-14T08:39:47.010

8

+1 for using the manga operator (see http://stackoverflow.com/questions/3618340/what-does-the-caret-operator-do)

– prototype – 2014-01-14T19:11:03.243

@boothby ^ and . are metacharacters not literal, that need to be escaped – P̲̳x͓L̳ – 2014-01-14T19:11:07.123

2It's broken in Perl. This question really should have set some ground rules about language. – boothby – 2014-01-14T19:28:52.313

1Actually, “^” is only special if at the beginning of the string. Any “^” after anything else does not need to be escaped, so this answer is wrong. – mirabilos – 2014-10-20T12:54:36.800

5

6 bytes

/(*F)/

An abbreviation for (*FAIL), supported by perl-compatable regex engines. Thanks to @HamZa for pointing this out.

9 bytes

/(*FAIL)/

Should work with any regex engine that supports verbs at all. I'm not convinced this really needs to be golfed any further.

primo

Posted 2014-01-13T17:19:26.717

Reputation: 30 891

1How does this work? – boothby – 2014-01-14T05:49:23.140

@boothby (*FAIL) is a verb that always fails. – primo – 2014-01-14T06:05:58.837

@primo you might just use /(*F)/ :) – HamZa – 2014-01-14T19:01:52.443

5

4 char:

/.^/

Works with GNU grep 2.5.1 and egrep.

RSFalcon7

Posted 2014-01-13T17:19:26.717

Reputation: 288

No, for the same reason as the one below: Actually, “^” is only special if at the beginning of the pattern. Any “^” after anything else does not need to be escaped, so this answer is wrong. – mirabilos – 2014-12-30T15:38:57.237

/.^/ = 4 chars. – Alexey Popkov – 2014-01-14T19:41:51.557

Why do you need the //? those are not required everywhere ;-) – RSFalcon7 – 2014-01-14T19:44:18.283

The wrapping slashes / count, see the original question ("including slashes and flags") and the OP's entry.

– Alexey Popkov – 2014-01-14T19:48:16.520

right! I miss read :( – RSFalcon7 – 2014-01-14T19:49:39.790

4

Perl 6 (5 characters)

/<!>/

Sorta rule abuse (because Perl 6 regexes are different, and incompatible with stardard regexes by design), but I don't care. <!> rule informs Perl 6 that the regex doesn't match.

Konrad Borowski

Posted 2014-01-13T17:19:26.717

Reputation: 11 185

4

4 chars

/$./

Needs any character after the string ends

c0de Freak

Posted 2014-01-13T17:19:26.717

Reputation: 55

Similarily to the other two ones, $ is only special at the end of the pattern. – mirabilos – 2014-12-30T15:39:28.443

3

4 chars with slashes 2 without

In the TXR language's regex engine, an empty character class [] matches no character, and therefore no string. It behaves this way because the character class requires a character match, and when it is empty it specifies that no character can satisfy it.

Another way is to invert the "set of all strings including empty" regex /.*/ using the complement operator: /~.*/. The complement of that set contains no strings at all, and so cannot match anything.

This is all documented in the man page:

   nomatch
          The  nomatch  regular  expression  represents  the empty set: it
          matches no strings at all, not even the empty string.  There  is
          no  dedicated  syntax  to  directly express nomatch in the regex
          language.  However, the empty character class []  is  equivalent
          to nomatch, and may be considered to be a notation for it. Other
          representations of nomatch are possible: for instance, the regex
          ~.* which is the complement of the regex that denotes the set of
          all possible strings, and thus denotes the empty set. A  nomatch
          has  uses;  for instance, it can be used to temporarily "comment
          out" regular expressions. The regex ([]abc|xyz) is equivalent to
          (xyz), since the []abc branch cannot match anything. Using [] to
          "block" a subexpression allows you to leave it  in  place,  then
          enable it later by removing the "block".

The slashes are not part of the regex syntax per se; they are just punctuation which delimits regexes in the S-expression notation. Witness:

# match line of input with x variable, and then parse that as a regex
#
$ txr -c '@x
@(do (print (regex-parse x)) (put-char #\newline))' -
ab.*c                               <- input from tty: no slashes.
(compound #\a #\b (0+ wild) #\c)    <- output: AST of regex

Kaz

Posted 2014-01-13T17:19:26.717

Reputation: 372

thanks for your answer and sorry again for the slash-counting. I thought it would be easier to include them if people used flags. – xem – 2014-01-15T07:18:11.467

1

This is a 5 char regex.

/[]+/

It matches an empty group 1 or more times.

EDIT:

Removed my answer for other flavours:

/.{-1}/

Anything that is not a number inside {} will match the text.

This one will match ".{-1}"

Ismael Miguel

Posted 2014-01-13T17:19:26.717

Reputation: 6 797

Note that this only works in the ECMAScript flavour. In most (all?) others it is not a valid expression. – Martin Ender – 2014-01-14T16:14:03.213

Isn't it invalid? – Wasi – 2014-01-14T16:14:56.543

@Wasi not in ECMAScript-conforming flavours – Martin Ender – 2014-01-14T16:17:16.120

1

6 chars

(or 4, depending on how you look at it)

/{,0}/

Tercy

Posted 2014-01-13T17:19:26.717

Reputation: 19

Fails in Ruby 2.0 – Nakilon – 2014-01-14T19:02:09.917

In which regex implementations does this not give an error? – Peter Taylor – 2014-01-15T09:43:37.107

I only tested it using PHP's preg_match. – Tercy – 2014-01-15T13:12:02.407

0

5 characters

Hope this doesnt sound stupid: /[]+/

nl-x

Posted 2014-01-13T17:19:26.717

Reputation: 306

Nope. Not a valid regex. – The Guy with The Hat – 2014-01-14T13:59:58.607

@RyanCarlson It is valid and legal... At least in Ecmascript. – nl-x – 2014-01-15T06:49:32.577

-1

/$^/

A thing that ends before it has begun...

simon

Posted 2014-01-13T17:19:26.717

Reputation: 393

Same problem as the other three answers: ^ is only special at the beginning of the pattern, and $ is only special at the end of the pattern, so this does, in fact, match the literal string $^. – mirabilos – 2014-12-30T15:40:46.047

7Matches the empty string (in some RE implementations, anyways). – jscs – 2014-01-13T22:37:16.570

1Your implementation is broken :) – simon – 2014-01-13T22:39:37.973

2

Better let Guido know.

– jscs – 2014-01-13T22:45:22.317

7

More importantly, as Ben Jackson pointed out, in Perl, where it doesn't match "", it does match a string containing those two literal characters: "$^".

– jscs – 2014-01-14T01:44:53.173

+1 I just wanted to post the same! @Josh, it does work in PERL, and it doesn't match empty string! Ben's comment is broken, I replied to it. – Tomas – 2014-02-02T14:14:02.520

I didn't say it matched the empty string, @Tomas, I said it matched the string "$^": perl -le 'print "Sure enough" if("$^" =~ /$^/)' – jscs – 2014-02-02T20:42:17.277

@Josh Caswell, you are sure enough but also mistaken enough :-) $^ is a variable that expands to STDOUT_TOP. Your string must be escaped: perl -le 'print "Mistaken enough" if("\$^" =~ /\$^/)', as well as Simons solution must be: /\$^/ or q{$^}. – Tomas – 2014-02-02T20:58:19.720

It doesn't matter what it expands to or what its value is, @Tomas. The contest is to make a regex that matches nothing. If there's any match at all, the pattern doesn't qualify. – jscs – 2014-02-02T21:00:35.357

Simon, your regex matches a string STDOUT_TOP. Funny enough :-) PS: you haven't tried this one? :-) – Tomas – 2014-02-02T21:02:47.410