Match the permutations!

15

2

Your challenge is to create a regex that matches every string permutation of itself, and nothing else. The match must also be case-sensitive.

So, for example, if your regex is:

ABC

It should match (and only match) these strings:

ABC
ACB
BAC
BCA
CAB
CBA

It shouldn't match things like:

AABC (contains an extra A)
ABCD (contains an extra D)
AC   (no B)
AAA  (no B and C, extra 2 A's)
abc  (case-sensitive)

Rules:

  • You are allowed to use any flavour of regex you like.
  • Standard loopholes apply.
  • You must have at least two different characters in your code. That means solutions like 1 are invalid.
  • The regex should contain only printable ASCII, and nothing else.

clismique

Posted 2016-11-22T04:52:35.587

Reputation: 6 600

3

Related: Regex that only matches itself

– xnor – 2016-11-22T05:28:46.763

Also related: Character counts in source code

– jimmy23013 – 2016-11-22T06:48:36.880

I thought (ABC|ACB|BAC|BCA|CAB|CBA) but you wanted a generalized answer. – Stephen Quan – 2016-11-24T23:56:30.280

Answers

11

JavaScript, 64 57 bytes

4 bytes removed thanks to Martin Ender.

^(?!.*([^])(.*\1){3}]?)[$$[-^?!!.'''-*11{33}5577\\-]{57}$

Try it here.

Explanations (outdated)

^                                  # Beginning of the string.
(?!.*                              # Match only the strings that don't contain...
  (.)(.*\1){4}                     #     5 occurrences of the same character.
  [^1]?[^1]?                       #     Something that doesn't matter.
)
[]zzz^(?!!!.**[)\\1{{44}}666]{64}  # 64 occurrences of these 16 characters.
                                   # Some are duplicated to make sure the regex
                                   # contains 4 occurrences of each character.
\z                                 # End of the string.

jimmy23013

Posted 2016-11-22T04:52:35.587

Reputation: 34 042

2

I think this works for 60: ^(?!.*(\S)(.*\1){3}[^1]?)[]zzSS[-^?!!.'''-*1{33}0066-]{60}\z regex101

– Martin Ender – 2016-11-22T08:13:54.873

This almost works in .NET: ^(?'4'(?!(.*\4){3})[]$$[\\^^?!!..'-*{}33-5-]){54}$[5]* – jimmy23013 – 2016-11-23T09:40:11.960

What doesn't work? Trailing linefeeds? – Martin Ender – 2016-11-23T09:55:38.810

@MartinEnder Yes. – jimmy23013 – 2016-11-23T10:13:40.980

2

Perl and PCRE regex, 280 bytes

^(?=(.*z){2})(?=(.*\(){43})(?=(.*\)){43})(?=(.*\*){22})(?=(.*\.){23})(?=(.*0){2})(?=(.*1){6})(?=(.*2){16})(?=(.*3){7})(?=(.*4){4})(?=(.*5){1})(?=(.*6){3})(?=(.*7){2})(?=(.*8){2})(?=(.*9){1})(?=(.*=){22})(?=(.*\?){22})(?=(.*\\){11})(?=(.*\^){2})(?=(.*\{){23})(?=(.*\}){23}).{280}\z

(Slightly) more readable:

^
(?=(.*z){2})
(?=(.*\(){43})
(?=(.*\)){43})
(?=(.*\*){22})
(?=(.*\.){23})
(?=(.*0){2})
(?=(.*1){6})
(?=(.*2){16})
(?=(.*3){7})
(?=(.*4){4})
(?=(.*5){1})
(?=(.*6){3})
(?=(.*7){2})
(?=(.*8){2})
(?=(.*9){1})
(?=(.*=){22})
(?=(.*\?){22})
(?=(.*\\){11})
(?=(.*\^){2})
(?=(.*\{){23})
(?=(.*\}){23})
.{280}\z

This runs in O(2^n) time as written, so is incredibly inefficient. The easiest way to test it is to replace every occurrence of .* with .*?, which causes the case where it matches to be checked first (meaning that it matches in linear time, but still takes exponential time if it fails to match).

The basic idea is that we enforce the length of the regex to equal 280, and use lookahead assertions to force each character in the regex to appear at least a certain number of times, e.g. (?=(.*z){2}) forces the z character to appear at least twice. 2+43+43+22+23+2+6+16+7+4+1+3+2+2+1+22+22+11+2+23+23 is 280, so we can't have any "extra" occurrences of any characters.

This is a programming example of an autogram, a sentence that describes itself by listing the number of each character it contains (and, in this case, also the total length). I got fairly lucky in constructing it (normally you have to use brute force but I stumbled across this solution while testing my brute-force program before I'd fully finished writing it).

Perl and PCRE regex, 253 bytes, in collaboration with Martin Ender

I hypothesized that there might be shorter solutions which omit some digits (most likely 9, 8, or 7). Martin Ender found one, shown below:

^(?=(.*z){2})(?=(.*\(){39})(?=(.*\)){39})(?=(.*\*){20})(?=(.*\.){21})(?=(.*0){4})(?=(.*1){6})(?=(.*2){11})(?=(.*3){6})(?=(.*4){3})(?=(.*5){2})(?=(.*6){3})(?=(.*9){4})(?=(.*=){20})(?=(.*\?){20})(?=(.*\\){9})(?=(.*\^){2})(?=(.*{){21})(?=(.*}){21}).{253}\z

Readable version:

^
(?=(.*z){2})
(?=(.*\(){39})
(?=(.*\)){39})
(?=(.*\*){20})
(?=(.*\.){21})
(?=(.*0){4})
(?=(.*1){6})
(?=(.*2){11})
(?=(.*3){6})
(?=(.*4){3})
(?=(.*5){2})
(?=(.*6){3})
(?=(.*9){4})
(?=(.*=){20})
(?=(.*\?){20})
(?=(.*\\){9})
(?=(.*\^){2})
(?=(.*{){21})
(?=(.*}){21})
.{253}\z

user62131

Posted 2016-11-22T04:52:35.587

Reputation:

I don't think you need to escape those {} in the last two lookaheads. You also don't need to add things like (?=(.*5){1}) since there wouldn't be a 5 if you didn't have that lookahead. One problem is that $ allows a trailing linefeed, so you'll need to use \z there instead of $ like jimmy did, but that won't cost you a byte I think since you save the \ in the first lookahead. – Martin Ender – 2016-11-22T08:52:36.850

I'm aware that it's possible to omit things like digits. However, they're there to make the autogram work. Removing any part of the program will cause all the rest to break, because it no longer describes the program correctly. (The counts for each line count the counts for each line! Thus it is in general basically impossible to change the program.) As for $ allowing a newline at the end of the string, that generally depends on how the regex is called by the surrounding program (they're normally run on code that's already been parsed into lines).

– None – 2016-11-22T08:58:04.107

Or to be more precise: I do need the (?=(.*5){1}) in this case. If I removed it, there would be a 5 in the program, because the (?=(.*1){6}) line would now have to read (?=(.*1){5}). – None – 2016-11-22T09:00:04.893

As for the trailing linefeed there doesn't seem to be any restriction in the challenge on the kind of input to your regex, so that usually means it should work for any string whatsoever, and changing $ to \z doesn't do any harm (and doesn't break the autogram). – Martin Ender – 2016-11-22T09:10:50.523

Oh, I see; you change the \$$ to z\z. That works; I'll go change it. – None – 2016-11-22T09:13:20.347

I fiddled around a bit and managed to recover an autogram after dropping those unnecessary escapes, saving 3 bytes: http://pastebin.com/4hUXHDrb

– Martin Ender – 2016-11-22T12:04:00.400

Even better: I got an autogram after dropping both 7 and 8, cutting this down to 253: http://pastebin.com/uqKKUGF9

– Martin Ender – 2016-11-22T12:17:08.933

Well found. I've added it to the post as an alternative/collaborative solution (you've done enough work on it that I can hardly count it as my own solution at this point). – None – 2016-11-22T12:49:19.197

I'm perfectly happy for you to replace your solution with it if you credit me. It's still your idea of using an autogram, and all I did was tweak some numbers in an editor until I got an autogram again. ;) – Martin Ender – 2016-11-22T12:50:46.257