Inverse regex of compound interest

9

Koronkorko is the Finnish word for compound interest. We don't want compound interest in our strings, so let's find the shortest possible regular expression to exclude it.

Given a string consisting only of the uppercase alphabetic characters A-Z, determine the shortest possible regular expression that matches the string if it does not contain the substring KORONKORKO. Any string that contains KORONKORKO as a substring should not be matched by the regex.

Only the characters A-Z, [, ], -, ^, , ?, *, +, |, (, and ) should be used in the expression.

I think this can be done with 118 characters in the expression. Can you make it shorter?

Note: This challenge is from Ohjelmointiputka (in Finnish).

guest

Posted 2016-05-12T18:47:11.177

Reputation: 101

If ! was an allowed character, you could've done ^((?!KORONKORO).)*$ for 19 bytes. – Mama Fun Roll – 2016-05-12T23:00:05.817

3@MamaFunRoll I think that's why ! isn't allow. – Alex A. – 2016-05-12T23:55:41.673

I had the fun of trying to work my way around the Finnish site, and I believe what you're looking for are theoretical regex expressions which match/reject the input string. For example, the site only seems to allow the use of - and ^ inside character classes (so ^ can't be used as an anchor), and a match is only counted if the whole string is matched by the regex (i.e. an implicit surrounding ^$, as opposed to normal "regexes" which count a string as matching if any part of it matches the regex) – Sp3000 – 2016-05-13T11:15:22.277

As such I've deleted my PCRE answer which, while it should work even in PHP, is almost definitely unintended in this case. – Sp3000 – 2016-05-13T11:16:33.017

I forgot to say that the site checks the if the expression is valid by PHP's ereg function. It was said in discussion in http://www.ohjelmointiputka.net/keskustelu/18785-putkaposti-29-k%C3%A4%C3%A4nteislauseke/sivu-1

– guest – 2016-05-13T11:24:11.097

Answers

6

204 characters

(K((O(R(O(NKORO)*(NK(O(RK)?)?)?)?)?)?K)*(O(R(O(NKORO)*(N(K(O(RK?[^KO]|[^KR])|[^KO])|[^K])|[^KN])|[^KO])|[^KR])|[^KO])|[^K])*(K((O(R(O(NKORO)*(NK(O(RK)?)?)?)?)?)?K)*(O(R(O(NKORO)*(N(K(O(RK?)?)?)?)?)?)?)?)?

Generated by turning .*KORONKORKO.* into a finite state machine, inverting the finite state machine, and turning it back into a regex.

orlp

Posted 2016-05-12T18:47:11.177

Reputation: 37 067

Why did this become the best answer? – Bálint – 2016-05-13T08:38:18.033

1

Python, 77 79 97 118 bytes

Edit 3: Rewrite. Uses nested lookaheads

^([^K]|K(?=$|[^O]|O(?=$|[^R]|R(?=$|[^O]|O(?=$|[^N]|N(?=$|[^K]|K(?=$|[^O]|O(?=$|[^R]|R(?=$|[^K]|K(?=$|[^O]))))))))))*$

Regex 101

Edit 2: Added '$|' throughout the regex. Now, if a prefix of KORONKORKO has been matched, the next item to match is end-of-string, a character that ends the prefix, or a character that extends the prefix if it is followed by something that ends the prefix.

This regex works with re.fullmatch(), which was added in Python 3.4. To use with re.match(), ^ and $ need to be added to the beginning and end of the pattern, respectively, for 2 more bytes.

([^K]|K($|[^O]|O($|[^R]|R($|[^O]|O($|[^N]|N($|[^K]|K($|[^O]|O($|[^R]|R($|[^K]|K($|[^O]))))))))))*

Regex101 link

Previous incorrect solution (see comments):

K|([^K]|K([^O]|O([^R]|R([^O]|O([^N]|N([^K]|K([^O]|O([^R]|R([^K]|K[^O])))))))))*

Edit: Added single K

RootTwo

Posted 2016-05-12T18:47:11.177

Reputation: 1 749

2I don't believe this matches K. – orlp – 2016-05-13T09:12:27.127

@orip - Good catch. Fixed. – RootTwo – 2016-05-13T14:04:03.750

Latest update now fails for KKORONKORKO – Sp3000 – 2016-05-14T03:16:45.673

Is it fixable? Any ideas? – RootTwo – 2016-05-14T05:33:36.143

1The starting ^ and ending $ aren't necessary. Also, = and $ aren't allowed. – LegionMammal978 – 2016-05-15T00:58:17.800