Regex for multiples of 9

14

3

It is easy to describe a finite state machine that recognizes multiples of 9: keep track of the digit sum (mod 9) and add whatever digit is accepted next. Such a FSM has only 9 states, very simple! By the equivalence between FSM-recognizability and regular languages, there is a regular expression for multiples of 9. However, any such regular expression is likely... very... long. As in, likely on the order of a gigabyte.

There is a worked example at https://www.quaxio.com/triple/ for multiples of 3. At the bottom of the page, the author provides a somewhat "hand-optimized" solution that's a bit shorter than the naive conversion from FSM to regex.

The challenge:

You must make a regex to detect multiples of 9. Since such a regex is expected to be very long, I ask that you provide a program that can print out your regex. (If you really want to give a whole regex, perhaps host it elsewhere and link it here!)

You must be able to tell us the exact character count of your program's output -- so having a program that simply tries all regexes up to a certain length, until it finds one that works, is not acceptable unless it runs quickly enough that you can run it to completion and give us the resulting regex length!

Points are for having the shortest output regex, not based on program length, of course. Since the regex is the "program" I am asking for, and it's just too long to conveniently transmit here, I'm still tagging this code-golf.

Rules:

  • The input will only include characters matching [0-9]*.
  • Your regex should match multiples of 9, but not anything else. Cases that aren't made entirely of the digits 0-9 and are invalid inputs can either match or fail as you wish.
  • Given the motivation that it's easily recognized by a DFA, the resulting regex must actually be regular expression in the more theoretic terminology, that is, only operators under which regular languages are closed. To be precise, the only things that are allowed:
    • Literals, character ranges ([ab], [a-f], [^k]), Kleene star (*), anchors (^ and $), grouping via parentheses, alternation (|), optional terms (?), one-or-more terms (+), lookaheads ((?=)), negative lookaheads ((?!)), lookbehinds ((?<=)), negative lookbehinds ((?<!)), conditionals (as in https://www.regular-expressions.info/conditional.html -- (?(?=test)then|else) ), and backreferences of bounded length (see below).
  • Examples of things that are not allowed:
    • Backreferences of arbitrary length, forward references, Recursion, subroutines, looping constructs, executable code, any variation of 'eval', or built-in constructs for casting the string to an arithmetic value.
  • Backreferences that can be shown to have a bounded-length binding string are acceptable, as they can be stored in finite state and do not alter the regularity of the language. For instance, the regex (..2.[3-5])4\1.\1 is acceptable, as there is bound length on the capturing group \1. This is a regular construction. A construct such as (2*)0\1 is not acceptable, as the captured group cannot be stored in finite state.
  • Your regex is free to accept or reject integers with extraneous leading zeroes as you wish. However, the string "0" must be accepted.

Alex Meiburg

Posted 2018-02-20T23:48:43.117

Reputation: 249

2Related, not sure if this would be considered a duplicate – ASCII-only – 2018-02-20T23:53:30.383

Ah, hmm! I had search for "regex multiple" but not "regex divisible". I suppose that's awfully similar, yes. – Alex Meiburg – 2018-02-20T23:56:49.457

11

It hasn't been said yet, so Welcome to PPCG and interesting first challenge! As mentioned by another user, it is often recommended, but not required, to post challenge proposals in the Sandbox so they can get feedback, before posting on main. However, this is a well thought out and clear challenge, so there is no reason to move this to the Sandbox. Hope you enjoy our community!

– caird coinheringaahing – 2018-02-21T07:30:12.470

Solutions of less than 200 kibibytes are possible, so it won't be THAT huge – Ton Hospel – 2018-02-21T13:09:46.050

Just so you know, this was already solved generally here.

– mbomb007 – 2018-02-21T15:52:04.017

3Solution using .NET's extensions: ^(0|9|(?<c>1|(?<c>2|(?<c>3|(?<c>4|(?<c>5|(?<c>6|(?<c>7|(?<c>8))))))))((?<-c>){9})?)*$(?(c).) – Neil – 2018-02-22T13:43:35.963

@Neil, can you explain a bit about how that works? I'm not familiar enough with .NET syntax. What is the ?<c> and ?<-c> construct? It looks like a back-reference, and I'm not convinced it satisfies the "bounded size" aspect. – Alex Meiburg – 2018-02-22T19:30:49.973

I'm not suggesting it satisfies the question criteria, which is why I submitted it as a comment rather than an answer. The ?<c> is a named capture group; .NET allows you multiple capture groups with the same name as well as repeating a capture with */+ and it tracks the list of captures of each group; ?<-c> removes the latest capture from the group (failing if there isn't one). So each digit adds its value in captures, and then we try to remove 9 captures (but that's ? optional). Finally the $(?(c).) only matches if there are no captures left at the end of the string. – Neil – 2018-02-22T22:06:16.530

Ah, I see. Very clever, the way you nested them like that! I like it. :) Does this evaluate efficiently in practice? It seems like it should, as there is essentially only ever "one way" to capture the next character, and the engine would ideally recognize that. – Alex Meiburg – 2018-02-22T22:38:32.130

Answers

3

Haskell, 207,535 202,073 bytes

5,462 bytes saved by using 0|9 instead of [09] where possible.

digits n
  | x == 0    = "0|9"
  | otherwise = show x
  where x = mod n 9

regex 0 = "[09]*"
regex n = (regex' n (-1) (-1)) ++ "*"

regex' 0 start end = digits (end - start)
regex' n start end = '(':(regex' 0 start end) ++ (concat ['|':(regex' (n-x) (start-x) (-1)) ++ (regex (n-x))
                                                  ++ (regex' (n-x) (-1) (end-x)) | x <- [1..n]]) ++ ")"

main = do
  putStr ("^" ++ (regex 8) ++ "$")

Try it online!

Just a quick adaptation of the regex given in the footnotes of the linked article to get things started.

Pastebin of output regex, courtesy of Herman Lauenstein.

While I have not been able to test the full regex, modifying the program to check divisibility by 3 instead gives something exactly equivalent to the regex I based this on. Furthermore, modifying the program to check the digit sum's divisibility by 4 or 5 also seems to work on the numbers I tested it on.

Nitrodon

Posted 2018-02-20T23:48:43.117

Reputation: 9 181

You can also test what your method gives for divisibility by 2 (should be something like /even$/) and divisibility by 5 (should be something like /[05]$/). PS: Mention the language of your code – Ton Hospel – 2018-02-21T14:56:26.517

Here is a pastebin with the output (with all occurences of ([09]| replaced with (0|9| to save thousands of bytes) – Herman L – 2018-02-21T18:30:07.273