Print everything which a regular expression matches

0

Modern regular expressions (regexes) have many features (\d, {n,m}, \1, etc.), but to define a regular language (modern regexes can define broader classes of languages, in their defense), all you need are three operators: concatenation (+), union (|), and the Kleene star (*). Here are some examples:

a - Matches the string a
a+b - Matches the string ab
a|b - Matches either the string a and the string b
a* - Matches the empty string, a, aa, aaa, aaaa, etc.
(a|b)* - Matches any string of a's and b's (including the empty string).
(a+a)* - Matches the empty string, aa, aaaa, aaaaaa, etc.

(Note that the full string must match; a does not match aa)

Given a regex, r, output every single possible string which r matches. There are, of course (countably) infinitely many such strings, so your program will run forever on some inputs. But for any string s which r matches, after some finite amount of time, your program must output r. Even if r only matches a finite number of strings, your program can, if you want, run forever after printing them all (it doesn't need to halt when it has printed every possible s).

You may not use any built-in functions involving regexes. You may choose any input alphabet you like, so long as it has at least 2 characters (e.g. a and b; 0, 1, and 2; etc.).

You can accept infix, prefix, or postfix notation (and the |, +, and * characters need not be those characters specifically).

If (and only if) you are using infix notation:

  • You can, if you like, omit the +s in your regexes (so aa would be a+a, a(aa) would be a+(a+a)).
  • You can assume that all operations except for the outermost one will be in parentheses (so a|b|a+a does not need to be a valid input, but (a|b)|(a+a) must be--up to your choice of alphabet and symbols for | and +).
  • You can choose whether * goes before or after the thing you are applying it to.

See the test cases below if this is unclear.

You can assume that the regex is less than 999 characters, but the strings matched by it can be of any length.

I/O can be pretty much anything (stdin, stdout, a file, etc.), but remember that your program must output every string after some finite amount of time, so you can't, for example, use a function which in theory returns an array/list, but only after an infinite amount of time. You can, however, use a generator/stream/(lazy) infinite list/etc. if your language supports those constructs--put simply, you can use a function for I/O if and only if you can add some code (not included in the byte count) which uses that function to print every matching string to stdout. So, this function would be invalid, but this function would be valid.

Please clearly state your I/O format and which notation and alphabet you are using in your answer.

A Try It Online link would be appreciated.

Test cases

Each test case just shows one possible ordering of the output.

This shows prefix, infix, and postfix, but you only need to accept one.

-input
prefix | infix | postfix
a        a       a
-output
a
-input
prefix | infix | postfix
*a       a*      a*
-output

a
aa
aaa
aaaa
...
-input
prefix | infix | postfix
*|ab     (a|b)*  ab|*
-output
a
b

ab
ba
aa
bb
...
-input
prefix    | infix             | postfix
*|+a+aa+bb  ((a+(a+a))|(b+b))*  bb+aa+a+|*

aaa
bb
aaabb
bbaaa
aaabbaaa
bbaaabb
bbbbaaa
...

Make sure you output the empty string when it is matched!

Leo Tenenbaum

Posted 2019-07-19T05:12:06.037

Reputation: 2 655

Question was closed 2019-07-19T05:46:33.273

Note that (a|b)* matches the empty string too.(you forgot it in the examples on top) – pixma140 – 2019-07-19T05:27:10.817

1I would say that the empty string is a string of a's and b's, just with 0 a's and 0 b's :) but I'll add it. – Leo Tenenbaum – 2019-07-19T05:43:47.660

No answers