17
3
In this task you have to write a program that reads a regular expression and generates another program that outputs whether an input string is accepted by that regular expression. The output must be a program written in the same language as your submission.
Input
The input is a regular expression r matching the following ABNF (the initial production rule is REGEX
):
REGEX = *( STAR / GROUP / LITERAL / ALTERNATIVE )
STAR = REGEX '*'
GROUP = '(' REGEX ')'
LITERAL = ALPHA / DIGIT
ALTERNATIVE = REGEX '|' REGEX
If the input does not match this grammar, the behavior of your program is undefined.
Interpretation
Interprete the input as a regular expression, where *
is the Kleene-star (meaning repeat left argument zero or more times), |
is an alternative, (
and )
group and no operator at all concatenated. Grouping takes precedence over star, star takes precedence over concatenation, concatenation takes precedence over alternative.
A string is said to be accepted if the regex matches the whole string.
Output
The program's output is another program written in the same language as your submission that reads a string s in an implementation defined way at runtime, outputs whether r accepts s and then terminates. Output may be done in a user-defined way although there must be only two distinct outputs for accepted and rejected programs.
You may assume that the input of your output program is never longer than 216-1 Bytes.
Restrictions
Neither your submission nor any program generated by your submission may use builtin functionality or libraries that
- match regexes
- transform regular expressions
- compile regular expressions
- generate parsers from a grammar
- simplify the problem in a way that your submission becomes trivial
Scoring
The score of your submission is the number of characters it. The submission with the lowest score wins.
Testcases
All testcases contain a regular expression, a set of accepted strings, a set of rejected strings and an example program in C99 which is a valid output of a (hyptothetical) C99 submission.
(empty regular expression)
Accepted strings
- (empty input)
Rejected strings
- foo
- bar
- baz
- quux
Example program
#include <stdio.h>
int main() {
char input[65536];
gets(input);
return input[0] != 0;
}
(b|)(ab)*(a|)
(a
and b
alternating)
accepted strings
a
ba
abababababa
abab
rejected strings
afba
foo
babba
example program
#include <stdio.h>
int main() {
char input[65536];
int state = 0;
for (;;) switch (state) {
case 0: switch (getchar()) {
case 'a': state = 1; break;
case 'b': state = 2; break;
case EOF: return 0;
default: return 1;
} break;
case 1: switch (getchar()) {
case 'b': state = 2; break;
case EOF: return 0;
default: return 1;
} break;
case 2: switch (getchar()) {
case 'a': state = 1; break;
case EOF: return 0;
default: return 1;
} break;
}
(0|1(0|1)*)(|A(0|1)*1)
(binary floating point numbers)
accepted strings
- 10110100
- 0
- 1A00001
rejected strings
- 011
- 10A
- 1A00
- 100A010
1I presume that having a program like
return (regex.match(stdin) is not null)
is not allowed. – beary605 – 2012-08-18T23:25:22.0901You say that "The output must be a program written in the same language as the input", but the input is a regex. And the grammar you provide doesn't include the rule GROUP, which presumably defines literal brackets. – Peter Taylor – 2012-08-19T07:24:16.067
@Peter Sorry, I meant to write they same language as the submission. – FUZxxl – 2012-08-19T10:04:52.583
@beary605 Yeah, you're right. See the section Restrictions: Neither your submission nor any program generated by your submission may use builtin functionality or libraries that match regexes (...). – FUZxxl – 2012-08-19T10:13:04.143
I think your second example program is incorrect, it's missing a loop around the outer switch – Hasturkun – 2012-08-21T11:09:54.100