Simple regex parser

2

Let's parse some regex. Not quite the regex you are used to, but the formal regular expressions taught in courses on automata theory. Wikipedia knows what I'm talking about.

We have four operations. In order of priority, from highest to lowest:

  1. Parenthesis. (a)
    • The contents of the parenthesis is a regular expression itself. This allows for nested parenthesis.
  2. Kleene star. a*
    • The star denotes that the previous token may be matched zero or more number of times.
  3. Concatenation. Logical AND. ab
    • This does what you think it does. It matches a followed by b.
  4. Union, or alteration. Logical OR. a|b
    • This matches a or b but not ab.

These characters have special meaning: *|()ε All other characters (except white-space) are in the language. There are no escape sequences.

ε denotes the empty string. If your language of choice does not support that character, tell us how we can change what character you used to denote the empty string.

The task:

Your program is supposed to read one line of well-formatted regex from stdin, followed by one line of text to be matched against that regex. If the regex accepts the string, output "yay" to stdout. Otherwise, output "nay" to stdout.

Yay-cases:

abc
abc

_

ab(c|d)*ef*
abcdcdcdddcce

_

ab(c|d)*ef*
abeff

_

[abc]*
[abc]

_

<(?P<tag>[A-Z][A-Z0-9]*)\b[^>]*>
<(?P<tag>[A-Z][A-Z0-9])\b[^>]>

_

a*
ε

_

a*b*c*
aaεbbεεbcε

_

a\(b\)c
a\b\c

_

a|b
b

Nay-cases:

abc
ab

_

ABC
abc

_

abc
abcd

_

[abc]*
abc

_

a+
aaaa

This is code golf, so lowest byte count wins.

Filip Haglund

Posted 2015-10-25T20:50:23.693

Reputation: 1 789

Question was closed 2015-10-25T22:13:59.073

2Closely related – Alex A. – 2015-10-25T20:56:55.580

1There are a lot of differences between the two challenges. This one has no restrictions on what language features one may use, and it does not require you to output a program that in turn parses the input. – Filip Haglund – 2015-10-25T21:03:22.187

1I don't think the challenges are different enough to warrant separate posts. – Alex A. – 2015-10-25T22:14:24.777

No answers