Write a CFG acceptor

7

1

Write a program that, given a context free grammar and a string, returns whether the string is a word of the language defined by the grammar.

Input

Your program should read a context free grammar and an input string in an implementation defined way during runtime. Hardcoding the input is acceptable as long as it is trivial to change the input.

The context free grammar is a set of rules as defined by the regex [A-Z][a-zA-Z]*, that is: One capital letter denoting the nonterminal to be replaced and an arbitrary number of terminals (minuscules) and nonterminals (capitals) to replace it. The grammar may contain up to 256 rules. The initial symbol is always S. You may not assume that the grammar is unambigous. The grammar must contain at least one production rule for S.

The input string must match the regex [a-z]*. It has a length of no more than 65536 characters.

Output

Output is to be done in an implementation defined manner. Your program must output whether the input string is a word of the language defined by the grammar.

Constraints

  • Your program must terminate in reasonable time
  • You may not use any libraries that provide parsers or parser generators
  • You may not run any program that generates a parser
  • You may not use compression utilities or libraries

Scoring

This is code-golf. The submission with the smallest number of bytes wins

Example

The rules

S
SaSb

represent a grammar which defines the language anbn for n≥0. So given string aaabbb it would indicate that it accepts it, but given string aaabb it would indicate that it does not accept it.

The ruleset

S
SaSbS
ScSdS

describes the Dyck language. It describes the language of all balanced pairs of ab's. and cd's. For instance, string ababacabdb matches but cabdacb doesn't.

FUZxxl

Posted 2012-09-10T20:20:09.980

Reputation: 9 656

Answers

6

Python 2.7, 325 chars

R=['S','SaSb']
I='aaabbb'

L=len(I)
def P(x,y,n):
 if n<3:return[[x,y]][:n>1or x==y]
 return[[x]+w for z in range(x,y+1)for w in P(z,y,n-1)]
S={(i,i+1,c):1for i,c in enumerate(I)}
T=0
while S!=T:
 T=dict(S)
 for r in R:
  for a,x,y,b in P(0,L,4):
   if any(all(t in S for t in zip(p,p[1:],r[1:]))for p in P(x,y,len(r))):S[x,y,r[0]]=1
print(0,L,'S')in S

The rules are listed in R and the input is in I.

The code uses dynamic programming, similar to the Earley algorithm. S contains tuples (i,j,N) if the symbol N can generate the input from indexes i to j. We iterate through all rules and indexes looking for new entries we can put in S.

P is a helper function that calculates all possible partitions of the range [x,y] into n-1 ranges (assigning one symbol in the rule to each). For instance, P(0,7,4) contains, among many others, [0,3,6,7].

The run time is exponential in the longest rule, but shouldn't be too bad in the number of rules and length of the input.

Keith Randall

Posted 2012-09-10T20:20:09.980

Reputation: 19 865