My program knows your language. Does yours?

3

1

Given a list of production rules and start symbol of a proper finite context-free grammar (CFG) of printable characters, your program or function must output its formal language (the set of all sentences which can be constructed from it).

Task

A context-free grammar is formally defined as a 4-tuple of a few parts. Here, they are defined for you to solve this challenge.

For the purposes of the challenge, a production rule is of the form {begin, {end-1, end-2, end-3, ..., end-n}} or equivalent. The begin part is always one printable character (0x20 to 0x7E), and the end-i are all strings of printable characters.

Your program will receive as input a list of production rules and a start symbol. The start symbol is a single printable character.

An iteration takes an input list of strings and is defined as follows: To each string s in the input list, look at the first character c in it which is found as the begin of a production rule. To each end-i on the right side of that rule, add to the list the string s but where the first instance of c is replaced by the string end-i. Then remove the original string s if and only if there is a character c.

The program would repeat applying iterations, beginning with the start symbol, until an iteration does not affect the list of strings, at which point it outputs the list of strings (with repeats removed).

This is equivalent to the following pseudo-code:

rules = input()
start_symbol = input()

define iteration(string_list,rules):
  for each string in string_list:
    c = first char in string which is the first element of a rule in rules
    // ^^ This is a Mathematica builtin, right? ^^
    if(c exists):
      ends = the second element of that rule
      for each end in ends:
        // replace the first instance of c with end.
        string_list.append(string.replace(c,end))
      string_list.remove(string)
      // make sure you don't remove the original string if
      // there does not exist a c

  return string_list

current_string = start_symbol

// Clearly this is inefficient to apply iteration twice
// but hey, this is code golf. As long as it runs in less than 60 seconds on TIO!
while(iteration(current_string)!=current_string):
  current_string = iteration(current_string)

print remove_repeats(current_string)

Note that you do not need to follow this exact pseudo-code as long as your program yields correct output.

Example

Given the start symbol S and production rules:

{S:{aE,bE,c}}
{E:{ab,aC}}
{C:{a,b,c}}

We do:

{S} // start
{aE,bE,c} // apply first rule to the first element
{aab,aaC,bab,baC,c} // apply second rule to the first and second element
{aab,aaa,aab,aac,bab,baa,bab,bac,c}             // apply third rule to the second and fourth elements
{aab,aaa,aac,bab,baa,bac,c} // remove repeats

Input restrictions

This should only matter to people who care.

The CFG ...

  • has no recursion.
  • has no cycles.
  • has no unreachable symbols.
  • has no unproductive symbols.

The list of production rules ...

  • may have productions resulting in an empty string.
  • may be taken in any format, as long as it encodes only a begin character and a list of alternatives as end-i for each rule.
  • has no duplicate begin characters.
  • has no duplicate end characters within a production rule

Test Cases

start symbol, {rule1, rule2, ...}, expected output
S, {S: {aE,bE,c}, E: {ab,aC}, C: {a,b,c}}, {aab,aaa,aac,bab,baa,bac,c}
S, {S: {a,B}, B: {g,k}}, {a,g,k}
A, {A: {B,C,D}, B: {c,d}, C:{b,d}, D:{b,c}}, {c,b,d}
~, {~: {`,^,/}, /:{`^,``}, ^:{``,```}}, {`,``,```,````}
+, {A: {1,2,3}, +:{A,B,C}}, {1,2,3,B,C}

Notes

  • The input will be taken in any convenient format as long as it gives no information other than the production rules and start symbol.
  • The output should be given such that the list of strings as output is clearly encoded and unique.
  • Please specify what format your list of production rules are taken in.
  • This is , so shortest code in each language wins. Do not be discouraged from posting by answers in golfing languages.

fireflame241

Posted 2017-08-29T01:23:04.667

Reputation: 7 021

1

Stack exchange thought the title of this question implied subjectivity :P. Anyway, Sandbox Post.

– fireflame241 – 2017-08-29T01:24:14.263

Answers

4

JavaScript, 118 bytes

m=>g=c=>(i=0,c=[...new Set([].concat(...c.map(n=>[...n].find(x=>k=m[o=x])?k.map(t=>n.replace(o,t,i=1)):n)))],i?g(c):c)

f=

m=>g=c=>(i=0,c=[...new Set([].concat(...c.map(n=>[...n].find(x=>k=m[o=x])?k.map(t=>n.replace(o,t,i=1)):n)))],i?g(c):c)

print = x => console.log(JSON.stringify(x))
print(f({S:['a','B'],B:['g','k']})(['S']))
print(f({S:['aE', 'bE', 'c'], E:['ab', 'aC'], C:['a', 'b', 'c']})(['S']))
print(f({A:['B','C','D'],B:['c','d'],C:['b','d'],D:['b','c']})(['A']))
print(f({'~':['`','^','/'],'/':['`^','``'],'^':['``','```']})(['~']))
print(f({A:['1','2','3'], '+':['A','B','C']})(['+']))

tsh

Posted 2017-08-29T01:23:04.667

Reputation: 13 072

2

Python 2, 124 bytes

def f(s,r,i=0):
 while s[i:]:
	if s[i]in r:return list(set(s[:i]+l+s[i+1:]for v in r[s[i]]for l in f(v,r)))
	i+=1
 return[s]

Try it online!

Halvard Hummel

Posted 2017-08-29T01:23:04.667

Reputation: 3 131