Random Sentence Generator

8

Write the shortest program you can in any language that reads a context-free grammar from and the number of sentences to produce from stdin, and generates that many random sentences from the grammar.

Input

Input will come in the following format:

n <START>
{"<A>":["as<A>df","0<A>","<B><C>","A<A>", ...], "<B>":["1<C>1","\<<T>>",...], ...}

n is the number of sentences to generate. <START> is identifier of the starting nonterminal symbol.

The grammar is enclosed in the {} and is formatted as follows:

  • Rules are of the form "<S>":[productions]. <S> is the identifier of the nonterminal.
    • Rules are delimited with commas.
    • The right-hand side of a rule is a double-quoted string whose first and last characters are "<" and ">", respectively. The remaining character should be in [A-Z] (upper-case alpha).
  • productions is a comma-delimited list of double-quoted strings, representing productions. All characters, including whitespace, in the rule are terminal symbols, except those that are enclosed in angle brackets ("<" and ">"), which are non-terminal symbols and will be the left-hand side of another rule. An open angle bracket may be escaped, but there is no need to escape a close angle bracket.
    • Productions will not contain newlines or the newline escape sequence.

Output

You should print each generated sentence to stdout with a trailing newline.

Test Cases
5 sets of balanced parentheses:

5 <S>
{"<S>":["<S><S>", "(<S>)", ""]}

Example result:

(())()
()
()()()

(())(()())((((()))()()))

4 postfix arithmetic expressions (note whitespace within strings is significant, whitespace elsewhere is not):

4 <S>
{"<S>":["<N>", "<S> <S> <O>"], "<O>":["+","-","*","/"], "<N>":["<D><N>", "<D>"],
 "<D>":["1","2","3","4","5","6","7","8","9","0"]}

Example result:

1535235 76451 +
973812
312 734 99 3 + / *
1 1 1 1 1 + - * +

Hoa Long Tam

Posted 2011-02-10T10:15:56.090

Reputation: 1 902

2Can we have some sample input/output? (I know the exact output would be different in every case, but just for reference). – Wile E. Coyote – 2011-02-10T10:46:30.810

Good challenge, but I think the input format is more complicated than it could be. Also, seeing as how the input format appears to be based primarily on JSON, wouldn't that give JavaScript and languages with built-in JSON parsing an unfair advantage. Also, what does the backslash in \<<T>> indicate? – Joey Adams – 2011-02-11T03:00:26.357

1The backslash escapes the open bracket, so if T produced "1", that the pattern \<<T>> would produce \<1>, which would produce a <1> as the final output. Yes, languages with JSON support would have a slight advantage, (though the escaped angle brackets should throw a wrench in that), but it at least levels the playing field for those languages not named "Perl". – Hoa Long Tam – 2011-02-11T03:13:50.610

The rules and examples don't seem to be entirely consistent on the amount of whitespace in the input. – Peter Taylor – 2011-02-11T15:19:13.307

@Peter: Whitespace outside of strings is insignificant; whitespace within strings is. – Hoa Long Tam – 2011-02-11T15:34:47.837

Maybe I’m misunderstanding something here, but it seems to me that the second example should say "<N>":["<D><N>", "<D>"] instead of just "<N>":["<D><N>"], as otherwise it never terminates... – Timwi – 2011-03-25T01:21:03.917

@Timwi, You're absolutely right. Fixed. – Hoa Long Tam – 2011-03-25T01:51:45.993

Answers

4

I felt like doing a little JavaScript. Also, I cried a little on the inside when I wrote "document.write"

<body>
    <script type='text/javascript'>
    function foo(){
        t=document.getElementById('ta').value.split("\n");
        eval('p='+t[1]);
        t[0]=t[0].split(' ');
        while(t[0][0]--) {
            s=t[0][1]
            while(x=s.match(/<\w+>/)) {
                ps=s;
                s=s.replace(x,p[x][Math.floor(Math.random()*p[x].length)]);
            }
            document.write(s+"<br>");
        }
    }
    </script>
    <textarea id='ta' cols='80'></textarea>
    <button onclick="foo()">go</button>
</body>

Input:

10 <A>
{"<A>":["a<A>b","c<A>d","<B>"],"<B>":["e<C>e"],"<C>":["z","<A>","<B>"]}

Output:

ccaaceeeeezeeeeedbbdd
accccceeeezeeeedddddb
aecezedeb
eaezebe
ccccaacezedbbdddd
eeeaaaceecacezedbdeedbbbeee
acaecaeaaeacccceeeeeeeaeeezeeebeeeeeeeddddbebbebdebdb
aaceezeedbb
aacezedbb
ceeaceecacaacezedbbdbdeedbeed

Mitch

Posted 2011-02-10T10:15:56.090

Reputation: 156

I think you can shorten this a bit by writing d=document; and reusing the value afterwards. Also, you might want to provide the character count. – Element118 – 2016-02-09T07:26:32.210