Books full of nonsense: Identify limericks

15

2

As we all know, limericks are short, five-line, occasionally-lewd poems with an AABBA rhyming scheme and an anapestic meter (whatever that is):

Writing a Limerick's absurd
Line one and line five rhyme in word
And just as you've reckoned
They rhyme with the second
The fourth line must rhyme with the third

You are tasked to write the shortest program that, when fed an input text, prints whether it thinks that the input is a valid limerick. Input can either be on the command line or through standard input, at your option, and output could either be a simple "Y"/"N" or a confidence score, again at your option.

Here's another example of a correct limerick:

There was a Young Lady whose eyes
Were unique as to colour and size
When she opened them wide
People all turned aside
And started away in surprise

But the poem below is clearly not a limerick, since it doesn't rhyme:

There was an old man of St. Bees
Who was stung in the arm by a wasp.
When asked, "Does it hurt?"
He replied, "No, it doesn't,
I'm so glad that it wasn't a hornet."

Nor is this one, as the meter is all wrong:

I heard of a man from Berlin
Who hated the room he was in
When I asked as to why
He would say with a sigh:
"Well, you see, last night there were a couple of hoodlums around who were celebrating the Bears winning the darned World Cup, and they were really loud so I couldn't sleep because of the din."

Clues

Here are some of the clues you could use to decide whether or not your input is a limerick:

  • Limericks are always five lines long.
  • Lines 1, 2 and 5 should rhyme.
  • Lines 3 and 4 should rhyme.
  • Lines 1, 2 and 5 have around 3x3=9 syllables, while the third and fourth have 2x3=6 syllables

Note that none of these except the first are hard-and-fast: a 100% correctness rating is impossible.

Rules

  • Your entry should at the very least correctly categorize examples 1 through 3 in a deterministic fashion.

  • You are allowed to use any programming language you would like, except of course programming languages specifically designed for this contest (see here).

  • You are not allowed to use any library except your programming language's standard offerings.

  • You are allowed to assume that this file, the CMU Sphinx pronounciation dictionary, is in a file called 'c' in the current directory.

  • You are not allowed to hard-code for the test inputs: your program should be a general limerick categorizer.

  • You are allowed to assume that the input is ASCII, without any special formatting (like in the examples), but your program should not be confused by interpunction.

Bonuses

The following bonuses are available:

  • Your program outputs its result as a limerick? Subtract 150 characters length bonus!
  • Your program also correctly identifies sonnets? Subtract 150 characters extra length bonus!
  • Your program outputs its result as a sonnet when used on a sonnet? Subtract 100 characters additional extra length bonus!

Finally...

Remember to mention which bonuses you think you deserve, if any, and subtract the bonus from your number of characters to arrive at your score. This is a code golf contest: the shortest entry (i.e. the entry with the lowest score) wins.

If you need more (positive) test data, check out the OEDILF or the Book of Nonsense. Negative test data should be easy to construct.

Good luck!

Wander Nauta

Posted 2014-03-22T14:22:01.990

Reputation: 3 039

This should be a code-challenge because of the bonuses. Please read the tag descriptions – user80551 – 2014-03-22T15:35:24.340

2

@user80551 Consensus on meta appears to be otherwise.

– Doorknob – 2014-03-22T15:59:52.650

I've clarified the nature of the bonuses, I hope that clears up the confusion. – Wander Nauta – 2014-03-22T16:01:04.990

2Goooooooo Bears! – alvonellos – 2014-03-22T23:17:59.720

I don't understand the bonuses. How am I supposed to output "Y" in the form of a limerick? – r3mainer – 2014-03-24T10:13:31.907

@squeamishossifrage It doesn't have to be a literal "Y", just some way for us to tell what your program's decision was. As for the bonus: you get the bonus if the way you choose to display your program's decision is a valid limerick (in both the yes and no cases). – Wander Nauta – 2014-03-24T12:16:01.607

Well, this is boring. I'm starting a bounty. – Mathieu Rodic – 2014-03-25T10:24:56.580

Seems like your entry is hard to beat, Mathieu! I'll double the bounty. – Wander Nauta – 2014-03-25T10:53:52.780

Oops, looks like I can't do that. Sorry. – Wander Nauta – 2014-03-25T10:55:33.023

@WanderNauta: well there's always these Perl answers which are much shorter than any other one... so maybe there's something lying there – Mathieu Rodic – 2014-03-25T21:40:19.830

@MathieuRodic As I said in the Sandbox, I'm secretly hoping for someone who's insane enough to throw GolfScript against this question - but I guess Perl would work as well. – Wander Nauta – 2014-03-25T21:42:07.707

Answers

8

Python: 400 - 150 - 150 = 100

The shortest script I could come up with is that one...

import re,sys;f,e,c=re.findall,lambda l,w:f('^'+w.upper()+'  (.+)',l),lambda*v:all([a[i]==a[v[0]]for i in v]);a=[sum([[e(l,w)[0].split()for l in open('c')if e(l,w)][0]for w in f(r'\w+',v)],[])[-2:]for v in sys.stdin];n=len(a);print n==14and c(0,3,4,7)*c(1,2,5,6)*c(8,11)*c(9,12)*c(10,13)*"Sonnet"or"For a critic\nOf limerick\nWell-equipped\nIs this script.\n%s limerick!"%(n==5and c(0,1,4)and c(2,3))

...but don't even try it. It parses the provided dictionary for every word it meets, thus being very slow. Also, an error is generated whenever a word is not in the dictionary.

The code still meets the requirements though: recognizing whether the text passed via stdin is a limerick, a sonnet, or neither of those.

With only 20 more characters, here is the optimized version:

import re,sys;f,e,c=re.findall,lambda l:f(r'^(\w+)  (.+)',l),lambda*v:all([a[i]==a[v[0]]for i in v]);d={e(l)[0][0]:e(l)[0][1].split()for l in open('c')if e(l)};a=[sum([d.get(w.upper(),[])for w in f(r'\w+',v)],[])[-2:]for v in sys.stdin];n=len(a);print n==14and c(0,3,4,7)*c(1,2,5,6)*c(8,11)*c(9,12)*c(10,13)*"Sonnet"or"For a critic\nOf limerick\nWell-equipped\nIs this script.\n%s limerick!"%(n==5and c(0,1,4)and c(2,3))

Features

  • able to recognize sonnets (-150)
  • answers to limericks with a limerick (-150)
  • relatively fast: only one file parsing per execution

Usage

cat poem.txt | python poem-check.py

3 different outputs are possible:

  • a limmerick saying the input is one if it is the case
  • a limmerick saying the input is not one if it is the case
  • "Sonnet" if the input is recognized as such

Expanded code with explanations

import re, sys

# just a shortened version of the 're.findall' function...
f = re.findall
# function used to parse a line of the dictionary
e = lambda l:f(r'^(\w+)  (.+)', l)

# create a cache of the dictionary, where each word is associated with the list of phonemes it contains
d = {e(l)[0][0]:e(l)[0][1].split(' ') for l in open('c') if e(l)}

# for each verse (line) 'v' found in the input 'sys.stdin', create a list of the phoneme it contains;
# the result array 'a' contains a list, each item of it corresponding to the last two phonemes of a verse
a = [sum([d.get(w.upper(), []) for w in f(r'\w+',v)],[])[-2:] for v in sys.stdin]

# let's store the length of 'a' in 'n'; it is actually the number of verses in the input
n = len(a)
# function used to compare the rhymes of the lines which indexes are passed as arguments
c = lambda*v:all([a[i] == a[v[0]] for i in v])

# test if the input is a sonnet, aka: it has 14 verses, verses 0, 3, 4 and 7 rhyme together, verses 1, 2, 5 and 6 rhyme together, verses 8 and 11 rhyme together, verses 9 and 12 rhyme together, verses 10 and 13 rhyme together
if n==14 and c(0,3,4,7) and c(1,2,5,6) and c(8,11) and c(9,12) and c(10,13):
    print("Sonnet")
else:
    # test if the input is a limerick, aka: it has 5 verses, verses 0, 1 and 4 rhyme together, verses 2 and 3 rhyme together
    is_limerick = n==5 and c(0,1,4) and c(2,3)
    print("For critics\nOf limericks,\nWell-equipped\nIs this script.\n%s limerick!", is_limmerick)

Mathieu Rodic

Posted 2014-03-22T14:22:01.990

Reputation: 1 170

Looks cool! I haven't tested it yet, but are you sure this takes input "either on the command line or through standard input" (see question)? If not, you should add that (probably a sys.stdin.read() or an open(sys.argv[1]).read() somewhere) and recount. – Wander Nauta – 2014-03-22T21:15:39.713

Okay! Corrected it :) – Mathieu Rodic – 2014-03-22T21:26:49.840

How does the algorithm check for rhymes? – DavidC – 2014-03-23T00:29:50.743

With the help of the file provided by Wander Nauta in the question! It really helped. – Mathieu Rodic – 2014-03-23T00:37:50.473

@DavidCarraher: I added some explanation to the code, and made it a bit more readable... – Mathieu Rodic – 2014-03-23T12:16:36.520

Fine. Even better if you add examples of its output. – DavidC – 2014-03-23T13:04:13.810

Clever! I'm not sure if I can award the output-is-a-limerick bonus though, as it only outputs a limerick if the input is a limerick and just plain False otherwise. – Wander Nauta – 2014-03-23T14:02:19.947

Well... I tried to write two limericks and one sonnet in less than 250 characters, but it seems a tidbit out of reach. Would a haiku do? – Mathieu Rodic – 2014-03-23T14:30:09.690

Nope, has to be limericks. Maybe you could use some kind of compression? Dictionary-based, perhaps? – Wander Nauta – 2014-03-23T15:28:57.913

Dictionary-based compression? Already considered, tried, and failed. The dictionary contains 115824 entries, so most indexes are as long as the original word. Neither will lz or bz2 give satisfying results... – Mathieu Rodic – 2014-03-23T17:28:45.117

Then I'm afraid you can't have the bonus - sorry... – Wander Nauta – 2014-03-23T17:37:12.327

I edited the post... it now displays the 'nay' as a limerick too. – Mathieu Rodic – 2014-03-23T17:41:06.963

1Neat! A shame I can't upvote you twice. – Wander Nauta – 2014-03-24T12:13:58.030

2

ECMAScript 6 (138 points; try in Firefox):

288 - 150 points bonus for including limerick (pinched from @MathieuRodic).

a=i.split(d=/\r?\n/).map(x=>x.split(' '));b=/^\W?(\w+) .*? (\w+\d( [A-Z]+)*)$/;c.split('\r\n').map(x=>b.test(x)&&eval(x.replace(b,'d["$1"]="$2"')));e=f=>d[a[f][a[f].length-1]];alert('For critics\nOf limericks,\nWell-equipped\nIs this script.\n'+(a[4]&&e(0)==e(1)&e(0)==e(4))+' limerick!')

Notes:

Expects the variable c to contain the contents of the dictionary file, as you can't read files in plain ECMAScript.

ECMAScript doesn't have standard input, but prompt is generally considered "standard input"; however, as prompt converts line breaks to spaces in most (if not all) browsers, I'm accepting input from the variable i.

Ungolfed code:

// If you paste a string with multiple lines into a `prompt`, the browser replaces each line break with a space, for some reason.
//input = prompt();

// Split into lines, with each line split into words
lines = input.split('\n').map(x => x.split(' '));

dictionaryEntryRegEx = /^\W?(\w+) .*? (\w+\d( [A-Z]+)*)$/;
dictionary = {};
// Split it into
c.split(/\r?\n/).map(x => dictionaryEntryRegEx && eval(x.replace(dictionaryEntryRegEx, 'dictionary["$1"] = "$2"')));

// Get the last word in the line
getLastWordOfLine = (lineNumber) => dictionary[line[lineNumber][line[lineNumber].length - 1]]

alert('For critics\nOf limericks,\nWell-equipped\nIs this script.\n' + (lines[4] && getLastWordOfLine(0) === getLastWordOfLine(1) && getLastWordOfLine(0) === getLastWordOfLine(4)) + ' limerick!');

Toothbrush

Posted 2014-03-22T14:22:01.990

Reputation: 3 197

Neat! This doesn't take 'input on the command line or through standard input', though, which is required by the question. Maybe you could rewrite it to use Node.js or something. – Wander Nauta – 2014-04-01T10:22:33.467

@WanderNauta Thank you. Please see the latest edit, as I explain why I'm not using the standard input. – Toothbrush – 2014-04-01T10:25:59.840