15
1
At http://shakespeare.mit.edu/ you can find the full text of each of Shakespeare's plays on one page (e.g. Hamlet).
Write a script that takes in the url of a play from stdin, such as http://shakespeare.mit.edu/hamlet/full.html, and outputs the number of text characters each play character spoke to stdout, sorted according to who spoke the most.
The play/scene/act titles obviously do not count as dialogue, nor do the character names. Italicized text and [square bracketed text] are not actual dialogue, they should not be counted. Spaces and other punctuation within dialogue should be counted.
(The format for the plays looks very consistent though I have not looked at them all. Tell me if I've overlooked anything. Your script does not have to work for the poems.)
Example
Here is a simulated section from Much Ado About Nothing to show what I expect for output:
More Ado About Nothing
Scene 0.
Messenger
I will.
BEATRICE
Do.
LEONATO
You will never.
BEATRICE
No.
Expected output:
LEONATO 15
Messenger 7
BEATRICE 6
Scoring
This is code golf. The smallest program in bytes will win.
8What if someone did this Shakespeare challenge in Shakespeare? It would be amazing if that was even possible... – fuandon – 2014-07-25T20:37:25.063
Can we assume we have a list of the characters in the play? Or must we infer the characters from the text? The latter is very difficult given that some characters (e.g. Messenger) have a mix of upper and lower case letters. Others have names with only upper case letters (e.g. LEONATO); and some of those are compound names. – DavidC – 2014-07-25T21:54:32.760
Yes you should infer the names. They are formatted very differently than the dialogue so given the html differentiating them shouldn't be too tricky. – Calvin's Hobbies – 2014-07-25T22:40:02.877
Yes, perhaps if one works directly with the HTML... – DavidC – 2014-07-25T22:54:14.387
1Should 'All' be considered as a separate character? – es1024 – 2014-07-25T23:09:59.710
@DavidCarraher Well I'm not sure why you wouldn't use the html unless you couldn't... I'll ignore the names rule iff you are using the Shakespeare language. – Calvin's Hobbies – 2014-07-25T23:58:29.277
1@es1024 Yes. Any play character with a unique title is considered separate, even if the result does not exactly make sense. – Calvin's Hobbies – 2014-07-26T00:01:44.387
Doesn't really qualify, but playing with javascript just for fun -> FIDDLE
– adeneo – 2014-07-26T00:53:29.503