16
4
In this challenge, your task is to make write a program with less than 300 characters that takes a short paragraph or a few sentences that a candidate has said and output who said it.
Input: Can be taken as a parameter to a function, input to a program etc. It will be a short paragraph, properly punctuated.
Output: The candidate that you think it is. This could be one of
Ben Carson (1)
Ted Cruz (2)
John Kasich (3)
Marco Rubio (4)
Donald Trump (5)
Hillary Clinton (6)
Bernie Sanders (7)
I've left off the names of people who have dropped out as of March 1st. You may output the name itself, or, more conveniently, the number that corresponds to the name.
Scoring: Your score is the percentage of test cases you get right. Highest score wins. Ties (or perfect scores) are broken by code length as in a code golf.
The test cases can be pulled from:
http://www.presidency.ucsb.edu/debates.php
Click on each debate, both Democrat and Republican that has happened so far (before March 1st). Every paragraph is a test case, unless the "paragraph" is less than 20 characters long.
Here is code that pulls out the test cases from a particular page:
var t = $(".tools").parentNode.querySelectorAll("p");
var categ = {}, cur = 0;
for (var i = 0; i < t.length; ++i) {
var p = t[i], str = p.innerText;
if (p.querySelector("b")) {
cur = p.querySelector("b").innerText.replace(':', '');
str = str.replace(/^.*?:\s/, '');
}
str = str.replace(/\[applause\]/g, '')
if (str.length < 20) continue;
if (categ[cur] == null) categ[cur] = [];
categ[cur].push(str);
}
You can then do categ.SANDERS
to get a list of all the paragraphs that Senator Sanders has said.
You can discard anything that isn't said by the candidates listed above (e.g. categ.BUSH
or categ.CHRISTIE
).
Here is the file with all the test cases: https://drive.google.com/file/d/0BxMn8--P71I-bDZBS2VZMDdmQ28/view?usp=sharing
The file is organized by candidate
CANDIDATE CANDIDATE_LAST_NAME
(empty line)
Series of statements. Each paragraph is separated by (NEW PARAGRAPH)-
(empty line)
CANDIDATE NEXT_CANDIDATE_LAST_NAME
(empty line)
etc.
An example partial submission would be:
if (/ win | wall | great | beautiful/.test(p)) return 5;
if (/ percent | top one | rigged /.test(p)) return 7;
// etc. for all candidates
or
var words = p.split(' ');
// majority of words have less than 5 characters
if (words.length - words.filter(a => a.length < 5).length < 4) evidence[5]++;
// at the end
return /* index with the most evidence */
Here is a place where you can test javascript solutions: https://jsfiddle.net/prankol57/abfuhxrh/
The code uses the parameter p
to represent the phrase to classify. Example code that scores around 20% (guessing would get around 11%):
if (/ rigged | top | percent | Wall Street /.test(p)) return 'Sanders';
return 'Trump';
Exactly what I'm asking: Write a program/function in less than 300 characters that takes as input a phrase that a candidate has said and returns as output which candidate said it. Your score is the percentage of test cases you get right. Highest score wins.
Yes, I know that a lot of lines have [laughter]
or [cheering]
in them. These will not be removed. At worst, they are extra information you can ignore; at best, they are extra information you can use (e.g. I made this up, but maybe people laughter is evidence that Marco Rubio is speaking). The test cases are as they appear in the text file.
1I have a suggestion. How about you make it code-golf, but you have to get all the quotes right? Also, you might want to make the quotes a lot shorter, as this is a little ridiculous to solve as-is. – Cyoce – 2016-03-02T03:07:09.760
@Cyoce The point of the challenge is you look for key words and phrases that are associated with the candidates. You aren't supposed to get all of them right. – soktinpk – 2016-03-02T03:07:48.267
2@Cyoce getting all the quotes right would be ridiculous (I think) considering the sheer number of quotes. – soktinpk – 2016-03-02T03:13:32.413
which is why I suggested reducing the quotes. Paired together, it would make this challenge a lot more manageable – Cyoce – 2016-03-02T03:15:10.050
@Cyoce The reasons I made this a code challenge is because a) if it is code golf, I have to limit the number of quotes, and I feel like the answer won't be interesting. In other words, the best answer will be some sort of hash that happens to work but isn't very general. b) The candidates have pretty distinct personalities and word choice so it can't be too hard. The point of the challenge is to figure out what kinds of things candidates say and apply them. – soktinpk – 2016-03-02T03:21:10.160
ok. I see what you're going for. I'll have to run a word-frequency count on the quotes to determine my approach – Cyoce – 2016-03-02T03:30:04.603
1Clever challenge idea, may need some refining though. Have you considered posting in Sandbox for some feedback? – Ashwin Gupta – 2016-03-02T06:19:47.113
1What is the winning criterion? (And why do you think that no-one will get a perfect score?) – Peter Taylor – 2016-03-02T11:00:32.870
@PeterTaylor The submission that classifies the most of the candidates' speech correctly wins. – user48538 – 2016-03-02T11:03:17.510
@PeterTaylor isn't [tag:test-battery] an objective winning criterion in itself? – plannapus – 2016-03-02T12:00:33.103
@soktinpk it would be nicer if you cleaned up the test cases (like one file for each candidate containing all test-cases clearly separated). Are elements between brackets such as
[laughter]
etc. part of the test-cases? – plannapus – 2016-03-02T12:06:45.2431@plannapus, I don't think so. It's like code-challenge: it tells you something about the winning criterion, but doesn't fully specify it. – Peter Taylor – 2016-03-02T13:29:27.277
@PeterTaylor fair enough, I see your point. – plannapus – 2016-03-02T14:12:11.597
@PeterTaylor I edited it, is it clearer now? Sorry for the delay. – soktinpk – 2016-03-06T00:33:34.423
Interesting challenge. Inspired by this perhaps? http://googleresearch.blogspot.co.uk/2016/02/on-personalities-of-dead-authors.html
– Dave – 2016-03-06T08:51:27.8372
The source data you provided is a little messy (difficult to parse automatically), which I think takes away some of the spirit of the challenge. I've made a cleaned-up version which uses one line per quote, with a blank line separating the next candiate name. This is much easier to parse in most languages. I've uploaded it here: https://drive.google.com/file/d/0B3uyVnkMpqbVSnVrZkVwTUhDODg (other than changing newlines, I've left the data untouched. That includes what looks like an encoding issue for –)
– Dave – 2016-03-07T19:09:35.527