8
Write a program or function that given a string (or your language's equivalent), determine if the string is a word, or not, and output a truthy or falsy value.
(This is not a duplicate of Is this even a word? The incorrect words are generated in a very different way that I believe makes this a completely different challenge)
The words will all be lowercase, between 5 and 10 characters, and have no apostrophes.
The correct words are a randomly selected subset of the SCOWL English words list (size 50).
The incorrect words are generated via two methods: swapping and substitution.
The "swapping" words are generated using a modified Fisher-Yates shuffle on the letters of randomly selected (real) words. Instead of swapping the letters every time, a letter may or may not be swapped (the probability varies, so some words will be more realistic than others). If the new word matches an existing word, the result is discarded and it generates another word.
The "substitution" words are generated using a similar method, but instead of swapping the letter with another letter, each letter has a chance of being replaced with another random letter.
Each method is used to generate 50% of the fake words.
Scoring
Your function must be less than 150 bytes. The scoring is determined as follows:
percentage of answers correct + ((150 - length of program) / 10)
Rules
Since this deals with a large number of test cases (each wordlist is 1000 words), an automated testing program is fine. The automated tester does not count towards the length of the program; however, it should be posted so that others are able to test it.
- No loopholes.
- No spelling/dictionary related built-ins.
Resources
List of words: http://pastebin.com/Leb6rUvt
List of not words (updated): http://pastebin.com/rEyWdV7S
Other resources (SCOWL wordlist and the code used to generate the random words): https://www.dropbox.com/sh/46k13ekm0zvm19z/AAAFL25Z8ogLvXWTDmRwVdiGa?dl=0
2I know having a word list available would defeat the point of the challenge, but how is a program supposed to know that grits (404 in not-word list) isn't a word, considering it really really is one? – Geobits – 2016-02-17T00:44:34.063
2Don't get me wrong; I don't like grits at all. Why anyone would eat them is beyond me. But even I wouldn't go so far as to reject the word itself :P – Geobits – 2016-02-17T00:46:23.360
Fair warning: I haven't gone any farther than that on the list, so it's possible there are others. – Geobits – 2016-02-17T00:49:21.220
Few more word non-words, some slightly obscure:
quais,paves,colic,supermax. (Side note: I was delighted to find out thatsupermaxis an actual word) – Sp3000 – 2016-02-17T01:03:17.453What's weird is even running the list through SCOWL's largest list didn't catch any of those, though they definitely are words. I generated another 1000 words (they are in the other resources link), and if a not-word is actually a word, it will be replaced by the word on the alternate list that's at the same line number. In the meantime, I updated the pastebin thing to use the alternate words for the word not-words. – Daniel M. – 2016-02-17T01:34:32.347
krónaseems to be in the list, with an accent. Dittocrèche,clichéd. – Sp3000 – 2016-02-17T05:26:11.153@Sp3000 Looks like it digged into French, as
crècheandclichéare actual words ^^ – Katenkyo – 2016-02-17T08:33:26.990@Katenkyo I'm not doubting they're words, I just thought that "lowercase" meant a-z without accents :P (or do we have to handle accents too? @DanielM) – Sp3000 – 2016-02-17T08:35:58.253
@Sp3000 For the purpose of what is a real, lowercase word, lowercase letters with accents count. However, they are only a small percentage of the words, so they shouldn't have too big of an impact on the score. – Daniel M. – 2016-02-17T11:37:54.287
@Katenkyo They are French words, but they are also English words. – Daniel M. – 2016-02-17T11:40:25.457
@DanielM. Did they keep their accent even in English? I though they disappeared as in
TheatreorCafe. – Katenkyo – 2016-02-17T12:18:03.727@Katenkyo If I search google for "cliche definition" (without an accent), it pulls up the definition of "cliché" (with an accent). Same with crèche. Króna depends who you ask, though it's a bit more disputed. I'll leave it for now (it's not like half the words have accents). – Daniel M. – 2016-02-17T12:24:21.173