12
4
Using you language of choice, write the shortest function/script/program you can that will identify the word with the highest number of unique letters in a text.
- Unique letters should include any distinct character using UTF-8 encoding.
- Upper and lower case versions of the same character are different and distinct;
'a' != 'A'
- Upper and lower case versions of the same character are different and distinct;
- Words are bound by any whitespace character.
- 'Letters' are any symbol which can be represented by a single unicode character.
- The text document must be read in by your code -- no preloading/hard-coding of the text allowed.
- The output should be the word, followed by the count of unique letters.
llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch - 18
- Any delimiter/spacing between the two values is up to you, so long as there is at least one character to separate them.
- In the event more than one word exists with the highest count, print all words for that count, with one new line delimiting.
superacknowledgement - 16 pseudolamellibranchiate - 16
- This is code golf, so shortest code wins.
This answer on English.SE inspired me to create this challenge. The example uses just a word list, but any text should be able to be processed.
1How are words separated? You say unique letters are any UTF-8 character, but that would imply that the entire file is just one word. – cardboard_box – 2013-02-05T14:54:46.307
@cardboard_box, my interpretation is that it's left flexible, so that you can decide to use code point 10 as a word separator, or code point 32, or treat any non-empty sequence of characters drawn from the 26 Unicode whitespace characters as a word separator. – Peter Taylor – 2013-02-05T15:15:23.100
@PeterTaylor Correct. I'll update the question to make note of the whitespace. – Gaffi – 2013-02-05T15:32:02.803
1How are you defining letters here? As I've just been and pointed out on one of the English.SE answers
LlanfairPG
is a Welsh word and contains letters from the Welsh alphabet -ll
andch
are both single letters in the Welsh language. – Gareth – 2013-02-05T16:12:32.447If the function is required to read the input, why require it be a (parameterless) function? It seems that a script would be more than adequate. – primo – 2013-02-05T16:19:50.657
1@Gareth I was not aware of that distinction, my mistake. Are there unicode representations of those two 'letters'? For the purposes of this challenge, each individual unicode character is a letter. – Gaffi – 2013-02-05T16:20:40.510
@Gaffi No, because they're easily represented as two characters from the english alphabet there's no point in having a separate Unicode character for them. I wasn't having a go - just wanted to be sure of what you meant by 'letter' if you mean Unicode character that's perfectly clear. – Gareth – 2013-02-05T16:22:36.950
@primo Fair enough. Any code implementation will work - script, function, complete program, what have you - so long as the code reads in the text. – Gaffi – 2013-02-05T16:23:33.410
@Gareth That's all it is, Uni char. :-) – Gaffi – 2013-02-05T16:24:42.383
Can you clarify "The text document must be read in by your code"? Can the input/document be a function parameter? Do you expect us to load a filestream? What about a prompt box in javascript? – Shmiddty – 2013-02-05T20:40:02.087
@Shmiddty All of the above are ok. Essentially, I meant for that rule to mean 'no cheating'. I intentionally left it vague, since I know some implementations will be smaller with reading STDIN vs. passing an argument vs. opening/scanning a text file. – Gaffi – 2013-02-05T21:08:44.133
One last question, do upper and lower-case characters count separately? – Shmiddty – 2013-02-05T21:27:05.790
Yes, they are different and distinct. Updating question... – Gaffi – 2013-02-05T21:33:48.533
1So
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!@#$%^&*()_+~\
<>/\?'";:{}[],.` is a valid "word"? – Shmiddty – 2013-02-05T21:44:15.3872Off-topic, but apparently there used to be single letters for LL and ll in Welsh. At least Unicode has U+1EFA and U+1EFB for those; "Middle-Welsh" it calls them. There is no titlecase Ll though. – Mr Lister – 2013-02-06T09:15:06.410
@Shmiddty Yep, that's valid. – Gaffi – 2013-02-06T12:36:19.477