23
2
Your challenge is to write a program to translate (English) leetspeak/lolspeak/txtspk into normal English. Your program should read from standard input and output to standard output, unless your language does not support these.
You may use a file containing a list of words in the English language, separated by new lines. It should be called W
and will be located in the same directory as your program. (On GNU/Linux systems and possibly others, you can make W
a link to /usr/share/dict/words
) The list doesn't have to be all-lowercase, you can use it to determine whether words should have capitals.
This is based on a now-deleted question posted by Nikos M. which could be found here. This is not a duplicate as this original question was closed and did not receive any answers, because there was no winning criterion and the user was unwilling to put one in.
Scoring
The scoring is a bit complicated!
Your score is
(leet items + bonuses) * 10 / (code length)
Highest score wins.
Your program doesn't have to be and probably can't be perfect, but the more accurate it is, the more bonuses it gets!
Since $
can mean both s
and S
, you get a bonus of 5 points per leet item for deciding whether it should have a capital letter (i.e. capital letters at the start of sentences).
You get a further bonus of 5 points per leet item for implementing proper nouns (words which always have capitals) - the way this works is that you would look through the word list, make the output capitalised if only a capitalised version is present in the list, and if both versions are there, just guess.
If a character has two meanings (e.g. 1
can mean L
or I
), you get 20 points per leet item for only picking those translations of the item which make real English words - use the wordlist for this. If more than one translation of a leet item makes a real English word, you can arbitrarily pick one of the valid translations and still get the bonus.
List of Leet
These are the leet items which you may implement. You don't have to implement all of them, but the more you add, the more points you get.
You cannot ever score points by translating an item or character to itself. This rule overrides any mistakes I might have made in the list.
It's tempting to do a simple tr
or s/.../.../g
. The real challenge is to determine which of multiple meanings could and couldn't be correct, using the wordlist.
Leet Items (each of these adds 1 to leet items
in the formula)
$ -> s,S ( -> c,C 5 -> s,S @ -> a,A 4 -> a,A 3 -> e,E 7 -> t,T + -> t,T # -> h,H teh -> the 'd -> ed pwnd -> pwned pwnt -> pwned k,K -> OK kk -> OK 0[zero]-> o,O y,Y -> why 4 -> for txt -> text dafuq -> what the f**k /\,^ -> a,A \/ -> v,V d00d -> dude n00b -> newbie \/\/ -> w,W 8 -> b,B |_| -> u,U |-| -> h,H Я -> r,R j00 -> you joo -> you vv,VV -> w,W tomoz -> tomorrow |< -> k,K [),|) -> d,D <3 -> love >< -> x,X 10100111001 -> leet (binary representation of 1337) 2 -> to,too ur,UR -> your,you're (no need to correctly distinguish between the two) u,U -> you 8 -> -ate-,8 x,X -> -ks-,-cks- z,Z -> s,S 1 -> i,I,l,L ! -> i,I,! c,C -> see,C,sea b,B -> be,B,bee [accented letter] -> [non-accented form] (score 1 per accented letter supported) &,7 -> and,anned,ant (may be used in the middle of a word)
Harder "Leet": score 30 points for leet items
each
!!!1!!1-> !!!!!!! (translate 1's in a sequence of !'s into !'s) !!!one!-> !!!!! !eleven-> !!!
Examples
These are examples of what a program which implements all the leet characters above, and some of the bonuses, might be able to do:
Example sentence: |-|3 15 $|_|(# @ n00b
= He is such a newbie
Leet-based censorship: $#!+
= s**t
Extreme leet: \/\/ 1 |< 1 P 3 [) 1 A
= Wikipedia
-xor suffix: H4X0R
= hacker
More extreme leet: @1\/\/4Y5 p0$+ ur n3VV qu35710nz 1n teh $&80x
= Always post your new questions in the sandbox
Example Scoring
Bash, 10 characters, 3 items, no bonuses:
tr 137 let
This scores ( 1 * 3 ) * 10 / 10 = 3
.
Sorry I didn't catch this in the sandbox, but if you're multiplying the bonuses by 10 they are still worth a lot more than the words themselves. Is that your intention? – Martin Ender – 2014-05-30T09:26:50.950
@m.buettner It's to combat simply using
tr
ors/.../.../g
. Just translating things like that would make a boring challenge, so we need to reward better translations which use the wordlist – None – 2014-05-30T09:28:54.503Would a long series of regexps be permissible? I would love to see if it were possible (albeit hard) to do this even in a context-aware way in mostly regexps. (Or maybe a
sed
script.) – Isiah Meadows – 2014-05-30T18:19:20.957When I say a
sed
script, I mean more than a simples/.../.../g
, but a file that is parsed and executed bysed
itself. As terse as the language is, it might be a decent golfable language... – Isiah Meadows – 2014-05-30T18:20:22.743@impinball Regexes are absolutely fine, although I have no idea how you would open the wordlist and parse it with a regex language alone.
sed
scripts are also fine and could be very interesting, they could do very well on this due to the short substitution syntax, you might be able to read from the wordlist, either with GNU extensions or by usingsed
as part of a larger Bash program – None – 2014-05-30T18:25:32.543I would use GNU extensions and prefix the program with
#!/usr/bin/sed -f
as the top line to be runnable as a normal program. – Isiah Meadows – 2014-05-30T18:29:59.173@impinball The shebang won't count towards the total score as even without it you can easily run the program with
sed -f <FILENAME>
. Options passed to the interpreter normally count towards the score but since-f
just makessed
read the program from a file, it won't count TL;DR: Shebangs don't count towards code length; I think thatsed
or a combination ofsed
and Bash is the right tool for the job here. Maybeawk
as well, although I don't know it? and of course GolfScript might turn up – None – 2014-05-30T18:39:29.637I'm pretty sure that
pwned
itself isn't english yet :) – orion – 2014-06-01T18:47:02.747@orion It basically is – None – 2014-06-01T19:03:12.130
The first example above isn't parsed by the rules provided. There isn't a rule defining "(" as "c". Does it need to be added? – rdans – 2014-06-03T20:48:19.113
@professorfish can you please clarify the scoring. If these items were supported: "$(5@437+#" thats 9 leet items and also gain the capitalization bonus, does that work out as ((9 + 5) * 10 / len) or would it be ((9 + 45) * 10 / len)? You say above that the bonus is 5 per leet item so that would be 5 * 9 = 45 bonus points correct? Also, for the !!!11! bonus, is it 30 bonus points for each of the 3 cases or does it need all of "1", "one" and "eleven" together for the 30 bonus points? – rdans – 2014-06-05T19:45:55.293
@Ryan add 5 per item for the capitalisation, so
((9+45)*10/len)
. it's to penalise simple substitutions. as for the 30pt bonus, 30 points per case – None – 2014-06-05T20:51:37.060