4
My fifth-grade daughter is learning about codes and ciphers. She has reached the point of decoding a ciphertext that has been encoded with the Caesar cipher where the offset is not provided in advance.
Without computers this can be time-consuming, especially for long blocks of text. Luckily, we do have computers, and let's prove it.
Cipher definition and other bounds:
- Every letter is offset by a fixed (but unknown) number of characters, while retaining the same case. Spaces and punctuation marks are unaffected. Thus, with offset 1, "A"→"B" and "c"→"d" but "," stays as ",".
- Offset will be a positive integer.
- English plaintexts only, and no accent marks in the ciphertext or the plaintext.
Your task: Given a ciphertext, determine the most likely correct plaintext via brute force with a dictionary lookup. Thus, generate all 26 possible solutions, then check for the presence of every word in each potential decryption in an English dictionary and select the potential solution with the highest percentage of words that are found there.
The ciphertext should be the only input to your program, and the plaintext should be the only output.
For the sake of this challenge: garbage in, garbage out — given an undecryptable input, your code can return any output. And if multiple solutions tie for % found in the dictionary, you can return any of them.
Sample inputs:
"Fsyvoxmo kxn mywwsddoo woodsxqc."
"Gwbqs Sadwfs Ghfwysg Poqy, Ghof Kofg vog pssb bchvwbu pih o pzif ct wbobs waousg kwhv ob slqszzsbh gcibrhfoqy."
"Pda pnqpd sehh oap ukq bnaa. Xqp jkp qjpeh ep eo bejeodaz sepd ukq."
Your system should accept any string as input and attempt to decode it, and must be able to decode these examples correctly. I completed an ungolfed solution in Mathematica in about 700 characters. Standard rules apply; no hard-coded solutions, etc. Smallest number of bytes wins.
7Is the dictionary excluded from the byte count? You really should recommend a specific dictionary to be fair to all answers. Also, you might on some edge cases get different "most likely" texts depending on whether you're considering the presence/absence of a word in a dictionary, or the true frequency of that word in the English language. Is the test of validity of an answer simply its ability to correctly decode the three sample inputs? – Level River St – 2015-10-26T13:39:37.160
@steveverrill External dictionary of your choice (or if there's one built into your programming language, you can use it). I don't think the three examples I've given above are likely to be edge cases; all working code is going to give the same answers. – Michael Stern – 2015-10-26T13:41:38.573
You still haven't said if loading in the dictionary counts to the byte count. – Blue – 2015-10-26T13:44:16.837
@muddyfish loading the dictionary counts. – Michael Stern – 2015-10-26T13:46:18.500
Also, do you want to keep capitalization? – Blue – 2015-10-26T13:48:34.417
@muddyfish yes, keep capitalization and punctuation. – Michael Stern – 2015-10-26T13:49:49.127
For most inputs, e will be the most common letter. Therefore you won't even need a dictionary. – TheNumberOne – 2015-10-26T14:18:40.493
1@TheNumberOne Mfe esle laaczlns tdy'e lwhljd rztyr ez hzcv, td te? – r3mainer – 2015-10-26T14:26:38.733
Never mind, 3rd test case doesn't work :( – TheNumberOne – 2015-10-26T14:41:26.747
I'll just confirm, is it allowed to take the dictionary contents as input in addition to the text? Or do I have to read it from a file? – PurkkaKoodari – 2015-10-26T15:20:32.627
@Pietu1998 I'm adding the rule that dictionary contents should not be passed to the program. – Michael Stern – 2015-10-26T15:30:11.333