Fifth Grade Math Problem for the Week: Automated Decryption of Caesar Ciphers

4

My fifth-grade daughter is learning about codes and ciphers. She has reached the point of decoding a ciphertext that has been encoded with the Caesar cipher where the offset is not provided in advance.

Without computers this can be time-consuming, especially for long blocks of text. Luckily, we do have computers, and let's prove it.

Cipher definition and other bounds:

  1. Every letter is offset by a fixed (but unknown) number of characters, while retaining the same case. Spaces and punctuation marks are unaffected. Thus, with offset 1, "A"→"B" and "c"→"d" but "," stays as ",".
  2. Offset will be a positive integer.
  3. English plaintexts only, and no accent marks in the ciphertext or the plaintext.

Your task: Given a ciphertext, determine the most likely correct plaintext via brute force with a dictionary lookup. Thus, generate all 26 possible solutions, then check for the presence of every word in each potential decryption in an English dictionary and select the potential solution with the highest percentage of words that are found there.

The ciphertext should be the only input to your program, and the plaintext should be the only output.

For the sake of this challenge: garbage in, garbage out — given an undecryptable input, your code can return any output. And if multiple solutions tie for % found in the dictionary, you can return any of them.

Sample inputs:

"Fsyvoxmo kxn mywwsddoo woodsxqc."

"Gwbqs Sadwfs Ghfwysg Poqy, Ghof Kofg vog pssb bchvwbu pih o pzif ct wbobs waousg kwhv ob slqszzsbh gcibrhfoqy."

"Pda pnqpd sehh oap ukq bnaa. Xqp jkp qjpeh ep eo bejeodaz sepd ukq."

Your system should accept any string as input and attempt to decode it, and must be able to decode these examples correctly. I completed an ungolfed solution in Mathematica in about 700 characters. Standard rules apply; no hard-coded solutions, etc. Smallest number of bytes wins.

Michael Stern

Posted 2015-10-26T13:20:31.060

Reputation: 3 029

7Is the dictionary excluded from the byte count? You really should recommend a specific dictionary to be fair to all answers. Also, you might on some edge cases get different "most likely" texts depending on whether you're considering the presence/absence of a word in a dictionary, or the true frequency of that word in the English language. Is the test of validity of an answer simply its ability to correctly decode the three sample inputs? – Level River St – 2015-10-26T13:39:37.160

@steveverrill External dictionary of your choice (or if there's one built into your programming language, you can use it). I don't think the three examples I've given above are likely to be edge cases; all working code is going to give the same answers. – Michael Stern – 2015-10-26T13:41:38.573

You still haven't said if loading in the dictionary counts to the byte count. – Blue – 2015-10-26T13:44:16.837

@muddyfish loading the dictionary counts. – Michael Stern – 2015-10-26T13:46:18.500

Also, do you want to keep capitalization? – Blue – 2015-10-26T13:48:34.417

@muddyfish yes, keep capitalization and punctuation. – Michael Stern – 2015-10-26T13:49:49.127

For most inputs, e will be the most common letter. Therefore you won't even need a dictionary. – TheNumberOne – 2015-10-26T14:18:40.493

1@TheNumberOne Mfe esle laaczlns tdy'e lwhljd rztyr ez hzcv, td te? – r3mainer – 2015-10-26T14:26:38.733

Never mind, 3rd test case doesn't work :( – TheNumberOne – 2015-10-26T14:41:26.747

I'll just confirm, is it allowed to take the dictionary contents as input in addition to the text? Or do I have to read it from a file? – PurkkaKoodari – 2015-10-26T15:20:32.627

@Pietu1998 I'm adding the rule that dictionary contents should not be passed to the program. – Michael Stern – 2015-10-26T15:30:11.333

Answers

1

Python 2, 384 319

a=raw_input()
e=lambda v,b:(ord(v)+b-65-32*(v>"``"))%26+65+32*(v>"``")
c=lambda b:''.join(chr(e(f,b))if f.isalpha() else f for f in a)
p=map(c,range(26))
w=map(str.strip,open('w'))
l=[0]*26
for k,i in enumerate(p):
    for j in i.split():
        if j.lower() in w:l[k]+=1
m=[i for i,j in enumerate(l)if j==max(l)][0]
print p[m]

I named my wordlist w, which is a newline-separated list of lowercase words.

TheDoctor

Posted 2015-10-26T13:20:31.060

Reputation: 7 793

1

Pyth, 39 34 31 bytes

h.Msm}@Grd0'\wcZd.u.r.rNrG1G25z

Solves all test cases. Expects the newline-separated lowercase word list to be in the file w.

You can also try it online. The online version expects input on the first line and the dictionary words in lowercase on the next lines. Note that trying to send a large dictionary to the server may not be a good idea.

The algorithm tries to maximize the amount of correct words in the text, so if there are impossible-to-decipher words they are just ignored. In case of completely impossible input just outputs it back.

Explanation

  • z is preinitialized to input.
  • .u ... 25z repeats the following process 25 times, starting with the input, and returns all 26 results:
    • .rNrG1 shifts uppercase letters (rG1) by one
    • .r.rNrG1G shifts all letters by one
  • .M finds the ones from said results that produce the maximum result for:
    • cZd splits the current string by spaces
    • sm counts the occurrences where:
      • rd0 lowercases the word
      • @G removes non-alphabetic letters
      • } ... '\w checks whether the word is in the file w

PurkkaKoodari

Posted 2015-10-26T13:20:31.060

Reputation: 16 699

How does this handle upper case letters? – Michael Stern – 2015-10-26T15:04:43.137

@MichaelStern It correctly converts them when generating possible results and then converts to lowercase when verifying. – PurkkaKoodari – 2015-10-26T15:05:47.797

Seriously? dictionary words in lowercase on the next lines – TheDoctor – 2015-10-26T15:13:19.080

1@TheDoctor That's 1 byte shorter than using a file and the way to load the dictionary wasn't specified. – PurkkaKoodari – 2015-10-26T15:17:25.893

I have clarified the rules, disallowing passing the dictionary as an argument. Ingenious, but outside the scope of the problem. – Michael Stern – 2015-10-26T15:31:30.633

@MichaelStern I have changed it to use a file. – PurkkaKoodari – 2015-10-26T15:33:32.570