36

Let's say I have a file containing a random bunch of bits and then I encrypt it using some modern algorithm (Blowfish, AES, or whatever). If someone captures the file and mounts a brute force attack on it, how will they know they've successfully decrypted it? It would be obvious if the original input was meaningful, but in this case the original input was just a bunch of random bytes. How would they know, if at all, that this was the CORRECT bunch of random bytes?

Rory McCune
  • 60,923
  • 14
  • 136
  • 217
M. Evans
  • 369
  • 3
  • 3
  • 13
    Your premise is extremely unrealistic. In practice it's almost never like that: the encrypted file almost always has some structure. Answers to the question you asked are likely to be misleading about the real world. Therefore, the best answer to this question may well be to un-ask the question and ask a different one. – D.W. Jan 19 '11 at 07:22
  • 5
    I think the question is legit instead. Take an encrypted binary file how does a brute-forcing software knows it is done decrypting it? I actually wonder why hiding documents in binaries that print them out isn't done more often, wouldn't that deceive the cracker? – Lorenzo Aug 20 '11 at 23:18
  • 2
    @L. De: A binary executable file still has quite some structure - almost all random files are not successfully executable. – Paŭlo Ebermann Aug 23 '11 at 12:50
  • @D.W. - I just looked at the Truecrypt file format: http://www.truecrypt.org/docs/?s=volume-format-specification it seems all fiels except the SALT are encrypted. – ripper234 Jan 14 '12 at 12:46

8 Answers8

30

They wouldn't. The problem you described is well-addressed in the Wikipedia article on Unicity Distance. That article also links to one by Bruce Schneier that may be more accessible.

PulpSpy
  • 2,204
  • 15
  • 19
12

Generally people do not encrypt random garbage. Assuming they did, a ciphertext only attack would be impossible.

Consider, however, that common cryptographic schemes add to the message non random data, such as padding data and message authentication codes. In some schemes, these can be used to check whether a guessed key is correct.

5

In general way they can't break the encryption (if you encrypt randomfile (for example secret key) or even compress file) but if sniffer have some other access to sender or receiver system they can do some other type of attack, and can break random file encryption. you read more there:

Rory Alsop
  • 61,367
  • 12
  • 115
  • 320
Am1rr3zA
  • 3,043
  • 4
  • 17
  • 14
  • 1
    a brute force can break encryption. The problem the OP raises is that if the encrypted data was random as cleartext you will not know when the brute force has worked successfully, as opposed to readable cleartext. – Rory Alsop Jan 18 '11 at 10:05
  • @Rory I understand it, maybe I can't explain clearly, I mean attacker for understanding that he/she break the encrypted message(random message) correctly need some extra info that he/she gain from Chosen-plaintext and .... – Am1rr3zA Jan 18 '11 at 12:17
3

If the cleartext just random then they won't know. However, most file types have some sort of recognizable structure. Passing blob A through algorithm B with key C will yield some output. If that output has structure to it then you've got a winner.

bahamat
  • 1,061
  • 7
  • 11
2

They don't. TrueCrypt takes advantage of this very fact to offer plausible deniability through 'hidden volumes': http://www.truecrypt.org/docs/?s=hidden-volume

Rushyo
  • 627
  • 1
  • 5
  • 13
  • Plausible deniability stems from the fact that encrypted data looks random, it doesn't change the data encrypted before encryption. – Hubert Kario Aug 23 '11 at 13:45
  • Which does not, in any way, affect anything I wrote? The fact is you don't know what the encrypted data is supposed to look like, so it's impossible to know what it's supposed to look like. Hidden volumes are a practical case of that concept being applied. – Rushyo Aug 24 '11 at 12:21
  • Plausible deniability doesn't use double encryption (encrypted volume inside encrypted volume) it puts two volumes side by side in the same file. That's why if you write too much data to the "for-cops" volume you can overwrite your "secret" volume. – Hubert Kario Aug 24 '11 at 12:49
  • ...and I never mentioned anything about double encryption. You're arguing a straw man. – Rushyo Aug 24 '11 at 12:58
  • 1
    Then you haven't answered the question OP asked. Plausible deniability doesn't decrypt the same data with different keys to get different results. It decrypts different data using different keys. So just because in one file you have two volumes doesn't make checking if the found key is correct any different, both volumes will have a highly organized data inside (the file system). It also doesn't change the way the brute-force (or dictionary) attack on the encryption is performed. – Hubert Kario Aug 24 '11 at 13:48
  • "If someone breaks encryption, how do they know they're successful?" "They don't." There, question answered. The TrueCrypt reference just added some flavour and context. The methodology of that and how it affects any particular method of breaking any encryption is irrelevant /mute – Rushyo Aug 25 '11 at 09:39
2

Your question appears twofold to me. First of all, a good encryption algorithm is indistinguishable from randomness. So, nobody will even distinguish your cipher text from random data. Furthermore a falsely decrypted cipher text will also look random. Thus, you will not be able to find the random data originally fed into the cipher with an exhaustive key search.

Secondly the point is that people usually don't feed random data into the cipher. Current encryption standards do not only cover the algorithm but also the data used. People usually pre- and suffix your data always with a certain padding and certain metadata (encryption schemes have a fixed lengh. you should somehow specify how long your text was). Knowing this standard bits will obviously make the decision between the right and the wrong key easier :) While padding serves mostly practical purposes, it may also enhance or even ensure security features (cf. chosen plain-text attack on textbook RSA).

freddyb
  • 521
  • 3
  • 9
1

If the file is unicode/ansi/etc, you can make some algorithm to parse something like the 200 first character of a file and see if there are more latin characters than other characters.

I remember I was quite annoyed when I tried the XOR brute force attack on some simple euler-project exercise, but it was easy and I just had to search for common english words.

I read somewhere that in encryption software, the implementation is very much important, sometimes more important than the alorithms. When I read that, I still wonder if one is talking about obvious subjects like pseudo random generators, or rather less obvious details like how to hide the format of the file you encrypt.

For example, if you encrypt a file, if it's a PNG file or GIF file, make sure to remove the magic number/string those file formats contains, and if it is a text file, don't use an ASCII table: use you own character table, for example just put all the latin character at 0, numbers at 245-255, and so on. you could also permutation, or rot13, or else.

Algorithms such as AES or Blowfish/TwoFish are "mathematically" secure, because it has been proven no attacks OTHER THAN BRUTEFORCE has been tested as efficient enough: you can only decipher the text by finding the actual key.

But those algorithms are only theorically efficient, you MUST implement them considering other practice factors such as file size, compression use, text encoding etc.

For example, know that it would be just plain stupid to store the filename in plain text next to your crypted file.

jokoon
  • 593
  • 1
  • 5
  • 8
  • Actually, modern symmetric ciphers (in secure modes of operation) should let the rest of the document secret even if the start is a known header. (This is known as a known-plaintext attack.) – Paŭlo Ebermann Aug 23 '11 at 13:02
  • 1
    Security of ciphers lies in the fact that no one (as in many people) have shown how to break them, the *only* algorithm that's mathematically secure is a one time pad. – Hubert Kario Aug 23 '11 at 13:48
0

Besides the theoretically ignorant approach by some "pros" in the field, there's a rather simple answer to your question: by comparing the "brute-force decryption results" with obvious stuff for categorization purposes.

Let me give you practical examples: text files will most probably contain "stop words" ( http://en.wikipedia.org/wiki/Stop_words ). Find more than one stop-word after decryption and you've most probably decoded some text successfully.

Media files like images and most other filetypes (audio, video, etc.) all have specific headers, making identification of the kind of data rather easy. Think you've decrypted a jpeg because you detected a look-alike header? Check the file format. Looks like it's correct? Then you've most probably successfully decrypted a jpeg media file.

I could provide books of examples... but I be you know what I mean by now, don't you? ;)

The rest is up to a little "human verification" so you know when to stop brute-forcing whatever you're trying to decrypt.

Is it practical? No. But it can be done when coded correctly and it does make the job easier. I've seen it in action several times; not only in a corporate environment.