3

A recent discussion regarding ways to keep data secure through multiple iterations of a program's execution (with repeated read/write operations) raised a question regarding known-plaintext attacks that I'd like to have a solid answer to.

If I have multiple ciphertexts, and I know that they were all produced from the same plaintext using the same (unknown) encryption method but (unknown) different keys, can I work out what algorithm and/or key was used?

The argument was that if the encryption method is any good, there should be as little correlation between the encrypted payload and the message content as possible, so even with multiple copies, it should be mostly indistinguishable from random data. My counter-argument is that any real-world algorithm is unlikely to be 100% perfect, so you're probably not going to have the ideal 50/50 chance for a 1 or 0 in any given bit, and with enough ciphertexts that you know are from the same plaintext, you'll start to see some correlations between them that you can use to deduce things about how the encryption works and/or what content is hiding behind it.

Assuming that

  • the algorithm is truly encryption, not hashing; that is, it is two-way and intended to be decrypted at some point
  • the system generates encryption keys in a sufficiently random manner to be essentially unpredictable
  • The attacker has no access to the encryption keys, only the encrypted message
  • attacker can get as many different versions of the message as they like, each encrypted with a different key

is it possible to work out how the encryption works and thereby reverse it to get the original plaintext?

anaximander
  • 1,531
  • 1
  • 10
  • 14
  • possible duplicate check out http://security.stackexchange.com/questions/3989/how-to-determine-what-type-of-encoding-encryption-has-been-used – micb Aug 27 '13 at 15:06
  • I think this is sufficiently different. That question is asking how to determine the algorithm based on one output; I'm asking whether having multiple outputs makes it any easier. – anaximander Aug 27 '13 at 16:20

2 Answers2

2

If the encryption system is any good, even noticing that two messages encrypted with distinct keys are, in fact, identical, would be considered a "break". An encryption algorithm has a "security level" expressed in bits; when the security level is k, then this means that the computational effort to enact a "break" should be on the amount of 2k-1 invocation of the underlying encryption function (to be precise, the success probability should be e*2-k where e is the effort, so e = 2k-1 implies a 50% success probability for the attacker).

For instance, if using AES with a 128-bit key, then you are supposed to achieve a "128-bit" security level. This means that if the attacker is ready to invest enough CPU power to compute, say, 270 invocations of AES, then he has probability about 2-58, aka "1 in 288230376151711744", to succeed. 270 AES invocations is a substantial effort; a good multi-core CPU can produce about 230 AES invocations per second, so we are talking about ten thousand big PC running for... three years. And this achieves a success rate considerably lower than winning millions of dollars at the lottery.

What counts as a break is anything which makes the system depart from its theoretical model, i.e. a pseudorandom permutation: using the encryption with a given key is akin to selecting a permutation on the space of messages, and a new key should mean a new random selection, independently of the previous. Noticing that two encrypted messages, with two distinct keys, proceed from the same plaintext message, would be a break, and the best attack method ought to be the generic one, i.e. trying all possible keys until a match is found. This method yields the probabilities explained above, i.e. "does not realistically work".

Right now, no such breaking method is known for AES.


Beware of outdated thinking. You appear to use notions of "correlation" and "unknown method" which were adequate about fifty years ago, but cryptographic research has gone a bit forward since that time. In particular:

  • We are now very strict about acceptable information leaks (your "correlations"), namely we accept none. And yet we have candidate algorithms (e.g. AES) which seem to achieve this high level of quality.

  • Encryption methods have been split into an algorithm and a key. The "algorithm" is what becomes code. The algorithm is supposed to be public, because all the secrecy is concentrated in the key; at least, there should be no harm into publishing the algorithm, if it is any good, and it would be very hard to prevent such publication anyway. See this answer for details.


Of course, if the algorithm was used sloppily, then anything goes. A core algorithm like AES is only an elementary brick; it processes 16-byte blocks. To encrypt a message (a possibly long sequence of bytes), you have to use the block cipher with a mode of operation, which is a matter of subtlety and can be done poorly -- sufficiently poorly to imply a total security loss.

There can also be side-channel attacks exploiting the information leaks through the behaviour of the encryption implementation, e.g. through timing. For instance, many AES implementations use in-memory tables, which implies data-dependent and key-dependent memory accesses, thus cache behaviour; in lab conditions, this has sometimes been exploited to extract the key. This is not an attack against the algorithm but against its implementation. Bottom-line is that implementation of cryptographic algorithms is a tricky task.

All of the above is about symmetric encryption, but the same kind of reasoning can be applied to asymmetric encryption, e.g. with RSA. A 2048-bit RSA key, properly generated and used, ought to offer a security level of about 112 bits, which is more than enough.

Thomas Pornin
  • 320,799
  • 57
  • 780
  • 949
  • I think perhaps you missed the gist of the question... *"even noticing that two messages encrypted with distinct keys are, in fact, identical, would be considered a "break"."* You haven't *noticed* that the plaintext is identical; the idea is that it's known ahead of time that the same plaintext was used to produce the messages you're seeing. It's not quite a known-plaintext attack - you know that the same plaintext was used each time, but you don't know what that plaintext was. – anaximander Aug 27 '13 at 16:18
  • "Noticing" means a _distinguisher_. The "thought experiment" is about making the difference between the situation with two identical messages, and the situation with two distinct messages. If you can _recover_ the message in the first case, then you can also distinguish the two cases from each other, by simply applying the recovery method and see if that works. In that sense, "noticing" the difference is a weaker problem, so bounding the success rate of the attacker for "noticing" implies _at least_ the same bound for the "recovery". – Thomas Pornin Aug 27 '13 at 16:47
  • The attacker can't tell from from the encrypted messages that the plaintexts match; they know that from their knowledge of the system. If you handed the messages alone to someone with no context, they (probably) wouldn't be able to tell. I get what you're saying about distinguishing the two cases by applying the recovery method; my question is whether you can work out that recovery method in the first place. It seems circular to say you can work out a recovery method by analysing messages with matching plaintexts, and you can spot matching plaintexts by testing with the recovery method. – anaximander Aug 27 '13 at 21:06
  • 1
    The point of the argument is that if recovery is feasible, then so is distinguishing (it suffices to temporarily assume that the messages are identical, "recover" them, and then see if that matches that which was obtained). This implies that if distinguishing is _not_ feasible, then neither is recovery. – Thomas Pornin Aug 27 '13 at 21:54
0

The simple answer is no. Strong cryptography is really strong if used properly. For block ciphers a lot will depend on the mode of operation. The mode controls what inputs are used for each block. Encryption is deterministic so given the same input it will always produce the same output however with clever use of random IV and block chaining it becomes possible to produce indistinguishable messages.

https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation

An early and simplistic mode was ECB and it generally isn't used anymore because messages are distinguishable. Encrypting "hello world" for a given key will always produce the exact same output. However even with ECB given an arbitrary number of messages plaintext and cipher text the attacker cannot derive the key faster than brute force.

Later and stronger block modes make the message even less likely to be distinguishable. Now the key is "used properly". The IV should not be repeated for a given key and plaintext. If it repeated in some block modes this only results in the message being distinguishable but in CTR for example reusing the same IV with the same key will result in a compromise of the system.

The simple answer is the cryptographic primitives of modern cryptography are well understood, peer reviewed, and subject to extensive cryptanalysis. It is highly unlikely they will be compromised overnight. On the other hand flawed implementation can result in failed systems.

Gerald Davis
  • 2,250
  • 16
  • 17