How do poor-quality initialization vectors affect the security of CBC mode?

Question

(If the consensus is that this question belongs on crypto, rather than here, please feel free to [tell me to] migrate it.)

From what I have read (looking specifically at AES in Cipher Block Chaining mode), initialization vectors should be non-repeating, or better, under some circumstances at least, totally unpredictable. If we consider the following sequence of "weakening" IVs:

Cryptographically sound random number
Any old "random number"
A non-repeating, monotonically increasing, non-continuous counter (such as a high-resolution clock)
A 1-by-1 counter, large enough not to repeat in, say 10 times the expected usefulness of the protected data.
A constant IV
An all-zeros IV

Now, as we weaken the IV, what attacks become possible, and at what stage in the weakening? I am particularly interested in storing data "at rest", and for the moment, without authentication.

After two excellent answers, I'd like to refine the question a bit with regards to possible attacks. Here are some more pieces of my puzzle, and a supplementary question:

The encrypted data is a credit card number.
I have, let's say, client records, with each client's card or cards associated with that client's record.
My principal, final objective is that I don't leak credit card numbers in the clear.

Now, my supplementary question is this: as far as I can see, all an attacker can do with a constant IV and key is to say "Aha, Client A and Client B use the same credit card"; just how much damage can be done with that?

score 11 · Accepted Answer · edited Oct 07 '21 at 06:47

There are two distinct "dangers" with CBC. Remember that CBC works the following way: to encrypt a block, first XOR it with the previous encrypted block. The IV is just the "previous encrypted block" for the very first block to encrypt. The idea is that a block cipher is a deterministic permutation: with the same key and the same input block, you get the same output. The XOR with the previous encrypted block is meant as a "randomization". So the dangers are:

Block collisions.
Chosen-plaintext attacks.

Block collisions are when, through bad luck or lack of randomness, the XOR of a block with the previous block leads to a value which was already obtained beforehand.

For instance, if you use a fixed IV (all-zero or not, it does not matter), then two messages which begin with the same sequence of bytes will yield two encrypted streams which also begin with the same sequence of bytes. This allows outsiders ("attackers") to see that the two files were identical up to some point, which can be pinpointed with block granularity. This is considered a bad thing; encryption is supposed to prevent such kinds of leaks.

If using a counter as IV, you may still have such collisions, because counters have structure, and "normal" data also has structure. As an extreme case, suppose that the encrypted message also begins with a counter (e.g. it is part of a protocol in which messages have a header which begins with a sequence number): the counter-for-IV and that counter may cancel each other with the XOR, leading you back to the fixed-IV situation. This is bad. We really prefer it when encryption systems provide confidentiality without requiring some complex requirements on the plaintext format. A high-res clock as "counter" could also incur the same issue.

Chosen-plaintext attacks are when the attacker can choose part of the data that is to be encrypted. With CBC, if the attacker can predict the IV, then he can adjust his plaintext data to match it.

This is the basis of the BEAST attack. In the BEAST attack, the attacker tries to "see through" SSL. In SSL 3.0 and TLS 1.0, each record is encrypted with CBC, and the IV for each record is the last encrypted block of the previous record: an attacker observing the wire and in position to input some data in the stream can push just enough bytes to trigger emission of a record, observe it, and thus deduce the IV which will be used for the next record, whose contents will begin by the next byte the attacker will push.

Of all the IV generation methods you show, only the first one (IV generated with a cryptographically strong PRNG) will protect you against chosen-plaintext attacks. This is what was added to TLS 1.1.

On a specific situation like your credit cards in a database, some of the possible attacks may or may not apply. However, don't try to "cut corners" too much. If you put user data in the database, then chosen-plaintext attacks may apply: an attacker who can look at your database (e.g. with some SQL injection technique) may also act as a "basic user" to feed you with phony credit card numbers, just to see what shows up in the database.

In particular, in that scenario, if you use deterministic encryption (and that's exactly what you get with a fixed IV, be it all-zeros or not), then the attacker can simply brute-force credit card numbers: a number is 16 digits, but one of them is a checksum, and the first four or six digits are from the bank, and the remaining one are not necessarily "random", so such kinds of attacks can be effective.

Bottom-line is that if you use CBC, then you must use CBC properly, i.e. with a strongly random IV. If you prefer a monotonic counter (or clock), then don't use CBC; instead, use a mode which is known to be perfectly happy with a monotonic counter, e.g. GCM. It is already hard enough to achieve security when cryptographic algorithms are used by the book, so any "creativity" here is to be shunned.

And, of course, contents which has been encrypted with a given key is no more secret than the key itself. When an attacker has read access to your database, he might have read access to more than the database -- in particular, to the encryption key itself. It depends on where you store the key, and also on the extent of the attacker's access (SQL injection, stolen backup tape, front-end system complete hijack,...).

Thank you. +1 and a tick. Especially for the very sound advice, as well as the facts. For interest, just one comment: the attacker is unlikely to have access to the key itself, as it is stored in another, "leakproof", part of the system, which is where en/decryption takes place. He would only have access to the Key Label (an identifier), and could only request encryption services with a certainb level of authorisation. Someone stealing a copy of the database backups would probably not have any way to access the actual key. — Brent.Longborough, Sep 03 '13 at 19:45
It's also worth noting that my "leakproof security box" can provide GCM, with its API also being able to generate a crypto-random IV for me. As this can also provide authentication, I'll probably end up going this way. — Brent.Longborough, Sep 03 '13 at 20:59

score 3 · Answer 2 · answered Sep 03 '13 at 09:45

The main reason you use an IV is to prevent the same plain text yielding the same encrypted text twice. With CBC you encrypt your text in blocks. Let's assume you have the following text and each line is a block:

AAAAAA
BBBBBB
CCCCCC
DDDDDD

and

AAAAAA
CCCCCC
EEEEEE
FFFFFF

Without using an IV, the encrypted block for AAAAAA would be the same for both texts. Which means that if someone notices that the encrypted blocks are the same at the beginning of the encrytped files, he would know what the other file began with in the first place.

The idea behind an IV is that you never use it twice. It must be unique, because if it isn't unique and there is a chance you re-use one, you can run into previously mentioned situation where you can recover part of the plain text due to similiarities with an encrytped version of a known plain text.

score 1 · Answer 3 · answered Sep 03 '13 at 10:11

Let's go through these one by one:

Cryptographically sound random number

You're pretty safe, assuming collisions are statistically improbable. Uniqueness is what we're going for.

Any old "random number

The randomness itself isn't really critical. It's the potential for collisions. If your RNG has issues like a short period or statistically more likely outcomes (i.e. non-uniform distribution) then you may damage the collision resistance, and therefore weaken CBC.

A non-repeating, monotonically increasing, non-continuous counter (such as a high-resolution clock)

Assuming it is 100% non-repeating, and that you're not in a situation where multiple machines are communicating (remember that clocks might be synchronised) then you're relatively safe. Again, the problem is IV re-use, so having two machines with the same clock might lead to statistically plausible collisions in this model.

A 1-by-1 counter, large enough not to repeat in, say 10 times the expected usefulness of the protected data.

Absolutely fine, assuming your counter factors in the global uniqueness requirement for any give key.

A constant IV

This breaks CBC entirely, if the IV is known (which it should be assumed to be). One can simply ignore the IV for all intents and purposes.

An all-zeros IV

Again this breaks CBC, but only because it's constant. If only used once for one message per key, then it's still safe. The content of the IV isn't particularly important. It just has to be unique per key.

Thank you for a useful and clear analysis. Ive expanded the question a bit, to try to understand the real impact of a known plaintext+ciphertext attack. — Brent.Longborough, Sep 03 '13 at 18:15
Doesn't a predictable (e.g., timer or counter) IV open up other kinds of attacks with CBC? If Eve has a good guess as to the IV used for one of your records, and the IV that will be used for her next record, she can construct $PT_{eve} = IV_{eve} \oplus IV_{you} \oplus CC_{guess}$ for some credit card number guess. This encrypts to $C_{eve} = E_k(IV_{eve} \oplus P_{eve}) = E_k(IV_{eve} \oplus IV_{eve} \oplus IV_{you} \oplus CC_{guess})$ and Eve can now compare ciphertexts to determine if she's made the right guess. — Stephen Touset, Sep 03 '13 at 19:46
@StephenTouset That's very interesting. Am I right in thinking that Eve must have access to the encryption service in order for it to work? — Brent.Longborough, Sep 04 '13 at 11:01
Eve must be able to provide many inputs to be encrypted, and have the ability to compare the ciphertext output against a ciphertext she wishes to reveal. — Stephen Touset, Sep 04 '13 at 14:51

How do poor-quality initialization vectors affect the security of CBC mode?

3 Answers3

Linked