9

I'm looking for an efficient way to encrypt multiple fields in a database with AES using a single global key, used throughout a large web application.

Obviously in order to re-use this key, a unique random IV is required for each field that is to be encrypted.

I'd rather not introduce more fields to the database to store each of these IVs, so the programatic approach seems to be to derive these IVs some how.

I'm toying with using either:

key = sha256(global_key + table_name)
iv = sha256(column_name + primary_key)

Or even simply:

key = global_key
iv = sha256(table_name + column_name + primary_key)

I'm leaning towards the former to generate per-table keys.

I've already read that the IVs do not need to be kept secret. So I'm working on the assumption that a derived key or IV (even if the algorithm becomes known), is no more insecure than any other non-secret IV, as long as the original key remains secret.

The question is:

Is there a fatal flaw in my approach? Am I introducing any serious weaknesses that, in the event that an adversary obtains a copy of the database, would make it easier for them to retrieve the plaintext data?

I realise that is potentially soliciting one word answers.

Suggestions for alternate / better schemes very much welcomed, as well as references to existing works and how they implement similar scenarios.

Leigh
  • 283
  • 1
  • 8
  • 1
    Which mode are you using? CBC? CTR? Some authenticated mode? Does the encrypted value ever change? – CodesInChaos Nov 13 '12 at 16:13
  • @CodesInChaos It will be CBC mode, the values being encrypted are user details (names, addresses, emails, etc.) So some fields completely unique, others may have duplicates. – Leigh Nov 13 '12 at 16:28

3 Answers3

5

A potentially better approach would be to store the IV and ciphertext in one column. This way, you can generate IVs in the way most appropriate for your choice of encryption mode while also not having to add columns.

Something like "$AES-128-CBC$" + Base64.encode64(iv) + "$" + Base64.encode64(ciphertext) is similar to the format used in crypt, easily parseable, and being Base64-encoded is slightly more convenient when doing queries on the database using command-line clients.

Stephen Touset
  • 5,736
  • 1
  • 23
  • 38
  • Thanks for pointing out I'm probably doing what I usually do. Over-think things. Definitely a lot simpler, and without the dependencies on db constraints (as per your comment on the other answer), I could resort to a purely random IV which makes it more UPDATE friendly. I'd probably not use anything that identifies the algorithm used though, just because. Base64 is certainly a lot easier to work with than something like MySQLs HEX / UNHEX too. I would probably have opted for hex encoding, but Base64 will give better compression. Cheers. – Leigh Nov 13 '12 at 18:22
  • 1
    The major argument to having the encryption algorithm stored in the string is that you have built-in forward compatibility if you ever decide to change the algorithm in the future (without having to reencrypt existing data). – Stephen Touset Nov 13 '12 at 20:35
3

Your approach is insecure. If you modify some value in the database, you'll be encrypting a new value using the same IV as you used to encrypt the prior value. (The data will be stored in the same place, so the table name, column name, and primary key will remain unchanged.) This might compromise confidentiality. The compromise is especially bad if you are using a stream cipher or AES-CTR or similar mode, as then you've encrypted two different values using the same stream, a classic keystream-reuse vulnerability (the two-time pad is highly insecure).

Instead, it is better to do as others recommend: generate a random IV, and store it together with the ciphertext. Indeed, in most modes of operation, you can consider the IV to be part of the ciphertext.

D.W.
  • 98,420
  • 30
  • 267
  • 572
  • Thanks for your input, I've already changed my approach to more closely match Stephen's suggestion. Since the IV will always be a known length I will simply prepend it to the ciphertext and Base64 encode the whole thing for storage. It's actually quite reassuring to have recurring themes in the answers. Cheers. – Leigh Nov 14 '12 at 10:06
2

I am assuming you are using CBC mode...

I would make the argument that this is a CWE-329 violation. The IV can be known the to attacker, but it must be random. A more common solution to your problem is just to store a very secure random value and use this as your "IV".

Lets say you have a table named "secret". The attacker has a SQL Injection vulnerability and sees this table, he can also see that the current primary key value. By computing a simple sha256 hash he is able to predict the next IV, or even per-compute a table of future IVs. (Depending on your platform the SQL Injection vulnerability could be turned into a decryption oracle! Nasty!)

If I where building this I would use a proper Key Derivation Function such as PBKDF2. These are heavier functions which make it more difficult for the attacker to ore-compute or calculate in mass. It doesn't have to be much heavier, but the resulting value does have to be the same size as your cipher.

Another possible solution is to have an global "IV secret". Pass iv_secret + column_name + primary_key to your Key derivation function to produce your IV. Due to the use of a secret the IV is no longer a predictable value and this is no longer a CWE-329 violation. Further more this value is also a nonce, it is not likely that the same IV would be generated twice for two different values... (Unless you made an update, which would be a violation). To mitigate this you could add a "version number" or last-modified timestamp to the IV calculation.

rook
  • 46,916
  • 10
  • 92
  • 181
  • 1) Unpredictability is only required with some modes such as CBC. In CTR it doesn't matter(But mutating the field is bad). 2) If you use PBKDF2, use a single iteration. No need for strengthening here, if the master key is good enough. Personally I'd go with HMAC, but even a plain hash isn't a big issue here. – CodesInChaos Nov 13 '12 at 16:15
  • @CodesInChaos yeah I agree. I still like using a proper KDF function for this and making sure its the exact size I for the cipher. But I recognize this is more of a gut feeling. – rook Nov 13 '12 at 16:18
  • Thanks for the very valuable input. Absolutely the IV must remain unpredictable. Would it really be a massive issue to use SHA256 with the inclusion of the iv_secret? One of the major benefits of AES is the speed, so using an intensive function like PBKDF2 really offsets this benefit. There are 10+ articles of personal user information to be encrypted per row, and 10x PBKDF2 is a large overhead. I was considering serialising the user-data into a single encrypted row, but an attacker knowing the PHP serialisation format then has known partial plaintext for each row. – Leigh Nov 13 '12 at 16:24
  • @CodesInChaos Thanks for pointing out HMAC. I made a slightly misinformed assumption about the IV length. (I thought AES256 was using a 256bit block size - which is why I opted for SHA256). Seems the block size is 128 bit, so a standard HMAC with MD5 would suffice for a 128 bit pseudo-random IV) – Leigh Nov 13 '12 at 16:39
  • @Leigh the size is more important, using sha256 isn't really a problem. md5 would be a problem because its prng output isn't as random as it should be. The main issue is that the IV must be unpredictable, and there is nothing intrinsic about sha256, so you need to add a secret. Also you can never reuse an IV for two different plain texts. So beware of update queries. – rook Nov 13 '12 at 16:51
  • @Rook Again absolutely spot on, thanks. If an attacker had a snapshot of before/after an update, he'll have two ciphertexts with the same key/iv (and probably only minor changes in the plaintext) Damnit! – Leigh Nov 13 '12 at 16:59
  • One other thing to worry about with derived keys is that the data used in derivation may change. If you rename columns, change a row's primary key, all the data must be reencrypted. IMHO, not adding a database column can't be worth the inconvenience and downsides this would cause. – Stephen Touset Nov 13 '12 at 17:57