5

I'm currently planning the development of a system where extremely sensitive information for individual users would be stored. There are currently two approaches being discussed. In order to make it more concrete, let's say we're encrypting each user's social security number (SSN):

  1. One time, generate a global encryption key/iv. Encrypt each SSN in the database with the same encryption key/iv.
  2. One time, generate a global secret. Use global secret and a KDF (such as PBKDF2 with a per-user salt) to generate a unique key, iv pair for encrypting each user's SSN.

I don't know enough to advocate either way, is there any advantage to (2) over (1)?

EDIT: I'm mostly looking for help weighing the general pros and cons of each approach. Namely, does (2) provide any additional security over (1) given that both rely on a shared global secret?

  • There is no universal answer to this question. The answer will depend on the value of the data you are protecting, the deployment scenario, etc... I'm not sure that we can help you without more information. – Neil Smithline Apr 19 '16 at 19:55
  • 1
    @NeilSmithline Thanks! I'll try to update the ticket with more information, but I'm not really looking for a universal answer such as perhaps some of the pros and cons either way. Namely, is (2) complicated or does it provide actual security benefits. – Eric Scrivner Apr 19 '16 at 20:21
  • @NeilSmithline I've gone ahead an updated the ticket to hopefully make things a bit more concrete. I don't want to divulge too much in the way of specifics, but I've hopefully added enough detail to make this more answerable. Is there anything else I could add to help with this? – Eric Scrivner Apr 19 '16 at 20:58
  • it looks good to me. – Neil Smithline Apr 19 '16 at 21:01

3 Answers3

4

Given the property of SSNs - beeing all guaranteed to be different - there is not much added in terms of security using the second approach, if the used mode of operation (which you didn't disclose) allows IV reuse.

But please let me suggest a third option: you should be using a Hardware security module (HSM) for this.

Your question does leave me with the impression that you are trying to solve this programmatically, yet there is a huge problem with that:

If your application is to use the SSNs, there has to be the key somewhere accessible. Encyption of the SSN should probably be deployed to keep the data safe in case one of the hard drives or the stored data gets leaked.

When this happens and you keep your key with the data, there is nothing protecting the data anymore. A data dump would most certainly also contain the key, unless an HSM is used.

Tobi Nary
  • 14,302
  • 8
  • 43
  • 58
  • 1
    I think you should add a link to different modes or something (maybe [this](https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation)). I suspect that the OP isn't sufficiently crypto-aware to know what they are. BTW, it's a bold move to post an answer to a crypto question after Thomas has posted. Nicely done. – Neil Smithline Apr 19 '16 at 22:14
  • @NeilSmithline The bear and I almost posted simultaneously. I will add it:) – Tobi Nary Apr 19 '16 at 22:14
4

Reusing an IV is a bad idea. Every encryption mode has its own requirements on IV management, but for most of them, consequences of IV reuse range from problematic to deadly. Don't do that.

The "normal" method is to use a single key, and a per-record IV: each encrypted element will get its own IV, which will be stored along the encryption result. Generating both key and IV from a global secret and a per-record "salt" usually works (it results in pseudorandom IV, which are fine for most encryption modes) but adds complexity, which is, in itself, a bad thing.

Doing encryption properly is a difficult art, especially since in most contexts that warrant encryption, integrity shall also be checked. Your best bet would be an Authenticated Encryption mode -- basically, GCM. GCM takes as inputs:

  • An IV of nominally arbitrary size, but a 12-byte IV is recommended.
  • A key suitable for AES (128, 192 or 256 bits)
  • The message m to encrypt.

The output then consists in c, the encryption of m (with the exact same size), and t, the "authentication tag", which has the same size as an AES block (128 bits, i.e. 16 bytes). What you must then store is the IV, the encrypted output c, and t.

When decrypting, the decryption engine will use the IV and c, and produce t again, which you then have to compare with the stored t to check for alterations.

The good thing about GCM is that the only requirement on the IV is no reuse. You don't have to make IV random or unpredictable, but you MUST NOT reuse the same IV twice. You could use a time stamp or a counter or a database row index, provided that you ensure that you never reuse an IV value.

Thomas Pornin
  • 320,799
  • 57
  • 780
  • 949
0

One thing you should consider is whether this value is used in JOIN or GROUP BY clauses in the applications that use this data. For example, if your applications need to assemble a credit history based on the SSN, you would want that value to have been deterministically encrypted in all of your database tables. The alternative would be decrypting every record and then using the decrypted value as the JOIN criterion, which would be horribly inefficient.

Dave Mulligan
  • 501
  • 4
  • 7