3

I am wanting to secure some highly sensitive data in a database. This would mean that the data needs to be encrypted and remain secure for 100 years if it were to fall into adversary hands. I also want to limit the amount of data that is vulnerable in plaintext at a time in RAM. This is so there is less chance of plaintext data being paged to disk. Also the database may be quite large so it needs to be more efficient than decrypting the whole database at a time just to access it. Therefore I am thinking about encrypting the sensitive data on a database row level. This would mean a unique index which references the record is unencrypted, so each record can still be found/retrieved, however the sensitive data itself is encrypted.

My solution would be to have the data per database row:

index | IV | sensitive encrypted data | MAC
  • A 256 bit database key will be used to encrypt the sensitive data which will be generated using /dev/random.
  • The IV for each row will be 256 bits from /dev/urandom (faster than /dev/random).
  • The encryption algorithm will be Twofish.
  • The MAC of each record will be HMAC-SHA3 of the index, IV & sensitive data using the key.

The system is single user. The user will create a strong alphanumeric passphrase (minimum 19 characters).

A password based key derivation function will be run on the passphrase to create a derived encryption key which will then be used to separately encrypt the database key with Twofish. This is so the user can change their password without having to re-encrypt the entire database - they can just create a new password and re-encrypt the database key instead. I understand that this is the weakest part of the scheme, but would like to make it very difficult for an attacker to brute force attempt password guesses. I think the security needs to rest in the strength of the passphrase, as any sort of secondary token could be compromised at the same time as the device which holds the encrypted data (I am thinking the device & token could be confiscated for arbitrary reasons when going through airport security so it would be no use).

  • To derive the key from the passphrase, PBKDF2 will be used with 10,000 iterations using HMAC-SHA3 with a 256 bit output and a salt of 256 bits obtained from /dev/urandom.
  • What I am trying to do is balance the number of password characters required to make the data secure versus making it reasonably fast for users on a mobile device which have slow processors and limited memory. I don't expect the user to wait more than 5 seconds for the PBKDF to complete.
  • A MAC is created using HMAC-SHA3-256(derived encryption key, salt | encrypted database key) and stored next to the salt and encrypted database key on disk. This can be verified when logging in to make sure they entered the correct password.

When the program loads, the user enters the passphrase. The KDF runs, which generates the key to decrypt the database encryption key. The real encryption key is then the only thing kept in RAM while the program is running and used to verify and decrypt individual database records when required.

  1. What's the optimal length for the row level IV? Is 256 bits fine?
  2. Is the minimum password strength of 19 characters and 10,000 iterations of PBKDF2 strong enough to protect the 256 bit database key? If not, what parameters would work?
  3. Is PBKDF2 still a good algorithm still to use here? If not, what Scrypt parameters?
  4. Any further changes or recommendations to make the system secure?
aobocod
  • 31
  • 3
  • This is a cross-post [from CryptoSE](http://crypto.stackexchange.com/q/19625/17716) because they did not think it was on topic there. – aobocod Oct 15 '14 at 10:09
  • The MAC key should be computationally independent of the Twofish key. –  Oct 15 '14 at 13:50
  • Regarding the size of the IV, it's not a variable. Your block size is 128bits with Twofish, thus your IV must be 128bits. Regarding your other questions, I'd say your threat model is very confused. You're afraid of leaving plaintext in RAM, yet you let your main database key in memory? I also don't understand why you use the exact same key for every row. And are you also afraid of possible attacks against your software implementation, or just your cold data? – Dillinur Oct 15 '14 at 12:22
  • @Dillinur Correct, I should use the IV for the cipher. Regarding the thread model, how are you supposed to access the data without a key being in memory somewhere? I consider it better to have just the key and load small amounts of the sensitive data one row at a time into memory, which is controllable, rather than loading the entire database into memory which could be 100 MB+ and the OS might decide to randomly page some of it to disk. Or is that an unlikely threat? What's wrong with encrypting using the same key per row? One key can encrypt 2^128 bits of data safely. – aobocod Oct 16 '14 at 08:56
  • @Dillinur It is different IV per row. Mainly I am afraid against attacks on the cold data. It is to protect the data at rest and the device turned off. If an attacker gets a hold of the device while the program is open and the key is in memory there isn't much you can do to protect against that. So the user will be responsible for logging out (which would wipe the key from memory) before going through an airport security screening or similar where the device could be seized. – aobocod Oct 16 '14 at 09:21
  • @RickyDemer Good catch, thanks. I will use a KDF to generate separate MAC and encryption keys from the main database key. – aobocod Oct 16 '14 at 09:54
  • If you're talking about 100 years of security and keys living in RAM, then first, I think you're beyond any reasonable bounds with current technology, and second, you should start looking into hardware security appliances with serious FIPS validation against a variety of attacks, including physical attacks. – Anti-weakpasswords Dec 29 '14 at 05:50
  • It sounds like you've read a lot stuff about cryptography but haven't understood it. The MAC is very probably redundant. Most DBMS support encryption out of the box - you seem to be rolling your own crypto. – symcbean Oct 08 '18 at 11:56

1 Answers1

1

What's the optimal length for the row level IV? Is 256 bits fine?

The answer depends on whether some of the plain text will be known or can be easily guessed by the attacker. The question described the data as, "Highly sensitive," and the protection period of, "100 years," so use the strongest cipher and largest secret conveniently available for encryption.

Is the minimum password strength of 19 characters and 10,000 iterations of PBKDF2 strong enough to protect the 256 bit database key? If not, what parameters would work?

The answer to the number of characters is not determinable without knowing the rules imposed on passwords during validation.

Is PBKDF2 still a good algorithm still to use here? If not, what Scrypt parameters?

An answer exists that provides some good background on bcrypt or PBKDF2.

Any further changes or recommendations to make the system secure?

Yes. Many. Here are a few.

  • Remove all redundancy that you can reconstruct in query results after decryption.
  • Ensure that the salt used to exhaust the block before encryption is truly unpredictable. (Use the best entropy source you can acquire on the server.)
  • Separate critical payload from less critical payload if they can be statistically independent to reduce the memory footprint of the plain text versions of the critical data.
Douglas Daseeco
  • 614
  • 3
  • 17