I am wanting to secure some highly sensitive data in a database. This would mean that the data needs to be encrypted and remain secure for 100 years if it were to fall into adversary hands. I also want to limit the amount of data that is vulnerable in plaintext at a time in RAM. This is so there is less chance of plaintext data being paged to disk. Also the database may be quite large so it needs to be more efficient than decrypting the whole database at a time just to access it. Therefore I am thinking about encrypting the sensitive data on a database row level. This would mean a unique index which references the record is unencrypted, so each record can still be found/retrieved, however the sensitive data itself is encrypted.
My solution would be to have the data per database row:
index | IV | sensitive encrypted data | MAC
- A 256 bit database key will be used to encrypt the sensitive data which will be generated using /dev/random.
- The IV for each row will be 256 bits from /dev/urandom (faster than /dev/random).
- The encryption algorithm will be Twofish.
- The MAC of each record will be HMAC-SHA3 of the index, IV & sensitive data using the key.
The system is single user. The user will create a strong alphanumeric passphrase (minimum 19 characters).
A password based key derivation function will be run on the passphrase to create a derived encryption key which will then be used to separately encrypt the database key with Twofish. This is so the user can change their password without having to re-encrypt the entire database - they can just create a new password and re-encrypt the database key instead. I understand that this is the weakest part of the scheme, but would like to make it very difficult for an attacker to brute force attempt password guesses. I think the security needs to rest in the strength of the passphrase, as any sort of secondary token could be compromised at the same time as the device which holds the encrypted data (I am thinking the device & token could be confiscated for arbitrary reasons when going through airport security so it would be no use).
- To derive the key from the passphrase, PBKDF2 will be used with 10,000 iterations using HMAC-SHA3 with a 256 bit output and a salt of 256 bits obtained from /dev/urandom.
- What I am trying to do is balance the number of password characters required to make the data secure versus making it reasonably fast for users on a mobile device which have slow processors and limited memory. I don't expect the user to wait more than 5 seconds for the PBKDF to complete.
- A MAC is created using HMAC-SHA3-256(derived encryption key, salt | encrypted database key) and stored next to the salt and encrypted database key on disk. This can be verified when logging in to make sure they entered the correct password.
When the program loads, the user enters the passphrase. The KDF runs, which generates the key to decrypt the database encryption key. The real encryption key is then the only thing kept in RAM while the program is running and used to verify and decrypt individual database records when required.
- What's the optimal length for the row level IV? Is 256 bits fine?
- Is the minimum password strength of 19 characters and 10,000 iterations of PBKDF2 strong enough to protect the 256 bit database key? If not, what parameters would work?
- Is PBKDF2 still a good algorithm still to use here? If not, what Scrypt parameters?
- Any further changes or recommendations to make the system secure?