PBKDF2 hash over PBKDF2 key derivation

Question

Imagine an app where any user provides a username and a password. This password is used to generate a 'strong' encryption key (for AES encryption) with the PBDKF2 key derivation algorithm. This encryption key is used to encrypt user data in a database.

Now, to identify a user and grant him access to his encrypted data, I used to hash its encryption key with the "SHA-512" algorithm and then compare it with the stored hashed key. I want to upgrade my strategy and better protect my app against brute-force attacks.

Since PBDKF2 is also a hash algorithm, my two questions are :

is it safe to use PBDKF2 to hash (and then store) an encryption key that has been generated by the same PBDKF2 algorithm (schematically PBDKF2.hash(PBDKF2.derivate(password)))?
is it equally safe to store the encryption key encrypted by itself (schematically AES.encrypt(input = key, key = key) ) instead of the key hash?

Take a look at [Secure Remote Password protocol](https://en.m.wikipedia.org/wiki/Secure_Remote_Password_protocol) since you seem concerned with the idea of the user sending you potential key material. — Ben, Jan 18 '20 at 20:53
Thank you, from what I understand this protocol indeed describes what I would like to achieve (basically client/server authentification through SSL with cryptograpic signature). — BmXaM, Jan 20 '20 at 07:18

Future Security · Answer 1 · 2020-02-18T20:04:15.560

Hashing a derived key is not a security issue because cryptographic hashes are one way. There are two ways people could attack the derived key.

Guess a password, process it using PBKDF2, SHA-512 hash the result and see if it matches
Guess a derived key (128 to 256 bits), SHA-512 hash it and see if it matches

The amount of work performed per guess with method 2 is much lower, but the probability of being successful within n guesses is only n / 2¹²⁸. Since the total amount of work is n times the work-per-guess, method 1 is always less work in practice. (Method 2 is never practical because brute forcing 128-bit or larger keys isn't currently possible.)

I estimate that typical "good" user chosen passwords have 30-40 bit strength. So you can expect them to be guessed within 2⁴⁰ guesses instead of 2¹²⁸. You would only "break even" if PBKDF2 were made to cost 2⁸⁸ times SHA-512. No user, legitimate or not, would be able to login if that were the case. It would just take too long.

(Not that "breaking even" matters in practice. Method 2 is already out of reach for real world adversaries so it's a strategy that no one is going to bother with. People's passwords are the weakest link in the chain. Not an algorithm as reliable as SHA-512.)

Another issue makes it possible to bypass the second level of PBKDF2 brute-force protection. Hashing the derived key and comparing it to the stored hash is not the only way for a hacker to determine if they have the correct derived key.

If they have access to ciphertext, then they instead could perform trial decryption to get a good idea of whether or not they have the correct key. It's not even required to decrypt all the ciphertext. A few bytes should suffice.

Alternatively, they could use the message authentication algorithm. (Encryption without authentication is a really bad idea. You do use an authenticated-encryption mode, right?)

Any place where key material is derived from a user's password provides a potential pathway a hacker can use to verify whether a guessed password is correct. Using any one of these avenues gets the person trying to crack a password the same answer. so they are going to use whatever option is easiest.

This is another weakest link in the chain situation. Whether you use PBKDF2/SHA-512 or PBKDF2/PBKDF2, expect that your level of brute-force protection is only as strong as the first layer of PBKDF2.

PBKDF2/SHA-512 can actually be made stronger than PBKDF2/PBKDF2. If you have only one second to budget to verifying a password, then it is preferable to put nearly 100% of that one second into the first PBKDF2.

Using SHA-512 for the second-level check lets you do this because its run time will be near-instantaneous compared to PBKDF2. (But again, a 128-bit key (or stronger) is not brute-force-able because the actual cost of SHA-512 is non-zero and 2¹²⁸ SHA-512 evaluations adds up.)

If you instead let the first layer of PBKDF2 run for 0.5 seconds then ran the second layer for 0.5 seconds, then you'd really only have half the brute force protection of what you could otherwise afford.

There are many interesting information in your answer, thank you. If I understand you correctly, you're saying that bruteforcing a SHA512 hash of a 128bits key is unfeasible for now, so SHA512 is enough to 'protect' a key as long as the key is long enough (and 'random enough'). Regarding the direct decryption of a few ciphertext bytes, I don't really see what I can do, except ensuring the best protection for the database containing the ciphertexts. — BmXaM, Feb 20 '20 at 09:02
Finally, I don't really understand what you mean by 'they could use the message authentification algorithm'. Could you please elaborate a little bit more on that? Actually I don't use an authenticated-encryption mode because I don't see where it could play a role : the user sends a hash, the server compares the hash with the one stored in database and returns the encrypted user data stored in database (only the user can decrypt the data, not the server). Would you mind telling me where/how the MAC could improve the procedure, please? — BmXaM, Feb 20 '20 at 09:02

score 0 · Answer 2 · answered Jan 18 '20 at 23:43

What are you trying to achieve here? Utilize a user's password for (1) authentication/login as well as a (2) key to encrypt their data? If so, then consider what password managers typically do - I see as bearing similarities with what you're proposing. Here are the operations cannibalized from an answer to a different question:

client_rounds = 5000 // iteration count may be user-defined
encryption_key = PBKDF2(HMAC-SHA256, password, salt, client_rounds) // unlocks your data
auth_hash = sha256(encryption_key) // sent to the server for authentication
server_rounds = 100000
server_key = PBKDF2(HMAC-SHA256, auth_hash, server_salt, server_rounds) // server can use this to do additional encryption

Full algorithm:
PBKDF2(HMAC-SHA256, sha256(PBKDF2(HMAC-SHA256, password, client_salt, client_rounds)), server_salt, server_rounds)

is it safe to use PBDKF2 to hash (and then store) an encryption key that has been generated by the same PBDKF2 algorithm (schematically PBDKF2.hash(PBDKF2.derivate(password)))?

Yes, it is safe, but using PBKDF2 for hashing is not the appropriate algorithm because hash functions are expected to be fast, whereas PBKDF2 is intentionally designed to be slow for key derivation.

Since you are already applying PBKDF2 to the password that is sufficient defense against brute force attacks - increase the iterations as needed.

Then follow up by hashing the derived key, which is also stored instead of the actual key/password.

is it equally safe to store the encryption key encrypted by itself (schematically AES.encrypt(input = key, key = key) ) instead of the key hash?

No, not safe. Encrypting keys are hard to do right - you will want to research key wrapping, starting with NIST's key wrap recommendations. Hashing is a much better approach.

Thank your very much for your answer. My main concern was to know if it was sure. Now, to decide who (client/server) computes the second PBKDF2 is subjective : in my case, I would prefer the client to compute it because if the server is compromised, the attacker receives the 'sha256-hash' and can brute-forces it to obtain the encryption_key. But again, the overall answer satisfies me. — BmXaM, Jan 20 '20 at 07:10

PBKDF2 hash over PBKDF2 key derivation

2 Answers2