1

Alice wants to encrypt her data. She is not comfortable with using only a password. While she prefers to use a properly random crypto key, she isn't confident she can protect the key file from theft or loss. So she wants to use both.

Bob proposes that she derive her encryption key EK from her password P and input key IK (the key Alice keeps in a file) as follows:

  • EK = HKDF( IK + PBKDF2(P) )

In which + is concatenation, PBKDF2 uses plenty of iterations, and both KDFs use SHA-256 and null salts.

Alice is interested but not yet sold on Bob's proposal. She has two questions:

  1. Wouldn't it be better to use a salt with the KDFs? If so, where to keep it?

  2. Since she needs to back up the file containing IK in a couple of safe places, shouldn't it be encrypted and have an HMAC?

Bob scratches his beard, mutters "good questions". Eventually he says that these two extra steps add complexity to the process that needs to be justified, and asks, "What, if anything, do they add to the security of your data?"

Given Alice's requirements, how would you answer Bob's question question?

(Or is Bob's proposal faulty? (Or perhaps Alice's requirements do not make sense?))


The story continues...

Bob is trying to taking Alice' worries seriously because he knows her government's reputation for brutality and suppression of dissent. So he posts a question to a social Q&A web site called "Information Security" and get's a very through answer from Tom Leek for which is is very grateful. From that answer Bob gleans:

  • Encrypting IK using another key derived from P does little to make it harder for Alice' wicked government to obtain her plain-text data. Decryption of Alice's data requires IK, P and the encrypted data either way. Adding encryption and authentication could make matters worse if it is not implemented correctly, for example, by exposing side channels.

  • Using null salt input to PBKDF2 means using an empty string as the salt. This is not the best choice. He decides to use a random value for salt and, for operational convenience, to keep it in the same file as IK.

  • Authentication of the key file is valuable, even though its encryption is not.

So Bob's revised proposal involves adding an HMAC to IK in the key file beside the salt. Thus the key file looks like:

  • salt + HMAC + IK

In which HMAC is computed over IK using an authentication key derived from P but different from EK.

2 Answers2

1

If you use PBKDF2 then you are using a salt, because that's how PBKDF2 is defined: it is a function which takes as input a password and a salt. So there is already a salt somewhere. Or else Bob is using a "fixed salt", i.e. not a salt at all, and we can say that Bob is not really running PBKDF2.

Salts need not be secret. Their virtue is in being unique (as much as possible).

Salts are part of password hashing functions, and they are good to thwart cost-sharing when the hashed secret value can be realistically broken with brute force; i.e., when the secret is a password. There is no need for a salt, or, for that matter, for lots of iterations, when processing keys which are not passwords. When someone deals with a key that is stored in a file, it is assumed that the key is large enough and generated with a strong enough PRNG that brute force is no longer an issue.

Therefore: there is already a salt for the PBKDF2, it must already be stored "somewhere" (not necessarily with any kind of confidentiality). There is no need for any other salt here.

As for the other question: if the places where Alice stores her key are safe, then why would she need extra protection ? That's the point of safe places: they are safe.

Alice wants to encrypt her data because she fears that the storage and transport mediums to which the data will be entrusted may come under the eyes of people who are interested in the data, but should not be able to see it. (At least that's the rational reason why Alice's data should be encrypted. It is possible that Alice wants to encrypt the data because she has an "encryption system" budget line to be spent before the end of the current fiscal year.) Encryption will maintain the confidentiality of the data as long as only Alice can perform the decryption.

A password is tied to Alice by being stored in Alice's brain. The key file is tied to Alice because it is stored in "safe places" that only Alice can access. Both cases can be deemed "slightly vulnerable" because:

  • A password that fits in a human brain can usually be recovered through a dictionary attack (because human brains are not good at remembering arbitrary bits, and they are extremely bad at making truly random choices).

  • A key file exists as a file on some physical object, and ensuring the physical security of an object on a 24/7 basis can be challenging (at least Alice cannot help but bringing her brain with her at all times).

Combining the two kinds of secrets for a single encryption is akin to two-factor authentication.

Further encryption of the key file raises the question of: encrypting, yeah, but with what key ? If Alice wants to do password-based encryption of her key file, then we are back to Alice's brain, whereas the point of using a key file was, indeed, not to rely on Alice's brain.


On a generic basis, all applications of encryption should include checked integrity. This is because the "passive-only" attacker model is almost never realistic. Most attackers who are in position to eavesdrop on data (and we encrypt because we think it possible) are also naturally able to alter the data.

Combining encryption and MAC properly is hard. The proper way is not to slap together some block cipher with HMAC in a semi-haphazard way; instead, use an authenticated encryption mode which does the job (e.g. GCM).


As for the details: there are many ways to "process" a key file and a password together into some secret value amenable to use as a symmetric key for encryption. The password part is best done with a dedicated password hashing algorithm, specially designed and reviewed to do get the most protection against dictionary attacks. PBKDF2 is not bad, although bcrypt is arguably better; and cryptographers are working on even better functions.

Using concatenation of the key file and the PBKDF2 output, as input to HKDF, relies on properties of HKDF which are not easy to spell out in all their mathematical glory, but are likely achieved by HKDF (this is a good choice).


The real key to understanding the situation is right there in your first paragraph: Alice is not comfortable with using only a password. The protocol suggested by Bob is not aimed at actually "increasing security"; the real goal is to allow Alice to relax. Therefore, what must be done is: whatever it takes to appease Alice's paranoia. If Alice wants to pile encryption upon encryption, and needs a tower of nested complexity to be relieved of her worries, then so be it. That's her data after all.

Tom Leek
  • 168,808
  • 28
  • 337
  • 475
0

The general class of algorithms are called secret-strengthening protocols. The idea is that Alice has a weak secret, p, and by conducting a short handshake with Bob, who knows a strong secret, N, Alice derives a combined secret H(p,N) which is a strong secret.

Secure Remote Password is an example of such a secret-strengthening protocol combined with a proof of knowledge to perform the additional authentication step.

http://en.wikipedia.org/wiki/Secure_Remote_Password_protocol

One problem with SRP is Bob, who knows the strong secret, N, could still dictionary attack Alice's p. To avoid this, you can extend the protocol to one or more additional parties, eg. Charlie, David, and Edward, each with additional strong secrets. Now Alice conducts her protocol with each of Bob, Charlie, David, and Edward, deriving H(p,N1,N2,N3,N4). The derived strong secret is now proof against dictionary attacks unless all parties collude against Alice, sharing their knowledge of N1,N2,N3, and N4 with an attacker.