2

Background: I'm working with Node's crypto library. I'm using PBKDF2 to convert a variable-length binary "passphrase" into constant-length keys for an AES cipher later on.

The underlying source of this passphrase data, for reasons out of my control, is encoded with base64. Out of habit, I decode the passphrase back to binary before supplying it to PBKDF2. But that got me thinking... Base64 encoding makes the passphrase longer, but with a more limited set of characters. In practical applications, would this make the data better or worse as an input to PBDKF2?

Put another way, if given a choice between:

key = pbkdf2(binaryPassphrase)

...versus...

key = pbkdf2(base64encode(binaryPassphrase))

...is there any difference in the security offered?

smitelli
  • 2,035
  • 3
  • 15
  • 19
  • 1
    It makes no difference. The strength is in the *entropy* of the passphrase, not in its length. The entropy is a measure of the process by which the passphrase was generated, and is not measurable from the passphrase itself after the fact. That said, I would personally decode it from Base64 out of habit and a sense of conceptual purity (e.g., perhaps later the source send you passwords in Base32 or some other encoding). – Stephen Touset Jan 07 '14 at 19:22

1 Answers1

2

Between

key = pbkdf2(binaryPassphrase)

and

key = pbkdf2(base64encode(binaryPassphrase))

there is no difference in amount of security provided. The base64 encoded passphrase is longer input, but it is based on exactly the same amount of entropy and thus offers no additional security.

The pbkdf2 function takes practically same time to execute with either input (unless binaryPassphrase is multiple kilobytes or more). The only significant difference will be that the resulting keys derived will be different.

user4982
  • 682
  • 3
  • 5
  • Makes sense. I suppose it doesn't matter that base64 adds a predictable pattern to the input (i.e. every high bit is 0)? – smitelli Jan 07 '14 at 19:22
  • PBKDF2 is just an iterated hash function, and it's generally considered that one of the effects of a good hash function is to evenly distribute entropy. For instance, if I have a string that contains four zero bytes followed by four random bytes, the first four bytes have zero entropy and the latter four have 32 bits of entropy in total. If I then hash this string (to an 8-byte output), *each* byte should now have roughly 16 bits of entropy (and the string as a whole will still only have the original 32 bits of entropy). – Stephen Touset Jan 07 '14 at 19:25
  • @StephenTouset: PBKDF2 is based on iterated invocation of HMAC. HMAC is considered good randomness extractor ("crypto talk" name for something that is good for retaining entropy available in input). For more information, you can read e.g. [Randomness Extraction and Key Derivation Using the CBC, Cascade and HMAC Modes](http://www.iacr.org/archive/crypto2004/31520493/clean.pdf). (Then again, HMAC is indeed based on hash functions.) – user4982 Jan 07 '14 at 19:31
  • Yes, I am familiar with PBKDF2. As you note, it is built on HMAC which is constructed from an underlying hash function. That hash function is what is responsible for evenly distributing the entropy in the passphrase. This property is not, AFAIK, necessarily a property of other MAC constructions. – Stephen Touset Jan 07 '14 at 19:55
  • @StephenTouset: In application of randomness extraction, HMAC is even stronger than usual use of hash. Hugo's [paper on HKDF](http://eprint.iacr.org/2010/264.pdf) contains much detail. (Most) other MACs are indeed worse (the paper I referenced in fact also tells that), and PBKDF2 does not allow anything except HMAC (unlike e.g. NIST SP 800-108 and SP 800-56C, which approve AES-CMAC as alternative for HMAC). – user4982 Jan 07 '14 at 20:26
  • I believe you are agreeing with me, but I am not sure you are aware of it. – Stephen Touset Jan 07 '14 at 20:40