Naive key stretching vs PBKDF2

Question

What are the drawbacks of using a trivially simple key stretching approach such as the below Python example, compared to something like PBKDF2? The purpose is to harden an AES-encrypted file against brute-forcing.

Are there any critical issues with the trivial way? There is no PBKDF2 implementation available in the language I am programming with.

for _ in range (100000):
  key = hashlib.sha256(key).hexdigest()

What language are you working in? Do you care about defending against gpu or fpga attacks? — Xiong Chiamiov, Feb 10 '17 at 22:34

score 4 · Answer 1 · edited Mar 17 '17 at 10:46

From wikipedia: Key Stretching:

... are used to make a possibly weak key ... more secure against a brute force attack by increasing the time it takes to test each possible key

Since hashes like SHA-2 are designed for speed simple key stretching by using just the hash don't really slow down an attacker which tries to brute-force a key based on the weak input. KDF like PBKDF2 are instead designed to really slow down the attacker by being slow themselves.

While your example does lots of iterations to slow down the computation an attacker could create once a database of all interesting weak inputs and their outputs and use this precomputed values later in multiple brute-force attacks instead of recalculating all values at the time of attack. That's why algorithms like PBKDF2 use additionally a salt to vastly increase the memory needed by such database and the time to create it and thus make it impractical.

There is no PBKDF2 implementation available in the language I am programming with.

If the language already offers a way to calculate a hash and you are willing to create some KDF anyway then implementing PBKDF2 or other well designed KDF instead of your own idea should not be too hard.

Luis Casillas · Accepted Answer · 2017-02-10T23:27:23.837

The security argument for PBKDF2 is better than for your proposal. This doesn't mean that your proposal is any less secure, just that PBKDF2 is easier to justify.

This is because PBKDF2 incorporates three additional ideas beyond just your iteration proposal. First, PBKDF2 explicitly incorporates a salt, while your example doesn't. Even if you implicitly assumed that key would be something like salt + password, a password scrambler really should take password and salt as separate arguments.

Second, PBKDF2 iterates not a public hash function like SHA-256, but a pseudorandom function ("PRF") like HMAC-SHA-256, keyed with the password. This is a very subtle theoretical point, but what this means is that PBKDF can work with functions that are weaker than a full-blown hash function like SHA-256.

Third, PBKDF2 doesn't just chain the outputs of the PRF, but also XORs its outputs. This is a (perhaps paranoid) safeguard against weak PRFs that have short cycles when applied to their own outputs.

PBKDF2 is actually pretty simple. It's simple enough that laypersons are unlikely to mess it up. If your language and library have SHA-256 they likely have HMAC-SHA-256 as well, but if not that's easy to implement as well. So you really ought to just give it a shot—it's better to stick to standard, well-tested techniques than to cook up your own.

Final note: the output of a hash function is binary data, and really should be treated as so. That is, your use of hexdigest in your example is wasteful and not conducive to good quality code; data should be processed in its native type as much as possible! (Using hex digests would also give you an incorrect PBKDF2 implementation.)

Naive key stretching vs PBKDF2

2 Answers2