Why not mix hashes?

Question

To make hashes harder to target by specialized hardware, I intuitively imagine that mixing a set of different hash algorithms should provide additional strength. For simplicity lets assume Hash1 is a number of iterations of SHA256, Hash2 is bcrypt and Hash3 is scrypt:

myhash = Hash1(Hash2(Hash3(0, password, salt), password, salt), password, salt)

Assuming Hash1, Hash2, and Hash3 each take 1/3 seconds to compute on a typical user's hardware, why does it seem to be preferred to use

myhash = Hash1(Hash1(Hash1(0, password, salt), password, salt), password, salt)

EDIT: Changed the short format Hash1(Hash1(Hash1(password ^ salt))), to a more accurate format Hash1(Hash1(Hash1(0, password, salt), password, salt), password, salt), to avoid answers that only point out the higher amount of collisions with the former format. Collisions are a non factor in user password hashing as long as the remaining entropy of myhash (~256 - x due to collisions, with x being close to 0) is appreciably higher than the entropy of the user's password. A user's password has almost always less than 60 bits of entropy, unless chosen by a password generator.

Well if one is consider to be "collision, second and pre image resistant", as of current knowledge about the function, three of those on top of each other isn't gonna make a difference. It could make it worse if one is weaker than the others (Big-MAC/Little-Mac attack). It's equivalent to attacking three different messages but in a certain order. If you can't attack attack one of those, why stack them together? Unless for secure MAC'S (HMAC), which has a specific attack it is trying to protect — dylan7, Jul 12 '15 at 15:31
Also, the likes of bcrypt, scrypt and pbkdf2 are not hashes themselves, but are KDF's (key derivation functions), built using HMAC algorithms, which in turn *utilize* hash algorithms like SHA-256 to generate the HMAC values. There is already significant mixing and stirring and shaking involved in getting a value out of a KDF. Even if mixing the output of multiple KDF's doesn't weaken overall security, it seems likely not to materially improve it. — Craig Tullis, Jul 12 '15 at 18:13
@Craig This is my favorite answer so far. Would you mind making it into an answer instead of a comment? — Peter, Jul 12 '15 at 19:52
[my take on progs.SE](http://programmers.stackexchange.com/a/115414/25768) — ratchet freak, Jul 12 '15 at 20:19
Hashing and salting only works on reading the stored password/string/whatever. Someone trying to brute-force it will only need to enter the non-encrypted string, regardless of how many times and whatever you use to hash it. — ggdx, Jul 13 '15 at 09:34
Be aware that bcrypt stops reading the input when it encounters byte 0, and reads up to 72 bytes. So when feeding output of hash functions into bcrypt you may get bcrypt("\0abcd...") == bcrypt("\0efgh...") which is very bad, 1/256 of passwords have the same hash! when feeding base64 of hash into bcrypt, the length limit can be a problem. — Z.T., Jul 13 '15 at 19:36
@DanWhite Yes, but... part of what the many iterations of key derivation functions like pbkdf2/bcrypt/scrypt accomplishes it to make each password guess take so long (a significant fraction of a second, or longer, instead of a few milliseconds) that trying to use brute-force guessing just isn't worthwhile. Unless you mean the attacker has a copy of the password database, in which case they need to know what algorithm to feed the plaintext password to, how many iterations to run, and so on. The bar is a little higher than you seem to imply? — Craig Tullis, Jul 13 '15 at 21:30
Can you prove that none of your hash functions invert (or partially invert) any of the others? I would be surprised, but have never tried to prove such a thing. It would be embarrassing to claim to have made a stronger hash only to find out that it was equivalent to only doing "half" of one hash... — Eric Towers, Jul 14 '15 at 06:56

score 28 · Answer 1 · edited Mar 17 '17 at 13:21

28

When trying to hash passwords, the attacker can always use the same kind of hardware as the defender. What the attacker tries is to do better, by using specialized hardware which will allow him to hash N potential passwords for less total cost than if he were using the defender's hardware. The total cost includes the cost of buying the hardware, the cost of developing the relevant software on it, and then the cost of running the hardware, which basically amounts to the used electricity (for powering the hardware and for cooling it). For a serious attacker, the power cost dominates.

Whenever there is some operation that the attacker can do cheaper than the defender, the attacker wins. An important point is that password cracking is an embarrassingly parallel problem: by definition, the attacker has many potential passwords to try.

Suppose that you cascade three distinct hash functions Hash1, Hash2 and Hash3. This means that the defender must have all three implementations at hand, all running on his server. The attacker, on the other hand, can have a better scheduling: he can (say) hash one million potential passwords with Hash1 and save the results in some buffer; then switch hardware to something that applies Hash2, and run it over the million saved outputs from the previous step, there again saving the Hash2 outputs in some buffer; finally switching hardware again, with Hash3.

This kind of "hardware switching" is especially relevant when using FPGA: each "switching" is a reprogramming of the same actual hardware, and is a matter of a few seconds at most. By using such scheduling and buffering, the "switching" cost is negligible.

This can also be used as pipelining: if the attacker built three specialized machines, one for Hash1, one for Hash2 and one for Hash3, then he can run Hash1 on the first potential password, then send the output to the machine that computes Hash2. While the second machine computes Hash2, the first machine can compute Hash1 on another potential password. And so on. In practice, the attacker can maintain all his specialized machines at full occupancy at any time, thereby laughing at your attempts at "increased strength".

Moreover, if there are three distinct hash functions to implement and only one of them can be optimized with specialized hardware, then the attacker still gets a win by optimizing that one. To say things crudely, if you cascade bcrypt, scrypt and SHA-256, then the attacker will use a PC for the first two, and a GPU for the SHA-256, and will thus avoid about 1/3rd of the cost.

To sum up, the intuition that "mixing a set of different hash algorithms should provide additional strength" is wrong. It does the opposite. Such mixing increases development and usage costs for the defender, while it does not slow the attacker (who has a lot of parallelism to benefit with), and increases the attacker's options for optimization.

(All of this is said without talking looking into practical things, such as the management for the individual salts for all cascaded functions, and the dangers of homemade cryptography.)

edited Mar 17 '17 at 13:21

Community

1

answered Jul 12 '15 at 16:02

Thomas Pornin

320,799
57
780
949

1

You say that using a GPU would remove a third of the cost for the attacker, if we use a mixed hash. Yet that implies that using just one hash, but the wrong one (sha256) the attacker doesn't have any cost anymore and has won. Which tells me "when in doubt which hash is best, cascade multiple hashes" - that's the point I don't get. – Peter Jul 12 '15 at 16:07
3

I think what you should take away is actually "when in doubt which hash is best, figure out which hash is best and use that one." These functions are analyzed in great detail, and it shouldn't be hard to figure out which one offers the most protection, given the attack profile you have to protect against. If you cascade the best hash with other inferior hashes, then it stands to reason that you can improve your security by replacing the inferior hashes with the best one. – David Z Jul 12 '15 at 18:54
Unless a 300% increased hashing cost for the defender is negligible (since he is doing so few), and a 200% increased hashing cost for the attacker is prohibitive. There are reasons not to compound hashes, but differences of cost between attacker and defender seems like a weak one. – Paul Draper Jul 12 '15 at 19:15
9

If a 3x increased hashing cost is negligible for the defender, then he should increase the iteration count in its bcrypt/PBKDF2/whatever accordingly. That's what that iteration count is for. – Thomas Pornin Jul 12 '15 at 22:50
2

I would offer the opposite argument: if one uses a single hashing function which has a 0.1% chance of having a discoverable weakness that would allow an attacker to speed it up by a factor of a million, there will be a one-in-a-thousand chance that an attacker will be able to gain a million-fold speedup. If one used three independent functions, each of which had a 0.1% chance of allowing such a breakthrough, there would be a 0.3% chance of an attacker being able to achieve a 33% speedup, a 0.0003% chance of an attacker getting a 66% speedup, and only a 0.0000001% chance of an attacker... – supercat Jul 13 '15 at 17:21
2

...getting a million-fold speedup. I would consider the possibility of an attacker getting a 33% speedup as inconsequential compared to the reduction in the probability of the attacker getting a 70%-or-better speedup. – supercat Jul 13 '15 at 17:22
1

Does your "must switch implementations" and "can pipeline" argument not exactly answer "Yes!" to the OP? Pipelining "forever" with FPGAs is not possible (especially if you don't know the next algo in the pipe), but it's trivial for the defender. You can hash 5,000 times in a PBKDF2-alike manner and select a pseudorandom hash from a bag of 3 or 4 algorithms. That kind of complexity will likely force you to break a GPU compute job into several passes, and either spend _millions_ on FPGAs (whereas otherwise a few bucks would do), or reprogram them all the time. In any case, it's much harder. – Damon Jul 14 '15 at 10:07
@DavidZ What you say makes sense, but the issue I have is that I work in "Now" and the attacker works in "Future". And to effectively thwart the attacker I need to figure out which hash is best in "Future", which is beyond my capabilities. – Peter Jul 14 '15 at 11:29
@Peter no, I don't think it is. We're talking about near future - roughly a year to a decade - and in that time frame there is a very low chance of a technological development that fundamentally changes which hash algorithms are the most effective. – David Z Jul 14 '15 at 12:13
@DavidZ The only points of reference I have are past decades, where we had many fundamental changes in every decade. – Peter Jul 14 '15 at 13:35
@Peter there haven't been that many fundamental changes. The big ones are ASICs and parallel processing. (GPUs are essentially a combination of the ideas involved in those.) Two major advances over half a century does not translate to many fundamental changes in every decade. And in terms of which hashes are most effective, the only really major change in recent memory AFAIK is the switchover from SHA-type algorithms, which are CPU-intensive (but easily optimized with hardware) to bcrypt-type algorithms, which are memory-intensive. – David Z Jul 14 '15 at 14:21
@DavidZ Additionally, the kinds of passwords used have changed dramatically - we can no longer use 4 digit all lowercase passwords. This and all the examples you mentioned happened in the very same decade. In the decade before that we started actually using hashes instead of storing passwords as plaintext in a database or textfile, and transmitting them in plaintext over the network. – Peter Jul 14 '15 at 15:23

score 12 · Accepted Answer · answered Jul 12 '15 at 21:32

12

I was asked to make my comment an answer, so here goes.

Basically, I said that the likes of bcrypt, scrypt and pbkdf2 are not hashes themselves, but are KDF's (key derivation functions). KDF's are built upon HMAC algorithms, which in turn are built upon one-way hashing algorithms like SHA-256 to generate the message digest values.

There is already significant mixing and stirring and shaking involved in getting a value out of a KDF.

Even if mixing the output of multiple KDF's doesn't weaken overall security, it seems likely not to materially improve it.

answered Jul 12 '15 at 21:32

Craig Tullis

1,483
10
13

So essentially `Hash1(Hash2(Hash3(password ^ salt)))` and `Hash1(Hash1(Hash1(password ^ salt)))` are both custom key derivation functions, trying to solve the same problem `scrypt` and `bcrypt` are trying to solve. As such, `scrypt` or `bcrypt` on their own are preferable to both of the examples in the question, due to having been subject to more scrutiny by professionals. Correct? – Peter Jul 13 '15 at 22:02
The "mixing and stirring and shaking" is quite off-topic here because a single hash function invocation, even a "broken" one like MD5, already provides all that is needed in that respect. The main attack vector for cracking passwords is brute force (trying potential passwords). The many iterations in normal password hashing functions are not there for the "extra mixing" (there's already more than enough) but for the extra _computing cost_. – Thomas Pornin Jul 13 '15 at 22:05
@ThomasPornin I don't believe anybody suggested that the extra computing cost wasn't a vital, intentional aspect of KDF's. ;-) – Craig Tullis Jul 13 '15 at 23:11
On the other hand, if computing cost was the *only* pertinent consideration, then there would be no benefit, real or perceived, in using scrypt over bcrypt or pdkdf2, right? You would just crank up the iteration count to make the digest computation take as long as you deem appropriate. For that matter, if the base hash functions *really* did more than enough "mixing, stirring and shaking," there really wouldn't be a need to use HMAC functions to defeat length extension attacks, thus going further to prevent collisions and discovery (or partial discovery) of the message. Or am I high? ;-) – Craig Tullis Jul 13 '15 at 23:19
...okay, I know I'm oversimplifying, and that some algorithms can be accelerated more efficiently with certain hardware (GPU's) than other algorithms can be. But ultimately, my statement that "There is already significant mixing and stirring and shaking involved in getting a value out of a KDF" obviously takes into account the action of the underlying hashing algorithm (or block cipher, in the case of bcrypt). What I meant to imply was the same thing you said--that the scrambling already taking place is enough, and mixing multiple KDF's won't likely improve on that. – Craig Tullis Jul 13 '15 at 23:27
The hardness of computing some functions with specialized hardware _is_ the whole point of bcrypt/scrypt over something like PBKDF2. bcrypt makes life hard on a GPU; scrypt also tries to discourage FPGA-based attacks. (Also, PBKDF2 is rather bad at being a KDF, because its overall computing cost is proportional to the output size; a better design would hash the password to a fixed size, then expanded with a fast KDF like HKDF. But that's another debate.) – Thomas Pornin Jul 13 '15 at 23:33
@ThomasPornin Thank you for expanding on that. My misspelling pbkdf2 in these comments (vs spelling it correctly in my comment on the OP's question) was just a typo, incidentally. :-) – Craig Tullis Jul 14 '15 at 02:21

score 10 · Answer 3 · edited Apr 12 '17 at 07:31

To copy my answer to a similar question on progs.SE:

the problem with
hash1(hash2(hash3(...hashn(pass+salt)+salt)+salt)...)+salt)
is that this is only as strong as the weakest hash function in the chain. for example if hashn (the innermost hash) gives a collision the entire hash chain will give a collision (irrespective of what other hashes are in the chain)

a stronger chain would be
hash1(hash2(hash3(...hashn(pass+salt)+pass+salt)+pass+salt)...)+pass+salt)
here we avoid the early collision problem and we essentially generate a salt that depends on the password for the final hash

and if one step in the chain collides it doesn't matter because in the next step the password is used again and should give a different result for different passwords

For password hashing, collisions are a relative non-factor. Unless a password-hashing method is Just Plain Broken or a user's password is exceptionally strong, it will generally be easier for an attacker to find the combination of characters in the user's password than a different combination of characters whose hash also matched it. — supercat, Jul 13 '15 at 17:26

score 6 · Answer 4 · edited Jun 22 '17 at 11:57

Using different hash algorithms will not give you more entropy, rather less. I guess that you have heard the expression «The security is not stronger than the weakest point», same applies here also. The entropy of this will not be more than the weakest algorithm gives.

But hashing the the password with same algorithm multiple times will give you more security. But use a strong hashing algorithm few rounds rather than a weak algorithm many rounds. Example this is more secure:

sha512 ( sha512 ( sha512 ( password + salt )))

Than:

sha1 ( sha1 ( sha1 ( sha1 ( sha1 ( sha1 ( ... sha1 ( password + salt ) ... ))))))

I see that there are people who don't believe at this and want me to prove it. Lets take an example. I have chosen three hashing algorithms, SHA256, SHA1 and MD5.

SHA256 produce 256-bits output
SHA1 produce 160 bits output
MD5 produce 128 bits output

So if we have:

sha256 ( sha1 ( md5 ( password + salt )));

First the password and salt is hashed with the MD5, the output will be 128bits big, so the total possibilities of MD5 output is 2^128. Then the MD5 sum is hashed with SHA1, but you don't need to hash all possibilities, only the 2^128 as MD5 outputs. Then the SHA1 sum is hashed with SHA256, but again. It's not necessary to hash all possibilities of SHA256, only the 2^128 possibilities that MD5 produces. So this hashing algorithm can only produce a output with entropy of 2^128, like the MD5. And MD5 has multiple vulnerabilities, so the actual strength is less than 2^128.

Comments are not for extended discussion; this conversation has been [moved to chat](http://chat.stackexchange.com/rooms/25904/discussion-on-answer-by-bufferoverflow-why-not-mix-hashes). — Rory Alsop, Jul 16 '15 at 07:46

score 2 · Answer 5 · answered Jul 12 '15 at 15:55

2

Mixing hashes usually results in the security of the weakest hash in the chain namely:

First preimage resistance
Second preimage resistance
Collision resistance

Rather than mixing hashes, the CPU power is usually best directed at hashing with more bytes.

In other words, if hashing with two algorithms requires N Joules of work and results in less security, just use SHA512 instead of SHA256 and you'll be more secure.

answered Jul 12 '15 at 15:55

makerofthings7

50,090
54
250
536

4

Collisions and second preimages are irrelevant to password hashing. For first preimages, cascading hash functions actually makes the chain as strong as the _strongest_ hash, not the weakest -- but it does not matter here, since it takes an awful lot of disastrous creativity to actually have preimage weaknesses. Password hashing is (mostly) about computing efficiency, not cryptanalysis. In that view, SHA-512 is "better" than SHA-256 not because of the extra output length, but only because currently sold GPU are poor at 64-bit computations. – Thomas Pornin Jul 12 '15 at 16:07

score 2 · Answer 6 · answered Jul 14 '15 at 01:52

Hashes should not be stacked; rather, the result of multiple hashes should be XORed together. Such a primitive (that XORs together multiple believed-strong hash functions) can be used in the loop of PBKDF or similar to achieve results much superior to stacking hash functions. What has to be factored in to the cost-benefit analysis is that such a protocol requires a lot of additional work to be implemented by the good guys, and there's no evidence that the bad buys can break SHA-256.

Indeed, stacking hashes will protect from, say, an odd-parity "hash" that reduces the keyspace to two bits for all subsequent hashes. — forest, Aug 15 '18 at 00:22

score 1 · Answer 7 · answered Jul 12 '15 at 15:35

1

If you change from:

myhash = Hash1(Hash2(Hash3(password ^ salt)))

...to...

myhash = Hash1(password + salt + Hash2(password + salt + Hash3(password ^ salt)))

...then you may have reduced the risk of a collision of Hash1...

answered Jul 12 '15 at 15:35

KristoferA

347
3
11

Why not mix hashes?

7 Answers7

Linked