Are hash functions "strongest link"?

Question

It seems to me that if you were to use multiple hash functions, the "strongest link" would hold up the others. For example:

HelloWorld

--- md5 -->

68e109f0f40ca72a15e05cc22786f8e6

--- sha256 -->

bdbf17386fbfd51980abdf024e236f5b4d9c2431b3102c0358db4afe2f188db6

MD5 is considered insecure, but since they still have to break the SHA256, the strongest of every hash function you chain together should retain all of it's security, right?

(This is assuming the hashtext is long and complex, salts, etc. I'm talking strictly about an algorithm being broken, not brute force.)

score 5 · Answer 1 · edited Jun 16 '20 at 09:49

5

Update:
After consulting with Thomas Pornin, I'm chainging my answer to "there's no good way to do this". See the comments for this answer for details.

Also see: On the Strength of the Concatenated Hash Combiner when All the Hash Functions are Weak.

Instead of hashing the output of the previous hash, you hash that output in addition to whatever you were hashing before:

So SHA256( MD5(message) ) gives you the worst of both worlds in certain attacks.
But SHA256(message + MD5(message) ) gives you the best of both worlds, since both hashes need to be broken independently.

Note for many usage cases, your message should include salt.

edited Jun 16 '20 at 09:49

Community

1

answered Sep 01 '13 at 00:25

tylerl

82,225
25
148
226

2

This does not hold. `SHA-256(message + MD5(message))` will be no more resistant to preimages than SHA-256 alone. In general, there is no known construction over two hash functions which will be "as secure" as the strongest of the two for all possible attacks. You have to take into account the context: whether you want to defeat collisions, preimages, second preimages... or even if you aim for a "random oracle". – Thomas Pornin Sep 03 '13 at 15:16
@ThomasPornin I was thinking more along the lines of *resistent to as-yet-unknown attacks on theoretically secure hashes*. The goal as far as I could tell is security that is this: you have hashes *x()* and *y()*. You don't know which one will be broken first, but you want your hash to be as strong as whichever proves to be the strongest (though no stronger). So instead of chaining as *x(y(key))* you chain as *x(key+y(key))*. Is that not right? – tylerl Sep 03 '13 at 18:53
If a yet-to-discover attack on _x()_ allows for breaking preimage resistance, then observing _x(key+y(key))_ yields _key_ (and _y(key)_ also), regardless of how strong _y()_ may be. In that sense, you do not get as much security as the stronger of the two. – Thomas Pornin Sep 03 '13 at 18:56
@ThomasPornin So then something like splitting the key in half and using half with each hash? Sounds silly, but still. TBH, my mind jumped straight to: *let's just use a one-way key-derivation function to generate another unique key that will then be hashed*... until I realized this was getting absurd. – tylerl Sep 03 '13 at 19:06
The problem of combining two hash functions into another which is as strong as the stronger of the two, _for all usages_, is not a new problem. To my knowledge, no good solution was found yet, although bad solutions were found (lookup "iterated and concatenated hash functions" for details). I doubt we would come up with something better within a few comments. – Thomas Pornin Sep 03 '13 at 19:29
Bitcoin "addresses" (a hash of a public key) uses SHA256( RIPEMD160(message)). Since that construct is not SHA256(message + RIPEMD160(message))) then is it safe to assume that `message` is secure from first, second preimage attacks, and other issues you describe, @ThomasPornin ?(TIA) – makerofthings7 Sep 03 '13 at 23:28
@makerofthings7 The way I understand it, that construct protects the message from vulnerabilities in `SHA256` but not in `RIPEMD160`. Conversely, duplicating the message as describe protects you from `RIPEMD160`, but *not* from `SHA256`. Choose your poison, apparently. – tylerl Sep 04 '13 at 01:28
1

`h(g(m))` for hash functions _g_ and _h_ is no more secure than _g_ alone against collisions and second preimages; it is as secure as the stronger of the two against preimages; its "randomness" (as in "random oracle") will be bounded by the output size of _g_. – Thomas Pornin Sep 04 '13 at 10:33

score 3 · Answer 2 · answered Aug 31 '13 at 02:00

No. Let's imagine we have c compute cycles available to hash each password and s is our unit of strength of our hashing algorithms per compute cycle used (the effort it would take to match its output with some precomputed table, in relation to the effort we put into hashing it). Now let's for our example combine different strength hashing algorithms, and for the sake of simplicity pretend they all take c/n compute cycles, where n is the number of different hashing algorithms. Now let's assume we use n=2 where h₁ has a strength s4, and h₂=s2 (hashing algorithm 2 has a strength of 2).

Our formula to calculate the strength will be s_x = (h₁ + h₂ + ... + h_n) / n. In the example where we combined two hash functions of various strengths (4 and 2, respectively), we end up with s_mix = (4 + 2) / 2 = 3, and if you used the strongest hashing algorithm with double the number of iterations (using up all the compute cycles available, same as with our first example), you'd end up with s_best = (4 + 4) / 2 = 4.

score 3 · Answer 3 · answered Aug 31 '13 at 22:20

No. The combined hash will be at least as weak as the first hash (MD5 in your example) against collision and second preimage attacks. That is, we know how to generate message pairs such that MD5(message1) = MD5(message2), therefore we also know how to generate message pairs such that SHA256(MD5(message1)) = SHA256(MD5(message2)). Similarly, if we found a way, given message1, to find a second preimage message2 such that MD5(message1) = MD5(message2), we'd also trivially have a second preimage attack against SHA256(MD5()).

More subtly, the entropy of the final hash value is limited by the shortest hash. In the case of SHA256(MD5()), it'll have a 256-bit hash value with an entropy of at most 128 bits.

These issues may or may not be significant problems in any given application, but they're certainly not good things. And I'm not going to promise they're the only problems with this construction.

That said, in applications where one is mainly concerned about first preimage resistance (such as password hashing), using nested hashing like SHA256(MD5(password)) is generally no worse than using the strongest of the hashes alone, at least assuming that none of the hashes is completely terrible (i.e. so bad that that there's a non-negligible probability of collisions between typical inputs occurring just by chance). — Ilmari Karonen, Sep 01 '13 at 14:26

score 3 · Answer 4 · edited Mar 17 '17 at 13:14

Yes and no.

Generally, there are three distinct security properties required of a cryptographic hash function:

First preimage resistance: Given a random hash value, it should not be feasible to find an input hashing to it.
Second preimage resistance: Given an input string, it should not be feasible to find another input hashing to the same value.
Collision resistance: It should not be feasible to find any two inputs hashing to the same value.

Generally, each of the properties listed above implies the preceding ones in the list: if a hash function is collision resistant, then it's also preimage resistant, and if it's second preimage resistant, then it's also first preimage resistance. (There are some weird edge cases, e.g. involving restricted input domains, where this might not quite hold, but for normal cryptographic hashes it does.)

For example, practical collision attacks are known against the MD5 hash function, capable of finding collisions within a few seconds on a modern CPU. However, while there are known theoretical attacks that can find MD5 preimages about 20 to 30 times faster than by brute force, none of those attacks are anywhere near fast enough to carry out in practice. For the SHA-256 hash function, no practical attacks against any of these properties are currently known.

Another difference between hashes that affects their security is their output length. For a hash with n-bit output, no matter how strong, a simple brute force attack can find a first or second preimage using on average 2ⁿ hash evaluations, and, due to the birthday paradox, a collision after only about 2^{n / 2} evaluations. For example, even if MD5 wasn't broken, its 128-bit output length means that finding a collision by brute force would be possible using only about 2⁶⁴ hash operations. That's a lot of work, but it's just about technically feasible for an attacker using a large distributed computing network and/or lots of expensive custom hardware. SHA-256 generates, as the name indicates, 256-bit hashes, meaning that running the same attack on it would require over 2¹²⁸ hash operations, way beyond anything even conceivable using known physics.

Anyway, let's see how nesting a weak and a strong hash function (say, MD5 and SHA-256) would affect the different security properties:

First preimage resistance

If we find a first preimage P for a given hash value H, such that SHA256(MD5(P)) = H, then we've also found a preimage P_SHA = MD5(P) such that SHA256(P_SHA) = H. Conversely, if we can efficiently find such preimages for the nested hash, we can also find a preimage P_MD5 such that MD5(P_MD5) = H' for any MD5 hash H' just by solving SHA256(H') = SHA256(MD5(P_MD5).

Thus, a nested hash is at least as strong against first preimage attacks as the strongest of the hashes. This may seem like good news for applications like password hashing, whose security depends on first preimage resistance (and generally, it is). However, there's a technical catch related to the output lengths of the hashes that we should be aware of.

To see what the problem is, consider what would happen if we replaced MD5 in the nested hash with a trivial hash that produced only a one-bit output. Now our nested hash, despite nominally producing 256 bits of output, would only have two possible output values. If the hash value we were attacking was one of those two values (as it would have to be, if we knew that some preimage existed), then finding a first preimage would be trivial. Conversely, if we tried to find a preimage for any other 256-bit hash value, it would be impossible (which is why this result doesn't contradict the earlier proof above).

In the case of SHA256(MD5(·)), we similarly have only 2¹²⁸ possible outputs, even though they're nominally 256 bits long. Thus, a generic brute-force preimage attack on the nested hash takes only 2¹²⁸ operations on average, not 2²⁵⁶ as it would for a "real" 256-bit hash. Effectively, our nested hash looks like a 256-bit hash, but only has 128 bits of strength against brute force attacks.

Still, as long as attacks requiring 2¹²⁸ operations are infeasible (and they are, for any foreseeable length of time), we don't need to worry about this, and we can safely use SHA256(MD5(·)) for password hashing (with appropriate salting and key stretching, of course).

Second preimage resistance

As Gordon Davisson shows in his answer, SHA256(MD5(·)) is only as strong as MD5 against second preimage attacks: if, for a given message M, we can find another message M' ≠ M such that MD5(M') = MD5(M), then it's also true that SHA256(MD5(M')) = SHA256(MD5(M)).

Thus, nested hashing is no stronger against second preimage attacks than the innermost hash. This is bad news for applications like digital signatures, where a second preimage attack on the hash generally translates to a forgery attack on the signature.

We could do better by using MD5(SHA256(·)) instead, since that would put the stronger hash on the inside. However, the resulting hash would still be no stronger against second preimage attacks than just SHA-256 alone. In fact, due to the shorter output length of MD5, it would be weaker than SHA-256 alone against generic brute force attacks.

Collision resistance

The situation with collision attacks is pretty much the same as for second preimage attacks: if we can find a collision for MD5 (and we can!), then we also have a collision for SHA256(MD5(·)). Thus, SHA256(MD5(·)) is utterly broken against collision attacks.

Again, this is bad news for things like digital signatures, where a hash collision could allow a malicious user to pull off a "bait and switch" attack on a third party, where they persuade the third party to sign an innocent-looking message (e.g. an SSL certificate) and then replace it with a different message having the same hash.

Reversing the nesting order again makes things somewhat better: just being able to generate MD5 collisions doesn't help us much against MD5(SHA256(·)), since, even if the colliding MD5 inputs were both valid SHA-256 outputs, we'd still have to find preimages for them. But even so, this nested hash still wouldn't be any stronger against collision attacks than SHA-256 alone.

Summary

Against (non-brute-force) first preimage attacks, a nested hash is at least as strong as the strongest of its component hashes. Against other kinds of attacks, however, a nested hash is no stronger than the innermost hash.

Thus, for most purposes, simply nesting hash functions is not a good idea. A much better alternative is to use the method suggested by tylerl, where the inner hash output is concatenated with the original input before applying the outer hash, e.g. H = SHA256(M + MD5(M)), where + denotes concatenation. This construction can be easily shown to be at least as strong as the outer hash against any kind of attacks.

However, there's one situation where using H = SHA256(MD5(M)) can be legitimately useful, and that's upgrading an old password database using MD5 hashes to use a more secure hashing mechanism. Provided that the old hashes are long enough to resist brute force preimage attacks, and that the old hashing method doesn't do anything stupid that could create a non-negligible probability of collision between normal inputs (like throwing away all but the first 8 bytes of the input), simply re-hashing the existing password hashes using a stronger hash function can provide an immediate security improvement without requiring user interaction (e.g. logging in).

Of course, if you're upgrading an old password database to meet modern standards, you really should be using something like PBKDF2-HMAC-SHA256 instead of plain SHA-256 as the outer hash, with a proper salt and a sufficiently large iteration count to make brute-force attacks difficult. If your old password hashing scheme also used a salt, your new password hashes might end up looking something like this:

type:new-salt:count:old-salt:hash

where type is a scheme identifier (so you can easily change the hashing scheme again later), new-salt and old-salt are the salts for the new and old hashes, count is the iteration count for PBKDF2 (again included so that you easily change it later) and hash = PBKDF2-HMAC-SHA256(old-hash(password, old-salt), new-salt, count).

Of course, this kind of nested password hashing scheme is most useful as just a transitional solution. As your users log in or change their password, you can gradually replace these transitional password hashes with a simpler hashing scheme (with a different type identifier) that doesn't use the old hashing scheme at all. That way, once all your user accounts are migrated to use the new, simpler scheme (which could take a while), you can later retire the old password hashing code entirely.

score 0 · Answer 5 · answered Sep 03 '13 at 21:57

I'd recommend avoiding SHA and MD5 hashes altogether for security uses... They're built for speed which is great for password crackers.

If possible, use Blowfish/BCrypt which builds hashes in an expensive (adjustable) way and even auto-generates salt for you.

If you're in the process of upgrading existing hashes to the new system then you'll only be able to do it as each user logs in as that is the only time you'll have access to their raw password.

Are hash functions "strongest link"?

5 Answers5

First preimage resistance

Second preimage resistance

Collision resistance

Summary