83

I mean, is it just a matter of "how difficult is it to reverse the function with the current technology"?

Or is there a mathematical concept or property that makes them different?

If it is a matter of "how difficult is it to reverse the function", then is it correct to say that with the progress of the technology, some Cryptographic Hash Functions stop being Cryptographic to be just Hash Functions? Is this what happened to MD5?

Ulkoma
  • 8,793
  • 16
  • 65
  • 95
Mr.Eddart
  • 933
  • 1
  • 7
  • 6

3 Answers3

81

Every cryptographic hash function is a hash function. But not every hash function is a cryptographic hash.

A cryptographic hash function aims to guarantee a number of security properties. Most importantly that it's hard to find collisions or pre-images and that the output appears random. (There are a few more properties, and "hard" has well defined bounds in this context, but that's not important here.)

Non cryptographic hash functions just try to avoid collisions for non malicious input. Some aim to detect accidental changes in data (CRCs), others try to put objects into different buckets in a hash table with as few collisions as possible.

In exchange for weaker guarantees they are typically (much) faster.

I'd still call MD5 a cryptographic hash function, since it aimed to provide security. But it's broken, and thus no longer usable as a cryptographic hash. On the other hand when you have a non cryptographic hash function, you can't really call it "broken", since it never tried to be secure in the first place.

Flimzy
  • 655
  • 1
  • 6
  • 14
CodesInChaos
  • 11,854
  • 2
  • 40
  • 50
  • 2
    Thanks for your answer! why is MD5 broken? Is it because the technology improved? or is it because somebody detected a mathematic flaw? If it is due to the improvement of technology, can we say that the "cryptographic property" of any cryptographic hash function is relative to the current state of technology? and that soon or later, all the currently considered "cryptographic hash functions" will eventually be "broken", and thus stop being cryptographic? – Mr.Eddart Feb 17 '12 at 10:17
  • 4
    There are mathematical flaws in MD5 which allow an attacker to construct several messages which have the same MD5 hash cheaply. SHA-1/2 also have some minor mathematical flaws. If such a mathematical flaw will be discovered in a certain hash function is hard to predict, since it requires the discovery of new crypto analysis methods. – CodesInChaos Feb 17 '12 at 10:29
  • 1
    On the other hand, the mere increase of computational power of conventional computers will not threaten larger hash functions in the foreseeable future. – CodesInChaos Feb 17 '12 at 10:31
  • thanks CodeInChaos, so just to be sure, now that it is "cheap" to produce collisions on MD5, is it a "Cryptographic Hash Function" anymore? or it became simply a regular "Hash Function" due to the discovery of this mathematical flaw? Also, conceptually, MD5 has never been a Cryptographic hash, because it has always had this flaw, although it was not still discovered, is this correct? thanks a lot again! – Mr.Eddart Feb 17 '12 at 10:38
  • 2
    @edutesoy That's just playing with words, and I'm no philosopher. Personally I call every hash function that *aims* to be secure a "cryptographic hash function", even if it has been broken by now. – CodesInChaos Feb 17 '12 at 10:45
  • Ok thanks, it is just that i am not familiarized with the terminology and I was trying to stablish clear boundaries, you helped a lot :) – Mr.Eddart Feb 17 '12 at 12:31
  • 2
    @edutesoy If you're looking for a label for it, I guess you could say MD5 is an 'insecure' cryptographic hash function, aka broken. – greatwolf Feb 05 '17 at 03:22
  • There are different levels of broken. No known attack, means that the only way of breaking it (generating the same hash) would be to brute force it the hard way. Theoretical attack, means that someone has discovered a flaw allowing it to be broken a lot faster than brute forcing, but it's still hard enough that nobody's done it yet. And practical attack, means that someone has actually used that flaw to break it. Recently SHA1 moved from "theoretical attack" to "practical attack" meaning someone finally broke SHA1, even though we've known it was breakable for years now. – thomasrutter Mar 22 '17 at 03:10
24

There are some properties that cryptographically secure hash functions strongly require, that are not so strongly required for non-cryptographically secure hash functions:

  • preimage resistance (given a hash h it must be difficult to find a message m that yields h when hashed
  • weak collision resistance (given a message m1 it must be difficult to find a different message m2 so that m1 and m2 yield the same hash)
  • strong collision resistance (it should be difficult to find any messages m1 and m2 that yield the same hash)

In those points, you see a lot of difficult, which is a qualitative measure instead of a quantitative one. The best answer here is feasibility: there is a fuzzy line when something becomes feasible and those lines move over time (as computation capabilities grow exponentially according to Moore's Law, once difficult problems can now be solved by your cell phone).

In general it's good practice to assume that difficult means that the time to achieve some goal is NP-complete. This means the time required to break the hash grows strongly as you increase the hash length.

Another point is, that a cryptographically secure hashing algorithm can be useful in some applications, but not in others. It depends on the model of your attacker, the nature of the information you want to protect and things like performance requirements (as a general rule, the better the cryptographic properties of a hash, the worse it's runtime behaviour).

jupp0r
  • 341
  • 1
  • 4
  • 5
    NP-complete does not mean exponential growth. Not until one proves that `P < NP`. – ypercubeᵀᴹ Feb 17 '12 at 08:55
  • For hash functions difficult mean - can not be done better than using brute force. –  Feb 17 '12 at 09:08
  • @ypercube: thanks, that's true, I corrected my answer. I wasn't aware there were complexity classes in-between, although they strike me as somewhat exotic. –  Feb 17 '12 at 09:28
  • @ralu: my definition is (mathematically) stronger, being only able to use brute force is included in it. –  Feb 17 '12 at 09:49
  • @jupp0r thanks for your answer. Does this imply that a Cryptographic Hash Function can stop being Cryptographic over the time? because what is dificult to do today, with the improvement of the technology, can be feasible tomorrow. So is this "cryptographic property" something relative to the state of the technology? thanks. – Mr.Eddart Feb 17 '12 at 09:52
  • @edutesoy: practically: yes, theoretically: depends on what exact definition of **difficult** to use. If NP time complexity is your measure than a 4 bit hash algorithm can be secure, that's obviously nonsense. That's why security researchers use the somewhat fuzzy concept of 'infeasibility'. What's infeasible today can easily be feasible tomorrow (especially if you think about quantum computers, etc). Generally, it also depends on your attacker model. If your attacker model is a guy with a multicore machine, that's another story from your attacker being the NSA. –  Feb 17 '12 at 10:00
  • @juppr: If I am not wrong, the following is true: If `P=NP` then off course there is no class between. If `P – ypercubeᵀᴹ Feb 17 '12 at 10:01
  • @ypercube there are classes in between, even with `P=NP`. If `P=NP` it just makes no sense to use those algorithms, because there are better ones :) –  Feb 17 '12 at 10:15
  • @jupp0r: If `P=NP` how can there be a class `Q` with `P – ypercubeᵀᴹ Feb 17 '12 at 10:18
  • 6
    The appropriate prefix is "cryptographically secure" and not just "cryptographic". For instance, MD5 is no longer considered cryptographically secure, but it is arguably still a cryptographic hash function, given the original purpose of its design. –  Feb 17 '12 at 12:03
  • @HenrickHellström: Thanks for your corrections, I updated the answer. –  Feb 17 '12 at 13:07
  • 3
    The *weak collision resistance* is also known as *second preimage resistance*. – Paŭlo Ebermann Feb 17 '12 at 18:25
  • @ypercube If P=NP there wouldn't be class Q with P – Dan Is Fiddling By Firelight Feb 17 '12 at 21:09
  • 2
    The meaning of difficult is quite well defined in this context: 2^n for preimage and second preimage resistance, and 2^(n/2) for collision resistance. While we can't prove that the problem is that hard, a hash function is considered broken, once algorithm are known which are faster than these bounds. – CodesInChaos May 12 '12 at 08:57
4

I'd say that the two key things to understand here are:

  1. The term "hash function" is vague—or more precisely, polysemous: it has a "family" of meanings that are closely related but distinct. If somebody labels a function as a "hash function," the label simply doesn't tell you what properties that function must have. You have to examine the context where the term is used and the requirements of that context.
  2. The term "cryptographic hash function" is a slight misnomer—it looks like a description, but it has an involved technical definition that the term itself doesn't actually describe. To put it simply, there are functions like message authentication codes (MAC) that are often labeled as hash functions and offer some form of cryptographic security, but are not "cryptographic hash functions" in the conventional definition.

The term "cryptographic hash function" is conventionally used to refer to what might be better labeled as collision-resistant hash functions, which are public functions ("public" = don't require a secret key) that are required to have these three properties:

  • Second preimage resistance: For a random value m1 chosen by an honest party, it's very costly for an attacker to find any value m2 ≠ m1 such that hash(m1) = hash(m2).
  • Preimage resistance: For a random value h chosen by an honest party, it's very costly for an attacker to find any value m such that hash(m) = h.
  • Collision resistance: It's very costly for an attacker to find any pair of values m1 ≠ m2 such that hash(m1) = hash(m2).

There's a fourth property that older cryptographic hash functions fail trivially but which newer ones like SHA-3 and Blake2 are designed to achieve:

  • Random oracle indifferentiability: This one is all but impossible to explain briefly, but let's dumb it down to this: it's very costly for an attacker to find any non-chance correlations between the outputs of inputs of their choice.

The random oracle property (when correctly formulated) implies the three previous properties, as well as additional ones like the absence of efficient length extension attacks. (Length extensions are the most obvious reason why older hash functions like SHA-256 and SHA-512 fail the random oracle property.)

Luis Casillas
  • 10,181
  • 2
  • 27
  • 42