29

Given an option , which one should I choose , a HMAC for storing a password securely or a bcrypt or scrypt library?

curiousguy
  • 5,028
  • 3
  • 25
  • 27
user917279
  • 463
  • 1
  • 4
  • 11
  • Just to clarify, by security you mean resistance to brute force attack on the hash? – Rory Alsop Jul 04 '12 at 11:55
  • 2
    I'm surprised why you are considering using HMAC for passwords. Isn't HMAC generally used to verify data authenticity and integrity? –  Jul 04 '12 at 12:11
  • Which specific security property of HMAC (that a simple hash does not provide) do you need for password storage? – curiousguy Jul 04 '12 at 14:09
  • duplicate: http://security.stackexchange.com/questions/3165/hmac-why-not-hmac-for-password-storage?rq=1 ? – Jacco Jul 04 '12 at 16:23
  • @Roy Alsop - Yes , brute force attack. – user917279 Jul 05 '12 at 09:35
  • @Terry Chia - I have read somewhere and ve seen implementations that use HMAC for a password. – user917279 Jul 05 '12 at 09:39
  • 1
    @curiousguy - that is what I am trying to understand , as far as my knowledge goes, HMAC is little more secure than a hash , since it requires a key . – user917279 Jul 05 '12 at 09:39
  • @Jacco - I am sorry, I have read through the question and the answers pointed out , but was not able to get an answer I required. – user917279 Jul 05 '12 at 09:40
  • @user917279, I just wanted to link to the other question, with the questionmark I tried to indicate they might not be identical duplicates – Jacco Jul 05 '12 at 10:03

3 Answers3

42

In order to give you a proper idea of the problems and subtleties of computing password hashes, as well as why HMAC isn't suitable for this problem, I'll provide a much broader answer than is really necessary to directly answer the question.

A HMAC hash algorithm is, essentially, just a keyed version of a normal hash algorithm. It is usually used to verify integrity and authenticity. The usual notation of this is H(m,k) = h, where H is the HMAC hash algorithm, m is the message, k is the key, and h is the resulting hash. The idea is that two parties that share a secret k can verify that the other person is the author of m. Furthermore, an attacker cannot forge a message hash without knowing k.

This is done as follows:

  1. Alice and Bob both know a shared secret key k.
  2. Alice writes a message m, and computes the HMAC hash of it using k, i.e. H(m,k) = h.
  3. Alice sends the message m and hash h to Bob.
  4. Bob computes H(m,k) and compares it to the hash h that Alice sent. If the hashes match, he knows that Alice sent the message, and that it was not altered after she hashed it.

Now that you understand what HMAC is, let's move onto what you really want to do - store passwords in a database.

Many years ago, it was standard practice to store passwords in plaintext in databases. This was a bad idea because, when the database was compromised, the attacker got all the passwords. To combat this, we started hashing passwords in the database using one-way cryptographic hash algorithms. MD5 became popular, but weaknesses (collisions, partial preimage, etc) discovered in it mean it's no longer recommended. A lot of people moved onto SHA1, which is more secure.

The problem with this approach is that it's possible to construct a huge table of hashes and their corresponding plaintexts. These are called rainbow tables. They work on the concept that it is more efficient to compute a huge list of hashes for all possible passwords (within a certain set) and then store it, so it can be quickly queried later on. So, instead of brute-forcing individual hashes, it became possible to just query the database for a hash and have its plaintext returned immediately.

In order to fix this, security nerds invented salts. Salts are large unique random values appended to passwords before they are hashed. This salt is stored with the hash, so that it can be computed again later. So, we compute H(m+s) = h, then store h and s in the database. This provides significant protection against rainbow tables because it essentially requires a separate rainbow table to be generated for each salt.

So, the bad guys switched back to dictionary attacks and brute-force cracking. With the advent of GPU computing, it became possible to compute billions of hashes per second on a moderately powerful graphics card. In fact, people have built computers that can compute nearly 50 billion MD5 hashes per second - pretty impressive / scary. The reason GPUs are capable of this is that they're designed to do huge numbers of parallel scalar operations. Scalar operations are math and logic operations that do not involve branches - i.e. they don't need to do much/any "if x then do y". Cryptographic hash algorithms tend to fit into this model.

In order to make this difficult, we have to make the hash operation slow enough to make brute forcing infeasible. Normal hash algorithms (e.g. SHA1) are designed to be fast, which makes them unsuitable for this purpose. HMAC adds very little overhead and no extra security margin, so it too isn't much use here.

Creating a slow cryptographic hash algorithm is easier said than done - it's very hard to come up with one that is slow, irreducible (i.e. cannot be optimised beyond its current state) and secure. There are three popular hash functions that can do this: PBKDF2, bcrypt, and scrypt. These are known as adaptive key derivation functions, because they accept a work factor value along with the plaintext and salt. The work factor alters the amount of time it takes to compute the hash, and is designed to safeguard against future hardware improvements.

So, for an adaptive key derivation algorithm H, we compute H(m,s,w) = h, where m is the message (password), s is the salt, and w is the work factor. The resulting h usually contains s and w, so that a verification function can later compute the same hash using the same parameters. The work factor generally controls the number of iterations performed of an internal cryptographic primitive. The goal is to make the computation take long enough to make cracking infeasible, but not to exceed the resources that we have.

In order to provide more security against dedicated hardware-based cracking, scrypt ensures that the hash computation is both CPU-hard and memory-hard, i.e. it requires significant CPU and memory resources in order to produce the hash value. This is important, because FPGAs usually have access to very little immediate memory.

Now, the obvious question arises: if I set up my site to use bcrypt, won't that mean my server has to compute CPU-intensive hashes all the time? Well, if you run them on your server, then yes, that's something to take into consideration. A better solution is to have the client compute the hash, then send it to the server over SSL. The server can then compare it to the value in the database. This ensures that the password hash cannot be easily cracked if it is stolen (e.g. through a database compromise), and that your server isn't overwhelmed by the overhead of password hash computation. For websites, you can use jsBcrypt. Update: This method has been pointed out as flawed by a commenter below. Don't use it.

Hopefully this gives you a good overview of the situation, and why HMAC isn't suitable for this kind of use.

Polynomial
  • 132,208
  • 43
  • 298
  • 379
  • 2
    +1 for the good writeup. The thing that is often cited is that, if the password hashes are stolen by a limited read only vulnerability (for example SQL-injection), the attacker has, due to the absence of a secret-key, all the variables needed for an offline attack. There has been some discussion about adding both a [pepper and a salt](http://security.stackexchange.com/a/3279/2113). The accepted answer there advices against HMAC and recommends BCrypt. – Jacco Jul 04 '12 at 16:21
  • @curiousguy Good point. I can't really see a solution to that issue. I'll add a strikethrough for that part. Are there any solid solutions to the problem of having CPU intensive hashes computed on the server? – Polynomial Jul 04 '12 at 18:06
  • @Polynomial, first thank you very much for the very descriptive answer, as long as I store the secret key too secretive, HMAC should be better than CPU intensive operations with bcrypt/scrypt , that would make an online user wait until he logs in? – user917279 Jul 05 '12 at 09:54
  • let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/4001/discussion-between-curiousguy-and-jacco) – curiousguy Jul 05 '12 at 16:13
  • 3
    Just throttle successful login attempts to one per 15s per IP, and block the IP for 30 minutes if it issues 20 or more logins in any 10 minute period. No legit client would ever reach that number of logins. Then throttle failed login attempts to 2s, 10s, 20s, 40s, 10 minutes for each subsequent attempt. Prevents online cracking and DoS, and quickly exhausts the IP pool of a DDoS attack. – Polynomial Jul 09 '12 at 07:55
  • So the difference between 1) the "work factor" in adaptive key derivation functions and 2) the number of iterations in a general H(H(H(m+s))...) hashing — is that in the first option, the work factor is stored along the hash so you can re-hash (verify) passwords with differing number of iterations? – Daniel Serodio Aug 15 '12 at 19:33
  • @DanielSerodio The work factor is usually a base value that is used to compute the number of iterations, but yes that's essentially true. The point is that passwords could have any number of iterations. If you want to tweak it, new hashes can be given new factors. – Polynomial Aug 16 '12 at 20:18
  • so basically the problem with HMAC is that HMAC fails with the salting – Webwoman Aug 27 '18 at 16:05
10

HMAC is a message authentication code; it uses a key. Bcrypt does not. Thus, the choice is not neutral; you cannot think of it all things being otherwise equal, because they are not.

Although nominally for integrity checks, it so happens that HMAC (when used with a reasonably secure hash function, e.g. SHA-256 or even SHA-1) behaves somehow like "a hash function with a key". This is not a generic property of MAC algorithms, but it works with HMAC (that's why it can be used as basis for a random generator called Hmac_DRBG). This makes HMAC a potential choice for password hashing.

If you "hash" your password with HMAC, and the attacker could grab your file/database of hashed password but not the HMAC key, then the attacker will not be able to crack the passwords. In that scenario, HMAC is "better" than bcrypt/scrypt. However, this is an edge case. We hash passwords because we worry about fringe scenarios where the attacker could breach security enough to get a read-only access to part of the server files, but not a read-write access. Th e hash-with-HMAC method is for a fringe-squared scenario where the read-only attacker could get the hashed password but not the key, which is nonetheless stored on the same server.

If the attacker could get the key, then HMAC becomes only a simple hash function for him, and in that case HMAC is much worse than bcrypt, because it is unsalted and too damn fast. It is like if the passwords where hashed with a pair of SHA-1 invocations. That scenario is no less probable than the previous, and thus we must consider use of HMAC alone as too risky.


For the best of both worlds, apply HMAC on the user password, and then process the HMAC output through bcrypt; you will store the output of bcrypt. Since bcrypt implementations expect the password as a character string, you would have to encode the HMAC output (hexadecimal, Base64...).

This is more complex than bcrypt alone, because of using two functions instead of one, and because of the key (which brings the whole thorny issue of key management: generation, storage, backup...). Since complexity is bad, I would recommend against it; simply use bcrypt alone, and don't add an HMAC. However, this is your call; if you really want the HMAC, use it in addition to bcrypt, not instead of bcrypt.

(Note: that's bcrypt on the HMAC output, not the other way round, which would not work because of the salt which is included in the bcrypt output.)

Thomas Pornin
  • 320,799
  • 57
  • 780
  • 949
  • 1
    Could you please explain the *"best of both worlds"* solution. I'm confused, because if we apply HMAC on the password using a key, apply bcrypt on the result of the HMAC, and then store that digest, how could we ever get the result of the HMAC back so that we can do our password/HMACkey verification? I feel I'm missing some vital part, and this is all very new for me. Thanks. – the system Jan 18 '13 at 23:13
  • HMAC is deterministic. To verify a given password _p_, recompute the HMAC on _p_ (with your key), and use the result as the input to your bcrypt verification routine (along with the stored bcrypt output, which contains the salt). – Thomas Pornin Jan 18 '13 at 23:51
  • Oh, of course. I was thinking we had to store the HMAC output, but we don't since we calculate it again from the password that the user sent for login. Thanks again. – the system Jan 19 '13 at 00:04
  • Among all the answers related to this topic, I like this answer best. Because it clearly explains: 1. `HMAC` is secure or somewhat "better" if your key/server is not compromised(only database is compromised), even if `HMAC` uses fast hash function 2. We should not rely on scenario `1.` Because it's also highly likely that the server also get compromised. 3. So `HMAC` + `bcrypt` combination is good but adds too much more complexity. 4. Using `Bcrypt` is somehow good enough :) – Rick Jul 01 '19 at 04:28
2

HMAC is designed to be very fast and is in this context a good way to add salt to password instead of just appending it. Bcrypt is much slower due to slow initialization, while scrypt is even slower than Bcrypt because it is intentionally designed in such a way. Scrypt is designed to make brute forcing it very computationally expensive. It consumes a lot of CPU, memory and is also slow to use on GPUs.

More about scrypt

Matrix
  • 3,988
  • 14
  • 25
  • 2
    And the point of the story is, that you should prefer a slow hash algorithm to store passwords (bcrypt or scrypt), because fast hashes can be brute forced too easily (e.g. with a GPU). – martinstoeckli Jul 04 '12 at 13:06
  • "_add salt to password instead of just appending it_" I don't understand "add" vs. "append"? – curiousguy Jul 04 '12 at 16:28
  • @curiousguy: Combining password and salt is cryptographically more secure when done with HMAC compared to a simple concatenation. I do not know why and I could be wrong. It's what I've read somewhere recently, but I cannot find the source. I think it was among comments on Hackernews. – Matrix Jul 04 '12 at 21:09
  • @Matrix HMAC is more secure than hash in general: finding a collision with the hash will not directly give a collision on the HMAC built on that hash. But **finding collisions is not really an issue with passwords**: given the hash and your password, you might find another password that would be accepted by the system. How is this a serious security issue? – curiousguy Jul 04 '12 at 23:47