926

If I hash passwords before storing them in my database, is that sufficient to prevent them being recovered by anyone?

I should point out that this relates only to retrieval directly from the database, and not any other type of attack, such as bruteforcing the login page of the application, keylogger on the client, and of course rubberhose cryptanalysis (or nowadays we should call it "Chocolate Cryptanalysis").

Of course any form of hash will not prevent those attacks.

AviD
  • 72,138
  • 22
  • 136
  • 218
  • 1
    There is also this [Stackoverflow thread](http://stackoverflow.com/questions/481160/is-bcrypt-a-good-encryption-algorithm-to-use-in-c-where-can-i-find-it) – Casebash Jul 26 '11 at 06:22
  • 3
    It's interesting that (if I read it right) after offering chocolate/guessing there was almost an identical number of folks revealing their password. Does that mean us IT professionals are just more susceptible to chocolate? ;) – Wayne Werner Sep 04 '12 at 16:50
  • 4
    This question made it to the blog: http://security.blogoverflow.com/2013/09/about-secure-password-hashing/ – Shnatsel Sep 14 '13 at 15:01
  • More recent similar discussion [In 2018, what is the recommended hash to store passwords: bcrypt, scrypt, Argon2?](//security.stackexchange.com/q/193351) – Michael Freidgeim Sep 28 '19 at 06:39

11 Answers11

1190

Note: This answer was written in 2013. Many things have changed in the following years, which means that this answer should primarily be seen as how best practices used to be in 2013.


The Theory

We need to hash passwords as a second line of defence. A server which can authenticate users necessarily contains, somewhere in its entrails, some data which can be used to validate a password. A very simple system would just store the passwords themselves, and validation would be a simple comparison. But if a hostile outsider were to gain a simple glimpse at the contents of the file or database table which contains the passwords, then that attacker would learn a lot. Unfortunately, such partial, read-only breaches do occur in practice (a mislaid backup tape, a decommissioned but not wiped-out hard disk, an aftermath of a SQL injection attack -- the possibilities are numerous). See this blog post for a detailed discussion.

Since the overall contents of a server that can validate passwords are necessarily sufficient to indeed validate passwords, an attacker who obtained a read-only snapshot of the server is in a position to make an offline dictionary attack: he tries potential passwords until a match is found. This is unavoidable. So we want to make that kind of attack as hard as possible. Our tools are the following:

  • Cryptographic hash functions: these are fascinating mathematical objects which everybody can compute efficiently, and yet nobody knows how to invert them. This looks good for our problem - the server could store a hash of a password; when presented with a putative password, the server just has to hash it to see if it gets the same value; and yet, knowing the hash does not reveal the password itself.

  • Salts: among the advantages of the attacker over the defender is parallelism. The attacker usually grabs a whole list of hashed passwords, and is interested in breaking as many of them as possible. He may try to attack several in parallel. For instance, the attacker may consider one potential password, hash it, and then compare the value with 100 hashed passwords; this means that the attacker shares the cost of hashing over several attacked passwords. A similar optimisation is precomputed tables, including rainbow tables; this is still parallelism, with a space-time change of coordinates.

    The common characteristic of all attacks which use parallelism is that they work over several passwords which were processed with the exact same hash function. Salting is about using not one hash function, but a lot of distinct hash functions; ideally, each instance of password hashing should use its own hash function. A salt is a way to select a specific hash function among a big family of hash functions. Properly applied salts will completely thwart parallel attacks (including rainbow tables).

  • Slowness: computers become faster over time (Gordon Moore, co-founder of Intel, theorized it in his famous law). Human brains do not. This means that attackers can "try" more and more potential passwords as years pass, while users cannot remember more and more complex passwords (or flatly refuse to). To counter that trend, we can make hashing inherently slow by defining the hash function to use a lot of internal iterations (thousands, possibly millions).

We have a few standard cryptographic hash functions; the most famous are MD5 and the SHA family. Building a secure hash function out of elementary operations is far from easy. When cryptographers want to do that, they think hard, then harder, and organize a tournament where the functions fight each other fiercely. When hundreds of cryptographers gnawed and scraped and punched at a function for several years and found nothing bad to say about it, then they begin to admit that maybe that specific function could be considered as more or less secure. This is just what happened in the SHA-3 competition. We have to use this way of designing hash function because we know no better way. Mathematically, we do not know if secure hash functions actually exist; we just have "candidates" (that's the difference between "it cannot be broken" and "nobody in the world knows how to break it").

A basic hash function, even if secure as a hash function, is not appropriate for password hashing, because:

  • it is unsalted, allowing for parallel attacks (rainbow tables for MD5 or SHA-1 can be obtained for free, you do not even need to recompute them yourself);
  • it is way too fast, and gets faster with technological advances. With a recent GPU (i.e. off-the-shelf consumer product which everybody can buy), hashing rate is counted in billions of passwords per second.

So we need something better. It so happens that slapping together a hash function and a salt, and iterating it, is not easier to do than designing a hash function -- at least, if you want the result to be secure. There again, you have to rely on standard constructions which have survived the continuous onslaught of vindicative cryptographers.

Good Password Hashing Functions

PBKDF2

PBKDF2 comes from PKCS#5. It is parameterized with an iteration count (an integer, at least 1, no upper limit), a salt (an arbitrary sequence of bytes, no constraint on length), a required output length (PBKDF2 can generate an output of configurable length), and an "underlying PRF". In practice, PBKDF2 is always used with HMAC, which is itself a construction built over an underlying hash function. So when we say "PBKDF2 with SHA-1", we actually mean "PBKDF2 with HMAC with SHA-1".

Advantages of PBKDF2:

  • Has been specified for a long time, seems unscathed for now.
  • Is already implemented in various framework (e.g. it is provided with .NET).
  • Highly configurable (although some implementations do not let you choose the hash function, e.g. the one in .NET is for SHA-1 only).
  • Received NIST blessings (modulo the difference between hashing and key derivation; see later on).
  • Configurable output length (again, see later on).

Drawbacks of PBKDF2:

  • CPU-intensive only, thus amenable to high optimization with GPU (the defender is a basic server which does generic things, i.e. a PC, but the attacker can spend his budget on more specialized hardware, which will give him an edge).
  • You still have to manage the parameters yourself (salt generation and storage, iteration count encoding...). There is a standard encoding for PBKDF2 parameters but it uses ASN.1 so most people will avoid it if they can (ASN.1 can be tricky to handle for the non-expert).

bcrypt

bcrypt was designed by reusing and expanding elements of a block cipher called Blowfish. The iteration count is a power of two, which is a tad less configurable than PBKDF2, but sufficiently so nevertheless. This is the core password hashing mechanism in the OpenBSD operating system.

Advantages of bcrypt:

  • Many available implementations in various languages (see the links at the end of the Wikipedia page).
  • More resilient to GPU; this is due to details of its internal design. The bcrypt authors made it so voluntarily: they reused Blowfish because Blowfish was based on an internal RAM table which is constantly accessed and modified throughout the processing. This makes life much harder for whoever wants to speed up bcrypt with a GPU (GPU are not good at making a lot of memory accesses in parallel). See here for some discussion.
  • Standard output encoding which includes the salt, the iteration count and the output as one simple to store character string of printable characters.

Drawbacks of bcrypt:

  • Output size is fixed: 192 bits.
  • While bcrypt is good at thwarting GPU, it can still be thoroughly optimized with FPGA: modern FPGA chips have a lot of small embedded RAM blocks which are very convenient for running many bcrypt implementations in parallel within one chip. It has been done.
  • Input password size is limited to 51 characters. In order to handle longer passwords, one has to combine bcrypt with a hash function (you hash the password and then use the hash value as the "password" for bcrypt). Combining cryptographic primitives is known to be dangerous (see above) so such games cannot be recommended on a general basis.

scrypt

scrypt is a much newer construction (designed in 2009) which builds over PBKDF2 and a stream cipher called Salsa20/8, but these are just tools around the core strength of scrypt, which is RAM. scrypt has been designed to inherently use a lot of RAM (it generates some pseudo-random bytes, then repeatedly reads them in a pseudo-random sequence). "Lots of RAM" is something which is hard to make parallel. A basic PC is good at RAM access, and will not try to read dozens of unrelated RAM bytes simultaneously. An attacker with a GPU or a FPGA will want to do that, and will find it difficult.

Advantages of scrypt:

  • A PC, i.e. exactly what the defender will use when hashing passwords, is the most efficient platform (or close enough) for computing scrypt. The attacker no longer gets a boost by spending his dollars on GPU or FPGA.
  • One more way to tune the function: memory size.

Drawbacks of scrypt:

  • Still new (my own rule of thumb is to wait at least 5 years of general exposure, so no scrypt for production until 2014 - but, of course, it is best if other people try scrypt in production, because this gives extra exposure).
  • Not as many available, ready-to-use implementations for various languages.
  • Unclear whether the CPU / RAM mix is optimal. For each of the pseudo-random RAM accesses, scrypt still computes a hash function. A cache miss will be about 200 clock cycles, one SHA-256 invocation is close to 1000. There may be room for improvement here.
  • Yet another parameter to configure: memory size.

OpenPGP Iterated And Salted S2K

I cite this one because you will use it if you do password-based file encryption with GnuPG. That tool follows the OpenPGP format which defines its own password hashing functions, called "Simple S2K", "Salted S2K" and "Iterated and Salted S2K". Only the third one can be deemed "good" in the context of this answer. It is defined as the hash of a very long string (configurable, up to about 65 megabytes) consisting of the repetition of an 8-byte salt and the password.

As far as these things go, OpenPGP's Iterated And Salted S2K is decent; it can be considered as similar to PBKDF2, with less configurability. You will very rarely encounter it outside of OpenPGP, as a stand-alone function.

Unix "crypt"

Recent Unix-like systems (e.g. Linux), for validating user passwords, use iterated and salted variants of the crypt() function based on good hash functions, with thousands of iterations. This is reasonably good. Some systems can also use bcrypt, which is better.

The old crypt() function, based on the DES block cipher, is not good enough:

  • It is slow in software but fast in hardware, and can be made fast in software too but only when computing several instances in parallel (technique known as SWAR or "bitslicing"). Thus, the attacker is at an advantage.
  • It is still quite fast, with only 25 iterations.
  • It has a 12-bit salt, which means that salt reuse will occur quite often.
  • It truncates passwords to 8 characters (characters beyond the eighth are ignored) and it also drops the upper bit of each character (so you are more or less stuck with ASCII).

But the more recent variants, which are active by default, will be fine.

Bad Password Hashing Functions

About everything else, in particular virtually every homemade method that people relentlessly invent.

For some reason, many developers insist on designing function themselves, and seem to assume that "secure cryptographic design" means "throw together every kind of cryptographic or non-cryptographic operation that can be thought of". See this question for an example. The underlying principle seems to be that the sheer complexity of the resulting utterly tangled mess of instruction will befuddle attackers. In practice, though, the developer himself will be more confused by his own creation than the attacker.

Complexity is bad. Homemade is bad. New is bad. If you remember that, you'll avoid 99% of problems related to password hashing, or cryptography, or even security in general.

Password hashing in Windows operating systems used to be mindbogglingly awful and now is just terrible (unsalted, non-iterated MD4).

Key Derivation

Up to now, we considered the question of hashing passwords. A closely related problem is transforming a password into a symmetric key which can be used for encryption; this is called key derivation and is the first thing you do when you "encrypt a file with a password".

It is possible to make contrived examples of password hashing functions which are secure for the purpose of storing a password validation token, but terrible when it comes to generating symmetric keys; and the converse is equally possible. But these examples are very "artificial". For practical functions like the one described above:

  • The output of a password hashing function is acceptable as a symmetric key, after possible truncation to the required size.
  • A Key Derivation Function can serve as a password hashing function as long as the "derived key" is long enough to avoid "generic preimages" (the attacker is just lucky and finds a password which yields the same output). An output of more than 100 bits or so will be enough.

Indeed, PBKDF2 and scrypt are KDF, not password hashing function -- and NIST "approves" of PBKDF2 as a KDF, not explicitly as a password hasher (but it is possible, with only a very minute amount of hypocrisy, to read NIST's prose in such a way that it seems to say that PBKDF2 is good for hashing passwords).

Conversely, bcrypt is really a block cipher (the bulk of the password processing is the "key schedule") which is then used in CTR mode to produce three blocks (i.e. 192 bits) of pseudo-random output, making it a kind of hash function. bcrypt can be turned into a KDF with a little surgery, by using the block cipher in CTR mode for more blocks. But, as usual, we cannot recommend such homemade transforms. Fortunately, 192 bits are already more than enough for most purposes (e.g. symmetric encryption with GCM or EAX only needs a 128-bit key).

Miscellaneous Topics

How many iterations?

As much as possible! This salted-and-slow hashing is an arms race between the attacker and the defender. You use many iterations to make the hashing of a password harder for everybody. To improve security, you should set that number as high as you can tolerate on your server, given the tasks that your server must otherwise fulfill. Higher is better.

Collisions and MD5

MD5 is broken: it is computationally easy to find a lot of pairs of distinct inputs which hash to the same value. These are called collisions.

However, collisions are not an issue for password hashing. Password hashing requires the hash function to be resistant to preimages, not to collisions. Collisions are about finding pairs of messages which give the same output without restriction, whereas in password hashing the attacker must find a message which yields a given output that the attacker does not get to choose. This is quite different. As far as we known, MD5 is still (almost) as strong as it has ever been with regards to preimages (there is a theoretical attack which is still very far in the ludicrously impossible to run in practice).

The real problem with MD5 as it is commonly used in password hashing is that it is very fast, and unsalted. However, PBKDF2 used with MD5 would be robust. You should still use SHA-1 or SHA-256 with PBKDF2, but for Public Relations. People get nervous when they hear "MD5".

Salt Generation

The main and only point of the salt is to be as unique as possible. Whenever a salt value is reused anywhere, this has the potential to help the attacker.

For instance, if you use the user name as salt, then an attacker (or several colluding attackers) could find it worthwhile to build rainbow tables which attack the password hashing function when the salt is "admin" (or "root" or "joe") because there will be several, possibly many sites around the world which will have a user named "admin". Similarly, when a user changes his password, he usually keeps his name, leading to salt reuse. Old passwords are valuable targets, because users have the habit of reusing passwords in several places (that's known to be a bad idea, and advertised as such, but they will do it nonetheless because it makes their life easier), and also because people tend to generate their passwords "in sequence": if you learn that Bob's old password is "SuperSecretPassword37", then Bob's current password is probably "SuperSecretPassword38" or "SuperSecretPassword39".

The cheap way to obtain uniqueness is to use randomness. If you generate your salt as a sequence of random bytes from the cryptographically secure PRNG that your operating system offers (/dev/urandom, CryptGenRandom()...) then you will get salt values which will be "unique with a sufficiently high probability". 16 bytes are enough so that you will never see a salt collision in your life, which is overkill but simple enough.

UUID are a standard way of generating "unique" values. Note that "version 4" UUID just use randomness (122 random bits), like explained above. A lot of programming frameworks offer simple to use functions to generate UUID on demand, and they can be used as salts.

Salt Secrecy

Salts are not meant to be secret; otherwise we would call them keys. You do not need to make salts public, but if you have to make them public (e.g. to support client-side hashing), then don't worry too much about it. Salts are there for uniqueness. Strictly speaking, the salt is nothing more than the selection of a specific hash function within a big family of functions.

"Pepper"

Cryptographers can never let a metaphor alone; they must extend it with further analogies and bad puns. "Peppering" is about using a secret salt, i.e. a key. If you use a "pepper" in your password hashing function, then you are switching to a quite different kind of cryptographic algorithm; namely, you are computing a Message Authentication Code over the password. The MAC key is your "pepper".

Peppering makes sense if you can have a secret key which the attacker will not be able to read. Remember that we use password hashing because we consider that an attacker could grab a copy of the server database, or possible of the whole disk of the server. A typical scenario would be a server with two disks in RAID 1. One disk fails (electronic board fries - this happens a lot). The sysadmin replaces the disk, the mirror is rebuilt, no data is lost due to the magic of RAID 1. Since the old disk is dysfunctional, the sysadmin cannot easily wipe its contents. He just discards the disk. The attacker searches through the garbage bags, retrieves the disk, replaces the board, and lo! he has a complete image of the whole server system, including database, configuration files, binaries, operating system... the full monty, as the British say. For peppering to be really applicable, you need to be in a special setup where there is something more than a PC with disks; you need an HSM. HSM are very expensive, both in hardware and in operational procedure. But with an HSM, you can just use a secret "pepper" and process passwords with a simple HMAC (e.g. with SHA-1 or SHA-256). This will be vastly more efficient than bcrypt/PBKDF2/scrypt and their cumbersome iterations. Also, usage of an HSM will look extremely professional when doing a WebTrust audit.

Client-side hashing

Since hashing is (deliberately) expensive, it could make sense, in a client-server situation, to harness the CPU of the connecting clients. After all, when 100 clients connect to a single server, the clients collectively have a lot more muscle than the server.

To perform client-side hashing, the communication protocol must be enhanced to support sending the salt back to the client. This implies an extra round-trip, when compared to the simple client-sends-password-to-server protocol. This may or may not be easy to add to your specific case.

Client-side hashing is difficult in a Web context because the client uses JavaScript, which is quite anemic for CPU-intensive tasks.

In the context of SRP, password hashing necessarily occurs on the client side.

Conclusion

Use bcrypt. PBKDF2 is not bad either. If you use scrypt you will be a "slightly early adopter" with the risks that are implied by this expression; but it would be a good move for scientific progress ("crash dummy" is a very honourable profession).

Matthias Braun
  • 421
  • 3
  • 12
Thomas Pornin
  • 320,799
  • 57
  • 780
  • 949
  • 9
    Ah ! Not so fast. There is now an [open competition](https://password-hashing.net/) for defining new algorithms; deadline for candidate submission is March 31st. The submission reports for some candidates will probably explain at length why their candidate is better than scrypt. Thus, two months from now, we shall have enough information to be able to reach a conclusion on "scrypt or not scrypt" (e.g. there have been suggestion that scrypt's "memory hardness" is more expensive than previously envisioned on multi-core servers). – Thomas Pornin Feb 12 '14 at 11:55
  • 3
    @ThomasPornin, it's been 7 months, so what's the final verdict on `scrypt`? – trysis Sep 13 '14 at 03:39
  • 4
    There is an attack against scrypt that can help an attacker up to a factor of 4, which is not critical. scrypt can still be considered to fare at least as good as bcrypt, and substantially better when the attacker has FPGA. It is still harder to configure and using it for a busy server with high peak loads requires some care. So far, PHC has a few candidate functions which try to do "memory hardness" like scrypt, but are arguably better at it. – Thomas Pornin Sep 13 '14 at 11:40
  • 2
    So you are saying that, right now, `scrypt` is as good as or better than `bcrypt`, and both are better than anything else at the moment, including (marginally) `PBKDF2`, at least until these new candidate functions prove themselves 5 years down the road? Did I get all that right? – trysis Sep 13 '14 at 18:05
  • 27
    You make a pepper look as something only useful with a HSM, which it is not. Its purpose is to have different salts on different places, thus forcing the attacker to compromise all of them. Typically, the salt is stored with the username in the db but the pepper is stored in the login server. Thus making some leaks resistant to offline guessing: A broken RAID disk from the database server is leaked, but the pepper was stored in the web server; or the db was obtained through a SQL injection but the configuration file is not. – Ángel Sep 13 '14 at 22:34
  • 1
    @trysis, Wait for 2015 Q2. [The results](https://password-hashing.net/timeline.html) should be out by then. – Pacerier Dec 02 '14 at 09:45
  • 2
    "A salt is a way to select a specific hash function among a big family of hash functions." Uh, no? The hash function used is the same for all passwords. A salt is a randomly generated sequence of characters that's appended to the password in some way, e.g. `hash(password + salt)`. – nyuszika7h Jan 28 '15 at 12:55
  • 6
    @nyuszika7h given function `f(x)`, `g(x) = f(x + 1)` is a function as well, shifted in x. A crypto-secure hash, when each input is salted, is another crypto-secure hash. Technically, appending the salt to the password isn't the only way to salt the password -- you could salt the password using any transformation you like, but for a few good reasons which a comment is not the place to discuss, `hash(pwd + salt)` is either indistinguishable from or superior to other alternatives. [This q](http://security.stackexchange.com/questions/6050/real-salt-and-fake-salt) covers that a bit. – Moop Feb 13 '15 at 21:28
  • @ThomasPornin since "Input password size is limited to 51 characters" is it a good practise to hash the password before supplying it to bcrypt in all cases given that we're never going to know what the password is? – cottsak Mar 10 '15 at 05:53
  • @ThomasPornin do I overcome the bcrypt limitation "Output size is fixed: 192 bits." by increasing my iteration count/work factor? Or is that limitation still a concern? – cottsak Mar 10 '15 at 05:54
  • 7
    bcrypt produces 192 bits, no more, for all work factors. When your bcrypt library outputs a _character string_, that string encodes the 192 bits and some other parameters (salt and work factor), and uses a Base64-derivative to get printable characters, but cryptographically speaking there are still only 192 bits -- which is fine for password verification, but possibly not for use as a KDF. This is "fixed" by modifying the algorithm, but this is not recommended at all (and if you knew enough to do such a modification securely, you would not ask that question). – Thomas Pornin Mar 10 '15 at 11:28
  • 2
    I thought client-side hashing is bad because then the hash basically becomes the password and loses its value? – David says Reinstate Monica Jan 07 '16 at 18:21
  • 5
    @DavidGrinberg: doing _all_ the hashing in the client side is bad for the reason you indicate (i.e. what you store in the database is what grants access when sent to the server, so it is plaintext-equivalent). However, doing _most_ of the hashing on the client side is reasonable (i.e. client computes bcrypt(password), server stores SHA-256(bcrypt(password))). – Thomas Pornin Jan 07 '16 at 18:44
  • 6
    Has any of this changed in 2016? It's been over 5 years for scrypt. =) I'd also be interested to know if the "Not as many available, ready-to-use implementations for various languages" con still applies. – jpmc26 Mar 07 '16 at 02:35
  • 47
    As someone with a background in CS with just over 15 years professional experience as a developer, I'd just like to say that this is my all-time favorite SE answer. Clear, concise, unbiased. Thanks for answering and reminding me why this stuff is so *hard*. – Madbreaks Apr 05 '16 at 16:47
  • @ThomasPornin: This sentence is missing one or several words (after one simple): "Standard output encoding which includes the salt, the iteration count and the output as one simple to store character string of printable characters." – Zack Jul 13 '16 at 16:33
  • 1
    It is a verb-less sentence as part of an enumeration. The output is a character string, consisting only of "printable" characters, and is thus simple to store. That string contains the encoding of the password hashing function, along with the salt and the iteration count: all three elements are encoded into a single string. – Thomas Pornin Jul 13 '16 at 18:43
  • Excellent commentary. Just a note though... Modern (as of say 2014) GPU cards actually excel at scrypt compared to CPU. This should not detract from someone using it, as even a GPU doing scrypt is thousands of times slower than bcrypt or SHA2-512. – MikeP Aug 24 '16 at 15:45
  • 4
    @ThomasPornin, The example for pepper need not be so convoluted. **Pepper is useful for normal applications.** Consider a parsing bug on a webapp that allows us complete read access to a server's database. We can do `select*from user` and get the full list of hash together with their salt for us to crack at offline pleasure. Since we are brutespearing only highprofile admin accounts, salt is not going to be an obstacle to us at all. But ooops, without the pepper we couldn't do no nothing. And full read-access to the database alone does not in anyway allow us to get access to the pepper. – Pacerier Sep 10 '16 at 01:52
  • thought of downvoting because for long explaination :) just kidding..... upvoted because for `conclusion` part..... – USer345738380 Oct 28 '16 at 07:46
  • 30
    It would be great to get an Argon2 update in here. – Xiong Chiamiov Feb 08 '17 at 16:20
  • The password hash Argon2, winner of PHC: https://github.com/P-H-C/phc-winner-argon2 – Christophe Roussy Oct 03 '17 at 08:28
  • 2
    The author of this answer, Thomas Pornin, has designed a password-hashing function named Makwa: https://www.bolet.org/makwa/ Makwa was a competitor and a finalist in the Password Hashing Competition. Thus, the author is well aware that that Argon2 has won the PHC. I guess he will update this answer when he finds the time. – Erwan Legrand Oct 30 '17 at 10:48
  • The link "rainbow tables for MD5 or SHA-1" is broken. – Paul Wintz Jan 10 '18 at 16:26
  • @ThomasPornin [This](https://stackoverflow.com/questions/16891729/best-practices-salting-peppering-passwords) answer on SO was written right around the same time this answer was, and makes an argument *against* using peppers. What is your take on it? Who's right or wrong? and why? – William Perron Jan 23 '18 at 18:44
  • (1) Is `scrypt` now recommended? Has it passed the test of time? (2) You talk about client-side hashing, but don't mention how [that means the hash really *is* the password](https://security.stackexchange.com/a/23018/96753), since you don't control the clients. – Wildcard Jul 03 '18 at 22:24
  • @Wildcard Update for 6/22/2017? NIST recommends "memorized secret" hashing using a KDF and specifically names PBKDF2 and [Balloon](https://eprint.iacr.org/2016/027) as good examples. [SP 800-63B Section 5.1.1.2 Paragraph 13](https://pages.nist.gov/800-63-3/sp800-63b.html#memsecretver) – Gregor y Jul 25 '18 at 01:57
  • @ThomasPornin When you say `iteration` you mean applying the hash function multiple times? – Can Sürmeli Jan 30 '19 at 12:57
  • 1
    @CanSürmeli The iteration is a specific way to specify a work factor. It does indeed mean applying *some function*, possibly a hash function, multiple times. It is a way of increasing the work factor in a linear fashion (bcrypt uses a exponentially increasing work factor instead). What one iteration includes depends on the password hashing scheme though. – Maarten Bodewes Feb 02 '19 at 11:39
  • 3
    **Please** add some notes about how hashing client side *must* be accompanied by additional hashing server side, since that situation reduces the password to just the hash. (An attacker need only obtain the hash the client sent.) – jpmc26 Feb 04 '19 at 23:39
  • @jpmc26 kinda sorta, securing the line isn't the job of the hash. On an open line both the hashed password and the plain password to be hashed could be stolen; however with client side hashing if someone steals the hash out of the server's storage then they could use it as the password without having to compute it. Which is where server relief comes into play see https://security.stackexchange.com/a/58706/182532 – Gregor y Apr 07 '21 at 16:20
  • One disadvantage of PBKDF2 which isn't mentioned is that it performs the entire number of iteration count for output larger than the HMAC/hash size. So if you ask for more information than that output you may have to perform all these iterations basically for no return (as you could extend the output using e.g. HKDF without performing these iterations). Worse, it may hand an advantage over to an attacker. Similarly, since it uses the password as HMAC key, there are ways of speeding up the HMAC that are mainly useful to attackers. – Maarten Bodewes Feb 01 '22 at 10:08
  • 1
    You mention one should use as many iterations as possible. While cryptographically sound, this advice leaves, in most server setups, the door opened for another type of attack: Denial Of Service. All an attacker needs now is to send you a few fake "login" requests per second to max out your CPU usage. This is definitely something worth considering. – pieroxy Jul 12 '22 at 06:26
127

For storing passwords hashes, you need an algorithm slow enough that brute-force attacks are not feasible. Salting the password will help against rainbow attacks, but not against brute-force attacks. For storing password hashes, you need to use an algorithm specifically designed for this purpose; such as:

scrypt is new but interesting because it not only uses a variable work factor but also memory-hard functions. This dramatically increases the cost of brute-force attacks, because both running-time and memory requirements are increased.

nealmcb
  • 20,544
  • 6
  • 69
  • 116
Ozgur Ozcitak
  • 1,373
  • 2
  • 8
  • 5
  • 15
    Why do you say that salt won't help? The cryptographic value of salts is not in their secrecy, but in the additional entropy (which passwords are usually low on). – AviD Dec 15 '10 at 09:28
  • 4
    It wasn't meant to discourage salting, but rather `$hash = MY_HASH($salt . $pass)` where `MY_HASH` is a fast hashing algorithm. Does my edit make it clearer? – Ozgur Ozcitak Dec 15 '10 at 10:19
  • 5
    @AviD , on the salt issue, there's some attack cases where it basically won't help. If you're using a fast hashing algoritm (as ozgur mentions) GPU cracking will get through v. large numbers of hashes v.quickly and adding a salt won't really slow it down. some interesting information on it here http://codahale.com/how-to-safely-store-a-password/ – Rory McCune Dec 15 '10 at 10:37
  • 6
    Also, `bcrypt` already stores the salt in the hash, so you don't have to worry about that. – Ozgur Ozcitak Dec 15 '10 at 11:34
  • 3
    Might want to include `let h(x) = hmac[sha-256, salt](x) in let k = 2^stretch in h^k(x)` family of slow hash functions. – yfeldblum Jul 15 '11 at 04:34
  • That is PBKDF2, more or less. – Stephen Touset Nov 11 '12 at 02:09
  • 2
    +1 For mentioning both bcrypt and scrypt, but it should probably also be mentioned that scrypt > bcrypt > PBKDF2 when it comes to the ratio of time that must be spent by the attacker to time spent hashing on a typical system. – Muhd Feb 15 '13 at 22:58
  • 1
    Btw @OzgurOzcitak sorry for removing the accept from your answer, it was hard to resist Thomas' epic canonical answer. – AviD Mar 10 '13 at 09:35
  • @AviD agreed. Thomas' answers are always great reading. – Ozgur Ozcitak Mar 11 '13 at 15:54
  • "you need an algorithm slow enough that brute-force attacks are not feasible." I would modify that statement to say "or an algorithm where the key space is sufficiently large". – KyleM Apr 01 '13 at 18:11
  • On servers under significant load (popular websites for example), could slow and RAM-intensive hashing functions actually open the door to DOS attacks? – Simon East Aug 11 '13 at 00:04
93

Passwords stored in a database as hashed values can be recovered either through brute-force calculation of the hashes or through use of rainbow tables (which are specific to the algorithm used).

A rainbow table is created as a series of pre-computed values for either a dictionary file or more commonly every combination of a given character set [a-z, A-Z, 0-9] being a common example.

Essentially they can speed up the cracking of a password by allowing the hash value to be looked up in the table rather than requiring the attacker to create the hash for each password. Rainbow tables for common password algorithms (eg, NTLM, MD5, etc) can be found online, making it fairly straightforward to get access to large volumes of them.

There are a number of ways of improving the security of the hashes stored in the database.

First up is to use a per user salt value, this value is stored in the database along with the hashed password. It isn't intended to be secret but is used to slow down the brute force process and to make rainbow tables impractical to use.

Another add-on I've seen to this is to also add in what was called a pepper value. This was just another random string but was the same for all users and stored with the application code as opposed to in the database. the theory here is that in some circumstances the database may be compromised but the application code is not, and in those cases this could improve the security. It does, however, introduce problems if there are multiple applications using the same password database.

A third means of helping improve the security of the passwords is to use a slow password function, this won't have a huge impact on individual users, but will massively slow down an attacker in cracking passwords retrieved from the database. Some more information on this approach is available here

Rory McCune
  • 60,923
  • 14
  • 136
  • 217
76

Update 4: By 2016, the hardware improvements and other factors caused the bitcoin hash rate to rise by a factor of more than 100,000 (!) in the 5 years since when this post was first written in 2011. Password cracking techniques have also improved on the software end. So users should to add a few more characters to the minimum length of their passwords, and iteration counts need to be raised, and we all really need to be preparing to move to better algorithms like Argon2.

Update 3: In 2015, the Password Hashing Competition selected a winner: Argon2. It was designed to be "memory hard" to make GPU implementations by crackers hard; simple; highly configurable; resistant to side-channel leaks, etc. If it passes the test of time, it may be a significant step forward, but as pointed out by Thomas at Are there more modern password hashing methods than bcrypt and scrypt?, you should be wary of shiny new algorithms, and probably give the pros more time to look for weaknesses.

Update 2: In 2013, several experts initiated a Password Hashing Competition which should result in improved and more usable methods, with winners selected by 2015. For excellent background on the need for that, and good advice in the interim, see Password security: past, present, future from Passwords^12. Note that the advent of faster and faster hardware (as discussed below) implies the need for memory-intensive algorithms like scrypt, and that bcrypt is also still resistant to GPU attacks unlike PBKDF2 or crypt.


Others here have pointed out that brute force attacks need to be defended against via salts, even though MYSQL still hasn't figured that out. The importance of iterations has also been noted, and has been known since the seminal paper on Unix crypt in 1978 by Robert Morris and Ken Thompson. But many people (and developers too, like Django!) evidently still think brute force must take a pretty long time or be pretty expensive, and thus think a single iteration of SHA-1 is OK for password hashing.

Not true! Moore's law and cloud computing has caught up with us. To crack a SHA-1 hash of an alphanumeric password of length 8 ((26+26+10)^8 = 62^8 = 218,340,105,584,896 = 218 trillion combinations) on a modern desktop machine takes 5 days, or 1 hour if you rent a bunch of Amazon compute nodes (How long does it take to actually generate rainbow tables? - IT Security)

Update: bitcoin hashing capacity

The most powerful organized hashing capability on the planet (excluding possible classified systems) is the bitcoin mining network. [As of May, 2011 it was] performing SHA-256 hashes at an aggregate rate of over 11 Thash/s, i.e. 11 * 10^12 hash/s (by 2016 that was 1700000 Thash/s - see update 4 above), and the rate has been rising quickly recently (graphs). Miners are working to earn the estimated $700,000 per week that mining yields at the current price of $14 per bitcoin (BTC) (graph), and rate of 50 BTC produced every 10 minutes. Popular hardware these days includes a Radeon HD 5970 GPU, each of which has a total of 3200 stream processors and can do about 800 Mhash/s. It is also thrifty in power consumption at about 2.3 Mhash/Joule. See Bitcoin mining hardware comparison for lots more options. It turns out that GPU nodes on Amazon's EC2 use Nvidia Tesla GPUs which are less efficient at hashing, and their nodes aren't cost effective for mining at today's prices.

This is about twice the capacity of one 5.5 Thash/s estimate for the hashing power of the world's top 500 supercomputers combined, though of course the supercomputers were typically designed for floating point performance, not hashing.

As an extreme current case, if this hashing capacity were redirected to trying to crack passwords, e.g. following a crash in bitcoin prices, it would be fearsome against non-iterated password algorithms. 8 character passwords using a completely random assortment of all 94 printing characters would fall in less than 10 minutes (94^8 / (11 * 10^12 * 60) = 9.2). 10 character passwords would take less than 57 days (94^10 / (11 * 10^12 * 3600 * 24) = 56.7). 10-character upper-lower case alphanumeric passwords (26+26+10 = 62 possible characters) would take less than a day (62^10 / (11 * 10^12 * 3600 * 24) = 0.88) even if well randomized.

But if programmers simply used e.g. an iteration count of 2000 as Thomas suggests, good 10-character passwords would last years. Though 8-character passwords would be easily cracked, within 13 days (2000 * 94^8 / 11 10^12 / 3600 / 24 = 12.8 days).

See also:

nealmcb
  • 20,544
  • 6
  • 69
  • 116
  • 2
    Excellent writeup with real-world numbers. I'd guess the cryptocurrency hashing collective had much more power now 5 years later. – Pacerier Sep 10 '16 at 02:17
10

Passwords should always be salted and stretched before storing them. Basically this involves appending or prepending some text to the password and hashing the result several times. As for hash algos anything over and above MD5 and SHA-1 is currently advisable - go for SHA 256 or 512 (see http://www.schneier.com/blog/archives/2009/06/ever_better_cry.html)

Nev Stokes
  • 458
  • 3
  • 10
  • 2
    Is there really much value in hashing a password multiple times, especially when you're using a fast algorithm? – Josh Anderson Mar 15 '12 at 06:24
  • http://programmers.stackexchange.com/questions/115406/is-it-more-secure-to-hash-a-password-multiple-times – Nev Stokes Mar 15 '12 at 09:45
  • @D.W. No it doesn't, it explicitly mentions stretching and indicates "hashing the result several times" (which has some minor issues if used as is). As it doesn't specify any particular named algorithm it is still a bad answer in my opinion. – Maarten Bodewes Feb 01 '22 at 10:16
9

I'd second the recommendations for PBKDF2. It's not the most computationally expensive, but it does have a precise standard for reference during implementation, and it's well-accepted.

https://www.rfc-editor.org/rfc/rfc2898

I'd really recommend reading Colin Percival's paper on scrypt, though. He does a good job describing the issues in play here. My guess is scrypt will look better and better with time.

http://www.tarsnap.com/scrypt.html

Having an implementable standard is not nothing, by the way - there have been differences between the algorithms described in papers and the reference implementations in both bcrypt and scrypt, if memory serves.

Steve Dispensa
  • 3,441
  • 16
  • 20
7

A good password hashing algorithm must have salt and something to make the calculation of the password expensive (usually iteration count).

The best, and most common, method for this is PBKDF2. Although not perfect, it should be a baseline for everybody:

http://en.wikipedia.org/wiki/PBKDF2

Nakedible
  • 4,501
  • 4
  • 25
  • 22
  • 3
    It is not "most common" or "best". Most common are crypt() family hashes or even unsalted variants. KDFs are not designed for that and therefore be used with care. Dont know what the "best" is (it depends on your goals I guess), but bcrypt and scrypt or even sunmd5 is more purpose build. – eckes Nov 22 '12 at 20:07
7

Depending on the algorithm you use the answer is probably no.

First off you should Salt them, this basically means appending or prepending some text to the password.

Then you should use a strong algorithm (md5 doesn't cut it)

Toby
  • 709
  • 6
  • 8
  • +1 for relating to the algorithm, but can you state which algorthms are good? – AviD Nov 13 '10 at 16:29
  • SHA-1 is considered secure -> http://en.wikipedia.org/wiki/SHA-1 – Toby Nov 15 '10 at 20:30
  • 5
    NIST is phasing SHA-1 out right now, and is not endorsing it for new work. This is explained in the wikipedia article, which points in turn to: http://csrc.nist.gov/groups/ST/toolkit/documents/shs/hash_standards_comments.pdf – pboin Nov 17 '10 at 15:59
  • Cheers for the link pboin, I did not know that! – Toby Nov 18 '10 at 12:55
  • SHA-1 is not endorsed? Then what should we use? – Kim Stacks Feb 27 '11 at 13:12
  • 4
    @keisimone SHA-2 for the moment until SHA-3 is out, but you should use a multi-round hashing algorithm to make brute-force a challenge; PKDBF2 using SHA-2 or bcrypt/scrypt are appropriate. – Richard Gadsden Jul 07 '11 at 13:41
  • 4
    All of this short of @RichardGadsden's comment is awful advice. SHA-2 is not a suitable hashing algorithm for passwords because it is *fast*. PBKDF2 (iterated hashing) ameliorates this somewhat, but is likely not as good as something like bcrypt or scrypt. To reiterate: simply salting a password and running SHA-256 on it is nowhere near good enough. – Stephen Touset Nov 11 '12 at 02:19
  • @Toby SHA-1 was [considered broken](https://www.schneier.com/blog/archives/2005/02/cryptanalysis_o.html) 5 years before you stated it was "secure". – Jonathan Cross Mar 04 '17 at 21:52
5

It is interesting to note that although bcrypt and scrypt are all good solutions to passwords, with a favor for the latter, scrypt seems to be prone to cache-timing attacks. As suggested here: http://eprint.iacr.org/2013/525 Catena will be safe against this, along with provable safety and a few other nice features.

lkk
  • 77
  • 1
  • 1
  • 1
    1) It's been known for a long time that a straight forward implementation of scrypt is potentially vulnerable to timing attacks. 2) I don't believe in catena's security claims yet. I believe it's vulnerable to an attack with complexity t^1.5 and thus doesn't reach the claimed t^2 security. 3) I believe it's a more promising route to create implementations of scrypt(or similar constructions) that mask the memory access pattern than using predictable memory access like in catena. – CodesInChaos Sep 06 '13 at 10:50
  • 3
    I'd wait until the end of the password hashing competition and only then choose a scheme that does well in the competition. Until then bcrypt or scrypt is preferable. If catena is really as good as it claims, it will be a strong contender. But unfortunately I don't think it is. – CodesInChaos Sep 06 '13 at 10:51
-1

bcrypt is said to be slower on GPUs, which makes brute-forcing slower. However, with ever evolving computing hardware, security should not only rely on the difficulty of implementing a particular hashing algorithm on a particular hardware.

Rather, you can arbitrarily increase the cost of brute-forcing a hash by using the "variable work/cost factor" (sometimes also called "rounds") that some hash functions libraries using hash functions support. Amongst them are bcrypt and SHA-512 glibc's crypt function and the pam_unix library.

For example, a cost factor of 100000 for SHA-512 makes generation (and therefore brute-forcing) of the hash about 4 times slower than the cost factor 08 for bcrypt. This can be confirmed by using a hash solving program like hashcat.

If you assume that at some point your password hashes and salts will be stolen and the attackers will use ASIC hardware to brute-force them, you could simply increase this work/cost factor to still make it too costly for them, while hopefully not overloading your servers' CPU with regular user authentication.

The importance of long and random passwords applies nevertheless.

Note that POSIX user authentication security has been relying on this technique since 2007 (since glibc 2.7), with Debian 11 only recently switching to yescrypt from SHA-512 with 5000 rounds.

I wrote a blog post about this technique including details and reproducible measurements of brute-forcing hashes with hashcat.

References

White paper

Specification

Quote from man crypt:

Since glibc 2.7, the SHA-256 and SHA-512 implementations support a user-supplied number of hashing rounds, defaulting to 5000.

Quote from man pam_unix:

It uses standard calls from the system's libraries to retrieve and set account information as well as authentication.

sha512: ... The SHA512 algorithm must be supported by the crypt(3) function.

rounds=n: Set the optional number of rounds of the SHA256, SHA512 and blowfish password hashing algorithms to n.

Quote from man login.defs:

When ENCRYPT_METHOD is set to SHA256 or SHA512, this defines the number of SHA rounds used by the encryption algorithm by default. With a lot of rounds, it is more difficult to brute forcing the password. But note also that more CPU resources will be needed to authenticate users. If not specified, the libc will choose the default number of rounds (5000), which is orders of magnitude too low for modern hardware.

See also:

  • /etc/pam.d/common-password
  • /etc/login.defs
  • 1
    There is no "cost factor for SHA-512". That seems to be some unnamed scheme which **uses** SHA-512, which is not further specified here. Tools such as hashcat (or even crypt) should not be used for algorithm definitions, mainly because the authors seem to have no idea themselves whatsoever. – Maarten Bodewes Feb 01 '22 at 10:12
  • @MaartenBodewes I slightly clarified my answer to say that it is not hash functions that support a variable work/cost factor (SHA-512 has a fixed loop of 80 iterations), but that it is the calling libraries that may do. My original answer was a bit imprecise, but this is what the `man` pages conveyed to me back then (now quoted in my answer). I also added a link to the whitepaper and specification by the author and the working group. I don't think that the "authors seem to have no idea themselves whatsoever"; the algorithm has been providing POSIX system user authentication since 15 years. – Michael Franzl Feb 01 '22 at 17:43
  • A very similar idea (SHA-256 instead of SHA-512) is discussed in this duplicate question: [Is using 100,000 iterations of sha256 good enough for password storage?](/q/179552) – Fire Quacker Feb 01 '22 at 17:50
-10

I use SHA1 here

def __encrypt(self, plaintext, salt=""):
    """returns the SHA1 hexdigest of a plaintext and salt"""
    phrase = hashlib.sha1()
    phrase.update("%s--%s" % (plaintext, salt))
    return phrase.hexdigest()
def set_password(self, new_password):
    """sets the user's crypted_password"""
    #from datetime import datetime, timedelta  
    import datetime
    if not self.salt:
        self.salt = self.__encrypt(str(datetime.datetime.now()))
    self.crypted_password = self.__encrypt(new_password, self.salt)
def check_password(self, plaintext):
    return self.__encrypt(plaintext, self.salt) == self.crypted_password

Rather than salting passwords like this I would like that the algorithm is changed so that it doesn't need salt and still won't hash the same input twice to the same value.

  • 3
    Is this a question or an answer? – Paŭlo Ebermann Aug 19 '11 at 01:44
  • "Is this a question about a question or about an answer?" – Niklas Rosencrantz Jun 11 '12 at 09:33
  • 20
    This is an extremely broken implementation which should be highlighted as an example of how *not* to implement password hashing. First: SHA1 is not a suitable algorithm, as it is extremely fast. Second: salts *must* be unique and *should* not be guessable. While using the current time is not abysmal, it is quite simply not good enough. – Stephen Touset Nov 11 '12 at 02:24