256

I know there are many discussions on salted hashes, and I understand that the purpose is to make it impossible to build a rainbow table of all possible hashes (generally up to 7 characters).

My understanding is that the random salted values are simply concatenated to the password hash. Why can a rainbow table not be used against the password hash and ignore the first X bits that are known to be the random salt hash?

Update

Thanks for the replies. I am guessing for this to work, the directory (LDAP, etc) has to store a salt specific to each user, or it seems like the salt would be "lost" and authentication could never occur.

SilverlightFox
  • 33,408
  • 6
  • 67
  • 178
Tsyras
  • 2,631
  • 3
  • 11
  • 7
  • 4
    In addition to AJ's comment, simply salting a hash is not enough to ensure secure password storage. Modern password hashing algorithms like bcrypt and scrypt require substantial amounts of CPU and/or memory, significantly slowing an attacker's ability to attempt guesses. – Stephen Touset Feb 20 '14 at 21:05
  • 1
    I wrote a series of articles answering your question a few years ago. http://blogs.msdn.com/b/ericlippert/archive/tags/salt/ – Eric Lippert Feb 21 '14 at 20:18
  • 1
    And yes, the salt is stored along with the hash, and the salt should be per-user. – Eric Lippert Feb 21 '14 at 20:19
  • 2
    The salt can also be a global salt, concatenated to the userID, and then hashed, to produce a unique salt for the user's password hash. This way, you don't need to store anything per user (which could be stolen along with the hashed password...) – Johan Bezem Feb 25 '14 at 08:14
  • 5
    The salt is not appended to the hash! The salt is appended (or prepended) to the *plaintext* password, and the salt and password together are fed to the hashing algorithm to produce the hash. That's why you can store the salt directly with the hash value. But of course simple salted hash, apart from being bad for your heart, is no longer sufficient for safely storing credentials. – Craig Tullis Mar 27 '14 at 06:38
  • 1
    If a salt has to be stored somewhere too (it does) then why is it actually useful? It's basically the same thing as a password. – WakeDemons3 Oct 04 '18 at 21:07

8 Answers8

412

It typically works like this:

Say your password is "baseball". I could simply store it raw, but anyone who gets my database gets the password. So instead I do an SHA1 hash on it, and get this:

$ echo -n baseball | sha1sum
a2c901c8c6dea98958c219f6f2d038c44dc5d362

Theoretically it's impossible to reverse a SHA1 hash. But go do a google search on that exact string, and you will have no trouble recovering the original password.

Plus, if two users in the database have the same password, then they'll have the same SHA1 hash. And if one of them has a password hint that says try "baseball" -- well now I know what both users' passwords are.

So before we hash it, we prepend a unique string. Not a secret, just something unique. How about WquZ012C. So now we're hashing the string WquZ012Cbaseball. That hashes to this:

c5e635ec235a51e89f6ed7d4857afe58663d54f5

Googling that string turns up nothing (except perhaps this page), so now we're on to something. And if person2 also uses "baseball" as his password, we use a different salt and get a different hash.

Of course, in order to test out your password, you have to know what the salt is. So we have to store that somewhere. Most implementations just tack it right on there with the hash, usually with some delimiter. Try this if you have openssl installed:

[tylerl ~]$ openssl passwd -1
Password: baseball
Verifying - Password: baseball
$1$oaagVya9$NMvf1IyubxEYvrZTRSLgk0

This gives us a hash using the standard crypt library. So our hash is $1$oaagVya9$NMvf1IyubxEYvrZTRSLgk0: it's actually 3 sections separated by $. I'll replace the delimiter with a space to make it more visually clear:

$1$oaagVya9$NMvf1IyubxEYvrZTRSLgk0
 1 oaagVya9 NMvf1IyubxEYvrZTRSLgk0

If I run the process again, I get a completely different hash with a different salt. In this example, there are about 1014 ways to store this one password. All of these are for the password "baseball":

$1$9XsNo9.P$kTPuyvrHqsJJuCci3zLwL.
$1$nLEOCtx6$uSnz6PF8q3YuUhB3rLTC3/
$1$/jZJXTF3$OqDuk8T/cEIGpeKWfsamf.
$1$2lC.Cb/U$KR0jkhpeb1sz.UIqvfYOR.

But, if I deliberately specify the salt I want to check, I'll get back my expected result:

[tylerl ~]$ openssl passwd -1 -salt oaagVya9
Password: baseball
Verifying - Password: baseball
$1$oaagVya9$NMvf1IyubxEYvrZTRSLgk0

And that's the test I run to check to see if the password is correct. Find the stored hash for the user, find the saved salt, re-run that same hash using saved salt, check to see if the result matches the original hash.

Implementing This Yourself

To be clear, this post is not an implementation guide. Don't simply salt your MD5 and call it good. That's not enough in today's risk climate. You'll instead want to run an iterative process which runs the hash function thousands of times. This has been explained elsewhere many times over, so I won't go over the "why" here.

There are several well-established and trusted options for doing this:

  • crypt: The function I used above is an older variation on the unix crypt password hashing mechanism built-in to all Unix/Linux operating systems. The original (DES-based) version is horribly insecure; don't even consider it. The one I showed (MD5-based) is better, but still shouldn't be used today. Later variations, including the SHA-256 and SHA-512 variations should be reasonable. All recent variants implement multiple rounds of hashes.

  • bcrypt: The blowfish version of the crypt functional call mentioned above. Capitalizes on the fact that blowfish has a very expensive key setup process, and takes a "cost" parameter which increases the key setup time accordingly.

  • PBKDF2: ("Password-based Key Derivation Function version 2") Created to produce strong cryptographic keys from simple passwords, this is the only function listed here that actually has an RFC. Runs a configurable number of rounds, with each round it hashes the password plus the previous round's result. The first round uses a salt. It's worth noting that its original intended purpose is creating strong keys, not storing passwords, but the overlap in goals makes this a well-trusted solution here as well. If you had no libraries available and were forced to implement something from scratch, this is the easiest and best-documented option. Though, obviously, using a well-vetted library is always best.

  • scrypt: A recently-introduced system designed specifically to be difficult to implement on dedicated hardware. In addition to requiring multiple rounds of a hashing function, scrypt also has a very large working memory state, so as to increase the RAM requirement for implementations. While very new and mostly unproven, it looks at least as secure as the others, and possibly the most secure of them all.

tylerl
  • 82,225
  • 25
  • 148
  • 226
  • 80
    Best, clearest, most complete explanation of hashes I've seen. I plan to *shamelessly* use it in classes I teach on security. – Rap Feb 21 '14 at 18:14
  • So, strictly speaking, a salt shouldn't be truly *random*, correct? Like, the encryption software should keep track of salts already used and make not to reuse them? Or is that largely irrelevant? – asteri Feb 21 '14 at 20:37
  • 6
    @JeffGohlke If you're using an 8-character salt that's 62^8 or over 218 *trillion* possible salts. And you just want to make sure you don't use the *same salt for the same password* twice. So, no, you don't realistically need to worry about reusing salts. – Adam Balsam Feb 21 '14 at 20:56
  • @AdamBalsam Makes sense. I guess I've only really seen two character salts when looking at this stuff in the past. – asteri Feb 21 '14 at 22:08
  • 4
    Could you please add a final paragraph about key stretching and using a proper KDF, otherwise this post is going to spawn yet more `sha1(salt || pwd)` schemes by users who only read this answer and went to do their thing. I know it's technically off-topic but we really need to fight this by trying not to suggest to use this directly. – Thomas Feb 22 '14 at 00:28
  • *"Plus, if two users in the database have the same password, then they'll have the same SHA1 hash."* Adding to this, it holds across databases that use the same unsalted algorithm (perhaps a user uses the same password on multiple systems). Also this is independent of the hash algorithm; even a completely custom, or hypothetical impossible-to-break algorithm wouldn't stop an attacker from seeing what users have the same password if salting is not used. – Jason C Feb 22 '14 at 01:15
  • 3
    I came here, read the answer, googled for c5e635ec235a51e89f6ed7d4857afe58663d54f5 and came again here! :) +1 – Kenyakorn Ketsombut Feb 25 '14 at 02:42
  • 1
    I used SHA1 for password hashing once when I was working on application to make PA-DSS compliant. I stored the salt for each user in a column called salt and it was just a random five digit number. If the user changed their password, then I generated a new salt. Auditor said it was secure. – Engineer2021 Feb 25 '14 at 16:43
  • @0A0D today it probably wouldn't be considered secure. I updated my answer to include the options that are recommended today. – tylerl Feb 25 '14 at 19:24
  • @tylerl: This was only a couple years ago! – Engineer2021 Feb 25 '14 at 19:27
  • @0A0D Fast times! `:)` Actually, your system will probably pass a PCI-DSS audit with the current standard. Requiring strong hashing in PCI would be devastating to the Windows admins, who don't get to decide how account passwords get hashed on their systems. – tylerl Feb 25 '14 at 19:33
  • @tylerl: Ah, well this was a custom application (hence PA-DSS not PCI-DSS). But I can see your point. – Engineer2021 Feb 25 '14 at 19:35
  • Note that PBKDF2 is also the only common password hashing algorithm that can use [FIPS 140-2](http://csrc.nist.gov/publications/fips/fips140-2/fips1402annexa.pdf)/[NIST SP 800-131A](http://csrc.nist.gov/publications/nistpubs/800-131A/sp800-131A.pdf) listed hash functions as its hashing primitive, if those documents are considered guidance for your industry or usage. – Anti-weakpasswords Feb 26 '14 at 07:44
  • 1
    very nice answer. Please note if someone is worried by reusing salt, like to store user psw, then you can use userID as the salt, as it will never change and is unique. – Lesto Feb 26 '14 at 13:33
  • Interesting site that came up when I googled `c5e635ec235a51e89f6ed7d4857afe58663d54f5`: http://arehost.net:9090/encryption/index.php?search=c5e635ec235a51e89f6ed7d4857afe58663d54f5&sub=Send! – Abe Miessler Jul 23 '14 at 22:35
  • @AdamBalsam: No, one wants to make sure that the _same salt is not used twice_, so that multi-target attacks are (hopefully) not faster than just attacking the hashes independently. –  Nov 15 '14 at 22:51
  • @asteri: See my previous comment. –  Nov 15 '14 at 22:51
  • Oh my, that's just beautiful. – lesssugar Jul 05 '16 at 20:34
  • Thank you for this explanation. The concept I had trouble with is that the salt is combined with the PASSWORD, not the hash, then the salt + password is hashed. This explanation really made a lot of sense. – Tensigh Sep 28 '17 at 01:21
  • 2
    Perhaps you should update this answer to mention argon2. – forest Mar 09 '18 at 09:30
50

Technically you can still use a rainbow table to attack salted hashes. But only technically. A salted hash defeats rainbow table attacks, not by adding crypto magic, but just by exponentially increasing the size of the rainbow table required to successfully find a collision.

And yes, you'll need to store the salt :)

xkcd
  • 761
  • 4
  • 10
  • 11
    Upvoted because this is the only answer that actually explains why salted hashes help against rainbow tables. – nwellnhof Feb 23 '14 at 20:55
  • 3
    and it does so *exponentially*. – Jonathan Hartley Feb 25 '14 at 09:43
  • 3
    The size of the table would not be larger. A separate rainbow table would be needed for each of the users (since a different salt is used for each user's password), but the size of each of those would be the same. Instead of downloading a rainbow table for say SHA-1 hashes of strings, you would create your own for say SHA-1 hashes of `CYyohYn8MGYk`-salted strings. That would be a useless exercise though, since a brute force attack on a specific password would be faster. Salts work, not because of size increase of the table, but because they render standard downloadable rainbow tables unusable. – mgr326639 Oct 26 '16 at 10:08
  • @mgr326639 - I was not referring to 'standard downloadable rainbow tables' in my answer, in fact I am not aware if there is a standard. The question mentioned 'building' a rainbow table. You can have one big table that caters for the whole thing or you can have as many tables as the salts or whatever logical or physical partitioning scheme you want as per your resources or any other factors. – xkcd Oct 26 '16 at 12:16
23

It is not added after the hash. It is added before the hash, so the hash is completely different for every salt.

It isn't

hash abcd = defg
store 123defg (for user with 123 salt) and 456defg (for user with 456 salt)  

It is

hash 123abcd = ghij 
hash 456abcd = klmn
AJ Henderson
  • 41,816
  • 5
  • 63
  • 110
15

For password hashes, you need to use something like PBKDF2/RFC2898/PKCS5v2, Bcrypt, or Scrypt, all of which allow you to select an amount of iterations ("work factor"), rather than just one. PBKDF2, for instance, uses HMAC keyed hashing with a well known hash algorithm (typically SHA-512, SHA-256, or SHA-1) internally for a number of iterations, preferably tens to hundreds of thousands as high as you can keep it without users complaining, essentially.

The reason for the large number of iterations is to slow down an attacker, which in turn reduces the keyspace they can go through in a given time period, so they can't attack the better passwords effectively. "password" and "P@$$w0rd" are going to be cracked regardless in an offline attack, obviously.

You are correct, each row (user) needs their own uniquely generated, cryptographically random long salt. That salt is stored in plaintext form.

With PBKDF2, Bcrypt, or Scrypt, I would also recommend storing the number of iterations (work factor) in the database in plaintext as well, so it's easy to change (personally, I use a somewhat randomized number of iterations - if it's always different, then there's less fear about "oh no, a tiny change might mess everything up - NEVER CHANGE THIS AGAIN" from management or other developers)

Note that when using PBKDF2 for password hashing, never ask for a larger output than the native hash output - for SHA-1, that's 20 bytes, and for SHA-512, that's 64 bytes.

To give actual examples of PBKDF2 with two different salts:

(PBKDF2 HMAC password salt iterations outputbytes results)

  • PBKDF2 HMAC-SHA-512 MyPass vA8u3v4qzCdb 131072 64
    • 2e3259bece6992f012966cbf5803103fdea7957ac20f3ec305d62994a3f4f088f26cc3889053fb59a4e3c282f55e9179695609ee1147cffae1455880993ef874
  • PBKDF2 HMAC-SHA-512 MyPass l6eZQVf7J65S 131072 64
    • 1018ad648096f7814bc2786972eb4091f6c36761a8262183c24b0f4d34abb48073ed2541ee273220915638b46ec14dfb2b23ad64c4aa12f97158340bdc12fc57
Anti-weakpasswords
  • 9,785
  • 2
  • 23
  • 51
14

In addition to what tylerl said, it's important to stress that in modern crypto, salts are not used to protect against rainbow tables. No-one really uses rainbow tables. The real danger is parallelisation:

  • If you have no salt, or only a static salt, and I steal your DB of a million hashes, I can throw all my cores at bruteforcing it, and each core will be attacking a million hashes simultaneously, because any time I hit any of the strings used as passwords, I will see a hash match. So I need to exhaust the search space once to crack a million passwords. That is a completely realistic scale of password leak, and an attractive enough target to throw a couple dozen GPUs at or even a custom FPGA. Even if I only exhaust the bottom 30% of the search space, I will still walk away with 500k real-world passwords or so. Those passwords will be then used to build up my dictionary, so the next time someone leaks a hash DB, I will crack 90% of them in hours.
  • If instead each and every password is hashed with its own, unique salt, then I will need to attack every single one of them individually, because even identical passwords will have completely different hashes stored. That means I might perhaps use my expensive, power-hungry GPU/FPGA farm to attack the couple high-value targets (say, if Obama was your user, then getting his password would still warrant the expense), but I won't get several hundred thousand passwords for free from that. If I wanted to get the whole million passwords, I would have to do a full brute-force search a million times.

And that's why any salt, static or not, will protect you against prepared rainbow tables as long as it's not widely-used, but only unique per-hash salts will protect you against the real danger of using lots of parallel computing power to crack everything at once.

mathrick
  • 241
  • 1
  • 3
11

It's more secure because when multiple people have same password they will have different hash.

Simple implementation using Python:

import hashlib

passwordA = '123456'
passwordB = '123456'

hashlib.md5(passwordA).hexdigest()
'e10adc3949ba59abbe56e057f20f883e'

hashlib.md5(passwordB).hexdigest()
'e10adc3949ba59abbe56e057f20f883e'

And now let's add salt:

saltA = 'qwerty'
salbB = 'asdfgh'

passA = passwordA + saltA
passB = passwordB + saltB

hashlib.md5(passA).hexdigest()
'086e1b7e1c12ba37cd473670b3a15214'

hashlib.md5(passB).hexdigest()
'dc14768ac9876b3795cbf52c846e6847'

This is very simplified version. You can add salt wherever you prefer. In the beginning/middle/end of password.

Salt is very useful, especially when you have a lot of users so each of them will have different hashes. Now imagine the probability to have same passwords for million accounts, such as Facebook or Google or Twitter account.

zakiakhmad
  • 464
  • 3
  • 10
8

First, if two users use the rather weak password "baseball", then cracking one password doesn't help at all cracking the second password because of salting. Without salting, you've cracked two passwords for the price of one.

Second, rainbow tables contain pre-calculated hashes. So a cracker could lookup the hash of "baseball" in a rainbow table with a gazillion entries. Much faster than calculating the gazillion hashes in the rainbow table. And that's prevented by salting.

And now an important one: Some people have good passwords, some have bad passwords. If you have a million users, and three use the identical password, you just know that password is weak. Without salting, you have three identical hashes. So if someone steals your database of hashes, they can immediately see which users have weak passwords and concentrate on cracking those. Much easier to crack "baseball" than 1keOj29fa0romn. Without salting, the weak passwords stand out. With salting, the cracker has no idea which passwords are weak and which ones are hard to crack.

gnasher729
  • 101
  • 1
1

Whether or not the hash is salted only makes a difference if the attacker has the password hash. Without a salt, the hash can be attacked with a rainbow table: a pre-computed dictionary which associates passwords with hashes. An N-bit salt increases the storage requirement for a rainbow table, and the time to compute that table, by a factor of 2**N. So for instance, with a 32 bit salt, just a single rainbow table dictionary entry, say for the password passw0rd, requires many gigabytes of storage, making such a rainbow table very expensive using current storage hardware. So when a salt is present, the attacker is reduced to a brute force attack on the specific set of password hashes that have been obtained.

However:

  • for weak passwords, a brute force attack will succeed in relatively little time.
  • sufficiently strong passwords will not be found in a rainbow table.
  • if the attacker has access to the hashes, system security has been compromised already: modern systems and protocols do not reveal their password hashes. If the attacker is not able to gain access to the password database, the passwords may as well be stored in plaintext.
  • if an attacker must compromise the system to get to the hashes in order to reverse passwords from them, then the only passwords which have value to the attacker are those which are re-used for other security domains to which the attacker has no access yet. Passwords which are not reused have no value (and certainly less value than other sensitive information associated with the accounts).

Say user joewestlake uses the password god1234. The attacker reverses this instantly using a rainbow table. (Or within mere minutes of cracking using a brute-force attack, if the hash is salted, since the password is so bad.) Now the problem is that joewestlake also used god1234 for his Gmail account, and for online banking, oops! Now the attacker reads Joe's e-mails, and learns enough about Joe so that he can easily answer the question "what was the name of your first pet" when logging in to Joe's online banking.

So, the rationale for salts is that they somewhat protect users by making medium-security passwords harder to reverse: passwords which, without a salt, could reasonably be expected to be found in a rainbow table, but which are strong enough that individually brute-forcing them takes a very long time. But salts provide this benefit only in the event that the hashes are compromised, which is already a serious security breach, and the benefit is only for those users who re-use their medium-security passwords in other systems.

Say Joe instead used a password made of 10 random alphanumeric characters and symbols. This could still be in a rainbow table, but takes a lot of work crack. So even if Joe used the same password for Gmail and online banking, he is safe, thanks to the salt. The cracker runs his brute-force crack for perhaps several hours, maybe days. The crack yields numerous weak passwords from others users of the same system who have weak passwords. The attacker is satisfied with that yield and stops cracking; Joe's password is never reversed.

Furthermore, if the breach is detected and users (including Joe) are advised to change their passwords, then Joe has a chance to outrun the attacker's cracking attempts, even if the attacker persists in attacking the entire medium-security password space that includes Joe's password. Joe knows that, the password he used on the compromised system is exactly the same as his Gmail and banking password, so he scrambles to change those other two. The salt helps here because it buys time for password-reusing users to change their password. The salt won't help those with very weak passwords which are cracked in minutes, but users of passwords which take hours or days to crack have a fighting chance.

Kaz
  • 2,303
  • 16
  • 17