Hashing two passwords can produce the same hash. How to prevent this?

Question

When hashing passwords, two passwords can produce the same hash, so if a user inputs someone else's username but his own password, there is a possibility that he will be able to login to that other account. How to prevent this?

With correct hash mechanism, the risk of hash collision is of same order as the risk at using same password *by chance*. — Serge Ballesta, Dec 28 '16 at 09:26
@SergeBallesta With the quality of passwords chosen by the average user, the probability of two users using the same password is many orders of magnitude higher than the probability of a hash collision. — kasperd, Dec 28 '16 at 10:03
@SergeBallesta A hash collision by definition means two **different** inputs producing the same hash. — kasperd, Dec 28 '16 at 10:23

DepressedDaniel · Accepted Answer · 2016-12-29T20:31:18.350

It is a mathematical certainty that any hash function with a fixed output size will have collisions. However, the chance of a collision can be reduced by using a hash function with a large output size. Using SHA-2 with the default output size of 256 bits, for example, means that you will need 2^128 users before there is a 50% chance that some pair of them will have the same password hash[*].

Ideally, you should apply a hash function to SALT+PASSWORD, where SALT is a unique value specific to the user account. If the salt values are chosen uniquely over accounts, this virtually ensures that both username and password must match with a specific account to allow logging in.

Please note that the above is only intended to explain the concept. In practice, you should almost always delegate this task to a quality implementation of a standard password hashing function, such as PBKDF2 or bcrypt, rather than rolling your own.

[*] - assuming no pair of them actually have the same password, which can happen with surprisingly few different users.

In the case of Alice and Bob having the exact same password and Alice logging in as Bob accidentally, there is nothing you can reasonably be expected to do about this.

In terms of unreasonable efforts, you could store the hash of PEPPER+PASSWORD for each user (where PEPPER is a site-wide constant), run a script to detect same-password cases, put the hash on a "bad list", and automatically send e-mail asking users to change their password if their password hash is in the "bad list". Note however that this opens your site up to a new kind of attack whereby an attacker repeatedly changes their password to a common password and waits to see if an e-mail is generated. If it is, they know another user has that password and they can try common usernames with it (or, the result of scraping any available public data for usernames, such a site forum). So you should rate-limit the script that sends e-mails (for example, to once per day) to reduce the efficacy of such an attack.

It is very unlikely that collision will happen ... can't add all data here to adding as answer — amarnath chatterjee, Dec 28 '16 at 05:00
I could be wrong, but isnt `salt` unique to the username and `pepper` is unique to the site? — CaffeineAddiction, Dec 28 '16 at 09:19
SHA-2 has four different output lengths to choose from, so you should probably specify which of those you are referring to. — kasperd, Dec 28 '16 at 10:05
Usually the username is not included in the hash. Including the username in the hash has certain drawbacks, for example if you ever need to change a username, you have to wait for the user to enter their password before you can change it. And the only reason I have ever heard of for including the username is as a workaround for a poorly chosen salt. As long as the salt has as much entropy as the final hash, there is no benefit to including the username as well. — kasperd, Dec 28 '16 at 10:13
This answer is implicitly guiding people to actually use manually salted SHA-256 as their password hashing function, which is insecure. People should use a specialized password hashing function instead, like PBKDF2 or bcrypt. — Luis Casillas, Dec 29 '16 at 19:55
@LuisCasillas Thanks, I have clarified that the example is just demonstrating the concept and added a warning against roll-your-own crypto. — DepressedDaniel, Dec 29 '16 at 20:31

George Y. · Answer 2 · 2016-12-28T09:57:45.037

If you allow weak passwords to be used, there is always a chance that a user would type another username, type something like "qweasd" or "P@ssw0rd" and log in; the more users you have, the more likely this to be possible. Just look at LinkedIn password dump analysis: 1.13M people used password "123456"! There is nothing you can do to protect against it, besides requiring a strong password.

But if this is a concern for you, you can use a random salt for each password (stored in the same table with the password). For example, for a user entering the password "abcdef" you generate a random salt "9D@!", and store the hash of "9D@!abcdef" and the salt. In this case even when two users have exactly the same password - such as 123456 - it will not be obvious from the database dump.

Another solution, as proposed above, could be to hash username and password together. In this case it may be better to also use a separator - a symbol which couldn't be part of username, such as colon, and hash the "username:password" combination. This would prevent similar cases where user abc1 has password 123456, and user abc has a password 1123456 - their hashes would match if no separation is used.

John Wu · Answer 3 · 2016-12-28T12:37:11.610

When hashing passwords, two passwords can produce the same hash, so if a user inputs someone else's username but his own password, there is a possibility that he will be able to login to that other account. How to prevent this?

This scenario, exactly as you described, isn't possible. The other user would have a different salt, so even if you had a password that would have a colliding hash, the actual value that is put into the hashing algorithm would differ.

That being said, it is possible (though extremely unlikely) that the user's password when combined with the other user's salt generates the other user's hash. No web site that I've ever worked on has ever worried about this, since it is so unlikely.

But...if you really wanted to, you could prevent collisions using the following sort of logic when saving the user's password (assuming it is saved as a salted hash):

var userName = GetUsernameFromUser();
var password = GetPasswordFromUser();
bool isSaved = false;
while (!isSaved)
{
    var salt = GetRandomSalt();
    var hash = ComputeHash(password + salt);
    if (!ExistsInDatabase(hash))
    {
        SaveHashedPassword(userName, salt, hash);
        isSaved = true;
    }
}

The above will keep trying different salt values until the hash doesn't collide with anything.

I don't recommend doing this, but this does answer the question.

Luis Casillas · Answer 4 · 2020-10-14T01:33:16.747

This isn't actually a problem at all in a correctly implemented login authentication and password storage system.

Login authentication works on username and password pairs

When a user logs in, they supply a username (or other identifier, like email) along with their password. Login verification means checking that the (username, password) pair is an authentic one. The username is used to look up the corresponding password entry, and that entry's hash is used to verify that the supplied password is the correct one for that username.

If somebody is trying to log in Alice's account, we compare the password they present to the hash stored for Alice's account; if that fails we don't go and check if it matches Bob's password.

That might sound obvious, but it's important, as we will see below, because every login attempt for a given username is tested using that username's salt and no other.

Salted hashes have very low collision probability

You should be using a specialized password hashing function to store passwords. (See: Thomas Pornin's answer to "How to securely hash passwords?") These functions take a salt as an additional argument. These salts should be unique for each password entry (ideally they should be chosen at random), and stored alongside the hashed password; in fact, it makes more sense to treat the whole salt + hash pair as a unit, as a password verification code (like the PHC string format). Under those conditions:

The probability that two users with the same password would get identical hashes is negligible;
The probability that two users with different passwords will get salts that produce identical hash results is also negligible.

So if your password storage is implemented correctly, you should not observe collisions. It's astronomically unlikely.

Collisions don't matter anyway

But suppose that somehow, Alice and Bob, despite having different passwords and salts, somehow improbably their passwords + salt pairs hash to the same result. What would happen? In fact, nothing bad at all:

If Alice enters her username and password, she is authenticated successfully.
If Bob enters his username and password, he is authenticated successfully.
If Alice enters Bob's username and her password, authentication almost certainly fails. Why? Because it gets hashed with Bob's password's salt, which is different from Alice's.
Likewise, if Bob enters Alice's username and his password, authentication fails, because it's hashed with Alice's password's salt, which is different from Bob's.
If Alice enters Bob's username and password, authentication falsely succeeds—but that's because she knows his username and password.
If Bob enters Alice's username and password, authentication falsely succeeds—but that's because he knows her username and password.

For your scenario to come to pass, what we'd actually need is a situation where pwhash(alice_pw, bob_salt) = pwhash(alice_pw, alice_salt). But:

If you pick large enough salts at random that's astronomically unlikely;
Even if such a situation improbably did obtain, you wouldn't be able to tell from the hashes in your database;
Even if an attacker knows the salts, finding a (fake_password, true_salt) that collides is at least as hard as guessing the true password.

Conclusion

This isn't actually a problem at all in a correctly implemented login authentication and password storage system.

Lie Ryan · Answer 5 · 2016-12-29T02:26:44.370

Due to pigeonhole principle, any fixed-length hash algorithm will have collision. The only way to ensure no collision is to use variable-length perfect hashing algorithm.

Note that with good hashing algorithm, starting at 128-bit hash, the chance of collision is statistically impossible. You'll exhaust the energy in the known universe before you can generate collision in a good 256-bit hash by brute force. In these hashes, attacks generally revolve around finding mathematical weakness, rather than brute force.

Mr. E · Answer 6 · 2016-12-29T12:30:44.840

A hash function is a function such that receives an input in {0,1}^n and its output is in the space {0,1}^t

As n can theoretically grow to infinite, then there are infinite values that map to the same x value in the space {0,1}^t. That is known, and therefore as you said there are at least two values that produce the same hash value

But cryptographycally secure hash functions have a special property, they're collision resistant. That essentially means that an attacker that wants to find a collision for a given hash value has no better approach than using an exhaustive search

In the particular case of bcrypt (The most common hash functions considered secure for password hashing, as it is your concern) has an output of 184 bits (Without the prefix, the salt and the workfactor). That means that an attacker that wants to find a collision for a given bcrypted password needs to try at least 2^92 passwords in average. Even if you could calculate 1 million bcrypt hashes per second you would need more that 3.76 quadrillion years to find a collision for a given hash

The odds that a user mispells his username and enters his own password and gets access to another persons account is around 4*10^-56, assuming he enters a valid username

Statistically talking, it's easier that anyone of us dies cause a meteor impact than finding a collision for a bcrypt hashed password in a random guess

PS: All of this is if the application uses salts properly. If not two users with the same password (Such as P4ssw0rd) could be catastrophic

Impressive , how did u come up with 4*10^-46 ? Can u explain a bit more ? Thanks :) — lasan, Dec 29 '16 at 05:05
I'm sorry, that was a typo. The value is `4*10^-56`, just edited it. The math is this. `bcrypt` has an output of 184 bits (As explained in the answer), that is, there are `2^184` possible outputs. As `bcrypt` is an uniformly distributed function, which means the probability of receiving an output value for a given input is the same for every possible output. So the odds of receiving a certain value is `1/(2^184)` that is approximately `4*10^-56` — Mr. E, Dec 29 '16 at 12:31
In crypto collision is two inputs for _any_ output, not a chosen one, and bcrypt is 2^92; an input for a _given_ output is preimage which for bcrypt is average 2^183. And your 3.76 quadrillion years is about 2^96.6 which isn't either of those. However 2^-184 is near enough 4e-56. — dave_thompson_085, Jan 06 '17 at 16:42

Arnar Freyr · Answer 7 · 2016-12-29T05:48:12.600

Always when hashing a sensitive information like password, you should use a strong algorithm and a thing called a salt. When a server makes a hash from your password it uses a certain method to do so depending on the algorithm in use. No matter how strong the algorithm is, when unsalted, it will always produce the same result with the same string. So, the solution here is that you need to use a salt with your algorithm. It's best if you could use a method built by professional people that generates an hash with a salt.

Also, you could just plant an one if sentence that says:

if user is this and the salt is this then login. Then you are sure that only this user with this username is logging in.

But, if you really want the hash to be random, you need a salt.

So, What is a salt?

Well, Wikipedia explains it in a good way:

In cryptography, a salt is random data that is used as an additional input to a one-way function that "hashes" a password or passphrase. Salts are closely related to the concept of nonce. The primary function of salts is to defend against dictionary attacks or against its hashed equivalent, a pre-computed rainbow table attack.[1] Salts are used to safeguard passwords in storage. Historically a password was stored in plaintext on a system, but over time additional safeguards developed to protect a user's password against being read from the system. A salt is one of those methods. A new salt is randomly generated for each password. In a typical setting, the salt and the password (or its version after Key stretching) are concatenated and processed with a cryptographic hash function, and the resulting output (but not the original password) is stored with the salt in a database. Hashing allows for later authentication without keeping and therefore risking the plaintext password in the event that the authentication data store is compromised. Since salts do not have to be memorized by humans they can make the size of the rainbow table required for a successful attack prohibitively large without placing a burden on the users. Since salts are different in each case, they also protect commonly used passwords, or those who use the same password on several sites, by making all salted hash instances for the same password different from each other. Cryptographic salts are broadly used in many modern computer systems, from Unix system credentials to Internet security. - Wikipedia https://en.wikipedia.org/wiki/Salt_(cryptography)

So when we use a salt we prevent so called rainbow attacks and dictonary attacks and we do not get the same hashed string.

How are we going to add a salt?

There are at least two ways to add a salt.

1. Add the salt to the string before it's hashed that is, string + salt

2.Use built-in methods or good methods designed by professional people

Okay, first of all, an salt needs to be really,really random for every single password. That's a high risk if you have a salt that is always the same because when it will be exposed (Yes, when, because everyone will be hacked at somepoint), it's a lot easier for the hacker to break all the passwords in the database. So with that in mind, we continue.

The first method, adding a salt directly to the string is not bad, but then the question is, where to store the salt? And how are you going to make it as random as possible? Well, it's recommended that if you are going to store salts that they are stored sperately from the passwords, even in another database because maybe someone hacks the password table or the database that the passwords are in and not the salts so then he does not know the salt and therefore not the passwords. Then we need to think about the randomnes of our salt, what method is the best to generate the most random thing that you can get? It depends on the language of course and you did not mention any language there in your question so I'am afraid I can't help you with that.

The second one is to use in-built functions that are designed to hash a password with a salt. These are often better than just adding the salt and hash it, because you can be sure that you get the most random salt as possible and different on each password. The headache with where to store the salt and how I'am going to let it be random and each password and what to use so it gets very, very random dissappears. One line and you are good. This is also often maintained and there are some methods there that provides the strongest algorithm that the language can offer + salt. So, that's even more secure, because you don't have to worry about what algorithm is best to use or when the algorithm you are using breaks, you don't need to go over the whole code and change. But, not all languages support these kind of methods so the first one is then better than nothing.

The second method is pretty good I think, because you don't even have to store your salt at all, at least not in your database and then it's impossible to know what password belongs to each salt so it's pretty secure. Also, I recommend when hashing, if possible, that you use a method that always uses the strongest algorithm avilable and a very random salt. Because, then it happens automatically and you don't have to worry.

So, I think the second method is better because it's a less overhead and, possible more secure because you are using an in-built functions. But, not all programming languages have these functions built-in so the first method is better than no salt. But, if that is not possible, use the first one.

There, you have it, the short answer is basicly put some salt on your password, then the hash will not be the same for each password.

I hope this information helps you and some others.

score 0 · Answer 8 · edited Dec 28 '16 at 10:16

0

Salting each and every password may be too much security here. Also you need a mechanism to store each salt along with passwords. Otherwise you cannot generate the hash.

On the actual question, I think a username + password combination is fine.

You can add an overall salt for all of your passwords to prevent if from reverse engineering. So basically your hashes will be:

Global Salt + username + password

It is very unlikely that collision will happen in the SHA-256 algorithm. Have a look at the birthday problem:

edited Dec 28 '16 at 10:16

Anders

64,406
24
178
215

answered Dec 28 '16 at 04:58

amarnath chatterjee

175
1
4

Salting every password is a basic precaution against an attacker who gets the database from being able to trivially identify identical passwords, and which ensures that in order to brute force a password, they'd need to try each users data in turn. It's also built into the currently recommended password algorithms, so is no extra effort to implement - bcrypt, for example, generates a salt as part of the hash process, which is part of the output string. – Matthew Dec 28 '16 at 11:57
Random salt may be unnecessary ... would rather user created timestamp to microsecond if its really needed. Random salt has to be stored additionally , why do that ? all you need is randomness in authentication strings. Or use one global salt which persisted seperately in file or something so that its not obvious... a person who can steal password hashes from your password table can steal salts as well. – amarnath chatterjee Dec 28 '16 at 12:45
You would need to store the time offset too, but you appear to have missed the point of a per record salt. It's not there as a secret. It's there to make an attacker who has got the data work harder. – Matthew Dec 28 '16 at 12:53

Shailendra Bhardwaj · Answer 9 · 2016-12-28T14:27:43.080

-4

A simple way would be to concatenate username and password to form a string and hash the whole thing. The idea is username and password combination with a seperator has to be unique as the username would be unique. If you are using duplicate username then use user code etc something which is unique for user.

edited Dec 28 '16 at 14:27

answered Dec 28 '16 at 04:09

Shailendra Bhardwaj

100
4

1

Comments are not for extended discussion; this conversation has been [moved to chat](http://chat.stackexchange.com/rooms/50962/discussion-on-answer-by-shailendra-bhardwaj-hashing-two-passwords-can-produce-th). – Rory Alsop Dec 30 '16 at 19:30

Hashing two passwords can produce the same hash. How to prevent this?

9 Answers9

Login authentication works on username and password pairs

Salted hashes have very low collision probability

Collisions don't matter anyway

Conclusion

Linked