How to check for duplicate passwords?

Question

Is it possible to check if a given number of people are using the same password, without risking anyone's password getting out? I heard that Google does this, not allowing the user to set a password 1000 people are using. What if users have access to their own records (e.g. distributed database)?

EDIT: The number of users for what I'm doing right now is 10-10K. What if we ask each client if their password matches a given text anonymously (I can do it with some RSA encryption and stuff)? Would it work? Or should I give this idea up for good?

Could you please elaborate on the last part of your question, the "distributed database" part? — Bob Brown, Jan 12 '15 at 00:32
@BobBrown I'm trying to implement this in a distributed database where each user has access to his own certificates/encryption keys/encrypted data/... — Behrooz, Jan 13 '15 at 02:40
@Behrooz dumb question, but why use passwords at all? If you are dealing with a distributed system with encryption keys ... why not just us pub key authentication and be done with it ... I mean, if your implementing a distributive encrypted file system ... it seems like the passwords would be the weak link ... why not ditch it for something more secure? — CaffeineAddiction, Jun 28 '16 at 13:36
@CaffeineAddiction It's a very valid question. The problem is the users. They can't carry physical keys reliably(losing it or forgetting to bring it to work) or handle files in a secure manner(they would all probably end up in a windows share accessible to everyone in a medium sized company). That's why I had to resort to passwords. and yes I know they're the weakest link but they're the hardest weakest link I can have that don't come with usability drawbacks. and as far as I can tell the managars only care if the thing just works. — Behrooz, Jun 28 '16 at 14:49
@Behrooz they dont need to be physical keys ... most ssh keys are stored in `/home//.ssh/id_rsa` ... each user has there own home dir that is inaccessible to others ... you could also enable full disk encryption or encrypt the users home directory ... if each user can use multiple computers then you could setup roaming profiles or a startup script that securely installs there private key on each computer they log into. — CaffeineAddiction, Jun 28 '16 at 14:55
multiple users using one machine. different work shifts and legal standing problems. — Behrooz, Jun 28 '16 at 15:58

Bob Brown · Accepted Answer · 2016-06-28T13:06:17.500

12

The only practical ways to do that are to use the same salt for hashing all passwords (bad) or to store the passwords in plain text or reversible encryption (very, extremely bad.)

Since those approaches are big and bigger security problems, I'd guess Google isn't doing either of them. What they might be doing is disalllowing the 1,000 or so most common passwords. There's a list of common passwords here: https://xato.net/passwords/more-top-worst-passwords/ (Note that if you disallow the top 10,000, for the "ordinary" user, you've disallowed everything.)

Edited to add: I guess you could store the last n passwords and their (different) salts. Seems like a lot of trouble.

edited Jun 28 '16 at 13:06

answered Jan 12 '15 at 00:31

Bob Brown

5,283
1
19
28

1

The top 10000 doesn't cover everything; there's not _that_ many Significant Other Name/Local Sports Name + Meaningful Date combinations in there :) And yikes; I don't even want to think about your bad and very, extremely bad options in actual use. Adobe notwithstanding. – Anti-weakpasswords Jan 12 '15 at 00:34
As Anti-weakpasswords has said, you *could* hash new passwords once for each existing user. I ruled that out as impractical in my answer, but it'll work if you have dozens of users. If you have as few as hundreds, it'll be impractical because of the time it takes. – Bob Brown Jan 12 '15 at 00:35
@Anti-weakpasswords: Correct, the top 10,000 *doesn't* cover everything. Mine aren't on that list, for example. I still say that for "ordinary" people, those who do not pay attention to security issues, disallowing all 10,000 will make setting a password effectively impossible *because of the way people choose passwords,* not because the list is exhaustive. – Bob Brown Jan 12 '15 at 00:37
Ahh; I was being partly humorous and partly serious; the top 10,000 cover a lot of people; I was pointing out that there is a small subset that habitually choose very weak password patterns that don't make the top 10,000 explicitly, like Jennifer311208 or Mark02141999, or even Chris9/9/99. Those patterns are why I suggest the rules based checks if you want to disallow or simply warn users about weak passwords. Most will choose them anyway if you warn them, but then you at least tried to educate them. – Anti-weakpasswords Jan 12 '15 at 00:40
@Anti-weakpasswords: Got it. (I've always been fond of the name Jennifer.) – Bob Brown Jan 12 '15 at 00:42

score 8 · Answer 2 · answered Jan 12 '15 at 00:31

If you're storing passwords properly, i.e. with PBKDF2/BCrypt/SCrypt, then as the user is changing their password you can spend the time to hash it with other user's salts (which are, of course, cryptographically random and unique per user), and if you find a match in the first N, then tell the user no. If you wanted, you could check every user, but that would take a prohibitive amount of processing time.

If it doesn't take a prohibitive amount of processing time, either you only have a handful of users, or you need to increase your PBKDF2/BCrypt/SCrypt iteration count/work factor.

This is a BAD idea, however, since it gives user A the possibility of gaining information on what other users have as their passwords, without actually trying to log in as them.

Better, include a check against rules-based dictionary attacks with a small to medium dictionary, using that pre-hashed cleartext candidate password. I go into more detail in my answer to , but the basics are:

Have a small to medium wordlist, very simple, letters are all upper case.
Code some "rules" to apply to the password, just like Hashcat or John the Ripper
- You can handle all the uppercase/lowercase rules with a simple UPPER() equivalent and an all-uppercase dictionary - if you find it, it's weak. (JacQueLinE)
- Appending/prepending numbers purely to meet length minimums is a simple pattern match - if the last/first N characters are numbers, and the remaining length isn't enough, it's weak. (Riddick123)
- Remove N numbers from the beginning/end, uppercase it, and check the dictionary for the remainder (JacQueLine12)
- The above, but N-1 numbers and/or symbols (#1JacQueLine)
- The above, but date formats. (JacQueLine02121995)
- If the last/first N-1 characters are numbers and the last/first is a symbol, and the remaining length isn't enough, it's weak. (!JacQueLine1)
- Take out one character at a time, see if it matches the dictionary. (jacqu$eline)
- Combine some of these.
- Reverse all of these.
- then move on to dictionary word combinations and so on.

I would reject the very worst passwords, absolutely. For an advanced project, start with a "worst" check (small top N worst passwords + very basic rules), and just reject those. Then, if no match occurs, try a "medium-worst" check, and give a big bright red warning. Then, if no match occurs, try a "worse" check, and deliver a smaller warning.

If no match occurs at all, say nothing. It might still be a horrifically bad password, you just didn't check that particular combination of base word + rule; you cannot ever know if a password is good.

Example: "286755fad04869ca523320acce0dc6a4" looks pretty random! It's just MD5("password")!!
- Making it 1337 speak doesn't help.

If users have access to the password hashes and the salts, then they can run tools like Hashcat or John the Ripper just like anyone else with a (leaked) password list can.

+1 for "This is a **BAD idea** because..." I hadn't thought of that. — Bob Brown, Jan 12 '15 at 00:39
Just remember that by using this approach (disallowing bad passwords) you are reducing the possible entropy of any good passwords. The more passwords you disallow for legitimate users, the less passwords the attacker has to try. As @anti-weakpasswords alludes to, it's important to strike a balance between disallowing the very worst passwords, and maintaining enough allowable passwords that an attacker still has a high enough count of passwords to try. — Chris Murray, Jan 12 '15 at 09:04
@ChrisMurray: A password in the top 10,000 has an entropy of about 13.3 bits, no matter how long it is, and will likely fall to an attacker within seconds. While bits of entropy is an important metric, and disallowing certain combinations *does* slightly reduce the available entropy, the more important consideration is "against what threats are we vulnerable." If we do not think like the attackers, we will fail. Finally, there are about 6.6e15 eight character combinations of 95 printable characters. Removing 10,000 of them changes the entropy almost not at all. — Bob Brown, Jan 12 '15 at 11:57
@BobBrown, removing 10,000; absolutely. However, using the rules that Anti-weakpasswords specifies will remove far, far more than 10,000. By using the same rules as the attacker would use, you remove the need for the attacker to use those rules. For example, one common dictionary attack rule would be to append/prepend numbers to dictionary words. By disallowing this, the attacker no longer has to run this rule, reducing the password space significantly. — Chris Murray, Jan 12 '15 at 12:40
@ChrisMurray - yes, the attacker doesn't have to run whatever rules the attacker knows about; though even if the attacker knows all of them, I would have to argue that you have not reduced the keyspace **significantly**. Further, the tiny portion of the total keyspace you are eliminating is the portion that the attacker would have attacked first, and gotten through in a very short time, regardless, so it's not really material. Unless you're also arguing that we shouldn't have minimum password lengths (which reduce the keyspace), or complexity rules (which also reduce the keyspace) either? — Anti-weakpasswords, Jan 13 '15 at 04:49
"""would take a prohibitive amount of processing time""" actually ... not really I ran `date +%s ; for i in {0..10000..1}; do echo 'foo '%i' bar' | sha1sum >> /dev/null ; done ; date +%s ;` in a single thread on a 3 year old server in 35s — CaffeineAddiction, Jun 28 '16 at 14:59
@CaffeineAddiction: That's *exactly* why one should use a computationally expensive hash algorithm, not SHA1. Try the same experiment with, *e.g.* bcrypt.with a cost of 10. — Bob Brown, Jun 29 '16 at 15:32

score 2 · Answer 3 · answered Jan 16 '15 at 00:45

If you use something like a Bloom Filter you can test to see if one of your other users is already using that password but it would be hard to tell which users in particular. There's a possibility for false positives but not false negatives.

http://en.wikipedia.org/wiki/Bloom_filter

Agree with the other response that while it's possible it's a bad idea because it gives away information about what passwords have already been used.

I'm skeptical about if this is a valuable protection. It certainly keeps large numbers of people from using "abcdef" and "123456" but if 10 of your 1M users pick a strong, random, long password, what's the harm if they are using the same value?

score -2 · Answer 4 · answered Jan 16 '15 at 08:33

I have a partial solution to your original question. From one I get, your intention is to check if a given number of people are using the same passwords.

The answers here so far seem to go more in the direction of secure passwords, which is good. While you should ensure 'strong' passwords, I think you just don't want a certain number of people to use the same password, however secure it is reckoned to be.

What if you do this? Keep a list of simple non-salted hashes of user passwords on point of registration. Like a SHA-512 hash. (you could even do repeated hashes a certain number of times only you knows) The list has no usernames, just password hashes that can be anybody's. You could even go further and implement it in a table, such that this pre-acceptance hash has a counter alongside, showing the number of people using a particular password on your site. Note that there is no way to tell the original password.

Of course, when you are actually storing the username/password in the database, you will then salt and hash. The aforementioned list is just a random list of pre-salted hashes of passwords together with the number of times that hash has been generated.

Now when a new user chooses a password, you do the pre-acceptance hashing and check whether that hash is already in the list and if so, how many times has the hash been used? If it's been used up to the maximum number of times you have decided, simply reject the password. On entry of a 'successful' password, enter the new pre-salted hash into the list.

Ideally, if users are using 'strong' passwords following the criteria other people have mentioned, the counter of each hash in the list should be exactly 1.

The solution is secure, as even if an attacker steals the list he can't use the hash to login, as its not the same as the actual salted hash in the database. Also you can use different hashing algorithms to frustrate cracking. This, of course, doesn't mean the attacker cannot try the usual means others have mentioned. What I am saying is that this solution does not in any way {further} compromise your users' security.

The downside (and why I said it's a partial solution) is that if the same user changes his/her password 3 times a day, the list is amended with the pre-salted hash of each of those passwords, even though in reality, no one is using any of them. And there is no way to check who supplied what hash as the list, for obvious reasons, does not contain usernames. It is very anonymous, and should be!

There is actually a fix to the downside, but I think it's better left unmentioned. Let me know what you think! Thanks and God bless you.

If you store the passwords using a weak hash, an attacker can crack that to get a list of passwords that are in use. Once they do that, matching passwords to users is trivial. — Mark, Jan 16 '15 at 08:44
You don't get me obviously. Cracking the pre-salted hashes may not be that trivial. Still even if it is, how does an attacker match the passwords to the usernames, given that this hash is different from the actual salted hash stored in the database? — THEL, Jan 16 '15 at 09:23
If they do have the main database, by traditional brute-forcing, using their newly-discovered password list as the dictionary. If they don't have the main database, by trying to log in. — Mark, Jan 16 '15 at 09:49
You don't get me obviously. Cracking the pre-salted hashes may not be that trivial. Still even if it is, how does an attacker match the passwords to the usernames, **given that this hash is different from the actual salted hash stored in the database?** It's just like telling you that there are 20 people on gmail whose passwords, P have a SHA-512^n(P) = x. If you know x, it does not mean you can get P. And if you get P, how will you get the matching username? Especially if secure password are used. — THEL, Jan 16 '15 at 10:28
Step 1: Take the "simple non-salted hashes" and apply traditional brute-force techniques to turn it into a list of passwords -- historical evidence shows that you'll get about two-thirds of the passwords this way. You now have a list of passwords that you know people on the service are using. Step 2: Use this list of passwords to try to log in as each user in turn. There are various ways to improve the efficiency of this technique or make it harder to spot, but it should be simple enough to understand why keeping a list of poorly-hashed passwords, even in isolation, is a bad idea. — Mark, Jan 16 '15 at 11:04
there was a chance some variation of this would work if the project wasn't open source. And personally I hate security through obscurity. — Behrooz, Jan 16 '15 at 13:12

How to check for duplicate passwords?

4 Answers4