1

NIST and OWASP ASVS recommends checking passwords against those obtained from previous data breaches. The list of such passwords can be downloaded from "Have I Been Pwned" (https://haveibeenpwned.com/Passwords), but there are more than 300 million pwned passwords. Do you recommend comparing users' passwords (during registration, password change, or even every successful login) with such a large list? Will it affect the performance too much?

Is it a better idea to compare passwords with smaller lists like top 1000 or 10000 most common passwords for example https://github.com/danielmiessler/SecLists/tree/master/Passwords? That is the exact recommendation from OWASP ASVS.

user187205
  • 1,163
  • 3
  • 15
  • 24
  • 1
    *"Will it affect the performance too much?"* - not if properly done. Don't sequentially scan the list but use a hash table, binary tree or similar. Lookups are fast with these. – Steffen Ullrich Apr 22 '21 at 08:19
  • 1
    This is looking like a performance question, not a security question. Of course the *security* recommendation is to scan for all the breached passwords. Your question is how to optimise performance. That makes this a programming question. What will be an acceptible performance hit for *you* is something only you can determine. – schroeder Apr 22 '21 at 08:40
  • `and, finally, version 7 arrived November 2020 bringing the total passwords to over 613M.` Test for all. The performance issue is off-topic, while @SteffenUllrich gave the approaches. – kelalaka Apr 22 '21 at 09:52
  • There is a very simple answer here. HIBP publishes an API that securely and efficiently does the lookup for you: https://haveibeenpwned.com/API/v2#SearchingPwnedPasswordsByRange Why download a list when you can just go straight to the source? – Conor Mancone Apr 22 '21 at 10:03

2 Answers2

2

"Every successful login" is pointless waste unless you just implemented the password-checking feature and didn't have it before, in which case it might make sense to check every user's login once. The only other time it would make sense is right after you add a bunch of new passwords. Add a field to your authentication DB that indicates whether each password has been checked against the list (defaulting to false), and check each one once, toggling the field to true after it passes (no match). Those that do match the list get forced to change it (if they decline, leave their field false, or even just flag the account for "must change password"). New passwords (either of new users, or if the user changes or resets them) are checked at creation, and once they pass, the field is set true.

The pwned passwords list can be sorted in a few ways, but "alphabetic" (by string comparison logic) is an obvious one, and allows extremely fast lookups (binary search). Stuff it all in a database table with appropriate indices and you won't even have to write the lookup logic yourself. There's no performance reason not to check all 300 million; it's only a bit more than twice as many lookups as checking 10,000 (which isn't enough, really), and that's a pretty easy task for any decent DB engine with a few gigs of RAM to use (give it enough RAM and the whole column can be kept in RAM, though even without that it'll be decently fast). However, you might decide there's some threshold where a password that has ever been compromised is still tolerable (maybe the passwords that have been seen in dumps only once, or just not the million most common ones, or some such thing). The most secure option is definitely to check them all, though.

CBHacking
  • 40,303
  • 3
  • 74
  • 98
  • Hash function's outputs are almost random so multi-level directory structure can be efficiently used, as in browser caches. Might be faster than a database – kelalaka Apr 23 '21 at 00:47
2

Do you recommend comparing users' passwords (during registration, password change, or even every successful login) with such a large list?

For reasonable security, I recommend to check passwords against a list of 10,000 passwords during registration and password change. Also block passwords that contain the company name, application name, or the user's name.

To improve security beyond that, you can increase the size of the blocklist. Larger blocklists are more secure, but this has diminishing returns.

Checking on login is useful if a user already had a password configured, which is then exposed in an unrelated data breach. This only works when you regularly update the database, or use an API.

See also this ASVS issue for a discussion.

Will it affect the performance too much?

If properly implemented, lookups in a database of 300M passwords can be very fast. However, it's a bit harder to implement than a lookup in a text file with 10000 lines. So it's more about the costs in developer time than server execution time.

Sjoerd
  • 28,707
  • 12
  • 74
  • 102
  • Hash function's outputs are almost random so multi-level directory structure can be efficiently used, as in browser caches. Might be faster than a database. – kelalaka Apr 23 '21 at 00:46