31

The site which I maintain has been in production for 3 years. When you register, the site generates a large (20 digit) random hex password for you. It's stored MD5 hashed unsalted.

When I told the lead dev that MD5 is bad for passwords he said if bad guys get it there's no way they can crack it because the password is random. And even if the bad guy cracks it we generate it so users can't reuse it on other sites.

How can I convince him that we need to use best practices? He is very stubborn...

Mike Ounsworth
  • 57,707
  • 21
  • 150
  • 207
bad security
  • 301
  • 3
  • 4
  • 4
    If someone cracks one of the passwords, what can they do with it? – Royce Williams Sep 13 '17 at 04:23
  • 38
    I know that you say that you generate the password, but is a user able to change the password to something that they want after registration? If so how do you store that password? If it's still unsalted MD5 you have a problem. – zero298 Sep 13 '17 at 06:38
  • 12
    I don't understand what stops the users from memorizing(or writing down) the password from your site, then reuse the nice and random-looking string on other sites. – DreamyDays Sep 13 '17 at 07:10
  • 3
    @ThomasMoors That doesn't stop them, that even encourages them. – glglgl Sep 13 '17 at 07:34
  • 2
    So your users are given a random password once and can never change it? Or do you handle user and random passwords separately, user passwords being properly hashed? How do we need to understand this? – Damon Sep 13 '17 at 11:34
  • 4
    as @Damon says the key question here is whether user's can change their passwords after the initial ones are created. If they can't then the risk of breaking the hashes is low (but the usability of the solution is questionable as more users will not be able to remember a 20 character random string). If the users can change their passwords, whats to stop them choosing something easily crackable? – Rory McCune Sep 13 '17 at 14:20
  • If the users can't change their password or at least reset it (e.g. mail with reset link), I'd see that as a vulnerability as well. If the user determines that their password has been compromised (e.g. lost the piece of paper, they wrote the random string on), that poses a potential threat to their account security and there's nothing they can do about it. – SaAtomic Sep 14 '17 at 07:11
  • 2
    How are the random passwords _delivered_ to your users? That's your most likely point of failure. Also, drop the "slow hashing is slow" assumption. Or challenge it, if it's an argument your developer is running. Using a proper salted hash absolutely will not make your webapp any slower because authentication occurs infrequently and represents only a tiny, tiny fraction of what a user actually does in a typical session. Even within an authentication request, the database access to _lookup_ the stored hash probably takes longer than running even the _slowest_ of hashing algorithms. – aroth Sep 14 '17 at 12:45
  • 1
    The real question here is why are you generating passwords for your users at all? This is very low on user-friendliness. You should allow them to choose their own passwords right from the start. – user207421 Sep 15 '17 at 01:08
  • 1
    Off topic here, but an argument for your dev: if a website/app/program in 2017 forces an insensible _password requirement_ (such as 'at least one special character'), I'll never use the site unless it's absolutely vital. If a website forces _a password itself_ on me, I'll leave at the very instant and never return. – Pavel Sep 18 '17 at 10:12

4 Answers4

47

Ok, so the site generates a random password for each user at registration time. An important question is whether a user can can manually set their password later, or if they are forced to use a random site-generated password. Let's look at the two cases separately.


Random passwords

As far as I can tell, this is the scenario you are describing in the question. Unfortunately, your dev is (mostly) right. At least about single iteration of hashing vs a big slow hash. Your question kinda has the flavour of blindly applying "best practices" without considering what those practices were intended for. For a brilliant example of this, here's a good read:

The Guy Who Invented Those Annoying Password Rules Now Regrets Wasting Your Time

Suggestion

Do switch from MD5 to SHA256, probably add a per-user salt, and maybe consider going to 32 char passwords. But adding a big slow hashing function will increase your server load for little to no added security (at least barring any other goofs in your implementation).

Understanding hashing as a brute-force mitigation

The amount of work a brute-force attacker who has stolen your database needs to do to crack password hashes is roughly:

entropy_of_password * number_of_hash_iterations * slowness_of_hash_function

where entropy_of_password is the number of possibilities, or "guessability" of the password. So long as this "formula" is higher than 128 bits of entropy (or equivalent work factor / number of hash instructions to execute), then you're good. For user-chosen passwords, the entropy_of_password is abysmally low, so you need lots of iterations (like 100,000) of a very slow hash function (like PBKDF2 or scrypt) to get the work factor up.

By "20 digits hex digits" I assume you mean that there are 1620 = 280 possible passwords, which is lower than "best-practice" 2128, but unless you're a government or a bank, you probably have enough brute-force security from the entropy of the password alone.

Salts also serve no purpose here because pre-computing all the hashes is like 280 * 32 bits/hash, which is roughly 1 ZB (or 5000 x the capacity of all hard drives on the planet combined). Rainbow tables help this a bit, but quite frankly, any attacker capable of doing that, deserves to pwn all of us.

You still want to hash the password to prevent the attacker from walking away the plaintext for free, but one hash iteration is sufficient. Do switch from MD5 to SHA256 though, and maybe consider going to 32 char passwords.


Human brain passwords

Commenters on this thread seem obsessed with the idea that, despite your statement that the site generates passwords, users can in fact choose their own passwords.

As soon as the user has the possibility to change the password, the a single hash iteration is no option for storing the now low-entropy password. In this case you are correct that you need to do all the best practice things for password storage.


Salting

Either way (user-chosen or random passwords) you probably want a per-user salt.

If user-chosen, then salts are part of the best practices. 'nuff said.

If random, @GordonDavisson points out a really nice attack in comments [1], [2] based on the observation that a db lookup is essentially free compared to a hash computation. Computing a hash and comparing it against all users' hashes is essentially the same cost as comparing it against a specific user's hash. So long as you're happy getting into any account (rather than trying to crack a specific account), then the more users in the system, the more efficient the attack.

For instance, say you steal the unsalted hashed password db of a system with a million accounts (about 220). With 220 accounts, you statistically expect to get a hit in the first 260 guesses. You're still doing O(280) guesses, but O(260) hashes * O(220) db lookups ~= O(260) hashes.

Per-user salts is the only way to prevent attacking all users for the cost of attacking one user.

Mike Ounsworth
  • 57,707
  • 21
  • 150
  • 207
  • 6
    The lack of salt *might* still matter if there are a lot of accounts, since without salt a password-guessing attack can be run against all accounts at once. For instance, if you have a million accounts (about 2^20), an attacker will only have to try (on average) 2^60 passwords before finding one that matches *some* account. If you have a billion accounts, they're down to only 2^50 guesses (average) before first match. – Gordon Davisson Sep 13 '17 at 03:29
  • 16
    As Mike said, if the password is randomly generated such that it always have sufficient entropy, then using slow hash is not strictly speaking needed. However, one thing I'd definitely check is how the password is generate. Check what PRNG algorithm is used and how that PRNG is seeded. The PRNG need to be seeded with at least 80-bit entropy per generated password for the generated password to contain the full strength of 20 character hex. If the PRNG is seeded with just current vtime, as is common in many weaker applications, you might actually have much less strength than expected. – Lie Ryan Sep 13 '17 at 03:36
  • 1
    @GordonDavisson How are you computing those numbers? Is that some sort of pigeon-hole or birthday analysis? Remember that the limiting resource for brute-force attacks is electricity costs, not time, so I don't see how paralleling that makes a difference. Whether you're trying 2^80 passwords against one account, or 2^60 passwords against 2^20 accounts, you still need to pay the electricity costs of 2^80 guesses. Cryptographers like 2^128 as a security level because that's where we start getting into "multiples of the sun" in power consumption for the attack. – Mike Ounsworth Sep 13 '17 at 03:50
  • 11
    @MikeOunsworth Checking many candidate hashes against many accounts can be sped up considerably (and the power requirements decreased) by first putting the accounts' hashes in a hash table (or bloom filter or...), so you can then do a near-constant-time lookup for each candidate hash. The table (/filter) lookup won't be as fast as a single comparison, but it'll be much much faster than comparing against each account's stored hash separately. Actually, even without that a single MD5 computation followed by 2^20 compares will be faster than 2^20 MD5s, each followed by a single compare. – Gordon Davisson Sep 13 '17 at 06:01
  • @GordonDavisson +1, I see how that decreases overall energy costs when you're looking at a small password list (ie derived from dictionary words) but if you're dealing with uniformly random passwords from a large enough set, say 2^128, then I still don't understand how you can possibly have an attack in less than O(2^128)? – Mike Ounsworth Sep 13 '17 at 13:47
  • The first sentence seems misleading since the OP should change from MD5 to SHA256 regardless of the password generation method. The answer seems sound, but someone skimming might mistake that you are arguing MD5 is OK so long as the input has high entropy. – IllusiveBrian Sep 13 '17 at 13:54
  • @MikeOunsworth I think he's getting that from the 16^20=2^80 so part of the 2^128 space is unused. A salt could be added to increase the used space without having to give the user a longer password (although with 20 hex digits already I'd think it'd be easier to just increase the password length to 32 characters, it's not like people are memorizing them (I hope)). – AndrolGenhald Sep 13 '17 at 14:09
  • Please add a note about changing passwords, if a user can do that, then the procedure isn't safe anymore. – martinstoeckli Sep 13 '17 at 15:01
  • @MikeOunsworth I was assuming 20-digit hex passwords (stated in OP), which (as AndrolGenhald said) gives 16^20 = 2^80 possible passwords. In theory, testing all 2^80 possible passwords against all accounts should recover all accounts' passwords. If that's 2^20 accounts, then the average recovery rate is (2^20 accounts) / (2^80 password guesses) = (1 account) / (2^60 guesses). – Gordon Davisson Sep 13 '17 at 18:26
  • @MikeOunsworth if you stop after your first hit, it will take about 2^60 hashes (and table lookups). – Paŭlo Ebermann Sep 13 '17 at 18:34
  • @GordonDavisson Oh I see. You're saying that if you try 2^60 passwords against all 2^20 accounts, you *expect* to get a hit. While this is still O(2^80) *guesses*, it's O(2^60) hashes and O(2^20) lookups which is less work / energy than 2^80 hashes. Am I understanding correctly? – Mike Ounsworth Sep 13 '17 at 18:37
  • @MikeOunsworth Essentially, yes. With one minor correction: I'm storing the 2^20 actual account hashes in the hash table, not the 2^60 guessed hashes. So it takes 2^20 hash table stores (loading the account DB into it), then 2^60 MD5 calculations (of password guesses) and 2^60 hash table lookups (one of each of those hashed guesses). – Gordon Davisson Sep 13 '17 at 19:07
  • 1
    Also, I should clarify that my attack is only advantageous because an unsalted hash was used. A slow but unsalted hash would be vulnerable to the same speedup (although calculating 2^60 slow hashes is going to be a big job -- just not as big a one as calculating 2^80 slow hashes). A fast but salted hash would not be vulnerable to this speedup. When you have a large number of accounts, *salting matters*. – Gordon Davisson Sep 13 '17 at 19:11
  • @GordonDavisson Nice point. Thanks for taking the time to explain. Mind if I edit that in? – Mike Ounsworth Sep 13 '17 at 19:16
  • @MikeOunsworth You're welcome; feel free to edit it in. – Gordon Davisson Sep 13 '17 at 19:22
  • @GordonDavisson Am I correct in thinking that increasing the random password from 20 hex digits to 32 hex digits would make a salt redundant, or is there something I'm missing? I'm thinking if the password space is the same size as (or larger than) the hash space it'd be more advantageous for the attacker to enumerate hashes directly rather than enumerating passwords. – AndrolGenhald Sep 13 '17 at 19:28
  • @AndrolGenhald Gordon's point is that unsalted, you can do the expensive hashing once as a pre-computation, and then check it against each user essentially for free. Salted, you need to completely redo the hashing for each user. No matter how big your password space is, you still get this speedup that checking a hash against one user is essentially the same cost as checking against all users. – Mike Ounsworth Sep 13 '17 at 19:36
  • @GordonDavisson Does that edit properly capture the thought? Feel free to suggest an edit. – Mike Ounsworth Sep 13 '17 at 20:03
  • @MikeOunsworth Looks good. The only thing I might expand on is that the importance of salting is proportional to the number of accounts you have -- if there are just a hundred accounts, salting isn't a big deal; if there are a billion (or you want to scale to a billion), it's a big deal. Also, I agree with your reply to AndrolGenhald. – Gordon Davisson Sep 13 '17 at 21:40
  • @MikeOunsworth I didn't explain myself well in the last comment, mostly because I hadn't thought it through completely. With the password being generated like this, what if you think of it as every user has the same password, but you're using a random salt. In a way, you don't need a salt because the password already is one. What I'm trying to get at is if you already know your password has 128 bits of entropy and you're using a 128 bit hash, adding a salt doesn't do any good because in either case the attacker's best option is to brute force the 128 bit hash (assuming a good hash of course). – AndrolGenhald Sep 13 '17 at 23:13
  • The check against each user is essentially free in either case, it's just that when salted you're not checking the password against each user, you're checking the salt+password against each user, which all but guarantees you at most one match. When the password is random anyway I don't see the difference. – AndrolGenhald Sep 13 '17 at 23:15
  • @AndrolGenhald uhm, I feel like you have a misconception buried in there about the problem salts are meant to solve, but I can't put my finger on what. Unsalted, if two users have the same password hash in the db, then you know they have the same password; if they have different hashes, then they have different passwords. That means you can build lookup tables mapping `hash value --> password` and use it on all users. Salted, matching / not matching hashes tells you nothing about matching / not matching passwords. You can't build lookup tables. Longer passwords does not give you this property. – Mike Ounsworth Sep 14 '17 at 01:50
  • @AndrolGenhald I think I see your point though, that there comes a point where your random passwords are long enough that even just iterating through all possible passwords is intractable, and holding the lookup table in memory would require more RAM than there is on the planet. – Mike Ounsworth Sep 14 '17 at 02:00
  • 1
    This answer (and also the OP) appears to significantly overstate the impact of using a proper salted hash. While it's technically more costly than the current approach, the real-world impact when viewed in the context of a modern webapp where user authentication represents a small and infrequent occurrence relative to all the things a user actually does after authenticating is negligible. A valid response is to challenge the assumption that a "big slow hash" is actually slow in _real world_ usage where people aren't just authenticating continuously as fast as possible. – aroth Sep 14 '17 at 12:41
  • @MikeOunsworth I think I understand salts correctly, my point is that the random password makes duplicate passwords for different users extremely unlikely. With 64 bits [Wikipedia](https://en.wikipedia.org/wiki/Birthday_attack) says for a 50% chance of a single collision you'd need over 5 billion users. – AndrolGenhald Sep 14 '17 at 14:06
  • When the random value is 80 bits you have 2^80 passwords to try. With 2^20 users adding a salt is essentially adding 20 unique bits to the password, increasing the number of passwords to 2^100, and @GordonDavisson's attack reduces this back to 2^80 for finding a password for a single user. If your random value is 128 bits, adding 20 bits to the password doesn't do anything because the hash length already caps the number of passwords to test at 2^128. Whether or not you have a salt it' s still 2^128/2^20 = 2^108 attempts to get a password for a single user. – AndrolGenhald Sep 14 '17 at 14:07
  • "Salted, matching / not matching hashes tells you nothing about matching / not matching passwords. You can't build lookup tables. Longer passwords does not give you this property." - You're talking about a rainbow table right? Don't longer _random_ passwords make rainbow tables infeasible the same way salts do? – AndrolGenhald Sep 14 '17 at 14:08
  • @AndrolGenhald I think we're getting into hair-splitting here, but this is a fundamental difference, not an issue of scale: longer passwords make lookup tables (and also rainbow tables) **infeasible** as you suggest because the size of the table grows. Salts make them **impossible** - regardless of the length of the password - because any computation done for salt1 does not apply to salt2. The problem with thinking about a salt as extending the password length is that a salt is _public_: attackers stealing a password db see the salt in plaintext. – Mike Ounsworth Sep 14 '17 at 14:18
  • @MikeOunsworth Let's pretend the first 64 bits of a 128 bit random password is a random salt that's magically not public, if the attacker doesn't know that salt an attack would be strictly harder than if it were a 64 bit public salt and a 64 bit random password. ie a 128 bit random password cannot be worse than a 64 bit random salt with a 64 bit random password. – AndrolGenhald Sep 14 '17 at 14:33
  • 1
    @AndrolGenhald I think you're still misunderstanding what salts do; they don't increase security against password guessing, they prevent a loss of security from parallel password guessing. Suppose we have random 128-bit passwords, 2^20 accounts, and no salt. Then my parallel attack will average 1 account password per 2^108 guesses (impractical, but not as hard as it could be). If you added salting my attack would still find a hash match every 2^108 guesses, but 99.9999% of the matches will be useless for login because they match the hash of an account with a different salt. – Gordon Davisson Sep 15 '17 at 01:40
  • [cont'd] This means that with salting the parallelization trick does nothing. It takes an average of 2^131 guesses per account to recover usable passwords. So in a limited sense the salt has increased the security to 2^148 bits (but only sort of) because both the hash *and salt* must match for a login to succeed. – Gordon Davisson Sep 15 '17 at 01:43
  • @GordonDavisson I had a light-bulb moment this morning and I think I located the point we disagreed on. Say we have salt `s` and password `p`, and hash `h`. I was, without realizing it, assuming that if the attacker found `h'` where `hash(h') = h` that he would be able to extract `s` and `p` from `h'`. I'd seen recommendations for things like `h = hash(s | hash(s|p))` but I'd never understood the importance of it, and I'd still thought of it mentally as `h = hash(s|p)`, so this attack is why a simple concatenation of `s` and `p` isn't good enough. – AndrolGenhald Sep 15 '17 at 13:04
  • 1
    With that assumption removed I believe all of my previous reasoning is invalidated, so I think I understand you completely now. Thanks MikeOunsworth and GordonDavisson for taking the time to explain it to me, I'm glad I was able to learn something! – AndrolGenhald Sep 15 '17 at 13:06
11

Supplementing Mike Ounsworth's answer, your dev is probably correct, providing they're generating their random numbers properly.

If you seed your PRNG that generates these passwords badly, then an attacker can infer the state of your PRNG to predict future passwords. For instance, in a hugely pathological case in which you use a Mersenne Twister with an internal state that isn't refreshed between sessions, the following attack is viable:

  1. I request some large number of accounts in sequence
  2. You generate a correspondingly large number of bytes from your PRNG and send them all to me
  3. I use those bytes to infer the internal state of your PRNG at the time when you generated my passwords
  4. From this, I infer the internal state of your PRNG at the time when you generated every subsequent user's passwords. Every future password your PRNG generates can be predicted by me. Further, the MT can be run backwards to generate all its previous outputs from a known point in time
  5. I have now computed every password in-use by your system without having access to your database

Make sure you use a cryptographically secure source of randomness. Your language's built-in PRNG may well not be.

Also, how are your users actually going to remember these passwords? Generating something long and unpredictable is just going to cause your users to save password.txt on their desktops. If the password is meant to be stashed in a config file somewhere then you probably don't have any real issue, but if it's something that's supposed to live in a user's head then you're massively overestimating the capabilities of your users, and likely causing them to invent their own security flaws.

ymbirtt
  • 483
  • 3
  • 7
0

As others have said, reversing the MD5 of an 80 bit random number is a hard problem, so if anyone obtained your table of hashes, they probably wouldn't be able to access user accounts.

However, you might want to consider where the users are storing those 80 bit random numbers. It probably isn't going to be in their heads. Best case, it's in a reasonably secure keychain or password repository app. Worst case, they'll have a file with PASSWORDS_FOR_YER_APP.TXT in their home directory.

Rich
  • 817
  • 6
  • 5
-3

MD5 as you say is insecure, mainly because of discoveries in 2013 that allow it to be collision-attacked in 2^18 time (Apparently less than a second on modern machines).

Regardless of whether of not the password is going to be used on other sites, your site is still insecure. Just because it's random and thus won't appear on lookup tables of any kind, doesn't mean it can't be broken via collision. That means that if somebody gets the hash, they can pretty easily determine something that will function as a password, assuming you check the hashes.

As others have said, use a better method - SHA family are good, but many prefer SCrypt and BCrypt for passwords, and if its randomly generated passwords you're dealing with you probably won't need salt.

  • 4
    Which MD5's collision resistance is every bit as broken as you say, the best known preimage attack still has a complexity of 2¹²³. – Dennis Sep 13 '17 at 14:25
  • 2
    Collision attacks involve the attacker generating a pair of inputs that have the same hash value. If one of the inputs isn't under the attacker's control (as is the case for a password database), weakness to collision attacks is irrelevant. – Mark Sep 13 '17 at 20:51
  • Note that sha1's collision resistance is also demonstrably broken. Nevertheless, collision resistance is irrelevant for password hashing. – Lie Ryan Sep 14 '17 at 12:26