3

Years back I voiced an opinion that making password handling forgiving, by accepting perhaps a single wrong character, would cost entropy but would not be leaving the barn door open; an additional entropic character or two should (I thought) be enough to offset loss of entropy by accepting near-misses.

I also said that I didn't see any way that wouldn't bear killer implementation issues; a fuzzy match to a stored plaintext password would scream "STORED PLAINTEXT PASSWORD" to security professionals and sink like a rock.

However, what if multiple trapdoor-encrypted passwords were stored: the "real" password, and all the variants that meet an error tolerance standard intended to foresee most "near-miss" typos. Such a criterion might hold tightly to near-miss requirements and only tolerate one near-miss rather than two, but while it would entail a different schema, it would be possible to have multiple passwords (the original as typed and slightly forgiving variants) without the unnerving insecurity of storing plaintext passwords.

What are the security liabilities of this approach above loss of entropy by having maybe a couple of orders of magnitude more strings that would be accepted? If an attacker has a hundred encrypted passwords that are all typo variations on one single password, is cracking one password potentially easier than a hundred completely unrelated encrypted passwords?

From a User Experience perspective, I imagine there are some UX types who would drool over, if you have to type a password rather than e.g. Facebook OAuth, showing tolerance to the most common typos in entering passwords.

Christos Hayward
  • 1,210
  • 8
  • 10
  • 2
    I think you're looking for hashes that cluster when the input is similar: https://en.m.wikipedia.org/wiki/Locality-sensitive_hashing – J.A.K. Apr 01 '17 at 19:52
  • 1
    Another option that is used by [Facebook](https://security.stackexchange.com/questions/68013/facebook-password-lowercase-and-uppercase) is making the 'near miss' conversions to the plaintext password supplied during login by the user before hashing, and then comparing those values to the single stored hash of the correct value. – PwdRsch Apr 03 '17 at 00:33
  • Is your suggestion that one should hash multiple versions of the password? – Anders Apr 04 '17 at 07:46
  • @Anders My suggestion is more a "Can this be done in UX without unacceptable infosec ramifications [i.e. stored plaintext passwords]?" initial exploration. I'm not conclusive enough to judge what is the right implementation. Though I did envisage storing multiple versions of the password, as you ask, as an initial approach to be typo-tolerant without compromising security. – Christos Hayward Apr 05 '17 at 12:28
  • @Anders As intended in my question, I regard it as an acceptable price to lose a *small* amount of entropy as the price of being more forgiving; users will do better with a slightly longer password that tolerates most typos than a shorter password with zero forgiveness; also people do better typing 29 characters of "correct horse battery stapler" than 19 characters of linenoise. – Christos Hayward Apr 05 '17 at 12:32
  • 2
    The concept of a typo-tolerant password violates the concept of why a password exists in the first place. In the same way that you don't necessarily need a large alphabet to generate a high entropy password, just high entropy randomisation and a decent password length, I can't see how a password should ever allow a typo. Password strength *relies* on the entropy that went into creating it. – Nick Bedford Aug 17 '17 at 05:15
  • @J.A.K. Please restate your comment as an answer so I can accept it. – Christos Hayward Sep 16 '17 at 16:04
  • Of course :). I will try to expand on it when i'm not on my phone – J.A.K. Sep 17 '17 at 20:02

3 Answers3

1

One method that could be used would be to generate "acceptable typos" upon initial password entry, then to store the hashes of all of these. There are some problems with this: you increase the time required for a password change and for password verification, especially if you're using a slow password hashing method, as usually recommended, and you increase the storage requirements by a potentially variable amount.

Assume that the password used is "Password1!", and that "acceptable typos" are the case of individual letters (or the presence of appropriate characters from pressing Shift and a number), the swapping of adjacent characters, and missing a single instance of a duplicated character (these are not the only possible typos, but from my typos, they're pretty easy to do!).

That means that all of the following would be acceptable:

Password1!
aPssword1!
Psasword1!
Password1! [transposing "s" and "s"]
Paswsord1!
Passowrd1!
Passwrod1!
Passwodr1!
Passwor1d!
Password!1
password1!
PAssword1!
PaSsword1!
PasSword1!
PassWord1!
PasswOrd1!
PasswoRd1!
PassworD1!
Password!!
Password11
Pasword1!

Given any reasonable password algorithm (or MD5), these will all have unrelated hashes:

2b4ae288a819f2bcf8e290332c838148
c435f7c0de60f0f69bce9f9a671e748a
8aed93fca5d9b2416c51fee36b03bc6b
9fbad358027d70c0db81fc23ab483c3c
dbf220c136f6b1f27c30997090da4697
c36fc47e48c2a52e450cd06f60bbd3bf

To verify a password in this case, you hash the input, and check all the possible values. You can even salt the input on an account level, and you only have to calculate one hash. There is a flaw though.

Knowing all of these hashes doesn't actually help an attacker directly, but they can apply the same rules as you did to any password list they have, and should any of the hashes match, they get access to the account - this applies if you use an account specific salt too. The trade-off for login speed helps them too.

Luckily, it's quite easy to defend against that, with a hash specific salt. If you use most implementations of bcrypt, you get this for free:

Password1!:$2a$04$GgLXwKnxLmnLV/UUtbYahuT9L8R.8/SoKAnRKZUyCuEEgfrX5ITLm
password1!:$2a$04$5Ol8psikd7h0Gt/ZWtQAve0PdA/LO4oub/0JMGC5AA/Rsup/3SBbG

Now they have to try all of the modifications against each stored hash. Depending on how you store your passwords, they might be able to optimise which variations they try against each hash, but since they don't know whether the correct password is "Password1!" or "pAssword1!", there is a risk of missing things if they do. However, when someone wants to log in, you need to do the same.

This is the trade-off - if you use a hashing method which takes 100ms to calculate, and need to calculate 1 value, it'll take 100ms to check. If you need to check 10 versions, it'll take 1s of processing time. If you do the checks one by one, you get a timing issue - fast responses mean the original password has been found, slower ones mean it has a typo. If you do them in parallel, you've increased the computing requirements for your system.

Using just the above list of "acceptable typos", you get a variable number of potential passwords. For "Password1!", that means you're storing 21 hashes, one of which happens to be a duplicate of the original password. If you require constant time output when logging into your system (this is a good thing), you either need to be able to calculate all these hashes in a given time, or to delay every login to the time it would take to calculate all the hashes for the longest password you support. If you don't, you've made a password length oracle.

There are hashing methods which allow for similar input to result in similar hashes, but these don't tend to have desirable properties for hashing passwords. They are generally optimised for finding similar parts of large data structures, so are high speed, meaning that large numbers of potential passwords can be tried quickly. They, by definition, don't have an avalanche effect, meaning that a potential attack could try common passwords, then pick the "closest" hash to the ones found for additional mutations. If an output hash from a mutation is "closer" to the stored ones, the input is also closer. With a cryptographic hash, even a fast one, changing a single character in the input is expected to change most of the output. There isn't a concept of "closer" in terms of output, so an attacker can't do better than random changes, hoping to get a match.

If you avoid the hashing issue completely, and go down the HSM route, it might be feasible to have a HSM which stores plain text passwords in an unrecoverable way, and which could produce responses of "original", "typo" and "no" using classical comparison methods. This might be acceptable if you allowed some actions to logged in users, but then required the correct password to be provided for high risk operations (e.g. changing the password), but it's not likely to be cost effective for most providers.

It's certainly possible, but there are a lot of potential issues with doing it, and it's unclear whether the UX benefits would justify the effort. It would probably be a lot easier to encourage the use of password manager applications, since they'll avoid the risk of typos in a lot of cases!

Matthew
  • 27,233
  • 7
  • 87
  • 101
1

The math to calculate the answer to your question is straight forward, but depends on how forgiving you intend to be.

For example, suppose you allow 1 character to be wrong. If you have 80 different characters allowed in your passwords, and your password is 10 characters long, instead of 1 password hash, you would need to store 1 + 79 * 10 = 791 total hashes. (Or compute them all on the fly if you don't store them.) Since good hashing algorithms are "slow", just verifying if a password is correct could become noticeably slower.

Perhaps you may want to limit the mistake to just 8 keys that are near each character. This certainly reduces the number of total hashes to check (to 81), but what about people that use a different keyboard layout?

Whatever algorithm you use, once you've calculated the number of hashes you would need to test, just divide 1 by that number to determine how less secure it is. In the first example, it is about 800 times weaker, which is about equivalent to shortening the password to 8.9(ish) characters instead of 10.

TTT
  • 9,122
  • 4
  • 19
  • 31
1

I think you're looking for hashes that cluster when the input is similar: en.m.wikipedia.org/wiki/Locality-sensitive_hashing

J.A.K.
  • 4,793
  • 13
  • 30
  • I'd comment that this approach enlarges the pie as opposed to allocating the same pie differently. Other responses are all or almost all based on a fairly superficial tweak of standard approaches that are not intended to be forgiving. This offers what looks like a genuinely better option, and it seems to open new possibilities (i.e. tolerating up to two typos instead of one doesn't seem like it would impact resource consumption terribly much). – Christos Hayward Sep 18 '17 at 20:16