tl;dr: BCrypt is limited to 72 bytes, not 56.
Background
BCrypt is limited to 72 bytes. The original paper also mentions the use of a null terminator. This means you would generally limited to:
- 71 characters + 1 byte null terminator
But the BCrypt 2a revision specifies the use of UTF-8 encoding (while the original whitepaper refers to ASCII). When using UTF-8, one character doesn't mean one byte, e.g.:
- Noël is four characters, but five bytes (
N
o
e
¨
l
)
- is one character, but four bytes (
F0
9F
92
A9
)
- M̡̢̛̖̗̘̙̜̝̞̟̠̀́̂̃̄̅̆̇̉̊̋̌̍̎̏̐̑̒̓̔̕̚ is one character, but 74 bytes (with the null terminator included)
So this throws a wrench into how many "characters" you're allowed.
Where does 55 or 56 come from then?
The original whitepaper mentions a maximum key length of 56 bytes:
Finally, the key argument is a secret encryption key, which can be a user-chosen password of up to 56 bytes (including a terminating zero byte when the key is an ASCII string).
This was a misunderstanding based on the Blowfish maximum recommended key size of 448 bits. (448 / 8 = 56 bytes). The Blowfish encryption algorithm, which bcrypt is derived from, has a maximum key size of 448 bits. From Bruce Schneier's original 1993 paper Description of a New Variable-Length Key, 64-Bit Block Cipher (Blowfish):
The block size is 64 bits, and the key can be any length up to 448 bits.
On the other hand, the bcrypt algorithm can (and does), support up to 72 bytes for the key, e.g.:
- 71×8-bit character + 1× 8-bit null terminator
The 72-byte limit comes from the Blowfish P-Box size, which is 18 DWORDs (18×4 bytes = 72 bytes). From the original bcrypt whitepaper:
Blowfish is a 64-bit block cipher, structured as a 16-round Feistel network [14]. It uses 18 32-bit subkeys, P1, ..., P18, which it derives from the encryption key. The subkeys are known collectively as the P-Array
The canonical OpenBSD implementation will truncate any key that exceeds 72 bytes.
This means that if your UTF8 string that exceeds 72 bytes, it will be truncated to 72 bytes.
Warning:
- this truncation will remove the null terminator
- this truncation will even happen mid-character (for a multi-code point character)
For example, if your passwords ends with:
"…stapler"
the UTF-8 encoding for BCrypt will be:
══╤══╤═══╤═══╤═══╤═══╤═══╤═════╤═════╤═════╗
... 63│64│ 65│ 66│ 67│ 68│ 69│ 70 │ 71 │ 72 ║ 73 74
s │ t│ a │ p │ l │ e │ r │ 0xF0│ 0x9F│ 0x92║ 0xA9 \0
══╧══╧═══╧═══╧═══╧═══╧═══╧═════╧═════╧═════╝
|
cutoff
This means that in the canonical OpenBSD implementation, the bytes are cut off inside the middle of a character (even if it leaves you an invalid utf-8 byte sequence):
══╤══╤═══╤═══╤═══╤═══╤═══╤═════╤═════╤═════╗
... 63│64│ 65│ 66│ 67│ 68│ 69│ 70 │ 71 │ 72 ║
s │ t│ a │ p │ l │ e │ r │ 0xF0│ 0x9F│ 0x92║
══╧══╧═══╧═══╧═══╧═══╧═══╧═════╧═════╧═════╝
Get rid of maximum length
In recent years, it has been recognized as a good idea that a password hashing algorithm shouldn't have any maximum limit. But there's a problem with allowing a client to use an unlimited password:
- it introduces a denial of service attack by someone submitting a multi-gigabyte password.
This is why it's now becoming common to pre-hash a user's password with something like SHA2-256. The resulting base-64 encoded string, e.g.:
n4bQgYhMfWWaL+qgxVrQFaO/TxsrC4Is0V1sFbDwCgg=
will only ever be 44 ASCII characters (45 with the null terminator).
This is the approach taken by Dropbox, and is included in bcrypt.net:
BCrypt.EnhancedHashPassword("correct battery horse staple Noël M̡̢̛̖̗̘̙̜̝̞̟̠̀́̂̃̄̅̆̇̉̊̋̌̍̎̏̐̑̒̓̔̕̚");
This means that your expensive hashing algorithm won't cause you a denial of service.
Beware of unsalted pre-hashing (aka Password Shucking)
In the last few years a ..."weakness"... has come up with pre-hashing a password:
hash = bcrypt(sha256("Tr0ub4dor&3"));
The issue is that you are effectively turning your code into something like:
hash = bcrypt("SEhuFRToQjRv9AWx5F9EBZroJhnyMG+Z0JQNyzhukfc=");
At first you might not see a problem. In order to bruteforce the final bcrypt hash they still have to know your password:
$2a$15$0l9xFKCmFa4rK2jp.otOKO6c2DQToijZ2IB5kvjA3hfqpT.XLK49S
SEhuFRToQjRv9AWx5F9EBZroJhnyMG+Z0JQNyzhukfc=
The issue is that attackers aren't going to try every possible combination of letters, numbers and symbols. Attackers are going to be using dictionary attacks, and passwords from previous breeches. So rather than trying every possible password, they're going to focus on trying previous breached ones. E.g.:
- hunter2
- correct horse battery staple
- password1
Now imagine they start including a breech that didn't contain raw passwords, but instead was the SHA-256 of user passwords:
- f52fbd32b2b3b86ff88ef6c490628285f482af15ddcb29541f94bcf526a3f6c7
- c4bbcb1fbec99d65bf59d85c8cb62ee2db963f0fe106f483d9afa73bd4e39a8a
- 0b14d501a594442a01c6859541bcb3e8164d183d32937b851835442f69d5c94e
It turns out that they got a match with:
- 48486e1514e842346ff405b1e45f44059ae82619f2306f99d0940dcb386e91f7
They don't know what the password is, but do know that it was hashed with SHA-256 - an extraordinarily fast hashing algorithm. They will then spend all their energy now trying to bruteforce:
- 48486e1514e842346ff405b1e45f44059ae82619f2306f99d0940dcb386e91f7
back into
It's not a weakness - but a shortcut. Like all dictionary attacks are a shortcut.
The answer, like everything else, is to use a salted prehash, for example:
hash = bcryt(hmac_sha256(salt, password));
And make the salt you pre-hash with the same salt that you generated for the bcrypt.