Initial idea
bcrypt(sha1(password));
Problem #1 - Null Termination Problem
The first reason you don't want to do that is because SHA
, or any hashing algorithm, puts out bytes. And many programming languages do not have proper String types; and instead simulate strings with a series of characters followed by a null (i.e. \0
)
terminator. If your hash digest contains a null
, the bcrypt algorithm might see the \0
character, and assume that's the end of the string:
- bcrypt(sha1("fsdf3hgfh2faff32f"))
- bcrypt(
96 87 0f 9e 71 ff 62 57 55 00 b6 5c 91 07 64 6f b5 81 13 a9
)
And with C and PHP, if you blindly treated the digest as a "string", then your "string" would look like:
- bcrypt("–‡žqÿbWU
\0
¶\‘doµ©")
causing some bcrypt implemetations to cut off at the \0
null terminator:
This is known as the null termination problem
Solution
Your implementation may be immune to this; or it may not. So lets not tempt fate. You can pre-hash the password, but be sure to base-64 encode the digest first:
- bcrypt(base64(sha1("fsdf3hgfh2faff32f")))
- bcrypt(base64(
96 87 0f 9e 71 ff 62 57 55 00 b6 5c 91 07 64 6f b5 81 13 a9
))
- bcrypt("locPnnH/YldVALZckQdkb7WBE6k=")
Problem 2 - Hash Shucking
The next issue is has to do with dictionary attacks.
An attacker isn't going to bruteforce every possible password:
- aaaaaaaa
- aaaaaaab
- aaaaaaac
- ...
Instead they're going to use dictionaries, previous password breeches, and password that follow the rules that certain stupid corporations insist upon (e.g. password complexity policies).
- hunter2
- password
- Tr0ub4dor&3
- 12345
- qazxsw
- zxcvbn
The whole point of bcrypt is that it is still hard to brute-force all these dictionary words. But the fact remains that there are still these lists, and it can dramatically shorten the search space.
But imagine there was a password database breech, and fortunately the web-site used SHA-1 to store all their passwords, and one of the breeched SHA-1 hashes was:
96 87 0f 9e 71 ff 62 57 55 00 b6 5c 91 07 64 6f b5 81 13 a9
They don't know what the original password is, but at least it's something they can add to their dictionary list. And if your web-site does pre-hash with SHA-1, then suddenly they can try:
- bcrypt(base64(
96 87 0f 9e 71 ff 62 57 55 00 b6 5c 91 07 64 6f b5 81 13 a9
))
If it matches, it means that they have the SHA-1 hash of someone's password. And since SHA-1 is so easy to compute in hardware, they now have an SHA-1 hash they can try to bruteforce.
This problem is known as 'Hash Shucking'.
Solution
What you want to do is be sure to salt the password hash:
- bcrypt(base64(sha1(password+salt)))
This way the "password hash" will never appear in any other global database.