(None of the examples you give should be used for password hashing. They're much too fast!)
The first general principle I would mention here is that we want our cryptographic operations to be high level interfaces; there should be a common, reusable abstraction that defines:
- What preconditions the caller should fulfill before calling them;
- What arguments the operation takes;
- What values it returns, or what postconditions it fulfills when correctly called;
- What security properties the concept should offer.
In this case we're talking about password hashing. A password hashing function, in its simplest incarnation, should:
- Accept two arguments, a salt and a password;
- Return a hash code for that password/salt combination;
- Inherently consume a lot of resources so that it slows down a password-guessing attacker substantially, but not so much that an honest user
So SHA-256 fails the first and third points. It only accepts one argument, and is much too fast!
And actually, the interface described above is arguably not even the best one for password hashing. A better interface is as a pair of functions written on top of the password hash (in Python-ish pseudocode):
def generate_new_verification_code(password):
salt = crypto_random(16) # 16 byte random salt
hash = password_hash(salt, password)
verification_code = salt + hash
return verification_code
def verify_password(putative_password, actual_verification_code):
actual_salt = actual_verification_code[0:16]
actual_hash = actual_verification_code[16:-1]
putative_hash = password_hash(actual_salt, putative_password)
return putative_hash == actual_hash
So proper abstraction says that for password verification we should be using two functions like these, written on top of a resource-intensive password_hash
function, which in turn would likely be written on top of a low-level function like SHA-256.
Note that PHP's password hashing interface actually follows this latter design:
- The
password_hash
function is meant to be called with a password but no salt; it picks a salt randomly, and as the page says: "The used algorithm, cost and salt are returned as part of the hash. Therefore, all information that's needed to verify the hash is included in it."
- The
password_verify
function is used to check whether a putative password matches the verification string.
Second problem: one key idea in the design of security protocols or other security constructs is that attackers will often break the rules, and cause your code to be called in unexpected manners and contexts. For example, when people write something like this:
hash($salt.$pass)
hash($pass.$salt)
...they often implicitly assume that salt
will always be the same fixed length, and don't stop to consider whether some attacker might be able to:
- Find a way to "break the rules" so that they can cause the length of
salt
to vary across multiple calls to the code;
- Find some unexpected way to exploit this to their advantage.
For example, you might think it's impossible for an attacker to find a collision for passwords hashed this way, but in fact if they can control the salt
and pass
it's trivial to do it:
hash("0001"."Passsword1!")
hash("0001P"."asssword1!")
This might sound farfetched, and well, to tell you the truth, I can't actually think of a scenario where this might be exploitable. But we really would like our security to founded on grounds more solid than "I can't think of any way to break this." So as a general philosophy, it's safer to design things in such a way that ambiguities cannot arise. In this case, we would like the following rule to hold:
- If we hash two different passwords with two different salts, the inputs we provide to the low-level hash function should be different.
So this means that to produce the input that we feed to the underlying hash, we should prefer to combine
them with an injective function—a function such that combine(salt1, password1) == combine(salt2, password2)
if and only if salt1 == salt2
and password1 == password2
.
hash(combine($salt, $password))
Some ways of doing this:
- Pad one of the inputs to a fixed size, and prepend it to the other. HMAC does this internally for keys.
- Hash one of the two inputs and prepend it to the other. HMAC does this for keys that are longer than the underlying hash function's block size.
- Put an unambiguous delimiter between the inputs (possibly requires you to escape occurences of the delimiter in the input values).
Dedicated password hashing functions like PBKDF2, bcrypt, scrypt and Argon2 adhere to most of these ideas.