Reverse-engineering password salts

Question

Lets set out a scenario, a database has been dumped and all the passwords are hashed.

You notice that some users seemed to have signed up twice and have the same hash. You run these hashes through your hash cracking program and nothing comes up, presumably because they are salted and you don't have the salt. But because users have the same hash on 2 accounts the salt isn't going to involve any random values, correct?

If the site was still vulnerable and you were able to create your own account and retrieve your own hash would you be able to reverse-engineer the salt given you know the password you signed up with?

Salt won't be part of the DB if it's done in code though surely? — user32045, Oct 15 '13 at 17:07
Then it's not a salt. The entire purpose of a salt is to be unique. — Stephen Touset, Oct 15 '13 at 17:30

score 8 · Accepted Answer · answered Oct 15 '13 at 17:15

Disregarding collisions, two users having the same hash value means that both of them have the same password, salt, and pepper. Consequently, this usually means one of two things

The site uses a fixed salt for all users: This renders the scheme to a pepper-only password hashing scheme (the "salt" is in the code). By means of brute force, and by employing knowledge of a plaintext password, you can find out the value of said "salt".

This, of course, can be done by registering another account of your choice and attempting to brute-froce something like hash(password + X), with X being the value you want to find out, and then comparing the output to the hash you extracted after creating your new account.

A site doing something as stupid as using a hardcoded salt (pepper-only) will probably be using something like MD5 or SHA-1, which will make your brute-force attack much faster.
The site uses non-unique salts derived from certain information from the user: (username, email, etc.) which you can find out by means of brute-force.

Since the database you extracted didn't contain any salts, it is most likely the case #1.

Note: Just because a duplicate hash wasn't found in rainbow tables, that doesn't mean it's an easy password. It might be the same user with two accounts using the same strong password.

Actually I think case #1 is still quite plausible: some sites use as "salt" the user's email address. It _is_ part of the database, but you usually don't look at it and think "mmh, this is a salt". This would explain why the same user, registered twice, gets the same hash. This can be tested for by registering a few accounts, with the same password but distinct names _and_ distinct email addresses. — Tom Leek, Oct 15 '13 at 18:05

score -1 · Answer 2 · edited May 27 '21 at 08:53

I'm going to do my best to tackle the entire problem at hand, being you want to perform user authentication in a secure way.

Firstly, if a particular "salt" is implemented as a non-variable piece of text in code, it is not a true salt, a true salt is random, and should also be stored in a corresponding column in the database table, or through another method if the salt is not to be exposed. It also could be generated from a piece of user-data, birthday (very horrible idea as a lot of people have the same birthday, please Google birthday attack), or something else, but would technically correlate to a pseudo-random salt (less secure). These points were mentioned by other users already, but nobody has mentioned why salts should be random, and there is also a general lack of information here on how authentication should be performed as a whole.

The way it should be implemented was not mentioned. For strictly password authentication, it should be done using a secure random number generator to create a salt the same size as the keyspace at a minimum (I don't know that there exists a use for generating a salt larger than that, and may not work depending on the implementation, or may be better in others, its going to be implementation dependent).

Anyhow, a salt can either appended or prepended (though, usually in practice is prepended), as long as the selection remains the same, unless a method of recomputing exists. If you're to do it any other way there is going to need to exist another variable piece of data describing how a string or character sequence alteration took place, unless you have a repeatable sequence like the inter-leaving of bytes, which would be very weird to see, and not necessarily more secure, since brute force attacks are going to be on the keyspace itself. Also, in many implementations all users would be required to create a new password if the selection of how to apply a salt was changed (in fact, I'm not sure if any exist to willy nilly change how a salt is applied, considering hashing is one-way). Anyways, in SHA-256 the keyspace is going to be 32 bytes of random information, (256-bits / 8 = 32 bytes, or 64 hex characters).

An interesting point is even with the 32 bytes of random information its not clear how to reverse the "masking" of the keyspace by a salt, at first glance. It would have to be done purely on a byte-by-byte basis, since random byte data won't always match up to either ASCII or Unicode characters that are readable by humans, even in Unicode some aren't readable, and many utilize more than one byte so Unicode is going to be useless if we're approaching the problem byte by byte. From a quick probability standpoint: if the salt is unknown this is roughly 2^256 * 2^256 * 2 possibilities (~2.68^154 which is beyond a googol, and over halfway to a centillion), not knowing the salt and not knowing whether it was appended or prepended without being privy to implementation details. If the salt is known then we're looking at at least 2^256 * 2 possibilities (~2.316^77, or 3/4ths the way to a googol).

Here is about where we run into the possibility of key collisions. In essence, its possible multiple values could map to the same thing and the previous two numbers are so incomprehensible, that it would be quicker to find a value that maps to the 2^256 possibilities within the keyspace on the first go, in other words, brute forcing every possibility within the keyspace, but salting changes things up. Given a random salt, a piece of plaintext maps to the keyspace differently. The salt is quite important. To brute force we can iterate through every possible value, performing a guess and test approach, while changing how the salt is applied, but only if we know the salt. Otherwise its useless unless we have knowledge of the implementation, or how to attack the SHA-256 algorithm itself. Adding a salt alters the parameters in a valuable manner. We just need to find some key, that when the salt is applied produces the hash itself, which should map to 2^256 possibilities and should matter how we apply a salt as long as we're consistent. But to think about it usefully, the salt application is going to change whether we're looking at the top or bottom of the haystack.

So, in a SHA-256 implementation, the fastest approach should be to test every possible value, unless another vulnerability exists, such as the discovery of a computable salt. For SHA-256, this should theoretically take longer than the age of Earth to reverse (about 10^65 seconds, assuming you can perform 10^12 or ~10 trillion hashes per second), but that metric is reducing as more powerful hardware becomes available. If a salt can be computer readily, then it becomes nothing more than a dictionary attack, unless the implementation details are strange. Moving on to computation power, the bitcoin network is to a total hashing power of around 163 trillion hashes per second average on a logarithmic scale from Bitcoin's conception, above that number, but far from what is needed to crack SHA-256. The entire bitcoin network itself is closing the gap, even though it would still take millenia, longer than the existence of Earth, and longer than the life expectancy of the known Universe (given we're supposedly around the middle point of the Universe's lifespan). Obviously, a different password storage technique should be used if this insane amount of hashing power in a known network continues to grow. Though, it's not yet proven to be needed at this point. However, since this post is strictly about hashing and salting, I'll continue, and then talk about some better methods at the end.

The way hashes are typically broken is by through the salt (if it can be pre-computed, say by idiot programmers like myself using a static value for a salt). In the case of random salting, you prevent a utility like hashcat from ever using rainbow tables (a known method of pre-computing) from being successful, as every user should have it's own random salt, even a user with the same username and password should have a unique hash and salt (theoretically, unless something 'whack' happens in the secure number generator, algorithm itself, or at the particular instances in time the hashes are created). That being said, its possible in a large user base that multiple users could still have the same hash. In this case, an e-mail is almost better if it could somehow be mapped to a full 32-byte keyspace, but we can't guarantee all e-mails are a minimum of 32-bytes and it doesn't protect against a bad actor trying an e-mail as a salt (or against anything ASCII-based, for that matter). The amount of "randomness" required here is left to business needs, I suppose.

However, following this kind of method, we're running into a security via obscurity kind of method, considering nothing prevents a hacker from assuming this knowledge, and anything math based should be better. If the salt is not meant to be exposed it should be held internally behind a firewall where the web-application is required to send a known request to the internal-side with some kind of network or transport layer security in place to prevent snooping. A way this could work is say by the application presenting a unique user ID and receiving a salt in return. And if this is the case, the web application itself needs to be hardened in a way to prevent snooping into the implementation, as well.

This prior concept could prevent against a straight up SQL injection attack when the attacker doesn't have internal access. Anyone finding themselves gaining access to the internal network is going to render it useless if they can figure out how to manipulate the filesystem, or the file encryption, or certificates required to access that data though its going to be hard to get to this point as it relies on manipulating the entire subsystem (getting past firewall, and any IDS/IPS in place, and not to mention logging that could take place on either endpoint of the firewall). Its possible in this circumstance the activity would be caught and measures would be put in place, but at the same time is likely to rely on human intervention that may not be a given.

So moving on to the fact SHA-256, and even SHA-512 are probably insecure (though no proof exists, yet), but they could also probably be used with 2FA or MFA, just fine, which I'll talk about shortly. Another level of security is appropriate systems administration, and security appliances. The common implementations today for password security are PBKDF2, Bcrypt, and Scrypt, which allow for the addition of key-stretching, which adds another layer of security to a hashing algorithm. Key-stretching essentially just takes a potentially weak key and strengthens it against brute-forcing methods. The only real important thing to note is that if a password is already sufficiently long, key-stretching can end up generating a shorter weaker key depending on the implementation. It also adds additional overhead in computing a hash from plaintext, especially in terms of network overhead if say it is deployed in a web environment.

If you're creating a non-critical application where a breach is somehow small beans and doesn't expose critical user data, then look no further. Otherwise, read ahead if you're interested in something more secure.

The only better methods at present beyond password security are 2FA or MFA, which don't necessarily increase the computational burden all too much and more truly identify a user. These ideas behind these methods posit that passwords are inherently weak in any form, hashed, salted, peppered, whatever, and suggests there are less manipulation-prone ways of proving identity. These methods require a user to present varying things for authentication i.e. implementing 2 or more of these concepts: something you have, something you know, something you are, somewhere you are, or something you do. If all of this information is followed a system should be secure unless a vulnerability in the subsystem itself exists and there is either a known, or zero-day exploit. At the same time, it's ridiculous to implement all 5 of these concepts as to decrease burden on the user to authenticate themselves. There is a fine balance in security and it's entirely dependent on an organizations need.

This really doesn't answer the question. All the technical details might be correct, but you made a massive assumption at the start and you have drifted very far from what was actually asked. — schroeder, May 27 '21 at 08:54
Your answer boils down to: "if the salt was implemented in this particular way, then it is infeasible that the salt could be reverse-engineered". But that disregards the details in the question. So, all-in-all, you have written a nice blog post, but this is not an answer to what was asked. — schroeder, May 27 '21 at 09:00

Reverse-engineering password salts

2 Answers2