Perfectly Robust Hashing Scheme, or Completely Over-Engineered?

Question

This is going to get long, so prepare. Basis of the question is, Do all these steps improve security, or am I completely overthinking the problem? Are my assumptions/thought process valid?

We all know there are many types of attacks a hacker can use to infiltrate a system, however, this deals with password hashing, so I am going to ignore other attacks like injections, etc. As far as I'm aware, there are 2 types of attacks that good password hashing can help prevent: offline brute-force attack and offline rainbow table attacks. In order to perform these, however, the attacker must first obtain access to the user-password file, where the passwords are stored as hashed versions, not plain text. A brute force attack involves guessing the user's password many, many, many times until the password he guessed matches the hashed password in the file. If you hash chain, the attacker must keep guessing down the chain, checking against the hashed version at that level until he arrives at the original password. Ranbow attacks are similar, but instead of guessing, the attacker uses a lookup table of pre-computed hashes based on commonly used passwords, words, and common variations. Rainbow attacks can be thwarted by using uncommon passwords or salts (password prefixes) which would not be contained in a pre-computed table. Instead, the attacker would have to compute this table on the fly which takes much, much longer. We can never prevent the attacker from guessing the password and checking to see if it is correct, but we can make it harder on him by making him guess many more times. We do this by chaining hashes so that whenever the attacker cracks the i'th hash, he must then repeat the process for the (i-1)th hash. However, if the attacker knows how many times we hash the password, this is mostly moot (see NOTE2). Anyway, on to the code:

function createHashedPassword(char[] username, char[] userPassword)
{
    char[] secretCompanyString = "Some Secret Company String Only Visible via Source Code";
    char[] companySalt = hash_sha512(secretCompanyString+username); //NOTE1
    memset(secretCompanyString[0], 0, sizeof(secretCompanyString)); //remove secretCompanyString from memory

    char[] userSalt = salt_generator_512();

    char[] hashedPassword = userSalt + hash_sha512(companySalt+userSalt+userPassword);
    memset(userPassword[0], 0, sizeof(userPassword)); //remove userPassword from memory

    int NumIterations = 4242+userPassword.length; //DOES MAKING THIS NUMBER PRIVATE/PASSWORD-DEPENDANT ENHANCE SECURITY? SEE NOTE2
    for(i=1;i<NumIterations;i++) //NOTE3
    {
        userSalt = salt_generator_512() + userSalt; //SHOULD I BE USING A NEW SALT EACH TIME? SEE NOTE4
        hashedPassword=userSalt+"$"+hash_sha512(companySalt+userSalt+hashedPassword); 
    }

    numIterations=0; //Remove number of iterations from memory
    memset(companySalt[0], 0, sizeof(companySalt)); //remove companySalt from memory


    return "$6$"+hashedPassword;
}

Note1: I believe that this is better than companySalt=secretCompanyString, because if a user can brute force his way into the top level of the chain (i.e. finding that companySalt+userSalt+hashedPassword leads to hashedPassword), he would see that it was in the form of companySalt+userSalt+hashedPassword, and the companySalt would no longer be secret. However, being user dependant, he would not be able to use it to help crack other passwords. He could then attempt to brute force his way into figuring out that the companySalt => secretCompanyString+username, but that would involve yet another (long) brute-force attack.

NOTE2: I believe making the number of iterations private would help a great deal because if the attacker knew how many times, all he would need to do is guess a password, hash it N times, and then check the result against the hash, instead of having to crack the umpteenth hash, and then have to work his way down the chain until he arrives at the password. If this value was public, I don't believe it would be any more secure to hash chain than it would be to only hash the password once. Am I right in thinking this? In terms of the second point, making the number of iterations password dependent should enhance security against brute-force attacks. Assume a user of the site gains access to the password file. He knows his username, password, and public salt. He can then brute force his way into getting the companySalt and number of iterations (assuming he knows that we use a company salt and chain hashing scheme). With that information, it would be easier, but still not easy, to then brute-force his way into another user's account. However, if the companySalt and/or number of iterations changes from user to user, this information would gain him nothing. Making it private/user dependant, I believe, should help prevent brute force attacks, not rainbow table attacks. This leads me into NOTE3.

NOTE3: I know we are supposed to hash chain, but why does this increase security? To slow down brute-force attacks by a factor of N? I'd assume it would also make using Rainbow tables infeasible because the rainbow table would have to contain all N entries, which would be practically impossible / impractically large.

NOTE4: After the attacker figures out we are using a company salt and brute forces his way into his first level, he would have built a huge table with guesses in the form of "companySalt+userSalt+XXXXXXX". If we used the same salt over and over, there is a possibility (however small) that in the process of guessing one link in the chain, he he already calculated a hash for another level. When he gets to that level, he can simply use his lookup table. But by pre-appending a new salt to the old salt each time, it breaks that pattern and makes it impossible for him to have already guessed that hash. He will have to start building the table from scratch. I am pre-appending here so that I can keep track of the hashes I used to make password checks possible.

I am expecting to get a bunch of answers discussing the inefficiency of the hashing algorithm (how long it takes, how much memory it will take to store the hashes, etc), but assuming I have a supercomputer w/ 800 petabytes of storage, and all I care about is maximum security, would all these steps help?

What you are going to get is a lot of comments on why you think your method improves on the standard crypto methods, and a lot of comments on not "rolling your own" — schroeder, Oct 29 '14 at 20:19
The first rule of cryptography is: don't roll your own crypto. The second rule of cryptography is: **don't roll your own crypto.** [Bcrypt/PBKDF2 does what you need already.](http://security.stackexchange.com/questions/211/how-to-securely-hash-passwords/31846#31846) If you're questioning why certain standards are in place, that link should answer them. — RoraΖ, Oct 29 '14 at 20:21
True. I tried to address that all in the question in my NOTES. I also don't believe I'm rolling my own because I'm not creating my own hash function..I'm using sha512. But I agree...I'm anticipating that as well. — dberm22, Oct 29 '14 at 20:22
You are creating your own password hash function, which is "rolling your own". It's not about making your own hash algorithm, but about securely applying it. — schroeder, Oct 29 '14 at 20:25
I think we got the point where we all agree on "rolling your own" we should try to -constructively- answer the question. — Herr, Oct 29 '14 at 20:26
@Schroeder Ahh, I see the distinction. Yup...going to get alot of those, then. I wonder if bcrypt employs all the steps I proposed. — dberm22, Oct 29 '14 at 20:27
Having opened a bounty, it might help if you explained what you found lacking in my answer. — Stephen Touset, Oct 31 '14 at 21:35
@StephenTouset First, I want to thank you again for your answer. I just wanted some more opinions. While you presented good information on what could make the hashing even better, you merely glossed over the details of the scheme I came up with. I commented my code and added notes to try to get a specific type of answer, and while you provided great insight, it wasn't exactly what I was looking for. I was just trying to get some more eyes/viewpoints on the matter. — dberm22, Oct 31 '14 at 21:54
See this regarding [memset](http://security.stackexchange.com/a/74281/396) to and securely erasing ram — makerofthings7, Jan 14 '15 at 00:29

Stephen Touset · Accepted Answer · 2014-11-03T16:56:51.033

This basically looks like something along the lines of PBKDF2 or sha512crypt, only with a bunch of "cryptographic voodoo" applied.

Salts have a very specific cryptographic purpose: to tie an password-guessing attempt to a single password instance. Having company-specific salts, user-specific salts, per-iteration salts, and (to steal a snark) hand-harvested, shade-grown, organic pink Himalayan salts are unnecessary and have poorly-defined (or undefined) security properties. While this doesn't necessarily create a weakness, in general, one should use the absolute minimum amount of cryptography necessary to solve a particular problem.

Having a variable number of iterations likewise doesn't improve anything. In the best case, all an attacker has to do is compare the hash after every iteration. While yes, this is slightly more work, it doesn't change the big-O complexity of cracking the password hash. In the worst case, the attacker knows the algorithm and so this does nothing to improve security. As always, Kerckhoff's principle is apropos: assume the attacker knows everything about your cryptosystem besides the key.

In some ways, your hashing scheme is overengineered. In other ways, it's underengineered.

For instance, the Catena password hashing function is one of the leading contenders for the PHC, an open competition attempting to spur the development of next-generation password hashing schemes and to choose one for the industry to standardize upon. It has several features yours lacks.

They recommend (based on my suggestion, actually) the use of BLAKE2b as the internal hashing function. This is a superior choice to SHA-2/256 or SHA-2/512 for most typical password-hashing scenarios, as BLAKE2b is as fast to compute in general-purpose hardware (e.g., typical server CPUs) as it is in specialized hardware (GPUs, FPGAs, or ASICs). This minimizes any advantage an attacker has over a defender, by ensuring that the defender can calculate a password hash roughly as fast on server hardware as a motivated attacker can do with dedicated password-cracking hardware.

Memory-hardness is another advantage of Catena. The point of doing a large number of iterations is to require a significant amount of work to compute a single password guess; this helps offset the relatively low amount of entropy in a typical password. However, modern algorithms do better than that. They can require a tunable amount of memory to compute a password hash. This can be an extremely effective defense against specialized multicore hardware. High-end GPUs (like the AMD FirePro D-Series can have thousands of cores. The D700, for example, has 2,048 cores and only 6GB of memory. If it requires 32MiB of memory to attempt a single password hash, the attacker can at best only use 192 of those cores. They also may be limited even moreso by memory bandwidth, slowing down cracking attempts further. A particularly challenging component of this type of feature is ensuring there are no favorable time-memory tradeoffs, where an attacker can reduce the memory requirement by using more CPUs (Catena has formal a proof of this feature).

Catena can also be used to increase the hardness parameters after the password has been hashed, without requiring the original password. This allows a site operator to dynamically increase the difficulty of hashing passwords as computers get faster, even for users that haven't been recently active.

There are other features that password-hashing schemes can implement that yours doesn't (or attempts to, poorly). Your approach to a company-specific salt appears to be an attempt to strongly tie a password hash to a secret. Much more cryptographically sound is to require a cryptographic key as input to your hashing function, and to HMAC the password with the secret key prior to running it through the iterative hashing. This ensures that the output cannot be computed by an attacker without getting access to a key. Storing this secret in your source code directly (as you suggest) is a security antipattern, as it grants access to this secret to any employee, contractor, or service with read privileges to your source code (e.g., GitHub or Travis CI).

Lastly, your scheme uses has poor cryptographic "hygeine" (for lack of a better term). Simply concatenating strings being input to hash functions is, while in this case not necessarily broken, a cryptographic antipattern. Cryptographers tend to prefer constructs like HMAC when combining secret data and user input, and prefer unambiguous encodings when combining multiple inputs (e.g., to disambiguate between H("test1" + "test2") and H("test" + "1test2")). These avoid some attacks against many classes of hashing functions, and while they may not be strictly necessary in this case cryptographers tend to be very conservative when it comes to allowing any potential source of attack, even if it's not currently obvious how it could be exploited; seemingly minor oversights in cryptographic protocols are exploited all the time.

TL;DR, password hashing is complicated. Don't roll your own. For now, use bcrypt, scrypt, or PBKDF2. And yes, you are definitely rolling your own even if you're not writing your own hash function; you're writing (effectively) a key derivation function, which is arguably a much harder cryptographic problem. As Thomas Ptacek once said:

I don't care what you write, but if you go into it thinking that the dangerous stuff is in the primitives like the AES core, and if you just stick to the glue you'll be safe, you're gonna have a bad time.

EDIT: To address your additional questions below:

The only purpose of a salt, cryptographically speaking, is to ensure that two identical passwords hash to different results. This tightly binds a password-cracking attempt to an specific hashed password, rather than to the password database as a whole. This should be the case even if you consider "the password database" to be the merged set of all hashed passwords across all systems globally — the work to crack one password should never be reusable to crack another person's password.

Given this definition of a salt, it should be clear that company-specific "salts", user-specific "salts" (e.g., one bound to a specific user rather than to the password itself), etc. are not in fact salts but are something else of your own invention. Storing one large, globally-unique salt (e.g., a randomly-chosen 128-bit value) is enough to ensure global uniqueness of that password.

Likewise, continually hashing the salt into each iteration does not contribute in any meaningful way toward the expected function of a salt. Hashing it in at the beginning is enough to ensure that an attacker cannot reuse any of the computation to attempt other passwords.

That said, there is benefit in hashing something additional in at each step. A theoretical weakness of iterated hashes is their ability to enter what's called a "short cycle". Essentially, we know that hashes have collisions (it's just impractically/impossibly difficult to find them). If you were to run a value through a 128-bit hash 2^128+1 times, you are guaranteed to have generated at least one collision (as you've enumerated the entire output space); this would put you into a cycle, where each successive hash output is the same as the previous time you hashed that value. However, statistically, it's expected that you do not need to go through all 2^128 values to generate a cycle; it's entirely plausible that there's some x such that x = H(H(x)). Or more generally, some x and n < 2^128 such that x = H^n(x). This is a short cycle.

To prevent a short cycle, we need to ensure that each invocation of the hash is guaranteed to be unique. A trivial way of doing this is including the iteration counter in the inputs to the hash; e.g., H(x || i). We don't know that x will be unique for each iteration of the hash, but we do know that i will be. This doesn't negate the possibility of a collision in hash outputs at some step, but if a collision does happen, we know that the next iterated output is still overwhelmingly likely to be unique.

This didn't address using a new salt each iteration, but other than that, what a well thought out and thorough answer. Thanks for taking your time! Hopefully you get a lot more +1's for such a long answer. — dberm22, Oct 29 '14 at 21:27
As you have 9 upvotes, I suppose your answer is the best I'm going to get. But would you mind explaining why per-iteration salts is not better? And also, you state that user-specific salts is "unnecessary and have poorly-defined (or undefined) security properties." I thought that you MUST have per-user salts so that if a hacker cracks one password, it doesn't help him crack everyone's. — dberm22, Nov 03 '14 at 13:07
I have updated the answer to include a section addressing these points. — Stephen Touset, Nov 03 '14 at 16:57
@dberm22 If you added a new salt to each iteration you would have to store each salt for each iteration. Otherwise you would be unable to verify a hash is correct. Depending on the number of iterations and users this could impact performance while not actually increasing security. — RoraΖ, Nov 03 '14 at 18:12
@raz I understand that. If you look at the pseudo code I posted, you can see that I am storing the salt each time. I even pointed out in my question (last line in bold) that it would take up alot of storage and effect performance. I am solely wondering whether or not it will increase security AT ALL. I am not asking about the performance of it. — dberm22, Nov 03 '14 at 19:37
I have addressed that point in my amended answer. Salts have a specific cryptographic purpose, and that purpose is not fulfilled any more through iteration-specific "salts". What you are proposing is *not* in fact a salt, but is something else. Its purpose is not well-defined, has effects on the process that have not been studied by actual cryptographers, and was not introduced for the purpose of mitigating an actual threat or weakness, whether theoretical or practical, so it should be removed. One should use the *minimal* amount of cryptography and complexity to solve any given problem. — Stephen Touset, Nov 03 '14 at 19:48
Thanks for your update. Just one last comment, and all my questions will have been answered. Doesn't using a new salt each time force the hacker to pre-compute multiple rainbow tables instead of just 1, for a given password? Or are you just saying that computing a complete rainbow table for a given salt is SO time consuming that all you need to do is force him to make a single one, and you're set? — dberm22, Nov 03 '14 at 20:52
You may want to re-read about rainbow tables. They are completely defeated by using per-password salts, as this ensures that no work can be shared across multiple password cracking attempts. In the threat model salts protect against, the salt and digest are assumed to be public information. — Stephen Touset, Nov 03 '14 at 21:01
Mostly spot-on answer. Something you're missing though is there is a use for application-specific salts. Really, it's not a salt - it's a pepper. It's useful if getting access to the passwords doesn't necessarily mean getting access to the application. — Dessa Simpson, Jul 09 '18 at 22:55

score 1 · Answer 2 · edited Mar 17 '17 at 10:46

I agree with others that you shouldn't try to "roll your own". You also want to avoid complexity.

For clarity, here is a worthy opinion on those issues, based on hard-earned wisdom. It is a quote from the winner of the Password Hashing Competition: in the Argon2 design document

Another problem with the existing schemes is their complexity. The same scrypt calls a stack of subprocedures, whose design rationale has not been fully motivated (e.g, scrypt calls SMix, which calls ROMix, which calls BlockMix, which calls Salsa20/8 etc.). It is hard to analyze and, moreover, hard to achieve confidence. Finally, it is not flexible in separating time and memory costs. At the same time, the story of cryptographic competitions [15, 18] has demonstrated that the most secure designs come with simplicity, where every element is well motivated and a cryptanalyst has as few entry points as possible.

[15] NIST, SHA-3 competition, 2007

[18] M. Robshaw and O. Billet, New stream cipher designs: the eSTREAM finalists. Springer, 2008, vol. 4986.

One subtle example of a problem with complexity is the risk of 'side-channel attacks'.

score 0 · Answer 3 · answered Apr 09 '18 at 15:29

Your solution is over-engineered because it focuses on making something already "impossible" even harder. If you properly use a good password hashing function (see Stephen Touset's answer) with salt and pepper, and a sufficient number of rounds, it is already infeasible to brute-force one of the password hashes to learn the pepper.

Perfectly Robust Hashing Scheme, or Completely Over-Engineered?

3 Answers3

Linked