2

I am in a situation where I need to harden a password hash, but not allowed to bring in any extra dependencies, so I am pretty much forced to do the one thing everyone seems to advise against - roll out my own implementation.

The SHA family, being considered fast hash, doesn't seem suitable for password hashing. And there is also the risk of loss of entropy from repeated hashing hashes.

The intent behind "slow hashing" seems to be increasing CPU time and memory requirements, so I conceived the following strategy.

  • the password is hashed via SHA-3 512
  • the hash value is fed to a mt19937 PRNG through a seed sequence
  • a long random sequence is generated to be rehashed
  • recurse the same for n number of times
  • the final hash value is derived from hashing all sequences inside out

That seems to introduce a fair configurable extra amount of work and memory requirement.

So my question is whether this is a reasonable strategy, and if so, what sequence lengths and recursion depths should be enough.

Update:

I ended up expanding the implementation a bit, by also incorporating a hashing algorithm at "random" for each step of the process, 12 in total - SHA2, RealSHA3 and Keccak with digest size of 224, 256, 384 and 512, using a clamped double as a selector.

All in all, compared to a single plain SHA3 hash, this implementation is about 100k times slower, and incurs an additional ~20 MB RAM cost over a rather deep and arbitrarily branching recursion to produce the final hash, which is what I can afford while still staying within reasonable limits considering the minimal target platform specs. But possibly just as important, this brings additional computational diversity compared to the naive "rehash n times" approach, using multiple hashing algorithms, PRN generation with fairly large internal state, std::seed_seq is always fed the full hash output and does additional "conditioning" of the values, and last but not least, adding in some double precision floating point operations to glue it all together, hopefully making the whole procedure GPU/ASIC unfriendly.

Hashing now takes about 500 msecs, up from 5000 nsecs, on a 4 Ghz i7 CPU. Given the total of 95 symbols allowed in a password, and assuming perfect scaling up to 8 threads, it would take this PC ~1450 years to try every possible combination for an 6 character password, or ~130 years for an 8 character password using 100k such CPUs, which seems reasonably demanding.

Is this considered "hardened" enough, and if not, how can I improve on it?

dtech
  • 129
  • 4
  • 1
    Normally, you just run the hash function multiple (hundreds, thousands) of times. Note that most languages will have some native hashing/encrypting built in - what one are you working in? And some of the _algorithms_ are public (and thus available for implementing), which means you aren't limited to SHA in that case. – Clockwork-Muse Nov 23 '18 at 22:31
  • I am using standard C++, no cryptography facilities in there yet. – dtech Nov 23 '18 at 22:36
  • 1
    Okay... why no third-party dependencies? For anything other than toy programs, the amount of work starts to approach astronomical. And is the injunction against libraries only (pre-packaged binaries), or is it against source code too? – Clockwork-Muse Nov 23 '18 at 23:21
  • 1
    @Clockwork-Muse This isn't necessarily true. Running one hash over and over can actually decrease security (though only by a few bits). It sounds like you're referring to PBKDF2, which uses HMAC to avoid that and is not as simple as repeatedly hashing one input. – forest Nov 24 '18 at 01:53
  • 1
    Would you be able to call some OS functions? I am pretty sure windows has some built in cryptographic functions. – DarcyThomas Nov 24 '18 at 09:25
  • @DarcyThomas that would constitute dragging in a huge dependency, and the application needs to maintain portability. – dtech Nov 24 '18 at 09:30

3 Answers3

3

There is one solution to this which does not require that you include other dependencies like Argon2. You can implement the algorithm used by OpenPGP, called String-to-Key, or S2K. This algorithm works by feeding the repeating password and salt to a hash function. The password and salt are repeated enough that it takes the hash function a non-negligible amount of time to process it. This is such a simple slow KDF that it can even be done using standard Linux utilities (in this example with SHA-256):

yes "$pass$salt" | head -c 50M | sha256sum

If you are ever able to bring in your own dependencies, I highly recommend listening to Future Security whose excellent answer lists superior alternatives to both S2K and your custom scheme.

forest
  • 64,616
  • 20
  • 206
  • 257
  • 1
    +1 for introducing me to S2K. Neat idea to make it memory-hard simply by concatenating the password and salt with itself until you have a large piece of data to hash. – Mike Ounsworth Nov 24 '18 at 02:37
  • 3
    @MikeOunsworth It doesn't make it memory hard, it just requires a lot of CPU time to process. You can easily [stream data to a hash](https://security.stackexchange.com/a/179675/165253) and so never use more memory than the internal state requires. In order to make it memory hard, you have to hash it non-sequentially in an unpredictable pattern so you have to have the whole thing in memory at once, which is what scrypt and Argon2 do. – forest Nov 24 '18 at 02:46
  • 3
    Oh. That makes sense too. I'm slightly less impressed with it then :P – Mike Ounsworth Nov 24 '18 at 02:49
2

It's worth it to add Argon2 as a dependency. Or bcrypt if you can't use optimized Argon2 and you manage to avoid all the ways bcrypt lets you shoot yourself in the foot. If you're left with no good option then give extra consideration to disallowing humans to choose their own passwords.

Losing entropy through repeated hashing isn't something you should be concerned with unless you truncate hash output between iterations or if you use a really bad hash function. It's not a problem for a secure function with large output. Only when two different passwords lead to chains of hashes merging will you lose anything. In other words, only when you have an accidental collision.

Using an RNG to generate new input is completely unnecessary. The output of cryptographic hashes is just as random whether you use random-looking or non-random-looking input. You might make things worse if someone could use a optimized hardware implementation while you had to use a slower software implementation. You are definitely making things worse if there is any flaw in the implementation or the seed is reduced to say, a 64-bit number.

Your specific method may permit a time-memory trade off. Some candidates in the password hashing competition, including Argon2, were designed to so that you can't halve the memory requirement if you're only willing to halve the speed. (Scrypt allows you to make such tradeoffs and that problem was one of the motivations to search for a better algorithm.) You might think doubling computation time isn't worth saving memory, but in the end you might end up with higher throughput, less power consumed, or less cost if you can buy cheaper or more efficient low-memory hardware and do more operations in parallel.

Whatever algorithm you could come up with probably would, at best, be competitive with PBKDF2. (PBKDF2 isn't great, but is simple enough to implement if you already have a hash function.)

If you use PBKDF2 or something like it you probably should not use SHA-3. SHA-3 hashes can be computed fairly efficiently but it's relatively slow on CPUs. That could advantage password crackers if they could use faster more efficient implementations. You would be better off using SHA-2-512, actually. Or Blake2.

Royce Williams
  • 9,128
  • 1
  • 31
  • 55
Future Security
  • 1,701
  • 6
  • 13
  • I've tried using hash output in the past as a substitute to prng sequences and result distribution wasn't very uniform, so it may be debatable whether or not hash output is "just as random". But more importantly, using by taking advantage of random sequences, I can get explicit control over how long those sequences are, effectively gaining the ability to fine control both memory and cpu time complexity, by controlling the sequence lengths and recursion depth respective. – dtech Nov 24 '18 at 08:26
  • Another thing I consider is randomly branching the tree rather than just using flat recursion. Random derived values can be used to determine the chance and depth of a branch, and I can also switch hashing methods for different branches. I can't bring in external dependencies, but I do have SHA 1 2 and 3 to work with. Seems like the arbitrary branching may provide some hardening against GPU and ASIC hardware attacks. – dtech Nov 24 '18 at 08:30
0

If you really cannot include any new dependencies or 3rd-party code (and really, your first step should be to push back on this limitation) then it is far better to implement a standard, vetted algorithm than to create your own algorithm. Then not only do you benefit from using an algorithm with known security, acknowledged by security experts and proven through years of use in the field, but you also have the opportunity to test against standard implementations or test vectors to be sure you didn't make any mistakes in the implementation that ruin the security, like accidentally only hashing up to the first zero byte or something. The easiest and least error-prone algorithim for you to implement is probably PBKDF2, since it mostly only needs a cryptographic hash primitive and source of randomness for the salt, which it sounds like you already have.

Ben
  • 3,846
  • 1
  • 9
  • 22
  • PBKDF2 seems to have very modest requirements, and being widespread, it has got to be a high value target for ASIC designs. There is value in a custom implementation, nobody is going to bother making custom silicon for one particular target. There is this dogmatic view that almost makes it sound like custom implementations are intrinsically bad. Implementations are not bad for being custom, but bad for being bad. Which is what this question seeks to avoid. – dtech Nov 24 '18 at 14:46
  • What you are hinting at here is security through obscurity, which is *always* a bad idea to rely on. A custom algorithm *is* intrinsically bad until it has been vetted through years of experts attempting to attack it. You won't get a definitive answer on whether your custom scheme is good from posting to this forum. – Ben Nov 24 '18 at 21:11
  • Consider how argon2 became the recommended method: first there was a years-long multi-stage competition among crypto experts and other security professionals to select an algorithm, then it went through a few years in the field among early adopters where minor weaknesses were found and fixed until it got to where it is today. – Ben Nov 24 '18 at 21:11
  • I don't rely on the implementation being obscure, it just happens to be obscure, and possibly an added bonus. If this subsite cannot really provide anything more than recommend established solutions, that would make it fairly pointless, as the answers that recommended such didn't really give me anything new that I didn't already get from a cursory web search. And BTW, this is not a forum, it is a QA site, there is a subtle difference :) – dtech Nov 24 '18 at 21:48
  • You very explicitly stated that your algorithm was better than the *industry standard* PBKDF2, because of its obscurity. This is FALSE and a DANGEROUS attitude to take. By all means code your own implementation of a standard algorithm if you must, but if you make up your own *algorithm*, all by yourself, you *will* screw something up. I was using "forum" in a more general sense as "place for discussing things" but it being actually a question and answer site makes it a *worse* place to vet a brand-new KDF algorithm, not a better one. DON'T. ROLL. YOUR. OWN. – Ben Nov 24 '18 at 22:43
  • No need to put words in my mouth, it doesn't win you any points.. not good points anyway. I didn't state my algorithm was "better", just that it had more computational complexity and memory footprint. I said that it isn't economically feasible to make a dedicated chip compared to PBKDF2 which is popular and has modest requirements, but again, I didn't state this makes my algorithm better. Also note that PBKDF2 is at this point almost 20 years old, and doesn't seem like it aged well. Also, no need to SHOUT.. Lastly, if people always went for the tried and true, we'd still live in caves ;) – dtech Nov 24 '18 at 23:41