I am in a situation where I need to harden a password hash, but not allowed to bring in any extra dependencies, so I am pretty much forced to do the one thing everyone seems to advise against - roll out my own implementation.
The SHA family, being considered fast hash, doesn't seem suitable for password hashing. And there is also the risk of loss of entropy from repeated hashing hashes.
The intent behind "slow hashing" seems to be increasing CPU time and memory requirements, so I conceived the following strategy.
- the password is hashed via SHA-3 512
- the hash value is fed to a mt19937 PRNG through a seed sequence
- a long random sequence is generated to be rehashed
- recurse the same for n number of times
- the final hash value is derived from hashing all sequences inside out
That seems to introduce a fair configurable extra amount of work and memory requirement.
So my question is whether this is a reasonable strategy, and if so, what sequence lengths and recursion depths should be enough.
Update:
I ended up expanding the implementation a bit, by also incorporating a hashing algorithm at "random" for each step of the process, 12 in total - SHA2, RealSHA3 and Keccak with digest size of 224, 256, 384 and 512, using a clamped double as a selector.
All in all, compared to a single plain SHA3 hash, this implementation is about 100k times slower, and incurs an additional ~20 MB RAM cost over a rather deep and arbitrarily branching recursion to produce the final hash, which is what I can afford while still staying within reasonable limits considering the minimal target platform specs. But possibly just as important, this brings additional computational diversity compared to the naive "rehash n times" approach, using multiple hashing algorithms, PRN generation with fairly large internal state, std::seed_seq
is always fed the full hash output and does additional "conditioning" of the values, and last but not least, adding in some double precision floating point operations to glue it all together, hopefully making the whole procedure GPU/ASIC unfriendly.
Hashing now takes about 500 msecs, up from 5000 nsecs, on a 4 Ghz i7 CPU. Given the total of 95 symbols allowed in a password, and assuming perfect scaling up to 8 threads, it would take this PC ~1450 years to try every possible combination for an 6 character password, or ~130 years for an 8 character password using 100k such CPUs, which seems reasonably demanding.
Is this considered "hardened" enough, and if not, how can I improve on it?