Can I trust a security hash implementation after testing it with random inputs against another implementation?

Question

Let's say I want to use a security hashing algorithm, like bcrypt, and I want to use a young bcrypt implementation, e.g. called libfancybcrypt, instead of an well established implementation.

Of course, I can simply generate a few thousand or millions of random strings, hash them with libfancybcrypt and with the old, well established library, and compare the hashes in the end. So assume, I've done that and the new library in question produces the same result as the well established one for all random inputs.

My question has two parts:

Assume the library author can be trusted. Given my random input test above: how likely is it, that the author accidentally introduces a bug with the effect that there are inputs for which a wrong hash is calculated?
Assuming the library author cannot be trusted. Given my random input test above: how likely is it, that the author has purposely introduced a backdoor of some kind?

Related but still different:

How much can we trust open source implementations of crypto (security) libraries?

Do you remember de case of the [Pentium FDIV bug](https://en.wikipedia.org/wiki/Pentium_FDIV_bug)? It is safe to assume Intel had done millions of tests and still the bug went unnoticed. The very same can happen with a hash implementation. Even with millions of tests there will be edge cases that are not tested. — Jeff, May 06 '17 at 13:59
if it matches several test inputs, i find it unlikely to break on others. — dandavis, May 07 '17 at 03:17

Steffen Ullrich · Accepted Answer · 2017-05-06T10:13:13.827

Assume the library author can be trusted. Given my random input test above: how likely is it, that the author accidentally introduces a bug with the effect that there are inputs for which a wrong hash is calculated?

Being trusted does not imply that the author of the software is a competent programmer which knows all possible pitfalls. This means if you rely only on trust you cannot know the likeliness of introducing a bug. And since such bug might only occur in rare cases like a race condition or some integer overflow there is no guarantee that you will trigger the bug with your random test cases.

Assuming the library author cannot be trusted. Given my random input test above: how likely is it, that the author has purposely introduced a backdoor of some kind?

If the backdoor gets only activated with a specific input you will never find out without thoroughly inspecting the code. And even code inspection might not help, see the examples from the Underhanded C Contest. This means it is possible to introduce such a stealth backdoor. But again, you cannot give a specific likeliness for this solely based on the information that the author is not trusted.

Can I trust a security hash implementation after testing it ...

Based on the previous observations this question can not be definitely answered. Apart from that it depends a lot on what you use the library for: if it is just for getting some checksums in order to detect accidentally corrupted data your tests might be enough. If is used instead for purposes where software failure might lead to death, leakage of top secrets or malware infection of critical infrastructures than such tests are probably not sufficient, especially if you don't trust the author.

martinstoeckli · Answer 2 · 2017-05-06T20:12:32.633

As long as the resulting hash is equal to the trusted one, the resulting hash can be trusted. Immediately this leads to the following questions:

Can all resulting hashes be trusted?

There could be implementation errors, which only occur in specific situations, an example could be uncorrect handling of \0 characters. If you test with enough random input, and the hashes are always equal, this is very unlikely though. Planning for a specific password to result in an uncorrect hash won't help the attacker.

Another problem specific to password hashing is the generation of the salt, it could be done with a random source which is not cryptographically safe, which could make cracking easier. This can be reviewed relative easily be yourself.

Can there be any side effects?

Independend of whether the resulting hash is correct, the code can do whatever it wants, this is mostly a problem in the scenario of an evil author, but there are also problems with a careless implementation.

There are easy to detect attacks , e.g. a hash algorithm should never do IO operations or access the internet of course.

Much harder to detect is code which can be exploited, maybe a certain input provokes a buffer overflow on purpose. So while the resulting hash is actually safe, an attacker could misuse your process to attack the server. This is not related to hashing though, it applies to all 3rd party libraries.

Another easily overseen side effect is, that if the code reads from /dev/random instead of /dev/urandom, it could drain the random source and block the server if it is used excessively.

➽ I myself prefer to use an untrusted library which offers a safe algorithm, instead of using an unappropriate algorithm from a trusted library. Of course it depends on the importance of your service, and it is always a good thing to check the source yourself (hash algorithms are not this much code).

Can I trust a security hash implementation after testing it with random inputs against another implementation?

2 Answers2