Is this hash function safe?

Question

Below is the hash function implementation of the roguewave library. It returns a 32 bit hash. The core operation is A = B ^ ((C << 5) | (C >> 27)). Is this hash safe to use as a password check or is it possible to retrieve all used B's by reversing it?

const unsigned RW_HASH_SHIFT = 5;

inline static void mash(unsigned& hash, unsigned chars)
{
  hash = (chars ^
       ((hash << RW_HASH_SHIFT) |
        (hash >> (RWBITSPERBYTE*sizeof(unsigned) - RW_HASH_SHIFT))));
}

unsigned
RWCStringRef::hash() const
{
  unsigned hv       = (unsigned)length(); // Mix in the string length.
  unsigned i        = length()*sizeof(char)/sizeof(unsigned);
  const unsigned* p = (const unsigned*)data();
  {
    while (i--)
      mash(hv, *p++);           // XOR in the characters.
  }
  // XOR in any remaining characters:
  if ((i = length()*sizeof(char)%sizeof(unsigned)) != 0) {
    unsigned h = 0;
    const char* c = (const char*)p;
    while (i--) 
      h = ((h << RWBITSPERBYTE*sizeof(char)) | *c++);
    mash(hv, h);
  }
  return hv;
}

This is not safe for a password hash. 32-bits of check is not nearly enough for hashing passwords. It doesn't matter that I don't know what the password is, I can just continuously generate a bunch of strings until a collision occurs. And collisions will occur quite a bit in 32-bits of space. — RoraΖ, Aug 22 '14 at 12:40
`sizeof(char)` is 1. Always. By design. By definition. Don't compute `sizeof(char)`. — curiousguy, Aug 22 '14 at 14:48
@curiousguy Any sane compiler would macro `sizeof(char)` to 1 and optimise out the `imul` anyway. I personally think `sizeof(char)` is a good thing to include, as it implies that the size of the type is relevant. That way, if you move to `wchar_t` or similar later to gain Unicode support, it's obvious that you need to change that calculation too. — Polynomial, Aug 26 '14 at 11:04
@Polynomial "_that the size of the type is relevant_" of which type? Why not `sizeof (sometypedef)`? — curiousguy, Aug 27 '14 at 17:19
@curiousguy I don't follow. Surely using `sizeof(char)` is the most descriptive term? — Polynomial, Aug 27 '14 at 22:56
@Polynomial `1` is a most descriptive term than `sizeof(char)`. — curiousguy, Aug 28 '14 at 19:26

score 10 · Accepted Answer · 2014-08-26T11:07:14.360

This is extremely unsafe, to the point of being pointless:

Your hash function is not a one-way function. One can instantly (with constant and low runtime) calculate an input producing any given hash if you allow arbitrary 4 character passwords as inputs by undoing the XOR with the initial hash value formed from the password length. With a little ingenuity, it is likely that such a preimage can be very effectively generated even if the input character set is restricted.
Even if your hash function was a cryptographically strong one-way function, it would not be a good password hashing function because it is fast. Ideally, you want a key derivative function that, like PBKDF2 or scrypt, has a configurable work factor allowing you to adjust the difficulty level for your attacker (and the workload for your server) to new conditions over time.
Your hash is so far on the short side that it is hard to call it secure even if everything else were fine. As Andrey pointed out in an older answer, any 32-bit hash can be guessed in (typ.) 0.5 * 2^32 attempts. To not be a security bottleneck, this number should always be greater than what it would take to guess the password. Yet 32 bits of entropy is so low that a similar number (28 bits) works to illustrate a weak password in the famous xkcd password cartoon.

score 8 · Answer 2 · answered Aug 22 '14 at 13:15

8

32 bit hash function cannot be possibly safe for the purpose of password verification.

Problem here is that it is "easy" to find a colliding password, that is, a password, that hashes to the "correct" hash value despite being different from the original password. On average it will take 2^31 password trials to get such collision, which is considered very weak by today standard.

answered Aug 22 '14 at 13:15

Andrey

2,226
16
14

How do you conclude that on "average it will take 2^31 password trials to get such" preimages? – Aug 22 '14 at 14:38
2

If output of hash function is uniformly distributed wouldn't it take on average N/2 tries to find a preimage (*N* being a number of possible outputs)? – Andrey Aug 22 '14 at 14:56
Yes, if it were uniformly distributed, or if the attacker was forced or opted to use a generic attack that would work even if it were, then N/2 attempts on average is correct. This hash function looks like it may take approximately one trial (since it looks, well, invertible). – Aug 22 '14 at 16:48
@pyramids Yes, I've seen your excellent answer and agree that collision can be found faster for this particular function. IMO, function isn't invertible in the sense that it won't allow recovery of the *original* input (simply because it's lossy), but it seems to allow for fast preimage computation. – Andrey Aug 22 '14 at 16:56
@Andrey: No, that's if there were N possible _inputs_. – Aug 22 '14 at 16:58
@RickyDemer Can you explain/link to explanation please? – Andrey Aug 22 '14 at 17:04
@Andrey You're right: I have used terms, both "invertible" and "collision" carelessly (copying previous mentions of the later term). Instead, we should probably use "pre-image" (or "not resistant against a pre-image attack"). – Aug 22 '14 at 17:07
With N possible _inputs_, the number of tries needed before trying the original input is distributed uniformly on {1,2,3,...,N-1,N}, and the mean of that distribution is (N+1)/2. Thus, in that case, the number of tries needed to find a preimage is at most (N+1)/2. With N possible _outputs_, each possible try has at least a 1/N chance of being a preimage. If those events are independent, then then the expected number of tries will be the sum of i*(1/N)*((1-(1/N))^(i-1)) over positive integers i. By the geometric series formula, that sum is N. – Aug 22 '14 at 17:55

Is this hash function safe?

2 Answers2