1

I have some sensitive client data that needs to be hashed, but I also need to check that that data isn’t duplicated by another client. So the hash function needs to produce the same value for the same data so I can search the db for duplicates.

One option is brcrypt with a constant salt but that isn’t very secure.

Any ideas?

ps. we are hashing a short string that could be thought of as a password for the purposes of this.

  • That is a totally different question and now more unclear. Why do you store the password? There are tons of Q/A about using password hashing algorithms like SCrypt, Argon2id, Ballloon hashing for login systems. Do you want to derive keys etc... – kelalaka Feb 11 '20 at 13:19
  • Better if you change your question back to "files" for the benefit of others, and create a new question if you can't find the answer you're looking for. I would be very surprised if you can't find the answer regarding short strings. – Kind Contributor Feb 11 '20 at 23:00
  • @kelalaka the original question didn't mention files, I just added clarification that I'm asking about a string that is sensitive and could be thought of a password for the purposes of this question, – Channing Walton Feb 12 '20 at 12:02
  • data is a generic term. if it is a password then dupe of this https://security.stackexchange.com/q/211/86735 – kelalaka Feb 12 '20 at 12:06
  • Does this answer your question? [How to securely hash passwords?](https://security.stackexchange.com/questions/211/how-to-securely-hash-passwords) – kelalaka Feb 12 '20 at 16:27
  • No. I need to hash a small string provided by a user in such a way that I can search for duplicates of that hashed string from other users. – Channing Walton Feb 13 '20 at 13:29
  • "but that isn’t very secure" - why is that? – schroeder Feb 13 '20 at 19:57
  • @schroeder I read somewhere that using a constant salt is not a good idea - not sure why but perhaps it makes it things easier to crack if you have access to lots of samples (like the database). – Channing Walton Feb 14 '20 at 20:25

1 Answers1

4

"Secure" for file hashing is very different to "Secure" for password hashing.

When password hashing, you usually have a "small" string, like "password123". When someone is trying to break the password, they go through small strings and get longer until they find a "collision". Bcrypt and other "slower" choices help to slow down brute-forced password breaking, by making the algorithm more memory/CPU bound with a linear chain of hashing cycles so that GPU optimization doesn't give a significant speed boost.

For file hashing, unlike short password strings, the files are relatively huge. There's no practical way that files could be "brute-forced" to finding a collision. So an algorithm like bcrypt doesn't add any meaningful security benefit.

Therefore, SHA (and even MD5) are "secure" for file hashing. I would tend to choose a hashing algorithm that's CPU/Memory efficient and outputs a large hash string to reduce random chances of a collision with another file. A recent edition of SHA hashing algorithm is probably the best choice.

You might also transmit the length of the file along with the hash for further reduction of risk of a collision, however, that might not be valid for your situation, where revealing a file size could be saying too much.

(Note: I assume the communication of the file hashes occurs over an encrypted transport like TLS)


Looking at the other answer from kelalaka, it's a great answer, but I don't agree with a couple of points, so to clarify:

1) I don't believe salt is necessary. That's for password hashing, and further slows down the possibility of creating a universal rainbow table. However, again, this is necessary because the password is so short.

2) I don't believe that any form of signing (HMAC) is necessary. For one, that makes hash comparison impossible. Usually a signature accompanies the file bytes, the verifier may hash the file themselves, then check the signature. But also, a hash is already secure enough to disquise the data in the file - that's what it does.

  • Thanks for your answer. I should have been clearer in my question. I need to hash a small string, a bit like a password, rather than a file. I've updated the question. – Channing Walton Feb 11 '20 at 11:33
  • [MD5 ?](https://en.wikipedia.org/wiki/MD5#Applications), [SHA-1](https://shattered.io/). What it the attacker somehow access the file and produce a new file with the same hash? I only said that using only one salt is not going to protect you if you are vulnerable to rainbow tables. Why do you think that HMAC makes the comparison impossible? The server has the key to create the HMAC not the user!. I suggested that if the input space is small. If small the attacker can search the space from unkeyed hash function in only they have no access to HMAC key. – kelalaka Feb 11 '20 at 13:35
  • 1
    Note: OP changed the question from file hashing to short-string hashing. – Kind Contributor Feb 11 '20 at 22:57
  • @kelalaka there is no known way to access a file and produce a new file with the same hash. You cannot create rainbow tables for files. "why do you think that HMAC makes the comparison impossible" - I never said that. – Kind Contributor Feb 11 '20 at 22:59