Securing hashes of short enumerated values

Question

The system manages and stores sensitive data of short strings.

Because the sensitive data is of an enumerated type with limited set of well known values, the attacker could easily iterate all the possible values to generate a rainbow table and utilize dictionary based attack.

Thus the data must be salted before hashing to countermeasure these threats. Of course the "salt" must also stay secret and be cryptographically secure. The salt must be common for all the records, as the application will do hash lookup on the input, i.e. the hash must be deterministic.

What are best practices to deal with hashing of enumerated data type, including the key management of the "secret salt" and forward secrecy aspects? We have planned to utilize HSM in the hashing to store the secret.

In theory, it should be no different from how you would [securely store a password](https://security.stackexchange.com/questions/211/how-to-securely-hash-passwords/31846#31846). The only difference is that you may want to increase the difficulty factor of the algorithm given the relatively small search space. — Mr. Llama, Jul 09 '18 at 14:54
How small is the search space, and what is the purpose of the data? — AndrolGenhald, Jul 09 '18 at 15:00
@AndrolGenhald brute forcing the search space is matter of seconds — Tuomas Toivonen, Jul 09 '18 at 15:09
@Mr. Llama but with the password salting the salt is always random and stored with the hash. In this case, the salt must be constant and absolutely not stored with the hash. — Tuomas Toivonen, Jul 09 '18 at 15:11
@TuomasToivonen I'm having trouble figuring out what your use case is. If you're ok storing a hash instead of the data, you must not need the data itself as hashes are irreversible. So what are you planning on using the hash for? — AndrolGenhald, Jul 09 '18 at 15:33
If the salt is constant for each record, then it's not a very salty. No matter how many hash iterations, if there's only one salt, then creating a lookup table is feasible. It would only take one pass of the entire search space to crack all of the hashes. — Mr. Llama, Jul 09 '18 at 15:35
@AndrolGenhald The system will consume messages containing the enumerations in plaintext, which are hashed and compared to all hashes stored in the database to find a match — Tuomas Toivonen, Jul 09 '18 at 15:52
@TuomasToivonen Why must salt must be constant and not stored with the hash? — zaph, Jul 09 '18 at 16:55
@zaph obviously so the attacker wouldn't have to just brute force the hashed value space to find a match. — Tuomas Toivonen, Jul 09 '18 at 17:35
If you are using a salt as a key, don't call it a salt, call it a key. Misusing terminology just causes confusion. What you are describing is an HMAC, a hash with a key. — zaph, Jul 09 '18 at 18:51
What do you mean "will consume messages containing the enumerations in plaintext"? Why can't an attacker just grab those messages? Why would they even care about this hashed storage at that point? — jpmc26, Jul 10 '18 at 01:09
Related: see PostgreSQL page about "pseudo_encrypt": pseudo_encrypt(int) can be used as a pseudo-random generator of unique values. https://wiki.postgresql.org/wiki/Pseudo_encrypt it is a kind of encryption without keys and also gives links to similar algorithms but with a key: Skip32, XTEA, and Pseudo encrypt constrained to an arbitrary range. — Patrick Mevzek, Jul 10 '18 at 03:00
So as an attacker, can't I submit a message with each value and then match my messages against everyone else's messages to discover their values? — user253751, Jul 11 '18 at 00:26

AndrolGenhald · Accepted Answer · 2018-07-09T21:52:15.613

The salt must be common for all the records.

This is known as a "pepper", not a salt.

the attacker could easily iterate all the possible values

If the search space is small enough that it could be brute-forced even with a high cost hash like bcrypt, then you're relying entirely on the secrecy of the "pepper" to prevent brute-forcing. In this case, you might as well use an HMAC with a secret key instead. Using an HSM to store a random key and handle the HMAC is about as good as you can get.

Just be aware that because you require the ability to look up the hash, the same data will always have the same hash value. As Kevin's answer goes into more detail about, duplicated hash values may be correlated with other data to leak information.

While it would be ideal for the hashes to be irreversible, that's simply not possible with a small message space. The best you can do is to make sure only the HSM can be used to reverse the hashes. This will help against data leaks, but you must still take care to make sure the application layer can't be brute-forced.

You were faster and wrote it better, but I think you want a salted HMAC. Otherwise you still get same results for same data. — DRF, Jul 09 '18 at 15:55
@DRF I would have agreed a few minutes ago, but TuomasToivonen's latest comment indicates he needs to be able to look up the resulting hash, so the determinism is necessary. — AndrolGenhald, Jul 09 '18 at 16:02
@TuomasToivonen Yes I see it too. Though that really should be added to the OP and not left in a comment as it does change the answer. — DRF, Jul 09 '18 at 16:07

score 2 · Answer 2 · answered Jul 09 '18 at 15:54

2

[This more of a long comment than an answer]

You have tagged the question encryption, but are talking about salts and hashes. As you point out, hash functions are ill-suited to protect a small message-space (typical example: messages are either "Yes" or "No"), so you have to bolt on all these salts and peppers and HMAC keys.

However, proper encryption, specifically block ciphers, are designed to be secure even over a "Yes/No" message space. Is there some reason you can't use actual AES encryption? (for bonus points, store the AES decryption key on the HSM).

answered Jul 09 '18 at 15:54

Mike Ounsworth

57,707
21
150
207

1

Because there must be absolute no way for *no one* to reconstruct the original plaintext. – Tuomas Toivonen Jul 09 '18 at 16:05
1

Ah, then you should clarify that in the question, and maybe remove the [tag:encryption] tag. – Mike Ounsworth Jul 09 '18 at 16:06
1

Although, given the small message space, you'll have a hard time convincing me that someone with access to the salts / peppers / HMAC keys can't reconstruct the original plaintext in a reasonable number of guesses. – Mike Ounsworth Jul 09 '18 at 16:09
@TuomasToivonen If you have both salts and key you will always be able to reconstruct it. That's inherent in the problem. Can't help that. – DRF Jul 09 '18 at 16:27
As an aside saying that block ciphers are secure over a "Yes/No" message space is VERY misleading. I'm sure you would agree that a communication channel between two parties where they encrypted each bit independently properly padded using a secure shared key and AES would not be a good idea. There is (almost) always more to cryptography than the underlying block cipher. – DRF Jul 09 '18 at 16:37
@DRF Yes, crypto is hard. "... with appropriate assumptions about IVs and padding" blah blah. There are libraries that do this properly. Point is: encrypting predictable strings is more of a solved problem than hashing them is. – Mike Ounsworth Jul 09 '18 at 16:43
@DRF Care to elaborate? With a suitable mode and random IV I think it's reasonable to say that proper encryption is "secure" over a 1-bit message space, even though "secure" is a bit vague. – AndrolGenhald Jul 09 '18 at 16:43
@AndrolGenhald With a suitable mode and random IV sure. But that's exactly the point. Ofcourse by adding a mode and an IV you just nicely got rid of the 1-bit message space and also broke the solution for the OP's use (you lose determinism). The point I was trying to get across is as Mike says, Crypto is hard. Thinking "hey I want to protect the answer key to this quiz so I will encrypt each answer with AES cause it's secure for a small message space" is bad, and I doubt everyone will think ahh I should use a salt or random IV even though I'm not encrypting anything long. – DRF Jul 09 '18 at 16:54
@AndrolGenhald Also the OP already says in a comment just here "Because there must be absolute no way for no one to reconstruct the original plaintext." That's not something he can ensure until he throws away the key or salts. Any F(X) which is short enough to be useful will also be brute-forceable if the space of (X) is small enough. – DRF Jul 09 '18 at 16:59

score 2 · Answer 3 · answered Jul 09 '18 at 21:24

I'm going to take the short answer: you can't do what you want. Not really.

You want to store a series of easily guessable values in the database, encrypted, so nobody who breaks your database can know what they are... but you want to be able to search the database for that term. Which means every sensitive 'tag' has to encrypt to the exact same value.

Okay, how about an example.

Name     MedicalStatus
----------------------
Kevin    Dying
Bob      Alive
Charlie  Dying
Diana    Alive
Elaine   Alive
.... followed by 10k more rows of 'Alive' or 'Dying'

... how much more secure is it to have:

Name     MedicalStatus
----------------------
Kevin    dk3jnnd832jj3fd
Bob      cx32d89dh32gf1x
Charlie  dk3jnnd832jj3fd
Diana    cx32d89dh32gf1x
Elaine   cx32d89dh32gf1x
... followed by 10k more rows of either 'dk3....' or 'cx32d...'

You don't have to even "decrypt" the values. You just have to guess one - you did say they were easily guessible, after all - and you cracked every other matching entry in the table. It doesn't matter how overboard you go trying to obscure those values and how 'secure' of technologies you use, it's going to be pretty obvious from the attacker's point of view what's going on.

(Heck, if they're anything like me, they'll view it as an enjoyable puzzle; obligatory https://xkcd.com/1286/) Or, if they're lazy, they'll just create a few records in the database using the app layer to see what their values got stored as.

So, that said... what can you do?

Option A: Ditch the fast query requirement. You can use actual salt (not pepper) and make the entries secure using a reversible encryption algorithm. But the downside is, like you realized, that the searches would have to decrypt each entry during a search.

Option B: Security by Obscurity. Yeah, you know this option's going to be ugly (obscurity security usually is.) But you could create a field, and call it a Hash - but merely XOR its data over the sensitive column to scramble it. When doing searches, you no longer search the field itself, but the XOR combo. It'd mean you could no longer do index seeks (you'd have to settle for index scans) and it's not exactly a strong setup... but it's at least better than having all the same entries having the exact same values.

Option C: Only have the hashes searchable to the person performing the lookup. Aka, hash the entries using a key that's unique to each person. Granted, this means that there can't be any global lookup; but it would at least allow the ability for a user to find their records that match.

Option D: Rethink and Redesign. Seriously, your requirements really aren't jiving here, and I have a feeling you're going to end up making a system that's not secure at all. Your goal here isn't "Make A System That Follows Requirements X and Y As Securely As Possible." Your goal is, "Make a secure system that tries to fulfill requirements X and Y." If you can't achieve the objectives securely, you shouldn't do it at all.

Since OP said "brute forcing the search space is matter of seconds" I assumed it was at least in the 10,000s, but I'm worried you may be right... and even if the search space is in the thousands it's probably not randomly distributed. Good answer. — AndrolGenhald, Jul 09 '18 at 21:54

score 1 · Answer 4 · answered Jul 10 '18 at 00:50

I believe without complicating your life with management of pepper/salt/other hashing mechanisms and since you have indicated you are going to deploy HSM, it is better to go with proper encryption to secure your enumerated values.

HSM is a cryptographic computing hardware module with protected key store works on a well-defined interface protocol and hard to compromise.

Possible symmetric/asymmetric keys you could use are:

Symetric: AES, 3DES, Blowfish, etc
Asymetric: RSA, DSA, Diffie-Hellman, etc

HSM could help you in many ways including generates keys, key transportation, protects key storage, handles key, provides key backup/recovery functions, etc.

Securing hashes of short enumerated values

4 Answers4