Securing hashes of short strings

Question

My understanding is that if I tell you that a given hash is the hash value of a single alpha-numeric character, then you can trivially identify what the source string is, by brute force checking the hashes.

However, if I tell you that the hash is based on a single a/n char again, but this time with a long unknown salt, then you're safe - you can no longer identify the char.

Now, suppose I give you a collection of 62 hashes and tell you that, collectively, those hashes represent the hashes of all 62 distinct a/n chars (a-z,A-Z,0-9) all hashed in the same way, with the same long unknown salt.

Is that secure? Or would that information be sufficient to reverse engineer some or all of the data?

Is this just theoretical or do I smell an [X-Y problem](http://meta.stackexchange.com/questions/66377/what-is-the-xy-problem) here? — , Mar 11 '17 at 10:27
:) Yes and no. I'm not trying to solve a problem per se - I'm trying to understand a scenario which currently makes no sense to me. The above question (which I AM also curious about academically in isolation) will inform the understanding of the larger problem. Once I've acquired better understanding of this particular aspect, I will likely ask about the larger problem separately. — Brondahl, Mar 11 '17 at 10:43
@JanDoggen Incidentally, I never said ... I went to ask the original question that lead me to this, and it turns out that it already exists: https://security.stackexchange.com/questions/4830/how-do-some-sites-e-g-online-banks-only-ask-for-specific-characters-from-a-pa — Brondahl, Apr 30 '17 at 15:42
I had been wondering whether the banks might simplying be storing hashes of individual letters ... whether that would actually be secure. Hence the question. — Brondahl, Apr 30 '17 at 15:43
Obviously there would be other issues with doing that ... all the variants on FrequencyAnalysis and KnownText attacks, that come with something that's approximately a substitution cipher — Brondahl, Apr 30 '17 at 15:44

Out of Band · Answer 1 · 2017-03-11T11:50:15.843

I think that yes, it's still safe, depending on how the "hashing" works.

You're not really talking about a salt value, since it is unknown. The scenario you describe fits a MAC better IMO. MACs require the input data (your single character) and a secret (your unknown "salt") and are designed to be safe despite the plaintext input data being known (since they are usually used to authenticate messages, where both the MAC and the plaintext message is known).

To be secure against attack, you can't just use a plain hash, such as

value = sha256(character || secret)

This wouldn't be safe. But if you use a primitive such as HMAC, it should be:

mac = hmac_sha256(character, secret)

Your requirements, if I understood correctly, are that it should be impossible for someone without the secret to determine, given

mac1 = hmac_sha256('0', secret)
mac2 = hmac_sha256('1', secret)
...
mac62 = hmac_sha256('z', secret)

which mac goes with which input character, and what the secret is.

MAC's guarantee that with a given input, such as '0', and mac1, there is no better attack on secret than brute-force, even if the secret is reused many times. So your secret is safe.

The question that remains is whether you can determine which input character belongs to mac1, mac2 etc if you are not in possession of the secret.

I think that's equally impossible, but I can't see how to prove the equivalence of that with the message integrity guarantees MACs make, so even though I don't think I am, I might be wrong.

I have some reasons for my beliefs:

Ideally, for hash functions, if you change a single bit in the message, that should influence all bits of the hash value. If I understand the way hmacs are calculated correctly, then that also goes for the secret, e.g. the secret influences every single bit in the mac output. So if there was a way to determine which mac belonged to which input without the secret, I think that would mean hmac is leaking information about the secret, and there should therefore exist a better than brute-force method to attack the secret.

But really you should ask a cryptographer, so crypto.stackexchange.com might be a better place to get a good answer.

Could you say a few more words replacing "This wouldn't be safe." with "This wouldn't be safe because ___ ___ ____." ? The length-extension attacks and other attacks on simple hashing -- that HMAC is designed to defend against -- don't seem to be an issue for this application. — David Cary, Mar 11 '17 at 16:30
I'm not sure enough about what's possible or not and decided to err on the side of caution, but I was in fact thinking about length extension attacks & co when I wrote that sentence. How about adding a second answer to elaborate on that point, and why it's not a problem? I'd be interested in your explanation. I'll upvote it if I understand it :-) — Out of Band, Mar 11 '17 at 16:40

score 1 · Answer 2 · answered Mar 18 '17 at 17:05

If I understand your requirements correctly,

It should be impossible for an attacker (without the secret key) who sees all 62 messages to be better at guessing which character Alice typed for any particular message than the 1/62 probability of an attacker who hasn't seen any of the messages.
It should be impossible for an attacker (without the secret key) who sees all 62 messages to guess the secret key.
It should be impossible for an attacker without the secret key, who has seen up to 60 of the messages, to actively calculate the correct MAC for either of the remaining 2 letters for which the attacker has not yet seen the MAC.

If I understand your proposed algorithm correctly,

Alice's machine has a single secret key that we assume the attacker does not have.
Every time Alice types in some alphanumeric character (a-z, A-Z, 0-9), the machine uses a cryptographically secure hash function to hash the concatenation of that character with that secret key, and sends the resulting message authentication code to Bob.
Alice never repeats a character (sends at most 62 messages with this particular secret key).
Bob discards all repeats.
Alice doesn't send the next message until after she learns that Bob has received the last message. (It's OK for Alice's computer to keep re-sending the packet containing that message until it hears the acknowledgement from Bob's computer).

This algorithm seems safe to me.

(I wouldn't use the term "salt" in this algorithm to describe that secret key. A cryptographic salt is, by definition, something that we assume the attacker knows, and usually is freshly randomly generated for each item. Neither one is true for this algorithm. ).

Because the string being hashed ( a single character concatenated with the secret key ) is a fixed length, it appears that any cryptographically secure hash will be sufficient in this algorithm -- it's not vulnerable to the length extension attack that HMAC defends against.

If we modify the algorithm so Bob doesn't discard repeats, it makes the modified system vulnerable to an active attacker using a replay attack.

If we modify the algorithm to allow Alice to repeat a character, because the process is deterministic, an eavesdropper Eve can easily detect when Alice repeats a character, because both times Alice types that character, it produces exactly the same MAC. (This makes the modified system vulnerable to known-plaintext attack ). An actual salt, freshly and randomly generated for each message, included in the hash function input, and also transmitted in plain text with that message, could be used to make the process non-deterministic enough to defend against this attack.

If we modify the algorithm so that Alice sends multiple messages before Bob confirms receipt, the modified system is vulnerable to an active man-in-the-middle re-ordering the outstanding messages.

Okay, I *think* I get it :-) Good summary of the requirements, too. — Out of Band, Mar 19 '17 at 00:54

score 1 · Answer 3 · answered Mar 18 '17 at 18:31

You effectively have a substitution cipher and the algorithm is no more secure by substituting a single character with a hash instead of another single character. If someone has to decrypt a single character or hash in isolation, then you could consider it secure, however repeated uses would probably allow frequency analysis and other attack to recover the plaintext.

Securing hashes of short strings

3 Answers3