4

I'm writing an application that needs to deterministically encrypt some data (where two equal ciphertexts will produce two equal plaintexts; this is acceptable and in fact desirable for this application), and I'd like to stay as far away from low-level crypto stuff as possible, as such I'd like to use nacl's SecretBox. The docs are very clear that the nonce parameter should never be reused.

Intuitively it makes sense to me that if I do reuse a nonce, but only ever for a given key/plaintext pair, then no information is revealed (other than that the plaintexts are equal, which in this case is desirable), since the attacker already has that exact information on hand. And at 24 bytes, it's considered safe to use random nonces.

So, I'd like to generate my nonce by taking an HMAC of the plaintext using the SecretBox key. My understanding is that an HMAC doesn't reveal any information about the plaintext or the key, and produces cryptographically-secure random output, and thus could be safely stored in plaintext for later use unsealing the SecretBox.

git-crypt (which has a fairly similar use case to my application) does something similar, but using AES in CTR mode, which leads me to believe this approach is likely sound, and that if I'm mistaken, it's due to SecretBox particularities and not the overall concept.

This seems straightforward to me, but I know cryptography can be anything but intuitive, so I'd like to check my understanding. Am I correct in assuming I'm safe generating my SecretBox nonces from an HMAC of the plaintext using the SecretBox key?

kelalaka
  • 5,409
  • 4
  • 24
  • 47
fe_alice
  • 41
  • 1
  • There is a recent design called [Daence](https://eprint.iacr.org/2020/067) that's based on this principle, but using Poly1305 instead of HMAC and taking rather more care with the details. – Luis Casillas Sep 10 '20 at 22:24
  • @LuisCasillas that uses lots of keys to achieve that. And, one cannot process in parallel. – kelalaka Sep 12 '20 at 18:52
  • @LuisCasillas also they use associated data (AD) to mitigate the same nonce of the messages are same. The AD must be differ. This AD may not be part of every protocol. Instead one can add a string at the end that behaves like AD. – kelalaka Sep 27 '20 at 18:58

1 Answers1

2

It is one of the options to choose a nonce. Using all of the messages than hash it ( or HMAC that will require a key). Note that in this case the same message will have the same nonce and therefore the observer will notice that.

Keep in mind that it will still have collisions, too, but those are not likely to occur. If you use SHA256 (or HMAC-SHA256) you need to trim the output to 24 bytes since the nonce size is 24 bytes. If we assume the output is evenly distributed than you will hit the same nonce in approximately O(2^96) messages with 50% probability.

Using all of the message is similar to SIV mode (SIV stands for “Synthetic IV”). In the future we will see more about the AES-GCM-SIV. SIV mode actually a nonce misuse-resistant authenticated encryption.

Another bad side of using all of the message is the performance. One has to read all of the message than encrypt. In other words, there is a double pass over the message. This is considerable especially the messages are big.

There are other options;

  • Random nonce: This require a good random number generator and there may not be possible for every place. So we dismiss this case. Note that the 24-byte random have negligible chance of colliding.

  • Use counter or LFSR and This is advised in NIST Special Publication 800-38D. This will make sure that a nonce is never occurs with the same key again. For each encryption increment the nonce ( or advance the LFSR). There is one problematic case for this once. During the system failures the current stage of the counter/LFSR may not be stored properly and this can cause the nonce usage. A mitigate is simply negotiate a new key.

    This is also has a drawback; it can giving away traffic information and that is costly for an attacker to collect this information.

  • Combining the random and counter/LFSR: if we split the nonce like 12-12 bytes and select the first 12 byte by random and use 12 bytes with counter/LFSR than we can also mitigate the system failure without changing the key. This still require a good random number generator. In this way, one can use the same key for 2^96 times.

If the performance is not issue you can use hashing the message while keeping in mind that the same message will have the same nonce. This is more common than it seem, considering forwarding a picture to the different people.

If the performance is the issue, the combining mode will be the choice.

kelalaka
  • 5,409
  • 4
  • 24
  • 47
  • this makes sense! in this case, it's a tool for encrypting values in a configuration file to be tracked in git, so "the same message will have the same nonce and therefore the observer will notice that" is desirable, so that diffs are understandable without decryption. this is a good overview, thanks! just out of curiosity, you mention "SHA256 (or HMAC-SHA256)". I assume without the HMAC construction one would just concatenate the key and the plaintext and hash that? That's what I had considered before reading that other apps use an HMAC for that purpose and assuming they had a good reason – fe_alice Sep 10 '20 at 16:32
  • yes, it is possible to use `key||message` but not necessary since if the message same then under the same key this will provide the same nonce if the key is different then there is already no problem. Remember a nonce and a key combination should never be used again. – kelalaka Sep 10 '20 at 16:35