0

What, if anything, can be used to provide confidentiality and authenticity guarantees for the storage keys used to store data in a key-value store?

To be clear what I mean by the storage keys - I mean the "key" in "key-value" store - I do not mean an encryption key.

The key-value store is something like Amazon S3 and is not trusted.

The storage keys have to be considered to be just as sensitive as the value objects.

But the storage keys are used to retrieve data from the store so they must also have a construction which is repeatable.

I'm using PHP and libsodium (via ParagonIE's Halite) to encrypt the data and ideally I would use the same library to construct the storage keys.

A related question mentions HMAC, but I don't know if that is useful.


Based on the answers given so far, the following could be a good solution:-

use \ParagonIE\Halite\Symmetric\Crypto.authenticate
use \ParagonIE\Halite\Symmetric\Crypto.encrypt
use \ParagonIE\Halite\Symmetric\Crypto.decrypt

// put
map.put(
  authenticate(storage_key, secret_signing_key),
  encrypt(concat(storage_key, value), secret_encryption_key)
)

// get
ciphertext = map.get(
  authenticate(storage_key, secret_signing_key)
)
plaintext = decrypt(ciphertext, secret_encryption_key)
// ensure that the key that was hashed equals the key that was encrypted
assert(equals(storage_key, substring(ciphertext, 0, len(storage_key)))

return substring(ciphertext, len(storage_key)) // just the data

jah
  • 390
  • 2
  • 10
  • Is not clear your question, but if you question is if sensitive data stored on a key value system (redis, dynamo, etc...) should be encrypted the response is yes. – camp0 Jun 03 '19 at 19:32
  • @camp0 thank you for your comment: I have tried to make the question clearer. Is it clear now? – jah Jun 03 '19 at 19:38
  • What are the specifics for key authenticity? – Future Security Jun 03 '19 at 20:36
  • Should `secret_encryption_key` in `encrypt(concat(storage_key, value), secret_encryption_key)` inside your `map.put` call be `...secret_signing_key...`? – TripeHound Jun 04 '19 at 13:10
  • @TripeHound No, but thank you for prompting a review because I'd copypasted the wrong key into the decrypt call in `//get` – jah Jun 04 '19 at 14:33

2 Answers2

1

If I understood your goal correctly, consider following. Since you want to have "confidentiality and authenticity" of the keys, encrypt them. If you choose an encryption method that is not fast enough, consider using something other instead of map keys and store tuple of (original encrypted key, original value). For instance use a hash of the original key or, if this is critical for you from the security point of view, use a KDF key as a map key. So, consider following approaches:

map.put(encrypt(key), value)

map.put(hash(key), tuple[encrypt(key), value])

map.put(argon2(key), tuple[encrypt(key), value])

Of course the map should be able to deal with collisions.

mentallurg
  • 8,536
  • 4
  • 26
  • 41
  • Thank you for this comment, it helps. I think something along the lines of your second approach is what is needed: `map.put(hmac(k), encrypt(concat(k,v)))`. – jah Jun 04 '19 at 08:23
1

If you're dealing with just one trusted computer and any number of key-value-store servers, then confidentiality is pretty easy. You want to use a PRF with a long enough output to prevent collisions. (HMAC with at least 256 output bits, for example.) The trusted computer can replace writes of each pair (k, v) with (hmac(secret_crypto_key, k), v). Likewise with queries for k.

By definition, PRF output is going to look random to anyone that doesn't know the key. Output is a deterministic function of the (secret) key and message (lookup key). It also isn't possible to determine PRF output without the secret key.

This is a one way operation. For HMAC you can't reverse the process even with knowledge of the secret key. That's not a problem for key-value stores unless you want to enumerate the original lookup keys.

If it is, then deterministic encryption is necessary. That's a very tricky thing to do. I don't think it is directly supported by libsodium. However, encryption can leak the length of the key unless the original lookup keys were padded to all be the same length. PRFs don't require fixed length inputs because all outputs have the same length.


If untrusted parties must be able to make queries without any trusted party, then there isn't a way to maintain confidentiality unless you know all original lookup keys come from a distribution with high min-entropy. (Which means that the most probable lookup key is still very unlikely. Unlikely enough to make guess-and-check attacks impossible.)

This same issue is present in unsalted password hashing. Applying a one way function is sufficient to protect users with unrealistically strong passwords. It doesn't matter for them whether the function used is fast or doesn't use a salt. Users with typical passwords, on the other hand, are even more vulnerable since password crackers only need to hash each candidate password only once to build a lookup table capable of targeting the entire user base.

You have the same problem with key confidentiality if any person can make the conversion between original lookup keys and confidential lookup keys. It doesn't matter if encryption or hashing is used.

One serious and common mistake is believing applying a one way function to data is sufficient to anonymize entries. Examples of low min-entropy values that people try to anonymize this way include names, dates, timestamps, street addresses, email addresses, and credit card numbers. Guess-and-check methods are an effective way to deanonymize things because the universe of possible inputs is too small or too predictable.

Someone like Amazon can build a database of matches to deanonymize lookup keys. There is no good way to work around that for low min-entropy inputs. Not even using a slow hash algorithm is a particularly good hack for this kind of thing.

Future Security
  • 1,701
  • 6
  • 13
  • Thank you @future-security for this detailed answer; I think your first three paras. describe the desired solution. Would you agree that the `hmac()` equivalent in sodium would be `crypto_generichash` https://download.libsodium.org/doc/hashing/generic_hashing#single-part-example-with-a-key ? – jah Jun 04 '19 at 08:51
  • @jah It uses Blake2, which is a really good algorithm. It isn't the same as HMAC but is actually preferable. Blake, Blake2, Skein, SHA-3 (KMAC) are all acceptable algorithms with built in keyed modes which make HMAC unnecessary. Both these keyed modes and HMAC (for older algorithms like SHA-2) implement a PRF. – Future Security Jun 04 '19 at 16:07
  • Regarding the iteration of encrypted keys, your answer says that deterministic encryption is necessary. Is there deterministic encryption that also preserves the lexicographic ordering of the plaintext key? – CMCDragonkai Jun 02 '22 at 00:53