0

I think I really misunderstand something about data encryption. This guide (https://docs.mongodb.com/manual/core/security-client-side-encryption/#randomized-encryption) says the following:

"Encrypting the personal_information and phone_numbers fields using the randomized encryption algorithm encrypts the entire object. While this protects all fields nested under those fields, it also prevents querying against those nested fields.

For sensitive fields that are used in reading operations, applications must use Deterministic Encryption for improved read support on encrypted fields."

The guide mainly talks about using a randomized algorithm vs using a deterministic algorithm. I think I understand the differences, using a random algorithm would create different outputs for the same input, and using a deterministic algorithm creates the same output for the same input all the time. But why would you want to store anything that you don't want to or can not query later? Why would you then use the random algorithm?

Soufiane Tahiri
  • 2,667
  • 12
  • 27
szeb
  • 3
  • 3

1 Answers1

0

But why would you want to store anything that you don't want to or can not query later?

It looks like you are mixing up "querying against those nested fields" with "retrieving data from the database". The first is about doing a query to find records matching specific criteria, i.e. get the user with a specific email address. The latter is about being able to retrieve a full object back from the database.

With randomized encryption each instance of a specific email address will have a different encrypted value. It is unknown what value this will be without known the random initialization vector used in encrypting the object. Since it is impossible to know the expected encrypted value it is impossible to efficiently search for it. Instead the whole database would need to be scanned, every object decrypted and only then one could check the email address inside the object.

With deterministic encryption all instances of the same email have the same encrypted value though. That's why the encrypted value is known and it is possible to efficiently search using the encrypted value, i.e. no need to decrypt every object in the database first.

Steffen Ullrich
  • 184,332
  • 29
  • 363
  • 424