9

I'm struggling to find good literature on Searchable Encryption. There are of course a few student papers written in LaTeX using Computer Modern that has some nice Greek soups in them, but none with any actual concrete examples. The same goes for videos on YouTube. The article in Wikipedia is scarce at best.

I have yet to determine which algorithm is currently the best (as of May 2018).

Does anybody know what is currently considered the best algorithm in this field that also works in practice? I would also take literature references.

Anders
  • 64,406
  • 24
  • 178
  • 215
HelloWorld
  • 303
  • 2
  • 10
  • How searchable does it need to be? – jrtapsell May 28 '18 at 14:22
  • @jrtapsell The database must be encrypted such that the server cannot decrypt it. If the user sends plaintext to the server containing the keywords, that's fine. I basically need to search for documents containing one or more terms. Order of terms is not hugely important. – HelloWorld May 28 '18 at 21:57
  • This question probably would have been a better fit on crypto.SE, but unfortunately it's too old to migrate. – Mike Ounsworth Aug 11 '18 at 20:51

2 Answers2

3

I was at this talk earlier this year and was impressed by the approach. The product is called EncryptedQuery which attempts to solve the (I assume related) problem of Private Information Retrieval. PIR is probably a stronger requirement than SE since with PIR, even the db server isn't allowed to know what you searched for or which record was returned.

My notes on the talk:

  • Motivation: cases where you want to keep your data access patterns private.
  • PIR is intertwined with the problems of Oblivious Transfer and Secure Multiparty Computation.
  • In 2016 NSA open-sourced a PIR software stack called PIRK
  • In 2017 PIRK project retired. The company Envieta took up the project and renamed it EncryptedQuery
  • It uses Paillier encryption, which is homomorphic in the sense that addition in plaintext space becomes multiplication in ciphertext space.

My understanding of the algorithm (probably wrong, though this made sense at the time) is that the requester encrypts a string

encr_req = encr(0 0 0 0 0 1 0 0 0 0 ...}

where say the kth column containing the 1 is the row number that it wants retrieved. Once this string is encrypted, the server doesn't know which column contains the 1.

Then the server iterates over the database doing

sum_i( encr_req[i] * encr(data[i]) )

Because Paillier is homomorphic and all but one of the plaintext values is 0, this is equivalent to

0*data[0] + 0*data[1] + ... + 1*data[k] + 0*data[k+1] + ...

So when you decrypt, you'll have your result.

decr( sum_i( encr_req[i] * encr(data[i]) ) ) = data[k]

Pros:

  • The server is able to handle retrieving items from the database by ID without knowing which ID was requested.
  • Bandwidth of the response is low: sum_i( encr_req[i] * encr(data[i]) ) is the width of a single database field.
  • Can be extended to request multiple items in the same query.

Cons:

  • Performance: for each request, the server needs to iterate over the entire database, encrypting every entry with the requester's public key.
  • All rows in the database must have the same bit-width (which you want to keep small for performance reasons on the encryption step).
  • Number of entries in the database, and their order, must be fixed and known to the requester.
  • For large or less structured DBs, use a more general (but less secure) Searchable Encryption.

TL;DR I'm not sure this actually answers HelloWorld's question about what's the best searchable encryption :/

Mike Ounsworth
  • 57,707
  • 21
  • 150
  • 207
  • So if I understand this right, this sounds like this is actually a method to hide what data you are returning, not search encrypted data.... or am I missing something? – Nosajimiki Nov 02 '18 at 14:54
  • @Nosajimiki I guess it depends on your definition of "search". My undernstanding is that PIR does "searching" in the sense that it "retrieves the record with a given ID even though the ID field is encrypted". If you're looking for "searching" in the sense of Google-style string matching, then yeah, that's a very different problem. – Mike Ounsworth Nov 02 '18 at 15:01
  • This is interesting, thanks for sharing. BTW if you can find a link to the talk please share it so we can watch as well. – Khaled Al-Ansari May 05 '20 at 17:50
2

It's scarce because searchable encryption is rarely practical.

Generally, it means you are either using a very weak cypher or you are decrypting everything as you go. The first case is bad because you can use the patterns in the encryption to break it with large enough of a sample. The second, more advisable method, is painfully slow because you have to decrypt your whole data set to run a single query. Either way, Searchable Encryption is only really practical with very small data-sets.

One example of where you would use it would be if you want to be able to search a single employee record for keywords. The Employee's ID would be unencrypted; so, you could use that to only query that person's records, then you could pass his whole record set to your application to decrypt. Then search the decrypted data and only output the fields you need from that.

That said, there is a lot of promise with sortable encryption as long as you are doing exact searches. Sortable encryption sets the scope of each new encrypted string between the ones it should sort between; so, lets say the following is true:

7iFA384S4BPmuXokR9rcMI37SKnClqnE = ant
E10ZJbnmvJHs3MOKkzDXw7sd037kLCUJ = cat
miHBVXxATe1Jg6G97ug86zv31BxrpRAa = dog
z0L9f8Py12euq9Nhy9Zk0e9L83F8MiBi = man

If I wanted to add "fox" to the list, then my encryption algorithm would return between "miHBVXxATe1Jg6G97ug86zv31BxrpRAa" and "z0L9f8Py12euq9Nhy9Zk0e9L83F8MiBi" resulting in something like:

7iFA384S4BPmuXokR9rcMI37SKnClqnE = ant
E10ZJbnmvJHs3MOKkzDXw7sd037kLCUJ = cat
miHBVXxATe1Jg6G97ug86zv31BxrpRAa = dog
Pe2624gcRjP6YGWOnhiW2xnRomAjDYQK = fox // sorts alphabetically between dog and man
z0L9f8Py12euq9Nhy9Zk0e9L83F8MiBi = man

This works because the first part of the encrypted string is just sorting information, and the second part is the actual encrypted data

SortingId(Pe2624gcRjP6YG), EncyptedData(WOnhiW2xnRomAjDYQK)

Once you have sorted encryption this means two things, one that you can sort encrypted data just as easily as unencrypted data which is pretty amazing unto itself, but secondly, it means that you can selectively decrypt. I don't personally know what databases actually do/don't support this yet, but in the above list, if I search for "man", and decrypt "dog", then I know the top 2 items are not man so I don't have to decrypt them to search them. This means that the larger your data set, the smaller % of your dataset you need to decrypt to find things.

Nosajimiki
  • 1,799
  • 6
  • 13