Which is the preferred way of encrypting Personal Identifiable Information?

Question

What is the preferred way to implement personal information encryption / decryption? After some reading, the main options appear to be:

Encrypt/Decrypt it at the database level
Encrypt/Decrypt it at the backend level
Encrypt/Decrypt it at the client level

Encrypting it at the database level, PostgreSQL offers PGP Encryption options which seems to allow encryption of the data but still being able to index it for searching if my understanding of PGP is correct? From what I've read so far, the commercial database vendors all have this as well. If the encrypted data can be indexed, is it really impossible to reverse engineer the indexes to get the encrypted data back? Also, the encryption / decryption then happens at the database level which is going to put overhead on the data base server. Also, what happens to the encryption / decryption keys, is this something that needs to be fed to a database or does databases handle this itself? (just wondering about how these keys should then be backed up if the database is managing this)

Encrypting it in the backend before inserting the data into the database and then decrypting it again seems like a very scalable way of doing this, adding more backend servers is cheap. If encryption were to happen in the backend, how does one do database queries for example to find all names like Jo%_n or all telephone numbers starting with 084? Encryption / Decryption key management, if someone gets the keys, then they can decrypt your database entries, what are the options to manage these decryption keys? I'm assuming something like AWS KMS is an option here and then restricting access to that KMS instance? Should one only store a master key in KMS which then decrypts all other keys allowing one to decrypt data with different keys for each company (assuming a multitenant setup)?

Last option, how feasible is it to encrypt data in the browser itself in which case the backend or database would have zero knowledge of what's in the data? Obviously this means zero chance of allowing searches on the data, unless there's still a way to do this? (I'm assuming password managers do this)? How are the keys managed in this scenario, I'm assuming the same key used for encryption should then be used for decryption too or is it feasible to have multiple decryption keys for the same data? (if the backend does the encryption with a private key and each user is issued a public key, then decryption can happen client side? This sounds like a very scalable way to handle lots of database reads with few inserts)

To sum up, what are the pros and cons of database, server and client side encryption / decryption to protect PII and what are the things to keep in mind for each option?

score 2 · Answer 1 · answered Feb 23 '22 at 04:26

What you've asked for is a hard problem, and the answer is going to be up to you.

First, it should be recognized that merely allowing searching of PII data can itself be a serious security risk. Data breaches have resulted from people abusing query functions to retrieve data they shouldn't. Is there a valid business reason for allowing a search for "J%n", or for phone numbers that start with "8675309"? You're much better of not allowing searches at all, and indexing the encrypted PII data using only a reference key based on the client ID.

If the business reason is "what if a user forgets their ID", a very acceptable solution is email recovery: "enter your email address and if you're in our DB, we'll send you a login link." Internally, you store a table with only one column, containing bcrypt()ed hashes of email addresses (identical to using bcrypt() to protect passwords), and if there's a match you send the ID to the address they entered. That's a perfectly valid approach that doesn't require an actual search of the encrypted PII in your database.

If you decide that there is a case where you allow searching, figure out what data can be safely searched, under what conditions, and determine "who" is authorized to search for "what". That should help you identify which fields should be encrypted, and which fields need to be in cleartext to serve as searchable terms. For example, you might decide it's OK to have a postal code be searchable so your marketing team can do geographical mapping of clients, while still protecting names, phone numbers, birthdays, or other PII. But be warned: researchers are quite good at "deanonymizing" this kind of data - the more you expose, the less it will be secure. (A quick search for deanonymizing will turn up a large number of papers where this has been successfully done.)

As you noted database encryption allows for searching, but comes with the risk that someone who is authorized to access the database may be able to access the entire database. The plus side is that the encryption in the database should be coded to prevent exposing the key, and its security has likely been well-tested in the case of most of the popular SQL engines.

Encrypting the data at the backend level moves the encryption keys to a different system. It also will likely require custom coding, which could easily introduce subtle flaws in the encrypted data that may have unintended security consequences. But, it may better serve as the rules engine for "what gets encrypted, what can be searched". A backend can also host various security measures, such as "rate limits" (also known as "anchor dragging") to help prevent attackers from executing thousands of searches per second to extract all the data over time.

Encrypting at the client solves the problems of your systems having access to the client's data. The client's keys have to be protected, but most mobile OSes have a facility for secure key generation and storage. If you're going that far, however, why bother having the sensitive data stored server-side at all? If the client's app stores their encrypted PII locally, a breach of your servers won't even have the sensitive data. For example, the client could send the shipping address to your systems only when they click "buy", and you could discard it as soon as you receive a tracking number from your logistics provider.

It all comes down to your business needs, your organization's security capabilities, and your organization's appetite for risk.

I suppose it then comes down to what is considered PII for each region. If it's PII, it can't be searched which makes perfect sense. Thanks for the great answer! — Jan Vladimir Mostert, Feb 23 '22 at 08:27

Which is the preferred way of encrypting Personal Identifiable Information?

1 Answers1