Model for multi-tenant data encryption where the application owners or cloud providers cannot see decrypted data

Question

Consider an application in which users will install on-premise agents which communicate with a cloud hosted service (aws in this case). The users can interact with the cloud service to configure and assign work to the on-premise agents.

The cloud service will be multi-tenant, and will store some sensitive customer data in the cloud. This data will be passed to the customer on-premise agents which will need to decrypt and use the data. The problem is how to ensure that sensitive data that we are storing cannot be read by us, the cloud provider or any other customer.

Our current thinking is that the customer will generate a public/private key pair, upload the public key to the cloud service, but keep the private key local so it will never enter our cloud infrastructure. So the public key can be used by the cloud service to encrypt data (or to encrypt a symmetric key which encrypts the data), but without the private key it cannot be decrypted.

The agents would obviously need access to the private key to decrypt the data, but it would not be possible for anyone to unencrypt the data in the cloud as the private key is stored with the customer

Does this sound like a reasonable approach?

These seems like a problem that should already have a solution, is there a library or cloud service that would help with this?

I've looked at AWS KMS but it doesn't seem to support a scenario that would prevent us from gaining access to the customer's data as we could assign ourselves a role that has access to their customer master key

EDIT: I'm still trying to get an understanding of this apologies if I'm not clear or fully understanding your answer.

To clarify, a user will be entering sensitive data such as passwords into a web application which will be cloud hosted. Client side encryption is an option, the cloud service doesn't ever need to see the data unencrypted.

Could we store the public key for each customer in the cloud, and have the browser make a request to retrieve that key (based on info from their auth token) when the user is required to enter sensitive data. Then encrypt on the browser before passing to the cloud service.

Again, the cloud service will only need to store this data and will never need to see it in it's decrypted form.

It will be required to pass the encrypted info to agents running outside of the cloud infrastructure which will need to use the decrypted data. So when setting up an agent we'd somehow need to register the private key with each agent so they can decrypt the data.

This is basically how any 'zero-access' cloud service works. ProtonMail, BoxCryptor, and SpiderOak are good examples of these. — mti2935, Nov 13 '20 at 19:38

score 2 · Answer 1 · answered Nov 14 '20 at 02:41

From what I can gather, in your process data are placed in the care of the cloud-hosted service in unencrypted form (ie. they may arrive by TLS socket, but are then delivered to the hosted service without encryption). The point being, clear-text data are exposed in the hosting service at some point in time, so you can never guarantee to your customers that you can't decrypt the data, because you've had the plaintext at some point, however brief.

If this is not the case, then you're using end-to-end encryption, and focussing your attention in the wrong place. Instead, you need to think about how your customer can upload encrypted work-data that can only be brokered to the on-premise agent by the cloud service, but not decrypted.

In your proposed model, there is potentially nothing to stop me, an independent 3rd party, from generating agent-data that could be interpreted as coming from a valid source (assuming I can get a hold of the agent's public-key) .. this is why authenticated exchange protocols exist, where both parties share their public keys(s), so that the sender can be satisfied that only the recipient can decrypt the symmetric data key, and the recipient can be satisfied that the data came from the sender.

As you've hinted at in your question, most data sharing in public-key cryptography are hybrid schemes, whereby asymmetric encryption is used to secure a randomly selected symmetric key. The symmetric key is used to encrypt the data. Some relevant information would available on crypto.SE., eg. kelalaka 2020 https://crypto.stackexchange.com/a/81704/77149.

One such example is from Libsodium's crypto_box construct. (However, any governance or standards you need to comply with might restrict the cryptographic primitives you're allowed to use, which may in turn preclude the use of this library.)

To achieve this, both parties exchange their public keys 'somehow', and then use the combination of their own secret key and the counter-party's public key to derive a shared secret.

If you're doing the encryption on the cloud-service (and not on the client's work computer), you now have the client's secret key sitting exposed in the context of the cloud service... which is why AWS KMS et al. exist. These services ensure that the secret keys are better protected from other processes in the host's context. If you did use KMS to secure the data encryption key, you would have to hit KMS every time you wanted to decrypt something. Instead, you might use KMS to secure a longer-term key, and then use this longer-term key to secure keys for adhoc data operations. I discuss this in a separate answer.

Please see my edit. It sounds like the second paragraph of your answer is true — Neil, Nov 16 '20 at 09:04

Model for multi-tenant data encryption where the application owners or cloud providers cannot see decrypted data

1 Answers1