TL;DR: Generate a data-key pair, encrypt the private part with the public key of all users that have write access, encrypt the public part with the public key of all users that have read access.
Let's tackle this one by one:
- Data must be securely encrypted in the database
This is to secure against attackers, and mainly for the users to know that even the staff cannot access their data, hence the keys must not be accessible by the tech team.
Given this requirement, the most important property you need to consider is that under no circumstances that the server can obtain the information necessary to encrypt or decrypt the data. This implies that all encryption/decryption must happen on the client side. Since web-based system is inherently insecure when you need to do end-to-end encryption, due to the ability of the server to inject JavaScript code-on-demand; the more security conscious users would want to control the client software used to access the service, so they would want this implemented as a desktop application.
- Data is scoped to user accounts
- The owner of the encrypted data must be able to give access to his data to other users
These two constraints means that multiple users will need to be able to decrypt the data. This means that the secret to decrypt the data needs to be shared to the other users.
- Other users can be revoked from this given access
Meaning that, if we consider the private/public key solution we must be able to delete the private key that was given to the user being revoked.
To revoke access you need to reencrypt the data with a new key. As other answers have discussed, you cannot enforce forgetfulness.
The best way to describe this, is perhaps by way of an example.
Notations:
P(x)
is the private key named x.
Q(x)
is the matching public key for x.
e = E(d, Q(x))
means e
is the result of encrypting plaintext d
with public key x
.
d = D(e, P(x))
means d
is the result of decrypting the ciphertext e
with private key x
.
Suppose that Alice wants to share data to Bob, Charlie, and Dave. Alice wants to allow Bob to be able to read and write the data, Charlie can read the data but not produce a valid data, and Dave can only write but not decrypt what others have written (essentially it is a drop folder to Dave).
All users have user-key pairs. P(Alice)
, Q(Alice)
is Alice's user-key pair; P(Bob)
, Q(Bob)
is Bob's user-key pair; P(Charlie)
, Q(Charlie)
is Charlie's user-key; and P(Dave)
, Q(Dave)
is Dave's user-key pair.
The system has a registry of user-key where users can share the public part of their user-key. How a user can securely retrieve and authenticate another user's user-key is beyond the scope of this answer and is left as an exercise to the reader. Most users may simply put some faith on the access restrictions that you put on your server, but the more security-conscious users would need to do something similar to a GPG key signing party;.
All users are expected to keep the private part of their user-key a secret for themselves. How to do this in details is beyond the scope of this answer, but you definitely don't want to store the private user-key in the server unencrypted. Instead what I suggest can encrypt the user-key with a symmetric key derived from the user password and a salt, then store the encrypted user-key and the salt on the server.
To store the data "Hello World" securely, Alice starts by generating a data-key pair: P(data)
, Q(data)
. Alice then encrypts the data with the data-key public key:
plaintext = "Hello World"
ciphertext = E(plaintext, Q(data))
Given the properties of public key cryptography, we know that ciphertext
can only be decrypted by someone that knows P(data)
. (Note that the notion of private and public for a data-key is just a matter of convention, both P(data)
and Q(data)
must be kept private from everyone that doesn't need them, like the server)
Alice wants to allow Bob and Charlie to be able to read this data, so Alice retrieves Bob's and Charlie's public key Q(Bob)
and Q(Charlie)
and encrypts P(data)
with them, additionally to allow Alice to decrypt the file in the future, possibly from a different machine, Alice does the same operation with her own public key:
alice_read_key = E(P(data), Q(Alice))
bob_read_key = E(P(data), Q(Bob))
charlie_read_key = E(P(data), Q(Charlie))
Alice wants to allow Bob and Dave to be able to write data that can be read by Alice, Bob, and Charlie. Alice also wants to be able to update the data in the future. To be able to do this, Alice encrypts the public data-key Q(data)
using Q(Alice)
, Q(Bob)
, and Q(Dave)
:
alice_write_key = E(Q(data), Q(Alice))
bob_write_key = E(Q(data), Q(Bob))
charlie_write_key = E(Q(data), Q(Charlie))
Alice then sends all of encrypted_key
, alice_read_key
, bob_read_key
, charlie_read_key
, alice_write_key
, bob_write_key
, and charlie_write_key
to the server.
Since the server/attacker is never in possession of P(data)
or Q(data)
and since the server also do not have the private key to decrypt any of the read_keys
, the server would not be able to decrypt ciphertext
.
When Charlie wants to retrieve the data, what he does is he needs to download both ciphertext
and charlie_read_key
and decrypts charlie_read_key
with his private user-key to obtain P(data)
and then use P(data)
to decrypt ciphertext
:
P(data) = D(charlie_read_key, P(Charlie))
plaintext = D(ciphertext, P(data))
Now Charlie is in possession of plaintext
. However, as Charlie does not have a write-key, he does not have a Q(data)
, so he would not be able to update data in the system in a way that others would be able to successfully decrypt.
Next, Dave needs to be able to add to the data. He cannot read the ciphertext
but he can append to it by decrypting his write-key to get the Q(data):
new_plaintext = "New Data"
Q(data) = D(dave_write_key, P(Dave))
new_ciphertext = E(new_plaintext, Q(data))
updated_ciphertext = ciphertext + new_ciphertext
Now Dave can send updated_ciphertext to the server.
(Note that in most asymmetric encryption algorithms, you cannot simply concatenate two ciphertexts and expect to be able to decrypt it, so you may need to store some metadata that keeps the ciphertext blocks separate and decrypt them separately)
This leaves us with only revocation. To revoke access, you need to have at least P(data)
to decrypt the ciphertext
back to plaintext
, generate a new data-key pair: P'(data)
, Q'(data)
, and reencrypt the plaintext with the new data-key pair:
plaintext = D(ciphertext, P(data))
new_ciphertext = E(plaintext, Q'(data))
and then you'll need to update everyone's write-keys and read-keys.
To add a new user to an existing file, all you need to do is just create their write-key and read-key. Only people that themselves can decrypt their read-key can extend a read-key to a new user, and only people that themselves can decrypt their write-key can extend a write-key to a new user.
If you do not need the fine-grained permission permission in this system, (IOW, if all users that can read the data can also update it); or if you use other ways to enforce fine-grained permissions, then you can replace the asymmetric data-key with a symmetric data-key (Trivia: the system with symmetric data-key would be similar to how multi-recipient PGP-encrypted email works; so you may want to investigate that).