23

I need suggestions to design my database architecture (in the context of a web application) on the particular point of its encryption ; knowing that the following elements must be respected:

1- Data must be securely encrypted in the database

This is to secure against attackers, and mainly for the users to know that even the staff cannot access their data, hence the keys must not be accessible by the tech team.

2- Data is scoped to user accounts

(meaning: each user has his own set of data, linked to their user ID)

Therefore I thought using the user's password as the encryption key, but this causes one problem: When the owner of the data decide to change password, the data must be re-encrypted and this would be too much demand in server-power.

3- The owner of the encrypted data must be able to give access to his data to other users

(meaning: there is an invitation system, and part or all of a user's data can be accessed by other invited users)

Which make impossible to use the user's password to encrypt the data because we don't want to share our password.

So I thought about a private/public key encryption, but the private key has to be stored somewhere. Storing it in the database is just rendering the whole encryption useless ; and storing it on the client side is not possible either because it would limit access to the application from the only computer(s) where the private key is installed.

4- Other users can be revoked from this given access

Meaning that, if we consider the private/public key solution we must be able to delete the private key that was given to the user being revoked.

Any suggestion on how to architecture such a system, or any idea I could get inspiration is greatly welcomed. Thanks


Update

Seems up to now, the best approach would be to encrypt the data with a asymmetric key (I call it the data-key), then to encrypt the private part of the data-key with a symmetric key (which is the user's password).

It seems a good solution; however there are several issues I can think of:

  • When a user logs-in, his clear password must be memory-stored on the server-side while the session is open, because we will need for each request to decrypt data. This is a security hole because a hacker could access all open session and their users password stored in clear.

  • When the data is shared (i.e a owner gives access to an invitee), the data-key is decrypted using the clear password of the owner, then encrypted using the clear password of the invitee. The problem is that owner and invitee are not necessary logged-in at the same time, hence the server won't know the clear password of the invitee at the time the invitation is done, and won't be able to encrypt the data-key.

  • When a user loses his password and request for a new password generation, he loses all his data that cannot be decrypted anymore

Benj
  • 231
  • 1
  • 2
  • 7
  • I seem to recall MIT working on some secure medical system lately ... Note, you can solve the password-change problem by using the password to encrypt the real private key. I don't know about 3 and 4. –  Jun 15 '15 at 06:32

4 Answers4

18

TL;DR: Generate a data-key pair, encrypt the private part with the public key of all users that have write access, encrypt the public part with the public key of all users that have read access.


Let's tackle this one by one:

  1. Data must be securely encrypted in the database

This is to secure against attackers, and mainly for the users to know that even the staff cannot access their data, hence the keys must not be accessible by the tech team.

Given this requirement, the most important property you need to consider is that under no circumstances that the server can obtain the information necessary to encrypt or decrypt the data. This implies that all encryption/decryption must happen on the client side. Since web-based system is inherently insecure when you need to do end-to-end encryption, due to the ability of the server to inject JavaScript code-on-demand; the more security conscious users would want to control the client software used to access the service, so they would want this implemented as a desktop application.

  1. Data is scoped to user accounts
  2. The owner of the encrypted data must be able to give access to his data to other users

These two constraints means that multiple users will need to be able to decrypt the data. This means that the secret to decrypt the data needs to be shared to the other users.

  1. Other users can be revoked from this given access

Meaning that, if we consider the private/public key solution we must be able to delete the private key that was given to the user being revoked.

To revoke access you need to reencrypt the data with a new key. As other answers have discussed, you cannot enforce forgetfulness.

The best way to describe this, is perhaps by way of an example.


Notations:

  • P(x) is the private key named x.
  • Q(x) is the matching public key for x.
  • e = E(d, Q(x)) means e is the result of encrypting plaintext d with public key x.
  • d = D(e, P(x)) means d is the result of decrypting the ciphertext e with private key x.

Suppose that Alice wants to share data to Bob, Charlie, and Dave. Alice wants to allow Bob to be able to read and write the data, Charlie can read the data but not produce a valid data, and Dave can only write but not decrypt what others have written (essentially it is a drop folder to Dave).

All users have user-key pairs. P(Alice), Q(Alice) is Alice's user-key pair; P(Bob), Q(Bob) is Bob's user-key pair; P(Charlie), Q(Charlie) is Charlie's user-key; and P(Dave), Q(Dave) is Dave's user-key pair.

The system has a registry of user-key where users can share the public part of their user-key. How a user can securely retrieve and authenticate another user's user-key is beyond the scope of this answer and is left as an exercise to the reader. Most users may simply put some faith on the access restrictions that you put on your server, but the more security-conscious users would need to do something similar to a GPG key signing party;.

All users are expected to keep the private part of their user-key a secret for themselves. How to do this in details is beyond the scope of this answer, but you definitely don't want to store the private user-key in the server unencrypted. Instead what I suggest can encrypt the user-key with a symmetric key derived from the user password and a salt, then store the encrypted user-key and the salt on the server.

To store the data "Hello World" securely, Alice starts by generating a data-key pair: P(data), Q(data). Alice then encrypts the data with the data-key public key:

plaintext = "Hello World"
ciphertext = E(plaintext, Q(data))

Given the properties of public key cryptography, we know that ciphertext can only be decrypted by someone that knows P(data). (Note that the notion of private and public for a data-key is just a matter of convention, both P(data) and Q(data) must be kept private from everyone that doesn't need them, like the server)

Alice wants to allow Bob and Charlie to be able to read this data, so Alice retrieves Bob's and Charlie's public key Q(Bob) and Q(Charlie) and encrypts P(data) with them, additionally to allow Alice to decrypt the file in the future, possibly from a different machine, Alice does the same operation with her own public key:

alice_read_key = E(P(data), Q(Alice))
bob_read_key = E(P(data), Q(Bob))
charlie_read_key = E(P(data), Q(Charlie))

Alice wants to allow Bob and Dave to be able to write data that can be read by Alice, Bob, and Charlie. Alice also wants to be able to update the data in the future. To be able to do this, Alice encrypts the public data-key Q(data) using Q(Alice), Q(Bob), and Q(Dave):

alice_write_key = E(Q(data), Q(Alice))
bob_write_key = E(Q(data), Q(Bob))
charlie_write_key = E(Q(data), Q(Charlie))

Alice then sends all of encrypted_key, alice_read_key, bob_read_key, charlie_read_key, alice_write_key, bob_write_key, and charlie_write_key to the server.

Since the server/attacker is never in possession of P(data) or Q(data) and since the server also do not have the private key to decrypt any of the read_keys, the server would not be able to decrypt ciphertext.

When Charlie wants to retrieve the data, what he does is he needs to download both ciphertext and charlie_read_key and decrypts charlie_read_key with his private user-key to obtain P(data) and then use P(data) to decrypt ciphertext:

P(data) = D(charlie_read_key, P(Charlie))
plaintext = D(ciphertext, P(data))

Now Charlie is in possession of plaintext. However, as Charlie does not have a write-key, he does not have a Q(data), so he would not be able to update data in the system in a way that others would be able to successfully decrypt.

Next, Dave needs to be able to add to the data. He cannot read the ciphertext but he can append to it by decrypting his write-key to get the Q(data):

new_plaintext = "New Data"
Q(data) = D(dave_write_key, P(Dave))
new_ciphertext = E(new_plaintext, Q(data))
updated_ciphertext = ciphertext + new_ciphertext

Now Dave can send updated_ciphertext to the server.

(Note that in most asymmetric encryption algorithms, you cannot simply concatenate two ciphertexts and expect to be able to decrypt it, so you may need to store some metadata that keeps the ciphertext blocks separate and decrypt them separately)

This leaves us with only revocation. To revoke access, you need to have at least P(data) to decrypt the ciphertext back to plaintext, generate a new data-key pair: P'(data), Q'(data), and reencrypt the plaintext with the new data-key pair:

plaintext = D(ciphertext, P(data))
new_ciphertext = E(plaintext, Q'(data))

and then you'll need to update everyone's write-keys and read-keys.

To add a new user to an existing file, all you need to do is just create their write-key and read-key. Only people that themselves can decrypt their read-key can extend a read-key to a new user, and only people that themselves can decrypt their write-key can extend a write-key to a new user.


If you do not need the fine-grained permission permission in this system, (IOW, if all users that can read the data can also update it); or if you use other ways to enforce fine-grained permissions, then you can replace the asymmetric data-key with a symmetric data-key (Trivia: the system with symmetric data-key would be similar to how multi-recipient PGP-encrypted email works; so you may want to investigate that).

Lie Ryan
  • 31,089
  • 6
  • 68
  • 93
  • I disagree with your statement that "all encryption must be done client side". In reality, it comes down to, whom do you trust less: the server, or the client? A malicious tech team (or virus) can certainly access keys client side just as much as server side. – Kevin Keane Jun 17 '15 at 18:44
  • @KevinKeane: for best security, the client has to be open sourced and has deterministic build. A security conscious user can audit the source code and compile the client themselves. With the user being able to trust the client, the user does not need to trust the server operator at all, because no sensitive information is ever sent to the server and the server cannot forge malicious data. The most damage the server operator can do, is delete data. – Lie Ryan Jun 17 '15 at 23:03
  • @KevinKeane: a security conscious user should not trust the server's claim that an encryption key belongs to another user. This is similar to how PGP's Web of Trust works does not depend on PGP keyservers. Like PGP, the security conscious user should verify other users's user-key out of band, so the client must have mechanism to view and compare user-key. Although the concept described here would work with any strong asymmetric encryption algorithm, I'd recommend actually using GPG to implement the asymmetric encryption here; it's a mature implementation and widely understood. – Lie Ryan Jun 17 '15 at 23:17
  • Thanks a lot for taking time to write such a detailed answer! :) However I do not want to pass keys on the client side, and I prefer for the server to take responsibility of encryption and storage. I believe this will represent less risks because the code would be simpler hence less prone to bugs and leaks ; and also because you cannot control nor protect safety (such as virus or other infections) on the client side. – Benj Jun 18 '15 at 06:45
  • @BenjaminSinclaire: if you are encrypting on the server, then it is broken by design because your server will have the data in plain text at some point. Any malicious tech staff/attacker can just grab the data before it gets encrypted. As to your second point, if the client is insecure, then the user are already screwed whether you do the encryption on client side or server side. The risk of leaks are much higher if you do not encrypt end to end. – Lie Ryan Jun 18 '15 at 08:45
  • @LieRyan - of course you are right, in theory. In practice, even an open-sourced client does not actually help, unless the user has personally audited the code, compiled it himself, and knows that there are no trojans or viruses. That is why ultimately it comes down to degrees of trust, rather than absolutes. Server-side, you have to trust the server operator is not being blackmailed by the NSA or otherwise malicious, but outside that risk, a server is more likely to be secure than a client. Of course, if you have the expertise, you can run your own server and get the best of both worlds. – Kevin Keane Jun 20 '15 at 19:45
  • @LieRyan - Correct, a security conscious user should not trust the server's claim. When both users involved know each other, that's not usually an issue, though. – Kevin Keane Jun 20 '15 at 19:48
  • @KevinKeane: the problem is that your first requirement states "... the users to know that even the staff cannot access their data". Doing encryption on server side means this requirement is **impossible**, it's no longer just a matter of trust. You might as well just use no encryption or a single encryption key for all users (e.g. full disk encryption), they're much simpler to implement and more importantly **the security isn't any different**. – Lie Ryan Jun 21 '15 at 01:54
  • @KevinKeane: if you are doing encryption on the client side, you **must protect the client** against compromise/virus/etc. If you do the encryption on the server side, you **must protect both the client and the server** against compromise/virus/etc. The latter just have more attack surfaces. If the client is compromised, you must consider the data coming from them as compromised, **especially if the data is not encrypted by the client**. Encrypting the data on client side, does not make client compromise any worse. – Lie Ryan Jun 21 '15 at 02:08
  • What if you want to make sure only the original owner of the file is able to extend read/write permissions? Having people who can read, extend read and people who have write, extend write seems iffy. – CMCDragonkai Jul 06 '16 at 12:19
  • @CMCDragonkai: Restricting write access can be done by requiring that update blocks be cryptographically signed by the creator of the update block. Recipients should only trust update blocks that are signed by the writers trusted by the file owner. The information about writers trusted by file owner is distributed in some sort of metadata block signed by the file owner. – Lie Ryan Jul 06 '16 at 12:51
  • @CMCDragonkai: Restricting read access redistribution is a bit more difficult. Someone who can access the plain text can always publish the plain text to anyone they like, encryption cannot prevent that. Note that you cannot prevent people from sharing their private keys to allow someone else to read all their data or do something on their behalf. Encryption cannot prevent people from intentionally and deliberately sharing whatever data they have access to. The most you can do is prevent accidental sharing, but that's a user interface problem rather than encryption problem. – Lie Ryan Jul 06 '16 at 13:05
  • @LieRyan Yep, but at least from a UI point of view, non-file owners should not have a easy to way to extend reading. Are you implying just to deny this option at the UI? – CMCDragonkai Jul 06 '16 at 13:20
  • @CMCDragonkai if the business requirement is that this data never really need to be shared by that person, then yes just don't implement a share button. – Lie Ryan Jul 06 '16 at 15:46
5

The generic methodology for this kind of problem is reasoning in terms of knowledge and indirection.

You want each user to be able to do some things that other users, or the "tech people", cannot do; therefore, each user must know a secret value that other people do not. The user's password can be such a secret; otherwise, you would need something stored on the client side.

Access to each data element must be accessible only to a selected set of people at any time, so the data must be encrypted, and the encryption key known to exactly these people. Moreover, you want to be able to share elements on a per-element basis, so each element (file) will need to have its own encryption key.

You cannot enforce forgetfulness; if someone knew, at some time, the contents of a file, then you cannot make it so that they forget it. In practical terms, they may have made a backup on their own machine. Therefore, you cannot revoke access to a data element. At best, you can choose on a per-file basis who can read it, and thus not make available to some people the new version of any given file.

Since you want users to give access to some files to each other, you need some sort of rendezvous, which will most easily achieved with asymmetric cryptography.


This leads to the following design:

  • Each user U owns a public/private key pair PU / SU of a type suitable for asymmetric encryption (say, RSA).

  • The private key is stored "somewhere" such that only the rightful owner may ever access it. One method would be encryption of the private key with the user's password (assuming that the user never sends his password to your server, otherwise the "tech people" could grab it). Alternatively, the user's private key is stored in a file on his desktop/laptop system.

  • Each data element (or file) is encrypted with its own, randomly generated key K (symmetric encryption).

  • Along with each file is stored encrypted versions of K with the public keys of users that should be able to read the file. If user U is part of that set, then that user uses his private key SU to recover K and decrypt the file.

  • Sharing a file with another user V is done by recovering K, then encrypting K with PV (the public key of user V) and storing the result along the file (or making it available to user V through some other mechanism).

  • If a user changes his password, then this impacts, at most, the storage of his private key. Nothing to do about the files. While the user's password may change, his public/private key pair is permanent.

  • When a file is modified, you can either treat the new version as a new, independent file, with its own new key K and its own set of recipients. If the new set of recipients is identical to the old set (or a superset thereof), then you can simply reuse the same key K, which may be simpler for the implementation. Changing the key K is what is most similar to "revoking access" (subject to the caveat of unenforceable forgetfulness).


Of course, the "tech people" still control whatever software is done to perform these operations (in particular in a Web context, with the Javascript being sent by the server itself, or if the encryption/decryption operations are done server-side), so if they really want to cheat on users, then one has to assume that they can.

Tom Leek
  • 168,808
  • 28
  • 337
  • 475
  • Many thanks for taking time to write your answer. The 3 level encryption `password (sym) -> public/private (asym) -> data (sym)` is clearly solving the problem of sharing data between users not being connected at the same time. – Benj Jun 18 '15 at 06:24
  • Because _K_ is only encrypted with public keys it's trivial for an attacker to regenerate _K_ and replace the original file content. The attacker would never have access to the original content, but being able to change the content in a way that it's still readable is a big flaw. – dhinchliff Apr 09 '18 at 15:57
  • @dhinchliff How is it trivial to regenerate K? – interlude Apr 25 '18 at 19:52
  • @interlude Just pick any K, K = "puppy" if you like. Then encrypt the new data with the new K and encrypt the new K with the receiver's public key. – dhinchliff Apr 26 '18 at 10:37
  • @dhinchliff I don't know the official jargon for this but wouldn't that be a K' or K2 or something? Your comment made me think that it's easy to find K somehow. But still I'm not used to the jargon and English is not my native language – interlude Apr 28 '18 at 16:27
1

This is an interesting problem but has actually been solved in various open-source applications at this point. I would recommend, for your use case, borrowing from ownCloud's encryption model (which has the benefit of being open-source).

The general application of this model on your software would look like:

1) Of course this can be done in many ways, but I recommend having the application server itself encrypt this data using asymmetric (public-private key) encryption on and then symmetric encryption. There is a lot you can do with symmetric encryption -- like having half the key rest on the server and requiring the user to supply the other half, etc to address this issue.

2) As o11c points out, encrypting the asymmetric private key with a symmetric encryption method (password) will definitely solve this issue.

3) When other users need a copy of the data, you'd have to have the application server decrypt and then re-encrypt the data for that user. In such a way, you end up with duplicates of the data for each user that needs it. The ownCloud method is interesting -- it uses an asymmetric "share key" to encrypt files that a user shares. This share key is generated for each file and user that the file is shared to. You can then have the application server decrypt the data, encrypt it with that user's public key, and then only that user's password would unlock the private key necessary to decrypt the file.

4) Drawing on 3, all you need to do is delete the newly-generated share key and access is securely revoked (provided they haven't done something like download it or perform a screenshot etc).

Herringbone Cat
  • 4,242
  • 15
  • 19
  • Hi, thanks for your reply. Very interesting indeed. Please note that I cannot consider duplicating the data as you suggest in your point 3. Kindly see my update in the question. – Benj Jun 17 '15 at 06:00
  • No problem. To respond to your update, on the first point the clear password need not be stored in memory. Either a hash can be used, or the entire key; but yes something is in memory. For your second point, we're using asymmetric encryption so it is encrypted with the public key of the user to share to. So, while the server cannot decrypt it, when the user logs in and decrypts their private key with their password the file becomes accessible. On the password change point, you can prompt for the old password and then the new password when a change is detected and then re-encrypt files. – Herringbone Cat Jun 17 '15 at 17:38
1

Apple uses such a mechanism on iCloud. I believe this is how it works (if memory serves me right), and slightly different from what others have suggested. As far as I understand it, it involves only asymmetric encryption.

1) The device (iPhone, iPad etc.) generates a key pair (device key).

2) For a new iCloud account, the device generates a second key pair (the encryption key).

3) The device encrypts the private part of the encryption key using the public device key. Both the (plaintext) public encryption key and the (encrypted) private encryption key are stored on the server.

4) The device uses the public encryption key to encrypt data sent to the server.

To share data:

1) You need a device that is already connected to the cloud. Let's call that device 1. The new device is device 2. 2) Device 2 generates its own device key pair. 3) Device 2 sends its public key to device 1 (either directly or through the cloud. Directly is more secure). 4) Device 1 decrypts the encryption private key using its own private key, and then encrypts it using device 2's public key.

There could be potential for a vulnerability in step 3; if an attacker can trick Device 1 into accepting his public key, he might get access to shared data. I don't know how this is solved, but probably it involves device identification and key fingerprints.

Edit for clarification: the encryption key pair in my descrption would be per-user, but you could use the same mechanism on a different scope. The scope determines the "unit of sharing" - if you want to be able to decide to share or not share individual files, then each file would need to have its own key pair. For sharing, only the key pair, not the underlying data, would be duplicated.

Kevin Keane
  • 1,009
  • 7
  • 8
  • Thanks for your answer Kevin. You lost me a some points because I do not have your knowledge ehehe :) but I still can grab the big picture. I'll also take time to understand the vulnerability you mentioned. – Benj Jun 18 '15 at 06:52
  • I would recommend that you look up Apple's own documentation rather than rely on my reverse-engineering-from-memory. I think Herringbone_Cat's scheme is somewhat similar, but mine/Apples does not require re-encrypting the data itself with each key; it just requires re-encrypting the key for each user. – Kevin Keane Jun 20 '15 at 19:37