Data Key for Production / Test DB

Question

We are encrypting database fields for PII. For Key Generation and storing it, we are using AWS KMS. Data key we have generated using AWS KMS,for the master key we are using CMKs. Now let's say there is only one DB field that I want to encrypt which is email. Also, for simplicity, let's consider we have two environment test/production.

Now I have created an AES-256 encryption library with GCM Mode(let's not discuss that).

Now I have some doubts

The Data Key / Master Key, can they be same for both test/production. If not, then how will the system work. I mean every quarter we copy the complete prod database to test. Now if we do not have the same key, we will not be able to decrypt it.
We will be using HMAC for this also in separate DB column for reading purpose. For this HMAC, should we use the same data key(used in above step) or some different data key? I am almost sure that this is bad idea. So what should I do in this case. Generate two data keys from AWS and use one for AES and other for HMAC.? Or anything else.

The fact that you claim to have made a library for encryption is absolutely a reason to worry and *should* be discussed. Why did you do that? — , Oct 21 '20 at 09:50
We are not using AWS KMS to actual encrypt/decrypt data. Plus the library that I have created, already got it reviewed(https://codereview.stackexchange.com/questions/250785/aes-gcm-encryption-code-secure-enough-or-not) from security perspective and i think its fine now. We have not used aws to reduce costing/performance latencies. — Ankit Bansal, Oct 21 '20 at 09:55
The code is fine, but for different reasons. You didn't write a library that performs AES crypto, you wrote a library that calls a library (that likely calls a library) that performs AES crypto. That's very different. As for your flawed reasoning: [codereview.se] is *not* a site that verifies security-critical code - their goal is to help programmers learn how to write "better" code, usually in terms of architecture, readability or performance. — , Oct 21 '20 at 10:28

brynk · Accepted Answer · 2020-10-27T06:29:48.110

I looked into AWS KMS, and also found your other questions. It became clear to me that your plan is to use envelope encryption, and that both previous versions of my answer were not relevant. On reflection, AWS KMS clearly also has this scenario in mind :

AWS KMS Getting Data Key using AWS Encyption SDK
AWS KMS Data Key Rotation
Envelope encryption and Wells '17 AWS KMS Envelope Encryption

You also asked a question about when to re-key the data key - On what basis to create Data Keys. The accepted answer doesn't mention any limits associated with the symmetric cipher, which may or may not be relevant depending on how much data you plan on encrypting.

So, a revised idea:

generate a new session data key at each service (re)start, and regenerate a new key at (or before):
- the life of the service, or
- up to the limits of the encryption cipher, or
- some arbitrary time frame that you decided upon
the service would connect to KMS in the region, and store this next key using the CMK ^
the service will encrypt data symmetrically before writing to the datastore, as you have already decided in your separate post
when the limit approaches, the service will block on further submissions, re-key, and start processing submissions again
(obv. if you only have one service, then progress will be blocked during this short period)

^ alternatively, it could encrypt the new session data key numerous times, with multiple public keys that it holds, eg, test, prod, recovery, ... and store all copies in KMS - apparently KMS can encrypt up to 4KB of data :

K_for_prod = sealed_box( StaticPK_prod, K_session_data )
KMS_payload = KMS_ENCRYPT( K_for_prod | K_for_test | K_for_recover | ... )

More on sealed box() and another alternative: crypto_box()

Depending on the PII, you might wish to disallow edit and just have a superseding or versioning in your data model, with the new data having to be provided again, which would avoid this scenario. In one data model that I worked on, we basically made all PII non-mandatory, so when the user wishes to change something through the forms system, they simply provide the new data. (The submission form is not used to retrieve data - instead, the person must request the information through a totally different channel.) This may not suit you, however, in which case my proposal above would need some way, at time of edit, to recover the data key for that earlier session.

I also noticed you clarified your requirements around HMAC-SHA256 in another question - https://security.stackexchange.com/a/239550/228961 - which describes your desire to be able to search or filter on the encrypted data in the database without exposing it.

Data Key for Production / Test DB

1 Answers1

Linked