Data protection laws including GDPR state:
“Personal data shall be obtained only for one or more specified and lawful purposes, and shall not be further processed in any manner incompatible with that purpose or those purposes.”
GDPR stipulates data should not be used in non-production systems unless anonymized or through pseudonymization .
Generally speaking a customer would not expect their information to be used in a test environment or for the purpose of new technology solutions and hence we can argue we do/do not have an case for legitimate processing of PII in test environments.
I have requirement. I want to use personal identifiable data (PII) to develop new technology. I need to ingest PII in an AWS dev environment, the data quality is poor, then clean the data in a dev/test system, and sent to a production environment after i have proved the data cleansing works. Ofuscating the data in some fashion is not an option as we need to transform the poor data quality into making it good.
I will encrypt the relevant services used in AWS using KMS and data access will be limited to a small group of developers. Data will be deleted at the end of the dev/test period. All AWS services will be tightly controlled via security groups and IAM polices. This seems like an easier option than anonymization or pseudonymization which is difficult and cumbersome.
Does this seem like a good approach ? How have others secured live (PII) data in non-prod environments?