PII Data Masking rules - what's acceptable and what's not

Question

Is there a security rule on correct masking for sensitive information? Let's say we want to use prod data in our UAT environment. We're thinking of creating a masking logic when we transfer prod data to UAT. Changing the fields would look something like this:

Names: e.g. John Doe to JXXXXXe
Mobile Number e.g. +1-234-56799101 to X-X-XXXX9101
Birthdate e.g. 05/23/1987 to 05/XX/XXXX

etc.

Is there a standard on acceptable masking logic?

Yes. There are numerous regulations and a massive topic area called "pseudonymization". The regulatory details depend on the regulations that apply to you. But for UAT, don't use live prod data or mask. Use generated data with no connection to real people and avoid this problem entirely. — schroeder, Apr 07 '21 at 14:08

score 1 · Answer 1 · edited Apr 07 '21 at 18:49

Just to literally answer your question, "is there a standard?.." - yes, for USA it is NISTIR 8053.

More specifically - you can de-identify or anonymize your data, depending on the nature of your dataset and the UAT. The difference is:

De-identification of data refers to the process of removing or obscuring any personally identifiable information from individual records in a way that minimizes the risk of unintended disclosure of the identity of individuals and information about them.
Anonymization of data refers to the process of data de-identification that produces data where individual records cannot be linked back to an original as they do not include the required translation variables to do so.

There are many methods that you can use, such as masking, shuffling, randomization, etc. Again, use whatever is appropriate for your use case.

PII Data Masking rules - what's acceptable and what's not

1 Answers1