In my company, numeric user IDs are considered PIIs and therefore need to be pseudo-anonymized to be GDPR compliant.
To do so, we populate a lookup table where to each ID is assigned a monotonically decreasing gdpr_ID
. Then when users are inactive or request to be deleted, we simply update the ID column by using the gdpr_ID
value, making it impossible to know that gdpr_ID
was originally linked to a specific ID.
The downside of this approach is that we have different tables where these IDs appear and we need to anonymize these records, but this lookup table is getting bigger and bigger, making our ETL pipelines increasingly slower.
The solution we were thinking about was to use a hashing function, but since IDs are just numbers we need to attach a salt. How can we generate this salt in such a way that salt(ID)
always returns the same value, without having to create/use lookup tables and, at the same time, not being able to reconstruct it?