I'm trying to help anonymise users but still give them some controls. So this is different from say, anonymising a data set where you never need to go back to the original users.
Let me give you an example. I want a user to come in with his or her email address to log into my system, but I don't want to store the email address. I want to assign that user a UUID that I will then use. However, if the user loses the random identifier they need to be able to re-enter their email address and then get back into the system.
The simplest answer to this is a hash. Of course I could store the hash next to the UUID. If they lose the UUID they can come back and I can re-hash the password. On the other hand, if my system is broken into, the bad guys can simply do a dictionary attack on a list of email addresses and then re-identify the UUIDs.
Firstly, is there a standard way of doing this? bcrypt and PBKDF2 pop to mind, but I obviously can't store a tuple of <email, salt, iterations>
without making the intruders job even easier.
I don't like to invent new security stuff, but I have had one idea. Basically I store <SHA512(email), salt, iterations>
and then I store <PBKDF2(email, salt, iterations), UUID>
.
That way they have to first dictionary attack the first table, and then use the results of that to do individual dictionary attacks on each row of the second table, which should slow things down.