I'm storing client data and they are sensitive about privacy and security of the data. In some cases, I don't need the actual data, but could work with a hash of the data. For example, in the case of a users email. I have no need in our application for the users email address except to compare for equality to find records about the same person.
So to minimise the exposure of that data, I was thinking to replace the email with a BCrypt hash of the email before saving it to the database - that way I don't store it, but can still compare like records, or if the client wants to lookup a particular email they can type it in and still be able to search for it.
But we will have 100,000's of records, so the computational cost of Bcrypt would quickly become a problem when cross referencing records.
I'm thinking to just use the lower MD5 instead since it's faster, but wanted to check my thinking:
- Does the reduced difficulty of MD5 vs Bcrypt defeat the purpose of hashing in the first place, or is it a valid trade-off in this case?
- Does this approach in general have a security catch or loophole that I may have overlooked?