We have a database of sort codes (6 digit numbers) and account numbers (8 digit numbers) that we use to reconcile monthly accounts with the table of supporters.
There is nothing in the data received from the bank that uniquely identifies the supporter, other than the sort code and account number. ... I know, it's annoying.
While this data is not as sensitive as card data (and not subject to PCI-DSS), it's still pretty sensitive and I'd like to find another way to do the reconciliation to reduce the liability of having all this data.
Combining sort code and account number gives up to 10^14 possibilities.
Is there a way (using a reliable and established PHP function) to hash the data and only store the hash, that would allow me to take a monthly file of -say- 1000 records and match them up to the hashed data? Or is there really no point and instead focus on hardening security around this db?
The security advantage I'm seeking is that the database does not have a ready-to-use list of people's bank details. The transactional monthly bank statement data can be considered to be of short lifespan (it is received encrypted, decrypted, processed, deleted).
I've read a helpful detailed comparison of hashing functions but obviously here we're not talking about password, and in effect we need to be able to crack them every month! Hmmm.
EDIT: Conclusion
Thanks to the answers below, here's what I plan to do:
Set-up
- Create a map for sort codes and account numbers to random ids.
- Replace real data with mapped data.
- Encrypt this map using PHP's Mcrypt AES 256 with a user-provided key never stored on server
- Store the encrypted map on the server.
Now: you can take the database, you don't get the data, or any way to decrypt it by brute force, thanks to the random map.
You can take the map also and figure out how it works (not relying on obscurity), but you still need to be able to crack the encryption to get access to the map. This feels like a suitable level of risk.
Reconciliation
- Decrypt the PGP content from bank locally.
- Over SSL, upload the month's transactions and also provide the decryption key.
- Server decrypts the map, applies it to the uploaded data, stores mapped data for later processing, deletes raw uploaded file.
- User deletes decrypted bank data locally.
This means the key and decrypted map are only ever in RAM. The month's transactions are temporarily stored on disk, but that's an acceptable level of risk IMO (could use a secure deletion method like bleachbit etc.).
Updating the key is as simple as provide existing and new keys, decrypt map, encrypt map, store map.
If there was concern that the decrypted map had been compromised, this could be rebuilt, too, although it's more effort as it means updating all the stored data.