Anonymizing IP addresses using (sha) hashes; how to circumvent rainbow table attacks?

Question

Under GDPR, IP addresses are personal data. I have no need to trace back IP to specific users, but I would like to limit downloads to one per IP.* I do not want to store plain IPs.

My first solution would be to hash the IP. I could store the hash:

12ca17b49af2289436f303e0166030a21e525d266e209267433801a8fd4071a0

The problem is that hashing all 4 294 967 296 possible IP addresses is simple, and someone will easily find that 127.0.0.1 is the stored IP.

Adding a salt holds the same problem, you can calculate all the IPs again with this salt and arrive at the same problem.

Is there a solution for this?

* Use case here is simplified, please do not comment on reasons why I want this. ;)

This really sounds like an [XY Problem](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem) to me. What you ask is "How can I store IP addresses in an anonymized format?", but it sounds to me as if the question you actually want to ask is "How can I limit downloads to one per IP?". I am aware it is simplified, but given the example that you made, it seems like perhaps something as ephemeral and easily changable as an IP address is not the best way to do things. — , Aug 09 '19 at 12:06
Furthermore, there are 2^32 IPv4 addresses, so even if you were to store them as safely as passwords, with a slow KDF and a random salt: Your keyspace is very very small, just over 4.2 billion addresses. — , Aug 09 '19 at 12:11
@MechMK1 I agree with MechMK1 that this is likely an XY Problem. I realize you've given a simplified example, but any kind of tracking by IP is rarely helpful. Not only can IP addresses change quickly (a user with a phone walks out his door and disconnects from Wifi, getting a new IP address in 30 seconds), but often a single IP address can represent multiple users and devices, sometimes even hundreds or thousands of people/devices for offices behind a NAT. Unless you're absolutely sure you care about the IP address, you probably don't really. — Conor Mancone, Aug 09 '19 at 14:14
In an ideal world we would have a better way of blocking and throttling scumbags, but in the real world IP addreses are the best we have. — Peter Green, Aug 09 '19 at 17:50

score 4 · Accepted Answer · answered Aug 10 '19 at 10:49

From a pure security standpoint, I see 3 possible improvements to your system:

Using a slow hash like bcrypt. It is serveral order of magnitude slower than SHA-1. Your application won't be impacted, but it will take weeks for a motivated attacker to bruteforce every possible IP, instead of a fraction of second.
Change the salt regularly. The attacker will have to generate a rainbow table per salt. It might not be possible if you don't flush the table regularly, though. The current date should be enough, for example.
Use the file ID as part of the salt. If your users are downloading different files, you can make your salt depend on the file ID, so there will be several different salts in use in the database at the same time.

An attacker would now have to make a rainbow table every day, once per file available to download. Each rainbow table would cost him weeks, even with a good setup.

That's still far from being impossible though (because of the very low number of IPv4 addresses) but that's definitively a massive improvement from the fraction of second it was before.

score 3 · Answer 2 · answered Aug 09 '19 at 12:18

3

As long as you are not storing IP addresses alongside other personally identifiable information, they do not have to be handled under GDPR rules. They only become sensitive when enriched with a user's name, email address, or any other such data. Just make sure that the systems limiting requests are separate from those handling logging in and user data, and you'll be fine. In this case, all you're really storing is a contextless number from a discrete set.

answered Aug 09 '19 at 12:18

James

51
4

3

I wouldn't go as far as saying it's just a random number. Depending where in the EU you live, your IP address *may* be pretty static. My old ISP would rotate IP addresses once every few *months*. And storing `IP 123.45.67.89 downloaded "How_to_build_a_pipebomb.pdf"` may reveal a bit more than just a random number. – Aug 09 '19 at 12:22
To reframe this problem a bit, bear in mind that firewalls log and manage states of connections using IP addresses and port numbers. There aren't required to be obfuscated or encrypted. Like any data you store, you just have to take reasonable care with it. – James Aug 09 '19 at 12:26

score 2 · Answer 3 · answered Aug 09 '19 at 15:42

GDPR requires reasonable protection of such data but does not totally forbid storing these data. Since the hashes are only stored on the server (where the attacker should have no access anyway and since the use case you have only needs short term storage and you can (and should) delete these hashes afterwards I see no real problem here. Even when the attacker manages to take get a copy of some hashes these will only impact very few users, i.e. only these which had an active download at this time.

But if you feel otherwise you could add additional protection by including some (kind of random) seed in the hash where each seed is valid for only some time. This makes the validation of duplicate downloads a bit harder since you must check with all seeds within the possible time frame of the downloads (and thus preserve older seeds for a short time) but it makes precomputed mappings between IP and hash useless.

Anonymizing IP addresses using (sha) hashes; how to circumvent rainbow table attacks?

3 Answers3