MAC address anonymization
MAC address anonymization performs a one-way function on a MAC address so that the result may be used in tracking systems for reporting and the general public, while making it nearly impossible to obtain the original MAC address from the result. The idea is that this process allows companies like Google,[1] Apple[2] and iInside[3] - which track users movements via computer hardware to simultaneously preserve the identities of the people they are tracking, as well as the hardware itself.
Examples
An example of MAC address anonymization would be to use a simple hash algorithm. Given an address of 11:22:33:44:55:66
, the MD5 hash algorithm produces eb341820cd3a3485461a61b1e97d31b1
(32 hexadecimal digits).[4]
An address only one character different (11:22:33:44:55:67
) produces 391907146439938c9821856fa181052e
,[5] an entirely different hash due to the avalanche effect.
Why this does not work in practice
Tracking companies rely on the assumption that address anonymization is akin to encryption. Given a message, and an encryption method that is well known to both the encoder and potential decryptor, modern encryption methods (such as Advanced Encryption Standard (AES) or RSA) will yield a result that is unbreakable in practice.
The problem lies in the fact that there are only 248 (281,474,976,710,656) possible MAC addresses. Given the encoding algorithm, an index can easily be created for each possible address. By using rainbow table compression, the index can be made small enough to be portable. Building the index is an embarrassingly parallel problem, and so the work can be accelerated greatly e.g. by renting a large amount of cloud computing resources temporarily.
For example, if a single CPU can compute 1,000,000 encrypted MACs per second, then generating the full table takes 8.9 CPU-years. With a fleet of 1,000 CPUs, this would only take around 78 hours. Using a rainbow table with a "depth" of 1,000,000 hashes per entry, the resulting table would only contain a few hundred million entries (a few GB) and require 0.5 seconds (on average, ignoring I/O time) to reverse any encrypted MAC into its original form.
One approach to mitigate this attack would be to use a deliberately slow one-way function for MAC addresses, such as a slow Key derivation function (KDF). For instance, if the KDF were tuned to require 0.1 seconds per MAC address anonymization operation (on a typical consumer CPU), generating a rainbow table would require 892,000 CPU-years.
Truncating
Where data protection law requires anonymization, the method used should exclude any possibility of the original MAC address to be identified. Some companies truncate IPv4 addresses by removing the final octet, thus in effect retaining information about the user's ISP or subnet, but not directly identifying the individual. The activity could then originate from any of 254 IP addresses. This may not always be enough to guarantee anonymization.[6]
References
- "Google Maps Has Been Tracking Your Every Move, And There's A Website To Prove It". Junkee. Retrieved 2016-04-10.
- "How your iPhone has been tracking your every move in secret | Metro News". Metro.co.uk. 2014-09-28. Retrieved 2016-04-11.
- "iInside retail brochure: Leading the market in indoor location techno…". 2014-03-10. Cite journal requires
|journal=
(help) - echo -n "112233445566"|md5sum = eb341820cd3a3485461a61b1e97d31b1
- echo -n "112233445567"|md5sum = 391907146439938c9821856fa181052e
- "Opinion 1/2008 on data protection issues related to search engines" (PDF). ARTICLE 29 DATA PROTECTION WORKING PARTY. Retrieved 20 October 2017.