Anonymity: standardization or randomization?

Question

I wonder if it's better to standardize or randomize data for anonymity. For example, think of browser fingerprinting. If you standardize every parameter, you would have all browsers returning the same user-agent, the same installed fonts, the same window size, etc. Every browser would appear to be the same as the others, so it would be anonymous (except for the source IP of course). The other approach is randomization: every browser will randomize the data for every request, so for example the user-agent will keep on changing every time (or pretty often anyway) and might be chosen from a large set of common user-agents, or maybe even randomly generated. The same goes for all the other parameters, including window size, etc.

I think there's basically no difference between these two approaches, except that maybe randomization might confuse the tracking systems a bit more, causing a little bit of damage by polluting their data. It might also be easier to spot any small differences between standardized data, while randomized data might be more difficult to analyze, at least at first, before the trackers have figured out a way to remove the noise.

@forest, I'm not sure I understand what you mean. It sounds like you mean there's no difference after all, and that's what I suspect too, but I'm not completely sure. I kind of feel that randomization (and therefore a bigger anonymity set?) provides some more benefits. — reed, Jul 09 '19 at 10:40
It's the other way around. Your anonymity set is the number of people you are indistinguishable from. Randomization _reduces_ your anonymity set, whereas using a standard profile increases it. It's extremely important _not_ to randomize if you want good anonymity. — forest, Jul 12 '19 at 05:14

Nic · Answer 1 · 2019-07-09T17:51:27.083

Complete fingerprint elimination is impossible for a wide variety of technical reasons -- in short, a lot of fingerprintable data is required for correct website operation, so you can't deny it to websites, or can be figured out through side channels that would require a big performance hit to eliminate. The original version of this answer had a massive list of specific reasons why, but... it was over two pages. I removed it before posting. There are also legal reasons -- browsers might be considered breaching their fiduciary duties if they aren't constantly trying to innovate, which implicitly means newer versions have different features, which... is fingerprintable. TL;DR: Fingerprinting will always exist.

The damage can be minimized, though, generally through standardization. For example, browsers could:

Change the UA so it only contains the browser and major version so it leaks less data about the OS and hardware.
Remove certain features that shouldn't be necessary for websites, and are used to provide additional data.
Encourage people to browse full- or half-screen only, so that their window sizes are standardized to a small set of values which are nonetheless convenient.
Teach consumers to be more paranoid about cookies and require more strict, more granular permissions before setting them.
Default to behaving in some standard way, with extensions opted in to, so websites can't tell if the functionality doesn't exist or if it's just disabled.

Randomization could also be used for those, but there's the additional danger of randomization patterns being fingerprintable, without any additional benefit, and while being harder to confidently reason about. The temporary benefit in the form of confusing fingerprinters briefly probably wouldn't be worth the lasting damage of giving much more accurate fingerprints (because currently, the UA can't really identify someone, as it's just a string that some browsers even let you change -- but how it's randomized is built in to the browser's code, and therefore can be.)

So ultimately, in practice, standardization works better for fingerprint reduction.

Even if you couldn't be fingerprinted via randomization patterns, you would end up being "that one guy who always gives a random fingerprint", which itself is a fingerprint that brings your anonymity set to 1: you. — forest, Jul 09 '19 at 09:36
@forest, right, it would probably be a bad idea to randomize the data for every request. But maybe only randomizing it for every website and every session would work. — reed, Jul 09 '19 at 10:43
@forest Well, like standardization, it depends on how many people adopt the technique. If _everyone_ is always randomizing _everything_ in a perfectly consistent way, then you do gain as close to perfect anonymity as you can get client-side. Likewise if everyone standardizes on everything and sticks to that standard. But... neither is practical, for many reasons. But you're right, if there's a very small subset of people who randomize, then it doesn't really work. — Nic, Jul 09 '19 at 14:13
@NicHartley Yes that's true, but because people don't do that, randomization is a bad idea. There are other things that make it inferior, but the big reason is that you would need everyone to not only use randomization, but do it exactly the way you do. — forest, Jul 12 '19 at 05:15
@forest Yup, what I mentioned was an edge case. That's why I recommended standardization in my answer :) — Nic, Jul 12 '19 at 14:08

score 0 · Answer 2 · answered Jul 09 '19 at 08:55

The main advantage of randomization is, that it is harder to detect and that you will have privacy even when you're the only one doing it (a standardized fingerprint is still a fingerprint and only becomes effective when many people have it).

The main advantage of standardization is, that you can test it better. It is hard to tell, if a value is really random or correlated with something, but it is easy to check if it is 0.

Anonymity: standardization or randomization?

2 Answers2