9

Modern web browsers leak a ridiculous amount of information through the User-Agent header. The following is an example for Safari on iPad, from Wikipedia:

Mozilla/5.0 (iPad; U; CPU OS 3_2_1 like Mac OS X; en-us)
AppleWebKit/531.21.10 (KHTML, like Gecko) Mobile/7B405

It is clear that the information provided here goes way beyond what makes any sense for the purposes of browser/OS targeting by legitimate web services. Indeed, the only thing that this level of detail seems to do is facilitate tracking and browser fingerprinting, and unsurprisingly it is used for that extensively.

Why do browser vendors allow/support this? Why isn't something like

Mozilla/5.0 (Safari 5.1; iOS/iPad; en-us)

enough? Even en-us should not be needed as it's duplicated by the Accept-Language header, and whether the server has any right to know what device I am using is debatable as well, leaving us with

Mozilla/5.0 (Safari 5.1; iOS)

which still captures the browser and OS version and should therefore be completely sufficient for all legitimate purposes.

Anders
  • 64,406
  • 24
  • 178
  • 215
user114974
  • 91
  • 1
  • 2

2 Answers2

10

User agent strings are maddly complicated for historical reasons. It is a long story, but the short version is that everybody wanted to look like someone else to circumwent servers restricting access to webpages based on browsers. Yes, that used to be a thing back in the days. And now we are stuck with this sad mess.

The good news is that this means that even though the user agent string is long, a lot of it isn't really useful information. If we remove the boiler plate, the information about you that we can actually get from this string is:

  • iPad: You are on an iPad.
  • CPU OS 3_2_1 like Mac OS X: You are using iOS with specified version.
  • en-us: You have your language set to american english.
  • AppleWebKit/531.21.10: You are using the WebKit engine, with that particular version.
  • Mobile/7B405: Your firmware version.
  • You are using Safari. Never explicitly says so, but it can be deduced from the structure of the string.

That is a lot of information that could be useful for fingerprinting. I would say this is still a good practice:

  • All of this information could also be useful when determining what version of a site to serve. For example, if there is a known bug in a certain version of WebKit, you might want to include some CSS with a workaround. There are legitimate uses for this.
  • If you keep your software up to date, your user agent will hardly be unique anyway.
  • There are just to many ways too fingerprint a browser for there to be any point in removing useful information to prevent it. Just look at Panopticlick or the evercookie.

My main point is that asking browser vendors to remove functionality will not solve the underlying problem. If you want to remain untraceable, you need to use a browser specifically designed for that. It will involve loosing a lot of legitimate functionality (e.g. the ability to rezise the browser window).

If you are only worried about the user agent header, there are plugins that will change it or even randomize it for you. But if you are worried about being traced on the web, you should probably be worried about a lot more than that.

Anders
  • 64,406
  • 24
  • 178
  • 215
  • 3
    To add to this: "Mozilla" was added when Netscape was the most functional browser, so servers sent the "best" version of the page to user agents with that in. "Gecko" is similar - the rendering engine used by Firefox. "KHTML" is the original rendering engine for Safari, which WebKit was based on. They're all to do with browser wars, and tricking servers into sending the "best" version of a page. – Matthew Jun 20 '16 at 09:44
  • 2
    All that junk info, and yet, User Agent doesn't contain the pieces of information that many web developers actually cares about: whether it's mobile/desktop and mouse/touch capability. Instead you have to build a database of User Agent lies to extract that knowledge. – Lie Ryan Jun 20 '16 at 12:17
  • @LieRyan Because user agents predate all that. Today a developer should use [feature detection](https://en.wikipedia.org/wiki/Feature_detection_(web_development)) to determine a feature and **not** user agents. But back then the browsers were so different it was normal to have multiple versions of a page served depending on the browser. The browser and how it would render/work could also change from version to version and platform to platform. User agents were at the time the only way to get these details on which page they should serve. This practice was called content tailoring. – Bacon Brad Jun 21 '16 at 20:13
  • Ironically enough I came here from Panopticlick because user agent was the second biggest fingerprint contributor at 10 bits. And that's on Chrome. WTF? – Fax Jan 08 '18 at 13:22
0

First it is important to understand the idea of user-agents were never intended for finger printing. But to get information about the browser, platform, rendering engine, and other device information to help developers.

User-agents were first added to the HTTP standard in 1996. At the time browser standards were generally the wild west. Very rarely did two browsers render completely the same. Sometimes the same browser would function entirely different per platform. And even two different versions of the same browser would work completely different. As a result it was normal to have multiple versions of the same page which were tailored for a particular browser or range of browsers. And the user-agent was the only way to obtain these details on which page to serve.

Today user-agent sniffing for content tailoring is discouraged for modern development. Instead a developer should detect and adapt using feature detection. However even though this is a better and more preferred way don't expect it to be stripped from browsers anytime soon. Due to it's wide usage it would be a breaking change. And there is also something call undetectables. Meaning the feature cannot be detected and the user-agent might be the last measure to detect it's usability or provide an alternate means. Or there is a known bug of a particular browser/renderer and you wish to detect its possibility via the user-agent.

Additionally the browser for security reasons typically sandboxes websites from accessing any detailed information from the machine. The user-agent is one of the few places a browser can volunteer and surrender this otherwise unobtainable information. So it is often descriptive because there is no alternative means to pull it. Whereas touch, screen size, etc can be done by feature detection making it unnecessary to include.

The whole idea and usage of user-agent strings used for finger printing is a side effect of it's availability. But it was never the intended purpose or specification for this feature.

Bacon Brad
  • 3,340
  • 19
  • 26