A few days ago I was searching YouTube and I noticed an "outlier" video poped up in the suggestions list. This video had nothing to do with the topic being searched, but instead it was based on a previous search that was requested from this same computer two weeks ago.
This was impressive considering the situation difficulted fingerprinting. In particular:
- I was not logged, nor was any other person in this machine, to YouTube nor any other Google service.
- The browser is configured in "private" mode by default, so every cookie/storage/websql/indexeddb file is deleted when it is closed. This means whatever info they stored was backed up online.
- The IP changes at least once each day, so hundreds or thousands of IPs would be associated with this geographical area. I'm almost sure they can only pin it down to the city level for what I have been able to observe in Gmail's account activity page for several users in this same area.
- The user agent is nothing out of the common, as the browser regularly updates to the last version. In fact, this is not 100% verified but I'm pretty sure their profiling method is somehow resistant to IP and user agent changes.
- Installed plugins are the usual ones, very common as well.
- Same about installed fonts and screen resolution.
In fact there are more computers in the LAN with almost the same setup (OS/browser/plugins). And they share the same IP. How were they able to identify the correct one over a two weeks period of time beats me.
Only disabling scripts for the google.com domain solved it, at the expense of not being able to read the comments.
I've visited panopticlick's site and nothing shown in the table is unique to the point of tracking an individual user over time. However the page says the fingerprint is unique among 4 million, which is a number greater than the population of the geographical area (also in the order of millions).
Questions:
- Is there anything in the browser unique enough to identify a machine inside a large city, or is it the combination of several characteristics what makes identification possible?
- Are Google's fingerprinting methods known? Has anyone conducted an analysis of their scripts?