Browser cache information disclosure

Question

Observing the time needed by a browser to load external resources such as images potentially discloses information about whether those resources have been accessed before.

Explanation:

For example, by embedding the StackOverflow logo within a third-party website via <image src="https://cdn.sstatic.net/Sites/security/img/sprites.svg?v=a1e0cd003b7d"> and observing how fast it loads, this third-party could infer whether or not the visitor has recently accessed StackOverflow.

This is due to cached resources having a much shorter load time compared to resources that are not yet in the browser cache and need to be transferred over the internet.

Load times can be observed by several methods. One of them is to attach a DOM onload event handler and measure the duration from requesting the source until the load event fires.

This method has some pitfalls which need to be taken into account:

HTTP response headers might prevent or limit caching
Initial DNS resolution might take some additional time
The resource URL might contain some variable tokens

Example:

However, it usually works pretty well. The following snippet for Chrome allows to verify if and when a user visited pornhub.com [NSFW]:

function cached(src) {
  return new Promise((resolve, reject) => {
    const img = new Image();
    img.onload = (event) => resolve(src);
    img.onerror = (event) => reject(src);
    img.src = src;
    setTimeout(() => img.src = '', 10);
  });
}

cached('https://bi.phncdn.com/www-static/images/pornhub_logo_straight.png?cache=2018011910').then(
  src => console.log('visited between 24.01 and 28.01'),
  src => console.log('not visited')
);

Now, above snippet is not robust and not complete. By adding more resource URLs together with the range of dates when those URLs where requested by the target website, a pretty detailed profile could be created. Of course, this is a rather extreme example to get the point across.

Question:

Inferring information via browser cache is a known issue, so are there countermeasures available to the user or the web application developer to defend against it? How is this issue commonly addressed?
Are there real-life exploits of such information disclosure?

*"so are there countermeasures available to the user..."* - use private mode (aka porn mode) for sensitive stuff - *"...or the web application developer to defend against it?"* - prevent caching by adding the appropriate HTTP cache control headers. — Steffen Ullrich, Jan 28 '18 at 18:52
*"Are there real-life exploits of such information disclosure?"* - are you really asking for exploits (which can be used in attacks) or are you asking if there were real-world attacks exploiting this issue? — Steffen Ullrich, Jan 28 '18 at 18:55
@SteffenUllrich The later, as I want to know how prevalent and practically relevant this issue is. Regarding private mode - websites visited during the same private mode session share the same browser cache, so the problem is not entirely mitigated by that. Disabling caching is a pretty heavy measure. Perhaps random tokens in the URL plus ETag might be a solution, but I haven't found a source confirming that. — le_m, Jan 28 '18 at 19:02
Detection of what websites you are using is demonstrated here: https://browserleaks.com/social however it shows only websites to which you are logged onto — Aria, Jan 28 '18 at 19:10
@Aria IMHO this test works by accessing resources which are accessible only when an authorization cookie is send together with the request, so it can be mitigated by e.g. disabling third-party cookies. The browser cache allows to access other kind of (browsing history) information — le_m, Jan 28 '18 at 19:12
@le_m: even better than private mode is [First party isolation](https://www.bleepingcomputer.com/news/software/another-tor-browser-feature-makes-it-into-firefox-first-party-isolation/). Also note that your attack is essentially destroying or faking evidence: you don't know if the entry was cached because the user was visiting the site or because somebody tried this attack. — Steffen Ullrich, Jan 28 '18 at 19:35
@SteffenUllrich I hadn't heard about first party isolation yet, it indeed prevents exploitation in FF, but needs to be manually enabled. Regarding evidence: If this is an issue, it might be possible to cancel the request before a new cache entry is created, e.g. via well timed `img.src = ''`. — le_m, Jan 28 '18 at 19:43
Note that there are many such fingerprinting techniques that use other methods as well, such as HSTS and redirects. I believe I have working code (JavaScript) which can do this attack, which I'll link to if I write an answer. — forest, Jan 28 '18 at 21:05
@forest true, HSTS fingerprinting is closely related. But AFAIK you can't access HSTS information of third-party domains due to the same origin policy. If there is a way around that, I'd be interested to hear about it. — le_m, Jan 28 '18 at 21:45
The general technique is to use JavaScript to make HTTP connections to third party domains that use HSTS. If the site has been visited, then it will automatically be re-written to HTTPS, otherwise it will receive a redirect and be given the HSTS header for the first time. This can be detected by timing how long it takes to make the connection, for example. I'm not sure if the same origin policy prevents this side-channel, since it's not an explicit access of information on a third-party domain. — forest, Jan 28 '18 at 21:48
i don't think you can defend it because more than the browser caches hot paths; OS, routers, even ISPs — dandavis, Jan 29 '18 at 20:54
@SteffenUllrich It shouldn't matter if he is asking for actual exploits, as long as a certain level of understanding and prior research is shown. Posting exploit code is [not only allowed, but expected](https://security.meta.stackexchange.com/questions/117/should-we-allow-questions-answers-that-appear-to-include-or-request-exploit-code). It doesn't matter that it can "help attackers", because learning from it also helps defenders. This is a reference site, and if a question is high quality, relevant to the site's topic, and can be helpful to multiple people, it is on-topic and allowed. — forest, Jan 30 '18 at 01:58

forest · Answer 1 · 2022-08-06T02:04:20.530

Browser fingerprinting

The issue you are describing is one technique out of many in the field of browser fingerprinting. There are three main types of browser fingerprinting, often related. In order of amount of research, they are:

Device fingerprinting, which aims to distinguish individual users. It focuses on finding unique identifiers on a browser when the IP address cannot be relied upon for identification. This is the most well-researched, with many proof-of-concepts available, and is the hardest to mitigate. In extreme cases, this can detect unique devices regardless of the operating system or browser.
Supercookies, which are persistent identifiers that a website can inject into a browser. They are not true cookies and so are not deleted when cookies are cleared. This can be implemented with clever cache fingerprinting, HTML5 web storage, HSTS or even HPKP, for example.
History sniffing, which involves one website detecting whether or not a visitor has also visited a different, unaffiliated third-party website. This is similar to supercookies, but relies on the third-party site setting one unintentionally. History sniffing techniques can be used to implement supercookies, but the opposite is not always true. This is what you are asking about.

There is also a type of fingerprinting called TCP/IP fingerprinting, also called networking stack fingerprinting or OS fingerprinting. It involves looking at the low-level structure of the networking protocols being used by your computer (what options are set and in what order, ISN and TCP sequence numbers, etc). Avoiding this requires using a "true" proxy such as Tor (VPNs do not work as they use your own networking stack). This technique is limited in the amount of information it can gather. As this is not unique to browsers, I won't go into detail about it. I suggest you look at an example TCP fingerprinting utility such as p0f to learn more about how it works and its capabilities.

Mitigations

In order to prevent a website from detecting whether or not you have visited a given website, the browser must not keep any persistent information between the time a website is visited and another website attempts to confirm the visitation. In many cases, private/incognito mode is not enough, as it only clears specific identifiers such as cookies. The most effective solution is to have an ephemeral data directory. This is the technique used by Tor Browser, which avoids, across browser restarts, the last two fingerprinting techniques, and makes attempts to limit the first through browser patches.

Using one of the numerous "privacy plugins" and custom configurations is not an effective mitigation. In many cases, they actually add to your browser fingerprint by reducing your anonymity set while not removing all, or even most, of the tracking vectors. A complete solution requires a standardized browser running a standard version, with heavy patching to remove various functionality that can be abused for fingerprinting. Tor Browser does this, but even it is not perfect and there are many open tickets for extant fingerprinting vectors, with varying levels of impact. For example, there are multiple open tickets for several different ways to identify what operating system the browser is running on.

The most effective strategy to mitigate browser fingerprinting in general would be the following:

Use an up-to-date and official version of the Tor Browser.
Set the "security slider" to high, which disables JavaScript (a common fingerprinting vector).
Do not modify any configuration settings or install any plugins.
Restart the browser to clear the fingerprint, as it can only isolate identities between sessions.

This may be necessary even if you only want to avoid history sniffing and do not care about device fingerprinting or supercookies. All other browsers, as far as I know, will keep a certain state even across resetting the profile or invoking private/incognito mode, which can then be used to detect whether or not you have visited a certain third-party site.

Real-world implications

There are various news articles and papers discussing ad networks that have been caught using various forms of browser fingerprinting to track users across sites and users who clear cookies (or browse in private mode). I am not aware of any particular incidents involving the browser cache, but most likely that is because there are other, more robust techniques to use. Ad companies typically use device fingerprinting, with canvas fingerprinting being one of the most common vectors, with more advanced and stealthy techniques such as AudioContext fingerprinting also being used.

Browser cache information disclosure

1 Answers1

Browser fingerprinting

Mitigations

Real-world implications

Linked