Browser fingerprinting
The issue you are describing is one technique out of many in the field of browser fingerprinting. There are three main types of browser fingerprinting, often related. In order of amount of research, they are:
Device fingerprinting, which aims to distinguish individual users. It focuses on finding unique identifiers on a browser when the IP address cannot be relied upon for identification. This is the most well-researched, with many proof-of-concepts available, and is the hardest to mitigate. In extreme cases, this can detect unique devices regardless of the operating system or browser.
Supercookies, which are persistent identifiers that a website can inject into a browser. They are not true cookies and so are not deleted when cookies are cleared. This can be implemented with clever cache fingerprinting, HTML5 web storage, HSTS or even HPKP, for example.
History sniffing, which involves one website detecting whether or not a visitor has also visited a different, unaffiliated third-party website. This is similar to supercookies, but relies on the third-party site setting one unintentionally. History sniffing techniques can be used to implement supercookies, but the opposite is not always true. This is what you are asking about.
There is also a type of fingerprinting called TCP/IP fingerprinting, also called networking stack fingerprinting or OS fingerprinting. It involves looking at the low-level structure of the networking protocols being used by your computer (what options are set and in what order, ISN and TCP sequence numbers, etc). Avoiding this requires using a "true" proxy such as Tor (VPNs do not work as they use your own networking stack). This technique is limited in the amount of information it can gather. As this is not unique to browsers, I won't go into detail about it. I suggest you look at an example TCP fingerprinting utility such as p0f to learn more about how it works and its capabilities.
Mitigations
In order to prevent a website from detecting whether or not you have visited a given website, the browser must not keep any persistent information between the time a website is visited and another website attempts to confirm the visitation. In many cases, private/incognito mode is not enough, as it only clears specific identifiers such as cookies. The most effective solution is to have an ephemeral data directory. This is the technique used by Tor Browser, which avoids, across browser restarts, the last two fingerprinting techniques, and makes attempts to limit the first through browser patches.
Using one of the numerous "privacy plugins" and custom configurations is not an effective mitigation. In many cases, they actually add to your browser fingerprint by reducing your anonymity set while not removing all, or even most, of the tracking vectors. A complete solution requires a standardized browser running a standard version, with heavy patching to remove various functionality that can be abused for fingerprinting. Tor Browser does this, but even it is not perfect and there are many open tickets for extant fingerprinting vectors, with varying levels of impact. For example, there are multiple open tickets for several different ways to identify what operating system the browser is running on.
The most effective strategy to mitigate browser fingerprinting in general would be the following:
- Use an up-to-date and official version of the Tor Browser.
- Set the "security slider" to high, which disables JavaScript (a common fingerprinting vector).
- Do not modify any configuration settings or install any plugins.
- Restart the browser to clear the fingerprint, as it can only isolate identities between sessions.
This may be necessary even if you only want to avoid history sniffing and do not care about device fingerprinting or supercookies. All other browsers, as far as I know, will keep a certain state even across resetting the profile or invoking private/incognito mode, which can then be used to detect whether or not you have visited a certain third-party site.
Real-world implications
There are various news articles and papers discussing ad networks that have been caught using various forms of browser fingerprinting to track users across sites and users who clear cookies (or browse in private mode). I am not aware of any particular incidents involving the browser cache, but most likely that is because there are other, more robust techniques to use. Ad companies typically use device fingerprinting, with canvas fingerprinting being one of the most common vectors, with more advanced and stealthy techniques such as AudioContext fingerprinting also being used.