43

Say that a web server supports both HTTP and HTTPS. If a browser fetches the same JavaScript with a HTTP GET and a HTTPS GET, and the JavaScript is cache-able, will the browser cache two copies of the same JavaScript?

The reason I'm asking is that if only one copy is cached, would it be possible for an attacker to first trick the victim into downloading JavaScript via HTTP and compromise it along the way, which will result in a cache poisoning attack?

Boann
  • 221
  • 1
  • 6
SamTest
  • 675
  • 5
  • 10
  • 1
    @MechMK1: it is not the same question. The OP clearly has an information security aspect now in the question. – Steffen Ullrich Dec 12 '19 at 09:05
  • @SteffenUllrich Making a new question is not the answer though. Editing the original question and waiting for re-opening is the way. –  Dec 12 '19 at 10:09
  • 1
    @MechMK1 you are correct, but now there are answers on this one. This one now has the use case, making the first part merely context and not the core question. – schroeder Dec 12 '19 at 10:51
  • 2
    @MechMK1, my apologies if I violated the rule again, I see my last question was closed and did not see a way how I can modify and protest, so I opened a new session. – SamTest Dec 12 '19 at 15:50
  • @SamTest The system isn't explained all that well, so I don't blame you for not seeing what you should have done instead. As far as I am aware, the correct procedure would have been to edit your question, and then waiting for the question to be processed in the re-opening queue. –  Dec 12 '19 at 16:27
  • 3
    @MechMK1 "_Making a new question is not the answer though_" Having a edit a question, esp. one that was fine and appropriate in the 1st place, to have to wait until it's unblocked is humiliating. – curiousguy Dec 13 '19 at 00:02
  • @curiousguy I don't consider it *humiliating*, but I agree that this is not an ideal workflow. I would complain about it on Twitter, but I don't use Twitter. –  Dec 13 '19 at 08:49
  • 2
    @MechMK1 Humiliating was probably too strong, but I find it unpleasant, arbitrary, annoying and slightly vexatious. – curiousguy Dec 13 '19 at 08:50
  • @curiousguy Auto-reopen is not implemented because some questions are simply put off-topic, even if OP disagrees. It prevents pointless close-edit-close-edit cycles, that only eat into moderation resources that are already spread thin as it is. –  Dec 13 '19 at 09:04

7 Answers7

69

Resources are cached by their URL, and the protocol (http:// or https://) is part of the URL. Since the protocol differs, the URL must also differ, and you have two separate cache entries.

MSalters
  • 2,699
  • 1
  • 15
  • 16
46

It is perfectly fine if a http:// and a https:// resource provide different data, even if everything but the access method is the same. For example access to http:// will today often result in a redirect response while access to https:// provide the real content. A browser will therefore cache these resources independent from each other.

Steffen Ullrich
  • 184,332
  • 29
  • 363
  • 424
  • 2
    Following your lead, if ```http://example.com/example.js``` is redirected to ```https://example.com/example.js```, and ```example.js``` is finally fetched, will this script be cached under ```http://example.com``` or ```https://example.com```? – SamTest Dec 12 '19 at 23:57
  • 9
    Depending on the exact response code and `Cache-Control` headers, the `http:…` response will be cached as a redirect. The cache entry for the `https:…` response will then separately store the actual JS. On subsequent requests for `http:…`, the browser will check its cache, see a redirect, and then begin a request for the redirect target -- possibly without sending any bytes over the network. In your scenario, the browser would look for the `https:…` JS, which it may serve directly from cache also or go to the network. The above may also be modified by HSTS. – josh3736 Dec 13 '19 at 02:03
  • 1
    I do not think this answers the question: are these resources handled as equal in cache? – Daan Dec 13 '19 at 08:29
  • 4
    @Daan: resources are cached by the URL which was *directly* (i.e. no redirect in between) used to request the cached resource. This means these resources have different cache entries. – Steffen Ullrich Dec 13 '19 at 08:54
  • Theoretically, you could express that the locations are equivalent using `Content-Location` or `Location` headers, and then a cache would be allowed to treat them as the same. I do not know if any common implementations actually do this. – OrangeDog Dec 13 '19 at 11:08
  • 2
    @StopHarmingMonica A user agent's cache is based on the _request_, not the _response_. That might include caching _the fact that the result was a redirect_, but that is not the same as _considering the two URIs as equivalent_. In particular, the relationship is one-way: if resource A redirects to resource B, then a change to resource A won't have any effect on the cache for resource B. – IMSoP Dec 13 '19 at 13:32
  • @IMSoP Sorry, I meant `Link`, not `Location`. No redirects are involved. A user agent's cache behaviour is very much determined by the response headers. – OrangeDog Dec 13 '19 at 14:18
  • 1
    @StopHarmingMonica If a response could cause a user agent to consider two resources as equivalent, that would in itself be a huge security problem. Consider if I write a page at `https://example.com/whatever` that claims, in whatever way you like, to be "equivalent to `https://apple.com`"; then later, I change that page to instead serve malicious content. If the browser simply merged the two cache entries, I could overwrite the user's cache of `https://apple.com` with my malicious content. The response can _narrow_ the cases where a cache is valid, but it cannot _widen_ it. – IMSoP Dec 13 '19 at 14:25
  • @IMSoP yes, doing it across origins would certainly be insecure – OrangeDog Dec 14 '19 at 14:08
8

Summary:

  • The primary cache key for any standards-compliant browser is an absolute URI
  • The absolute URI begins http: for all insecure requests and https: for all secure requests
  • Consequently, a resource fetched securely can never use the same cache key as a resource fetched insecurely

The current standard for HTTP is split across multiple "RFC" documents, with RFC 7234 dedicated entirely to caching, because there is a lot of complexity involved.

In section 2, "Overview of Cache Operation", there is this summary:

The primary cache key consists of the request method and target URI. However, since HTTP caches in common use today are typically limited to caching responses to GET, many caches simply decline other methods and use only the URI as the primary cache key.

This is more formally stated in the first bullet point in section 4, which says:

When presented with a request, a cache MUST NOT reuse a stored response, unless [...] the presented effective request URI (Section 5.5 of RFC7230) that of the stored response match [...]

Section 5.5 of RFC 7230 starts by saying

For a user agent, the effective request URI is the target URI.

A browser is a "user agent", so this is the case we're concerned with here. "Target URI" is defined in section 5.1:

A URI reference (Section 2.7) is typically used as an identifier for the "target resource", which a user agent would resolve to its absolute form in order to obtain the "target URI". The target URI excludes the reference's fragment component, if any, since fragment identifiers are reserved for client-side processing (RFC3986, Section 3.5).

The generic definition of a URI is in RFC 3986, and HTTP-specific concerns take up three pages of RFC 7230. The most relevant part for our purposes is RFC 3986 section 4.1 which defines this grammar for Absolute URIs:

absolute-URI = scheme ":" hier-part [ "?" query ]

Crucially, note that scheme is a mandatory part of any Absolute URI. Since HTTP URIs always use the scheme http and HTTPS URIs always use the scheme https, this means that their absolute URIs, and thus their "primary cache keys" in a browser, can never collide.


Other answers have mentioned ports. RFC 7230, Section 2.7.1 defines http URIs as including an "authority" section, which is defined in [RFC 3986, Section 3.2]:

authority = [ userinfo "@" ] host [ ":" port ]

The port is optional, with RFC 7230, Section 2.7.1 defining the default for the http URI Scheme:

If the port subcomponent is empty or not given, TCP port 80 (the reserved port for WWW services) is the default.

And the following section defining the default for "https":

All of the requirements listed above for the "http" scheme are also requirements for the "https" scheme, except that TCP port 443 is the default if the port subcomponent is empty or not given, and ...

It then follows that:

  • Any HTTP request not on port 80 must include a port number in its absolute URI
  • Any HTTPS request not on port 443 must include a port number in its absolute URI
  • No two requests with different port numbers specified will have the same cache key, since they will have distinct absolute URIs

Thus these URIs would all be cached separately:

The only thing I'm not clear on is whether the browser should, may, or must normalise URIs which explicitly mention the port which would be the default anyway. In other words, whether these two URIs would be cached separately or not:

I can't think of any practical consequence of normalising these to the same cache key, because by the definitions above they are guaranteed to represent the same resource.

IMSoP
  • 3,780
  • 1
  • 15
  • 19
  • 1
    [RFC 7230 section 2.7.3](https://tools.ietf.org/html/rfc7230#section-2.7.3) and [RFC 3986 section 6.2.3](https://tools.ietf.org/html/rfc3986#section-6.2.3) make it pretty clear that browsers _may_ normalize `http://example.com:80/path` to `http://example.com/path`, but seem to stop short of saying that they _must_ do so for the purposes of caching. – Ilmari Karonen Dec 13 '19 at 19:19
  • I did some quick testing. Firefox, Safari, and Chrome all normalize the URL for all purposes. Edge normalizes the URL for caching, but not for display purposes. Internet Explorer is...odd (I couldn't get consistent cache behavior out of it regardless of URL). – Mark Dec 13 '19 at 21:41
  • In addition to having the absolute URI in the key, browsers will be adding the origin of the top frame into the key as well (called double-keyed caching). This is to protect against XS-Leaks. Safari already had double-keyed caching (but slightly differently) for a while. https://www.jefftk.com/p/shared-cache-is-going-away – Buge Dec 15 '19 at 04:47
  • @IlmariKaronen Given that caching is optional to begin with, it seems natural that they only require to differentiate between things that may refer to different resources, while they do not require to necessarily identify all cases of equal resources. – Hagen von Eitzen Dec 15 '19 at 12:43
2

Yes, because they are different network destinations. The tcp port is not shown in the location bar when using the standard port.

Http defaults to tcp port 80. Www.example.com:80

Https defaults to tcp port 443 Www.example.com:443

Even if the domain and ip are the same, the ports are not. From the browser perspective, the browser is communicating with different sites.

UPDATE

The network doesn't affect it as much as the S does in the https. It's a different URI, too.

Jonathan
  • 2,288
  • 13
  • 16
  • 15
    Shouldn't the first line here say "yes", if it's to answer the title? (yes, I realise that the question contains some rephrasing of the title with reversed sense to the title question). – Lie Ryan Dec 12 '19 at 11:34
  • Thank you!. I guess it is the same case at the CDN and reverse proxy level? – SamTest Dec 12 '19 at 15:53
  • 2
    This appears to be wrong. If the caching happened at TC/IP level, you would have issues with SNI (Server Name Indication). Multiple sites can share the same TCP/IP. – MSalters Dec 12 '19 at 16:30
  • 2
    I am 99% sure that caching is keyed on _URIs_ ("Uniform Resource Identifiers"), not network connections. If there was some scenario where HTTP and HTTPS were served on the same port, the two requests would still be distinct, because they are requesting different resources, with different URIs. – IMSoP Dec 12 '19 at 18:31
  • I agree with @IMSoP, at least web browser cache is based on URL. What I am not certain is if protocol HTTPS/HTTP or port 443/80 are considered cache key, and if this behavior is consistent for all browsers and intermediate caches(CDN or reverse proxy). – SamTest Dec 12 '19 at 18:48
  • 6
    I'm downvoting because you're right for the wrong reasons. It's not the port (implicit or otherwise) that makes an HTTP address distinct from an HTTPS address, it's the protocol. – Mark Dec 12 '19 at 21:39
  • 1
    @Mark no, `http://www.example.com:80` is also a different URI (and thus not cache-equivalent) to `http://www.example.com:8080` – OrangeDog Dec 13 '19 at 11:10
  • 1
    @Mark: Well, no, this answer is exactly correct. I would argue that the port is the primary discriminator, though it wouldn't surprise me if a browser _also_ discriminated on protocol (when the port is, for some weird reason, and defying logic, the same) – Lightness Races in Orbit Dec 13 '19 at 11:46
  • 1
    @LightnessRaceswithMonica Caching is an HTTP-level concept, not a network-level one, and defined in terms of URIs, not network destinations. The port is relevant _only because of its interaction with URIs_. I've just posted an answer with full references, so we don't need to speculate. – IMSoP Dec 13 '19 at 13:18
  • 1
    @IMSoP We never needed to speculate, because it's plainly obvious - see StopHarmingMonica's example. But I'm glad that the references you've found back this up. :) – Lightness Races in Orbit Dec 13 '19 at 14:15
  • 1
    @IMSoP I'm actually not sure what you were saying in your comment. Sure, a URI doesn't _have_ to describe a resource that's on a network. But, nobody claimed that it did. – Lightness Races in Orbit Dec 13 '19 at 14:17
  • @LightnessRaceswithMonica I meant we don't need to speculate about whether it's the port or the scheme that makes a difference, or how different browsers might work, we can look it up in the spec. And the answer is that the scheme is always part of the URI, so will always make a different cache entry, regardless of anything to do with ports. The mention of "network-level" was referring to the first line of this answer, which incorrectly states that the cache will vary "because they are different network destinations". – IMSoP Dec 13 '19 at 14:19
  • 1
    @IMSoP but it has nothing to do with what's network-level. Both protocol and port are part of the URI. They both make a difference. – OrangeDog Dec 13 '19 at 14:21
  • @StopHarmingMonica I agree. The question we are commenting under states otherwise. – IMSoP Dec 13 '19 at 14:25
  • @IMSoP Well, if they're different network destinations then they have different URIs ;) But, if we're to nitpick a bit, I agree that the statement is perhaps misleadingly narrow. Anyway I think we're all fundamentally in agreement. – Lightness Races in Orbit Dec 13 '19 at 14:25
  • Let's say HTTPS was implemented by some sort of `STARTTLS` style or `Upgrade` header procedure on port 80. Would that make `https://example.com:80/` equivalent to `http://example.com:80/` ? – curiousguy Dec 14 '19 at 08:12
  • 3
    @curiousguy no, those URLs are still different – OrangeDog Dec 14 '19 at 14:11
2

Leaving aside the fact that the spec is quite clear that different URLs should be treated as different resources, don't you think that someone might have noticed and exploited this by now if it were not the case? After all the issues exposed by cookies (and addressed by the "secure" flag) have been known about for 20 years or more.

So the browser must retrieve both URLs. It is conceivable that a cache might retain a single copy of a file downloaded from different sources but accessed via different keys - or that this de-duplication might occur deeper in the filesystem (de-duplication). But this would only happen after the cache (or the filesystem) had determined that the files were the same.

symcbean
  • 18,278
  • 39
  • 73
  • 7
    Well, thanks for your comment, but I can't agree on your first point. Most security vulnerabilities seemed "simple" only after someone found it. For me, I just happened to think this point and curious how it really behaves in practice. – SamTest Dec 12 '19 at 23:44
0

From a server-side perspective the same URL (eg. www.test.com) on different protocols (eg. HTTP vs HTTPS) can use a different file source. So a URL with TLS can output a completely different website than the URL without TLS. This alone makes me think that browsers won't use the same cache for both files.

Martin
  • 101
  • 1
  • 1
    This feels like an exact duplicate of [Steffen Ullrich's answer](https://security.stackexchange.com/a/222648/51961). – IMSoP Dec 13 '19 at 14:06
0

Yes, these are different origins. While it's very likely they would serve the same content, they can technically serve entirely different content. For this reason, the browser is not allowed to treat them as the same resource.

Aaron Cicali
  • 422
  • 2
  • 9
  • Hi, this doesn't really add anything that other answers haven't already said. It's also problematic to use the word "origin" here, because that has a particular meaning in web technologies, and many URLs can have the same origin but still require completely different cache entries. – IMSoP Dec 14 '19 at 15:31