28

This method which I am talking about can improve caching of images, videos, and CSS by the ISP rather than just depending on the browser cache. And it also proves the validity of the sender. Is there any reason why this semi-HTTPs not considered?

Cost of assymetric signing is one thing I can think of. But if we group chunks of static content and calculate sha256 checksum in batches, validating the signature in the browser can be a better trade off than the end-to-end network cost for each request not in the browser cache.

TRiG
  • 609
  • 5
  • 14
kalyan
  • 447
  • 4
  • 6
  • 5
    You want to spend lots of CPU time calculating these hashes? Also, how do you intend to verify that you received the correct hash(es) from the server? How will you "decrypt" using a hash? Hashing is not encryption. – Mark Buffalo Mar 12 '17 at 04:22
  • Hash will be signed by private key of the site. So the cert of the site can also be cached which has public key. Now hash of static content is validated by the public key provided the cert is trusted. Reg adult content or dynamic content HTTPS can be used. HTTPS and this hashing technique can coexist but HTTP and HTTPS object can't coexist in same page – kalyan Mar 12 '17 at 04:33
  • 11
    I think you're basically talking about the so-called "NULL" cipher suites like `TLS_RSA_WITH_NULL_SHA256`. They're highly discouraged and I've only ever seen them actually implemented by accident. – Scovetta Mar 12 '17 at 04:49
  • Yah similar to null suites. Null cipher suites are discouraged by openssl as it doesn't provide confidentiality. But static content can be visible until the validity of sender is proved – kalyan Mar 12 '17 at 05:05
  • 6
    @MarkBuffalo I think you misunderstood the premise behind the question. Also, considering what TLS does already, calculating the hash of transferred data is trivial. – Luke Park Mar 12 '17 at 05:43
  • 3
    You might have a look at the [http signatures draft](https://datatracker.ietf.org/doc/draft-cavage-http-signatures/). But this one is still in the drafting process for nearly 4 years now. – Steffen Ullrich Mar 12 '17 at 09:39
  • I know the context is for browsers, but how about applications, mainly those that use end-to-end encryption? Can those use HTTP for data transport? What are the issues of doing that? MEGA, as an example, always uses HTTP when possible. – Gustavo Rodrigues Mar 12 '17 at 15:08
  • @MarkBuffalo If the CPU time argument doesn't work for TLS why do you think it would work for something easier than TLS? – user253751 Mar 13 '17 at 21:05
  • @Scovetta Well that's because it's assumed that anyone using TLS wants all the security of TLS and therefore using the null cipher is a mistake. But the question is asking: what if you don't want all the security features of TLS? (Also I'm not sure whether TLS with null ciphers still provides authentication or not; I'd have to look it up. The proposed scheme *would* provide authentication without encryption) – user253751 Mar 13 '17 at 21:06
  • I think the biggest problem by far with this is the replay attacks John Wu points out. I don't want attackers replacing parts of a website I access with other parts of it; could potentially cause serious harm. – sudo Mar 14 '17 at 06:03
  • Agreed substitution attack is very much possible @sudo – kalyan Mar 14 '17 at 06:05
  • 1
    Why do people keep looking for contrived reasons to *not* use HTTPS? It's usually both the best and easiest path! *(autistic screeching)* – xDaizu Mar 14 '17 at 07:50
  • @xDaizu nothing wrong in discussion I feel than having a preconceived notion – kalyan Mar 14 '17 at 09:25
  • HTTPS is also about privacy. Digital signatures don't assure that. – user207421 Mar 14 '17 at 10:17

7 Answers7

55

Why do we need HTTPS for static content? If we can have a checksum at the end signed by the private key, won't that prove the validity?

I think you're setting up a strawman with that question. We don't in fact need HTTPS for static content, and the purpose of HTTPS isn't just to prove the validity of the content. That's just one of several purposes. The move to switch a huge number of sites to HTTPS (even those serving harmless static content, such as wikipedia) in the last couple of years didn't primarily happen because people were worried about getting the wrong content; it's because people were worried about three-letter-agencies spying on users; e.g. the large-scale move to HTTPS happened primarily for privacy reasons (see for example RFC 7258, Pervasive Monitoring is an Attack.)

Your idea of using a signed checksum is already in production all over the internet: Software which you download is often verified like that. Package managers / update systems of most operating systems do this, and individual software projects do it on a smaller scale by publishing pgp / gpg signatures along with their software downloads.

This all works irrespective of whether these downloads are delivered via https or not, although https is often used in addition.

You're suggesting to add a third protocol besides http and https, maybe one called httpv for "verified", that builds content verification into the protocol but leaves out the rest of ssl.

I agree there would be an advantage to serving some static content in the clear so it can be cached. But this is not an option if you're worried about privacy issues in light of the intelligence community's programs to spy on all our communication.

Any particular reason why this semi HTTPs not considered?

So I'd assume that your third protocol can't gather much steam because

  1. there already are solutions which work in place when we really need to verify content, and

  2. with so much of the internet now becoming encrypted to guard our privacy, it seems like there wouldn't be much use for another protocol that didn't protect against spying.

Out of Band
  • 9,150
  • 1
  • 21
  • 30
  • 3
    Makes lot of sense with respect to privacy. Ppl can figure out who saw which videos or images. Privacy concerns reduce the use cases. – kalyan Mar 12 '17 at 12:08
  • 4
    You really should reference [Internet best current practice (BCP) 188](https://tools.ietf.org/html/bcp188). Also known as [RFC 7258](https://tools.ietf.org/html/rfc7258). In the words of its title, *Pervasive Monitoring Is an Attack*. From the bottom of its section 2: "current capabilities permit some actors to monitor content and metadata across the Internet at a scale never before seen. This pervasive monitoring is an attack on Internet privacy. The IETF will strive to produce specifications that mitigate pervasive monitoring attacks." You don't get a much clearer policy statement in a RFC. – user Mar 13 '17 at 21:33
16

Your suggestion is to basically split HTTPS up into two things: Signing-Only and Encryption: Signing-Only prevents an active man in the middle from injecting their own content - such as script tags - and Encryption would protect sensitive data.

But who would decide what is sensitive data and what is not? This seems time-consuming and error-prone.

You could of course automatically declare images, CSS, videos, and JS files as non-sensitive. But are they?

  • JS files are used to exchange data
  • Images and videos can contain highly sensitive data
  • CSS files may provide insight into what specific page a person is viewing
  • All requests may contain sensitive data in the HTTP header reply (such as cookies being set)

Your idea would also require some other changes:

  • What about HSTS? It forces all traffic via HTTPS and is strongly recommended. Would your sign-only HTTP count? How would that work out in practice?
  • What about Usability? The browser interface would need to indicate to the user what "mode" is currently used. And it would require further education of the user about what the modes mean, and when which mode is appropriate. This will lead to a lot of confusion (users don't even understand all elements of the current interface, such as the padlock or the various warnings).
  • You would somehow need to signal to the ISP caching sever that you are requesting this file. You can't just have the request to the server via HTTP, because that would leak cookies and other sensitive data. Or you would need to specify that cookies are never to be send to these non-sensitive files. But how would you do that reliably? Sure, they could be served from a different domain, but that requires some major re-design of existing websites. And what if those non-sensitive files should be protected by some sort of authentication?

Apart from that, the benefits of this approach seem low. From what I am reading, ISPs rarely cache anything anymore because CDNs take care of that already.

tl;dr: The approach would violate the privacy of users and would introduce security issues because of usability problems for the end-user as well as the developer.

tim
  • 29,018
  • 7
  • 95
  • 119
  • 1
    Privacy makes lot of sense. Very common CSS bootstrap or jQuery can fit in. But privacy concerns reduce the use case as referer can be visible in HTTP header – kalyan Mar 12 '17 at 12:09
  • Fun fact: there are already SSL cypers that provide [authentication without encryption](https://www.rfc-editor.org/rfc/rfc4785.txt). Browsers allow these though, because these could be used to give the user a false sense of confidentiality. – user2428118 Mar 12 '17 at 17:35
  • @user2428118 You meant to say "Browsers DON'T allow these though", right? – Martijn Heemels Mar 13 '17 at 11:23
  • @MartijnHeemels Yes – user2428118 Mar 13 '17 at 12:15
  • 3
    @kalyan "A user agent MUST NOT send a Referer header field in an unsecured HTTP request if the referring page was received with a secure protocol." ([RFC 7231 § 5.5.2](https://tools.ietf.org/html/rfc7231#section-5.5.2)) – suriv Mar 13 '17 at 13:49
  • 1
    +1 I'm a bit shocked that, out of so many answers, you're the only one who mentioned cookie-HTTP-headers. It's BY FAR the most significant reason to use HTTPS. – Radu Murzea Mar 14 '17 at 12:18
  • @RaduMurzea By definition, static public data doesn't need any cookie. – curiousguy Jun 18 '18 at 13:57
13

Implementing some sort of checksum would work, yes. But using TLS is much easier.

It's also generally advised to use standard protocols unless you have a really good reason not to.

Finally, https provides a variety of other advantages. Within the bounds of the CA system, you know that you're receiving the file from whom you think you are. And it provides privacy for your users, preventing observers from seeing what specific urls they request.

Xiong Chiamiov
  • 9,384
  • 2
  • 34
  • 76
  • 8
    But static content which could be cached by ISP is not possible with HTTPS. And HTTP and HTTPS staying together in same page is not possible. It makes sense because validity of sender can't be verified. Signing checksum will verify the validity. Privacy part of dynamic part can be preserved by allowing HTTPS and digital signature of static content coexist together. I am thinking why this technique is not considered or is it considered? If so what are the disadvantages? – kalyan Mar 12 '17 at 08:34
  • 1
    Also, TLS is of course... well, **TL**S. – CompuChip Mar 12 '17 at 13:59
  • 6
    @kalyan I have enough trouble with browsers that don't respect cache settings and hard refreshes well. ISPs caching my static content would be a **nightmare**. – ceejayoz Mar 13 '17 at 16:36
  • @ceejayoz Agreed. But HTTP contents are even now cached by ISP and institutes. Squid proxy or other forward proxies do that – kalyan Mar 13 '17 at 17:03
  • @kalyan All the more reason to go full-HTTPS for everything. – ceejayoz Mar 13 '17 at 17:07
  • @kaylan the ISP will quickly lose all their customers if their cached content no longer matches the checksum sent by the origin server. – user253751 Mar 13 '17 at 21:07
  • @ceejayoz: That could be resolved by incorporating a hash into the URL. The typical usage scenario I can see for authenticated static content would entail an https server that knows exactly what the static content should be, and should thus be able to include a hash within the URL. If the content needs to change, the https server should know that and supply a different URL for it, which would render the previous cached content irrelevant. – supercat Mar 13 '17 at 22:43
  • 1
    No need for ISP caching. CDNs already won. – sudo Mar 14 '17 at 05:58
13

There are a few security issues I can think of.

Replay and substitution

There is nothing preventing a man in the middle from replacing a signed resource with another signed resource (captured previously). For example, I might be able to make a green GO button appear where a STOP button should be, causing you to drive off a cliff. Or I could make it look like your cash transfer failed so that you submit it again and again and I get paid multiple times.

If some of a site's static content is a Javascript file, perhaps I can swap it with a different Javascript file the same site. If I'm clever maybe I can find one that has the same function names but does different things, or lacks certain validations.

Redirection

I could intercept your request for an image and reply with a 302 redirect to a URL whose querystring parameters comprise a XSRF attack.

Loss of privacy

A hacker monitoring your traffic may be able to determine what pages you are visiting by examining the pattern of http requests for static resources.

DoS via cache poisoning

A hacker, working a man in the middle, substitutes an invalid file for one of the responses. The proxy caches the file. Browsers attempting to use the cached resource will find that it fails validation and will display an error, with no easy means of bypassing the problem.

P.S. Performance issue

Also, there is a performance issue-- resources that are subject to a checksum will not be able to be progressively rendered, because the entire BLOB is required to compute the checksum. So your images will appear slower and your movie won't start until the whole thing has made it through the pipe.

John Wu
  • 9,101
  • 1
  • 28
  • 39
  • 2
    +1 Good answer. Regarding 1) "replay" is a bit misleading, I think just "substitution" would be clearer 2) I would assume that headers are somehow signed as well (maybe separately). Otherwise, there would be a whole lot of problems 4) I would assume most mitm sit between the user and the ISP, but here they would have to sit between the ISP and the server. Still good point though. – tim Mar 12 '17 at 20:15
  • Nice answer. Regarding DOS via poisoning , even now in https DOS via man in the middle with self signed certificate or impersonation is possible. But substitution part makes a lot of sense – kalyan Mar 13 '17 at 03:29
  • 2
    "maybe I can find one that has the same function names but does different things, or lacks certain validations" -- most dangerously, use the version of the same file before they fixed some critical security flaw. If replay is possible then bugfixes (to the people being replay attacked) are impossible. Of course this is an issue for cached content in general, and the digital signatures could have expiry dates and so on. But you need a pretty high confidence that it's a good idea, before you let a third party cache something for which you'd normally rely on https for security. – Steve Jessop Mar 13 '17 at 10:03
  • How about making a hash of the signature be part of the URL? That would limit the usage cases to those where the entity serving up the URL knew what content was expected, but if the hashing function is good it should avoid any caching issues and block any substitution attacks. – supercat Mar 13 '17 at 22:47
  • 1
    You could maybe deal with the performance issue by signing parts of the file rather than the entire thing. More hash checking is bad, but at least you'll be able to start loading resources before they're fully received. – sudo Mar 14 '17 at 06:07
12

This is partly solved by subresource integrity. You serve the main page over HTTPS, and this includes the subresource metadata/hashes, the subresources can then be fetched over HTTP to be cached by ISP proxies or over HTTPS to be cached by a CDN and you can be confident that the resource is untampered by the intermediaries as long as the hash matches.

However, browsers still considers subresources delivered this way to be mixed content because any model that permits ISP caching lacks confidentiality. Instead the model that's being pushed nowadays is for site owners to set up a CDN to do edge network caching, thus edge caches are paid for by the site owner, rather than by the users. This also bode better with the ISP as dumb pipes model of network neutrality.

If we can have a checksum at the end signed by the private key, won't that prove the validity? ... Any particular reason why this semi HTTPs not considered?

Probably because it's trivial to strip the signature and you'll fall back into unsigned contents. The SRI model, on the other hand, requires the main document to indicate when the subresource is expected to be secured.

Lie Ryan
  • 31,089
  • 6
  • 68
  • 93
  • 1
    Subresource integrity is pretty much similar to the above strategy. But yah confidentiality and net neutrality are an issue – kalyan Mar 12 '17 at 16:37
  • 2
    Subsource integrity is for validating that the resource was not altered by *anyone*, including the legitimate owner of the resource. Has nothing to do with interception and is not interchangeable with https in any way. Indeed, all the examples in the [link](https://www.w3.org/TR/2015/WD-SRI-20150707/) you provided actually use https. – John Wu Mar 12 '17 at 20:19
  • 1
    +1 for mentioning subresource integrity which is something that is used irl and very close to what OP asked for – Matti Virkkunen Mar 13 '17 at 11:35
  • @JohnWu if the owner of the resource is also the owner of the page including it, then the owner can modify it by also modifying the checksum. – user253751 Mar 13 '17 at 21:09
  • 1
    Not sure what your point is immibis. If the owner of the resource is also the owner of the page, then it is not a third party resource, and SRI isn't needed. – John Wu Mar 13 '17 at 22:51
1

It should be understood that the browser vendors have had nothing but bad experiences with "middleboxes", and they have settled on end-to-end encryption as the best way to prevent them from interfering. This is why, for instance, they refuse to implement cleartext HTTP/2.0 . For the same reason, anything along the lines of your idea is very unlikely to be adopted, ever.

zwol
  • 647
  • 1
  • 4
  • 12
  • I bet this is the real reason. Nothing to do with browser vendors being dissatisfied with non-encrypted connections. – user253751 Mar 13 '17 at 21:10
1

Techniques exist to sign static website content, especially HTML, e.g.:

Such initiatives have not yet become popular. I suspect this is for three reasons:

  • they are fiddly to implement;
  • the cultural shift towards more secure web development (of which widespread use of HTTPS is a facet) was still in its infancy when they were first proposed;
  • many web developers don't know how to use OpenPGP tools, let alone possess a securely-generated and securely-stored OpenPGP signing key.

Nevertheless, these techniques do in principle constitute an excellent mechanism for verifying the integrity of static resources on websites.

Even if such signing of static web resources were universal, however, there would still be a benefit to implementing HTTPS: confidentiality. Signing alone would not provide this, but the encryption provided by HTTPS would provide it. (At least, HTTPS would provide confidentiality as long as the applicable spec for HTTPS does not have critical bugs, and has been properly implemented.)

sampablokuper
  • 1,961
  • 1
  • 19
  • 33
  • 1
    Why does everyone seem to think "there is still a benefit to HTTPS which is that it's confidential" is a conclusive argument in favour of HTTPS? There is still a benefit to implementing HTTP+signing which is that it's cacheable. – user253751 Mar 13 '17 at 21:13
  • @immibis, I am not sure why you posted that question on my answer, as my answer does not state that confidentiality amounts to a conclusive argument in favour of HTTPS. Perhaps you are confusing me with somebody else? – sampablokuper Mar 15 '17 at 02:40