0

Is it possible to make browsers verify that index.html matches some checksum?

Context:

With subresource integrity, you can specify SHA hashes for URLs, so that you know that you are getting the correct javascript and css files even when they are pulled from a CDN that doesn't use HTTPS. See:

https://developer.mozilla.org/en-US/docs/Web/Security/Subresource_Integrity

<script src="https://example.com/example-framework.js"
        integrity="sha384-oqVuAfXRKap7fdgcCY5uykM6+R9GqQ8K/uxy9rx7HNQlGYl1kPzQho1wx4JwY8wC"
        crossorigin="anonymous"></script>

It would be nice if something equivalent can be done with index.html. If e.g. DNSSEC can be used to deliver a text record with a hash of index.html then it's possible to get index.html securely, knowing that it hasn't been tampered with, without having to rely on https.

Imagine, a green badge over HTTP!!!

Max Murphy
  • 245
  • 1
  • 3
  • 1
    Who serves the checksum? How do you handle the dynamic pages? – schroeder Feb 25 '20 at 15:23
  • 1
    This assumes that the "index" is static. So for a website like, let's say StackExchange, Facebook, Twitter, YouTube or any other dynamic content, this would not work at all. For what benefit? To avoid using encryption? –  Feb 25 '20 at 15:24
  • 1
    What problem are you trying to solve here? – Teun Vink Feb 25 '20 at 15:29
  • 1
    I believe some sites do use this technique and check for a "clean state" to prevent script injections. So it will generate the hash after pageload and insert it into the DOM. This will then be re-generated and checked when any data is sent from that page. If the two don't match, you know that something has been inserted. (I think this is mainly used for bot detection.) This wouldn't help HTTP become more secure, as HTTPS is mainly used to prevent spying, not necessarily injections. HTTPS actually does nothing to prevent client-side injection. – pcalkins Feb 26 '20 at 18:32

2 Answers2

3

Is it possible?

Yes, in theory, it is possible to define a checksum for the index page of a website. If browser vendors wanted to support such a thing, it could definitely be done. Similar mechanisms for SRI and CSP already exist.

However...

It doesn't make any sense. /index.html pages used to be integral parts of the web, back when it was mostly static. However, most content is generated in a dynamic fashion. For example, if you look at the "index" page generated for this website, you will see that all the dynamic content from the Stack Exchange network, such as your avatar, your username, your reputation and questions currently being asked, are all generated server-side.

That means when you visit https://security.stackexchange.com, you will be delivered a different document than me, or anyone else.

This alone renders your approach completely infeasible. But there is more.

It doesn't make sense

Subresource Integrity solves a particular problem, and you seem to fundamentally misunderstand this problem. You said:

With subresource integrity, you can specify SHA hashes for URLs, so that you know that you are getting the correct javascript and css files even when they are pulled from a CDN that doesn't use HTTPS.

But the goal isn't to replace HTTPS. The goal is to prevent a bad actor from replacing the legitimate jquery.min.js with a malicious version. If your site specifies how jquery.min.js should look like, then your browser would refuse to execute a malicious version. It was created in a response to the "single point of attack" we created by putting all our JavaScript into one domain.

Furthermore, it doesn't solve any problem at all. You said

Imagine, a green badge over HTTP!!!

but that isn't even a desirable goal. HTTP is fundamentally insecure and should die. It should have died last decade, for all that security is concerned with. HTTPS solves all the problems HTTPS would need to solve. And with Let's Encrypt, it's literally free and takes five minutes to get a trusted certificate for HTTPS. There is no reason to keep HTTP around even longer.

  • *"The goal is to prevent a bad actor from replacing the legitimate jquery.min.js with a malicious version."* - I would focus on a more specific attack: maintaining the integrity of a third party resource without having control of what the third party actually does. One usually has control of its own resources so SRI is not that needed there. But one has usually no control over third party resources even though they have the same permissions as own resources if included. This gives the necessary control back. – Steffen Ullrich Feb 25 '20 at 15:45
  • 2
    Re: index.html and dynamic content, index.html can be completely static but load content dynamically. A static index.html doesn't mean boring websites. :-) – Max Murphy Feb 25 '20 at 16:29
  • 1
    Re: HTTP/HTTPS: I am concerned with protecting end users even when one of the servers is compromised. If a server is compromised, HTTPS doesn't buy anything any more. If the browser can validate the data it receives independently there is a limit to how much damage the server can do. – Max Murphy Feb 25 '20 at 16:32
  • "The goal is to prevent a bad actor from replacing the legitimate jquery.min.js with a malicious version." - this is easy if we can first prevent the bad actor from replacing index.html with a malicious version. – Max Murphy Feb 25 '20 at 16:35
  • 1
    @MaxMurphy If your web server is compromised, it's very likely that the DNSSEC entry can be compromised in the same fashion. Further, if you do everything "interesting" dynamic, an attacker could just compromise the dynamic data that is not covered in the hash. –  Feb 25 '20 at 16:44
  • I don't think the fact that it is dynamic is particularly important. If you really want, you could create a hash on the fly after generating the content. It strikes me that something like that has been done for dynamic JSON resources, but I don't remember where (I could be making it up). The second half of this answer is spot on: doing this for index.html just doesn't make sense. – Fire Quacker Feb 25 '20 at 17:35
  • @FireQuacker Why does it not make sense? If index.html is compromised, it's game over. If there is one thing to secure, it's that. But if you can secure index.html, you can use that to bootstrap everything else. – Max Murphy Feb 26 '20 at 09:40
  • @MechMK1 Our DNSSEC is much, much better protected than the web servers and has a smaller attack surface. The web servers are scattered around the world, so one rogue employee in a datacentre could compromise a server, one HeartBleed like bug could let a remote attacker kill a server via ssl. In contrast the DNSSEC entries are closely held and well protected. There is no comparison, in this case. For some amateur developer with all their keys on a single web server, sure, knock out the box and you are done. – Max Murphy Feb 26 '20 at 09:47
  • 1
    @MaxMurphy You conveniently ignored my point. If I take over the web server, I am in the position to inject anything I want into dynamic data. –  Feb 26 '20 at 10:08
  • I already have a solution for the dynamic data so I'm not worried about it. Once I have bootstrapped the browser with trusted code, I can make the client validate dynamic data because the data I am pulling (stored in git) is signed, not by the server. So the server can do whatever it wants, it can't produce a fake commit, so it can't produce tampered dynamic data. – Max Murphy Feb 26 '20 at 11:01
  • @MaxMurphy So explain to me, how would you implement something like a user-login? –  Feb 26 '20 at 11:38
  • In my application that is not relevant; I am concerned with integrity of the code and artefacts that can be pulled from the site, there is no meaningful private data. For some other sites, sure, so we can talk hypotheticals if you like. I think the key piece that you would be interested in is that I would set up separate auth servers (auth server != web server, as web servers are a security nightmare whereas with an auth server you can minimise attack surface and afford to pay a one-time performance cost for security). Again, there is nothing the web server can do to inject bad responses. – Max Murphy Feb 26 '20 at 12:20
  • But these are hypotheticals and not relevant. – Max Murphy Feb 26 '20 at 12:20
  • @MaxMurphy If you are this convinced that it will be an amazing success, make a Proof-of-Concept and suggest it at a conference –  Feb 26 '20 at 12:56
  • @MaxMurphy How do you bootstrap the browser with trusted code? That is basically the hard part of all of this. If you already have a method to bootstrap the browser with trusted code then I feel like you've already solved your problem - after all, you now have a trusted application in the browser. It can generate its own html file client-side. That's effectively what modern front-end frameworks do anyway – Conor Mancone Feb 26 '20 at 13:17
  • @ConorMancone. That is exactly the question. The key unanswered part is how to get the browser to get a verified index.html. If that can be done, all else is easy. This is reminiscent of domains on the darknet, where intermediaries cannot be trusted so the domain name is cryptographically bound to the destination. Can it really be that there is darknet tech that could be really useful for securing the "normalnet"? :-) – Max Murphy Feb 26 '20 at 15:37
  • @MaxMurphy Ah, I thought you were saying that you already had figured out how to load the browser with trusted code. – Conor Mancone Feb 26 '20 at 15:58
  • One use case for this might be to verify the integrity of static web pages that perform some function using client-side javascript cryptography. See my answer below. – mti2935 Feb 27 '20 at 11:48
1

I can imagine a few good use cases for this. For example, a web page like https://coinb.in/#newAddress, which lets the user create a new bitcoin address, along with the corresponding private key, using client-side javascript-based crypto running in the web browser.

This is a handy tool, and there is no reason why this page should not be static. But how can the user trust that the newly generated private key is not sent back to the server? There is a statement at the bottom of the page, that reads, This page uses javascript to generate your addresses and sign your transactions within your browser, this means we never receive your private keys...' but how can the user trust this?

This is the familiar chicken-and-egg problem with browser cryptography. If you can't trust the server with your secrets (the bitcoin private key), then how can you trust that the code that the server is serving is not malicious (and will steal the bitcoin private key)?

One way to solve this problem might be for a trusted reviewer to review the source code, then post an attestation on his (https) web site (or sign the attestation using his pgp key), saying 'I, [trusted reviewer], have reviewed the source code for the web page at https://coinb.in/#newAddress, with the SHA256 checksum xxxxx, and I have verified that this source code does not contain malicious code.'

But, even if the source code for the page has been reviewed by someone that the user trusts, and the user is able to verify the authenticity of the attestation by the trusted reviewer - how can the user be sure that the source code for the page is in fact static, and that the source code has not changed since the trusted reviewer reviewed the code? In other words, how can the user be sure that the code that is currently loaded in his browser is the same as the code that the trusted reviewer reviewed?

This is why it would be nice, as the op alluded, if the web browsers provided a way for the user to view a hash-based checksum for the page that is currently loaded. This way, the user could view the checksum of the currently loaded page, verify that it matches the checksum posted in the attestation made by the trusted reviewer, then rest assured that the page does not contain malicious code. But, (as far as I know) there is no feature in any of the mainstream browsers that shows the checksum of the currently loaded page. As a workaround, the user could load the page, then save the source code of the page to their system, then use a tool like SHA256SUM to take a checksum of the saved file, verify that it matches the checksum in the attestation by the trusted reviewer (similar to the way that one would verify the integrity of a downloaded iso file from the web), then proceed to use the page.

Of course, this would require that all supporting files (e.g. javascript files and css files) are referenced using subresource integrity (otherwise, code in these files could change without the code in the root document changing).

Related:

How To Prove That Client Side Javascript Is Secure?

What’s wrong with in-browser cryptography in 2017?

Javascript crypto in browser

Problems with in Browser Crypto

mti2935
  • 19,868
  • 2
  • 45
  • 64