Content hashes to help protect resources being fetched from a CDN

Question

During a conversation in The DMZ, it was suggested that an SHA256 hash could be used to check that content being delivered from a CDN hasn't changed before being executed, similar to how Kim Dotcom's MEGA tried to do recently with CBC-MAC.

The mechanism would be implemented at the browser level, whereby a content hash would be embedded in the link to the content. For example:

<script src="http://example.cdn/jq/jquery-1.2.3.js" hash="sha256:kMufczNYKx9B2A7x7eICQVu18YDzEMqUe3G+h5QSifw=" />

The hash would be provided as part of the site code, so that only content that matches the hash would execute. This would protect the user against cases where the CDN was compromised. It would also offer a way to provide minimal security if operating in mixed-content mode where everything but the CDN operated on SSL.

Are there any flaws in this approach? Would there be any important cases to consider during implementation?

That wasn't just a spontaneous comment on DMZ. I've been thinking about this and related issues in distributed storage quite a lot over the last year or so. I'm also working on a secure tree hash that can verify partial files, and includes some extra information like size. — CodesInChaos, Jan 25 '13 at 14:19
I recommend adding the ability for encryption as well so the public host doesn't know what is being hosted. — Matrix, Jan 25 '13 at 16:24
@Matrix That wouldn't work. If the client can decrypt it, the CDN operators can decrypt it too. Also, HTTPS for transport security is often avoided on CDNs due to performance and caching issues. — Polynomial, Jan 25 '13 at 16:25
@Matrix While you could do that, that seems like a different issue. I wouldn't couple the two issues. This problem is including standard javascript, like jquery from CDNs in a way that doesn't need to trust the CDN. Similar techniques could be applied to advertisement scripts. — CodesInChaos, Jan 25 '13 at 16:38
@Polynomial How would CDN operators know the key? They'd need authorization from the website to load the HTML code containing the remote script, video or img tag. — Matrix, Jan 26 '13 at 10:22
@Matrix If it's encrypted on the CDN server, the client has to know how to decrypt it. Therefore anyone who visits the site has to know the key to decrypt the file, including anyone from the CDN company that visits the site. — Polynomial, Jan 26 '13 at 10:55
@Polynomial Yes, but the website can chose whether to load this content depending on who is accessing it. Signature, encryption and authentication key are only sent to authorized users. Also, referer header is not sent if the website is accessed over HTTPS and CDN is accessed over HTTP. Before the file is saved on CDN the file name can be randomized to not leak any information. This has application beyond the traditional CDN setup. — Matrix, Jan 26 '13 at 13:29
If it were so sensitive, it wouldn't (and shouldn't) be on a 3rd party CDN anyway. — Polynomial, Jan 26 '13 at 14:23
@Polynomial Why? It's just one variant of secure distributed storage where you don't need to trust the storage provider. A server is the closet isn't the ideal solution either. — CodesInChaos, Jan 27 '13 at 13:24
@CodesInChaos I don't see it being a major benefit or use-case. If the CDN company made a single account on the site, they could get access to the key and change the data. Not exactly ideal. And if you think I'm being overly paranoid about how far they'll go, then why do you need to encrypt it on the CDN box in the first place? — Polynomial, Jan 27 '13 at 18:35

score 20 · Accepted Answer · edited Oct 07 '21 at 07:18

Update: There is more information on Subresource Integrity at MDN, which (as of 12/12/16) shows support in Chrome 45+ and FireFox(Gecko) as of 43+

Update: There is a w3c draft called Subresource Integrity describing a feature like this.

It's already implemented in Chromium.

For example:

<script src="file.js" integrity="ni://sha256;BpfBw7ivV8q2jLiT13…"></script>

The basic approach is sound IMO, but there are a few details to take care of:

You should support several hashes on a single tag. The browser doesn't need to validate all of them, validating one collision resistant hash is enough.
Being able to specify the size seems useful to avoid some kind of DoS where your site is fed a huge resource
Unless you're using a tree-hash, you can't verify incomplete files. That's not an issue for 100kB javascript files, but it is for a 5 GB video. So support for tree hashes should be added later on.
I'd use algorithm identifiers matching NI - Naming Things with Hashes, and use urlsafe Base64 without padding
I'd specify SHA-256 as the standard algorithm every browser should support, but allow browsers to add other algorithms. SHA-256 is:
- collision resistant at a 128 bit level
- a NIST standard, implementations are widely available(unlike SHA-3)
- performance isn't great, but still fast enough to keep up with typical network speeds on mobile devices.
IMO SHA-256 is the ideal choice for the default/mandatory algorithm.
You should support all embedded resources, CSS, images, videos etc. and not just scripts
One could consider using NI urls instead, but I prefer the attribute based approach here. An attribute is more flexible and doesn't require cooperation of the target host to implement. NI can only specify a single hash per url.
You could disable mixed content warnings for securely hashed content that was fetched via http
It's a great way to see if your cache is still valid. It's valid if and only if the hash matches. No need for rechecking, dates, etc. This also works if you downloaded the resource from a different url. For example if you already have jquery from google in your cache, you don't need to reload it from another url since the same hash guarantees[assuming collision resistance] that they'll be the same.
There are probably some issues related to authenticating http headers since those influence the interpretation of the resource. For example mime type and charset/encoding are such headers.

So an example might look like this:

<script src="http://example.cdn/jq/jquery-1.2.3.js"
     hash="sha-256:UyaQV-Ev4rdLoHyJJWCi11OHfrYv9E1aGQAlMO2X_-Q; size:103457;
           other-hash: abc..." />

Isn't something along these lines already done for flash `.swz` files? I vaguely recall reading about that one time. — TRiG, Jan 25 '13 at 17:36

score 3 · Answer 2 · answered Jan 25 '13 at 16:11

3

To add to the good points from @CodesInChaos: there was a much older mechanism to support signed Javascript. This comes from the days of Netscape 4 and it is still documented, but it is unclear whether this is still supported in Firefox. Internet Explorer never supported it, although the people at Microsoft toyed with the idea. The system piggybacked on the Jar file format, which came from the Java world.

Your method has the appeal of looking simple to implement; and it could be done in Javascript directly (the order of magnitude of Javascript performance for hashing is roughly 1 MB/s, which should be sufficient for scripts).

One drawback of your system, to be aware of: if you modify the script, you must alter all the pages which reference it with an explicit hash; this can be quite inconvenient in a big site (ready for a search-and-replace over 10000 static files ?). This is where signatures can offer more flexibility.

answered Jan 25 '13 at 16:11

Thomas Pornin

320,799
57
780
949

1

Yeah, but if your big site is using static pages, then you kind of deserve to get what's coming to you. Caching a dynamic system isn't that hard and makes updates like this across a big site pretty trivial. – AJ Henderson Jan 25 '13 at 19:58
@AJHenderson That's a naive view. Web servers like IIS and Apache have quite high overheads for sending small static files, whereas servers like nginx are optimised for serving static files extremely efficiently. Furthermore, configuring the server for high performance on both content delivery, database access, and web applications is difficult. For performance on a large site, it makes sense to split those functions out, so that the system configuration and server daemons can be optimised for a single function. Caching is *already* done on dynamic pages, but some things just can't be cached. – Polynomial Jan 25 '13 at 20:10
2

@Polynomial - If it can be done as a static, it can be done as a template based system, even if you have to generate out the static content. The key is that if you have duplication across a large area that would require updates in a bunch of spots for small changes manually on a large site, there are other issues that need to be addressed. I oversimplified sure, but this isn't a question about proper content management. Not that there aren't many sites out there where this would be a legit concern in their current state. – AJ Henderson Jan 25 '13 at 20:30

score 0 · Answer 3 · answered May 02 '14 at 16:19

The most important issue to address is how to handle a hash mismatch. You have to not accept the content in that case as otherwise you'd lose all the potential security gain. But a hash mismatch might have rather innocent reasons, like resource being served in a fresh wire format/encoding (think a loss-less compressor rewriting a PNG image to get the same pixels using a different representation for slightly better gzip compression). Or a different character encoding or newline handling for your Javascripts that does not maliciously alter them but gives the file a new hash...

To me, the solution is to find sources unlikely to change, e.g. your own CDN, or those public CDNs, like jsdelivr and cdnjs.com, that use explicit versioning rather than serving mutable auto-updating. Using such sources, next add reliability by coding fallbacks to alternate locations.

I know of two implementations of just that, not in browsers themselves, but as Javascript implementations of resource loaders with sha256-based integrity verification:

They also both offer fallback to alternate source URLs, which seems sensible considering that hash-verification introduces an additional failure mode, hash mismatch.

Disclaimer: I am the author of needjs and wouldn't recommend it for production usage quite yet. I also haven't looked into using it for resources other than Javascript and CSS yet.

Content hashes to help protect resources being fetched from a CDN

3 Answers3

Linked