58

The CRIME attack taught us that using compression can endanger confidentiality. In particular, it is dangerous to concatenate attacker-supplied data with sensitive secret data and then compress and encrypt the concatenation; any time we see that occurring, at any layer of the system stack, we should be suspicious of the potential for CRIME-like attacks.

Now the CRIME attack, at least as it has been publicly described so far, is an attack on TLS compression. Background: TLS includes a built-in compression mechanism, which happens at the TLS level (the entire connection is compressed). Thus, we have a situation where attacker-supplied data (e.g., the body of a POST request) gets mixed with secrets (e.g., cookies in the HTTP headers), which is what enabled the CRIME attack.

However there are also other layers of the system stack that may use compression. I am thinking especially of HTTP compression. The HTTP protocol has built-in support for compressing any resources that you download over HTTP. When HTTP compression is enabled, compression is applied to the body of the response (but not the headers). HTTP compression will be enabled only if both the browser and the server support it, but most browsers and many servers do, because it improves performance. Note that HTTP compression is a different mechanism from TLS compression; HTTP compression is negotiated at a higher level of the stack, and only applies to the body of the response. However, HTTP compression can be applied to data that is downloaded over a SSL/TLS connection, i.e., to resources downloaded via HTTPS.

My question: Is HTTP compression safe to use, on HTTPS resources? Do I need to do something special to disable HTTP compression of resources that are accessed over HTTPS? Or, if HTTP compression is somehow safe, why is it safe?

D.W.
  • 98,420
  • 30
  • 267
  • 572
  • 2
    An update to this question is here: [What can be done to protect against BREACH](http://security.stackexchange.com/q/39925/396) (HTTP version of CRIME) – makerofthings7 Aug 02 '13 at 16:21
  • As I understand it only the body of a webpage is deflated. Perhaps the solution is to place the session token in the URL? –  Aug 03 '13 at 17:23

2 Answers2

69

It seems risky to me. HTTP compression is fine for static resources, but for some dynamic resources served over SSL, it seems like HTTP compression might be dangerous. It looks to me like HTTP compression can, in some circumstances, allow for CRIME-like attacks.

Consider a web application that has a dynamic page with the following characteristics:

  1. It is served over HTTPS.

  2. HTTP compression is supported by the server (this page will be sent to the browser in compressed form, if the browser supports HTTP compression).

  3. The page has a CSRF token on it somewhere. The CSRF token is fixed for the lifetime of the session (say). This is the secret that the attack will try to learn.

  4. The page contains some dynamic content that can be specified by the user. For simplicity, let us suppose that there is some URL parameter that is echoed directly into the page (perhaps with some HTML escaping applied to prevent XSS, but that is fine and will not deter the attack described).

Then I think CRIME-style attacks might allow an attacker to learn the CSRF token and mount CSRF attacks on the web site.

Let me give an example. Suppose the target web application is a banking website on www.bank.com, and the vulnerable page is https://www.bank.com/buggypage.html. Suppose the bank ensures that the banking stuff is only accessible by SSL (https). And, suppose that if the browser visits https://www.bank.com/buggypage.html?name=D.W., then the server will respond with a HTML document looking something vaguely like this:

<html>...<body>
Hi, D.W.!  Pleasure to see you again.  Some actions you can take:
<a href="/closeacct&csrftoken=29238091">close my account</a>,
<a href="/viewbalance&csrftoken=...">view my balance</a>, ...
</body></html>

Suppose you are browsing the web over an open Wifi connection, so that an attacker can eavesdrop on all of your network traffic. Suppose that you are currently logged into your bank, so your browser has an open session with your bank's website, but you're not actually doing any banking over the open Wifi connection. Suppose moreover that the attacker can lure you to visit the attacker's website http://www.evil.com/ (e.g., maybe by doing a man-in-the-middle attack on you and redirecting you when you try to visit some other http site).

Then, when your browser visits http://www.evil.com/, that page can trigger cross-domain requests to your bank's website, in an attempt to learn the secret CSRF token. Notice that Javascript is allowed to make cross-domain requests. The same-origin policy does prevent it from seeing the response to a cross-domain request. Nonetheless, since the attacker can eavesdrop on the network traffic, the attacker can observe the length of all encrypted packets and thus infer something about the length of the resources that are being downloaded over the SSL connection to your bank.

In particular, the malicious http://www.evil.com/ page can trigger a request to https://www.bank.com/buggypage.html?name=closeacct&csrftoken=1 and look at how well the resulting HTML page compresses (by eavesdropping on the packets and looking at the length of the SSL packet from the bank). Next, it can trigger a request to https://www.bank.com/buggypage.html?name=closeacct&csrftoken=2 and see how well the response compresses. And so on, for each possibility for the first digit of the CSRF token. One of those should compress a little bit better than the others: the one where the digit in the URL parameter matches the CSRF token in the page. This allows the attacker to learn the first digit of the CSRF token.

In this way, it appears that the attacker can learn each digit of the CSRF token, recovering them digit-by-digit, until the attacker learns the entire CSRF token. Then, once the attacker knows the CSRF token, he can have his malicious page on www.evil.com trigger a cross-domain request that contains the appropriate CSRF token -- successfully defeating the bank's CSRF protections.

It seems like this may allow an attacker to mount a successful CSRF attack on web applications, when the conditions above apply, if HTTP compression is enabled. The attack is possible because we are mixing secrets with attacker-controlled data into the same payload, and then compressing and encrypting that payload.

If there are other secrets that are stored in dynamic HTML, I could imagine that similar attacks might become possible to learn those secrets. This is just one example of the sort of attack I am thinking of. So, it seems to me that using HTTP compression on dynamic pages that are accessed over HTTPS is a bit risky. There might be good reasons to disable HTTP compression on all resources served over HTTPS, except for static pages/resources (e.g., CSS, Javascript).

D.W.
  • 98,420
  • 30
  • 267
  • 572
  • 38
    Wow - congrats on describing the BREACH attack 10 months before it was demoed at BlackHat! http://security.stackexchange.com/q/39925/396 and http://www.kb.cert.org/vuls/id/987798 And for giving a much clearer scenario than I've seen elsewhere.... – nealmcb Aug 08 '13 at 05:34
  • Your last line about leaving HTTP compression on for static resources (CSS/JS), even over HTTPS, makes this more palatable. As long as you aren't doing any tricky file handling and seeing that the static output is not affected by the input it should be safe. – PadraigD Nov 11 '19 at 12:55
  • If the content that can be injected by the attacker is included in the page unescaped, this is comprehensible. But most secure web apps already escape user input (else there's another possible attack, like XSS). So when the secret has some special characters around it (`"` or `&`), then this should be near-impossible because the included escaped input won't be the same as the plain secret. And trying to find the secret without any surrounding pattern is a lot harder because you'd have to start with a batch of characters instead of trying them one by one. Is my understanding correct? – ygoe Sep 15 '20 at 13:02
  • @ygoe, no, I doubt it. XSS prevention often doesn't escape `&`. If it does, I suspect the same attack should work with if `evil.com` triggers a request to `https://www.bank.com/buggypage.html?name=csrftoken=1`. – D.W. Sep 15 '20 at 16:25
  • At least in ASP.NET Core MVC the CSRF token is surrounded by quotes, and some more quotes away from a recognisable code pattern (the field name); and all quotes are always escaped when printed in the page, no matter where (content or input/textarea value). I verified this in my app. – ygoe Sep 15 '20 at 21:58
29

Compression, in general, alters the length of that which is compressed (that's exactly why we compress). Lossless compression alters the length depending on the data itself (whereas lossy compression can reach a fixed compression ratio, e.g. an MP3 file at a strict 128 kbit/s). Data length is what leaks through encryption, which is why we are interested in it.

In a very generic way, a length leak can be fatal, even in the presence of a passive-only attacker; it is a kind of traffic analysis. An example is from World War I, where French cryptographers could predict the importance of a message based on the length of the (encrypted) header: an important message was sent to the colonel (Oberst) whereas less important messages where tagged for a lieutenant (Oberleutnant, a much longer term).

Compression makes length leaks only worse, because it prevents you from fixing the length leaks by normalizing the lengths of the messages.

When the attacker can add some data of his own in the chunks which are compressed, he amplifies the length leak, which can be become a practical attack vector for arbitrary target data, as the CRIME attack demonstrates. However, I argue that the problem was already there. In that view, HTTP-level compression is not a new risk; it is rather an aggravating factor to a pre-existing risk. Letting the attacker add some of his own data in the encrypted stream is yet another aggravating factor, and these factors add up.


I wager that you are not the first one to have this idea. Not only did quite a lot of people (me included) gave some thought to it in the last 10 days, but if you try to access this URL:

http://www.google.com/sdfdfskfdjsdfhfkjsbkfbsjksalakjsflfa

then you get a 404 error from Google, which contains the "sdfdfskfdjsdfhfkjsbkfbsjksalakjsflfa" word. Hey, that's attacker chosen reflected data, that could be fun ! So let's try again with a HTTPS URL:

https://www.google.com/sdfdfskfdjsdfhfkjsbkfbsjksalakjsflfa

and then, no 404, no fun, you are unceremoniously redirected to Google's home page. This makes me thinks that some people at Google already thought of it, too, and proactively deactivated the reflection bit when using SSL (because when using SSL, you get the Google+ bells and whistles, hence potentially dangerous data).

Thomas Pornin
  • 320,799
  • 57
  • 780
  • 949
  • 5
    Would adding a random number of padding bytes in the payload help? Assuming the padding bytes themselves were generated by a cryptographically secure RNG, such that they're unlikely to be compressed, I'd imagine it'd make the attack _much_ harder to pull off reliably. – Polynomial Sep 20 '12 at 09:48
  • 3
    @Polynomial: it can help a bit, but you have to add a substantial number of such bytes. If you add an average of _n_ bytes (with a gaussian distribution), then the attacker must do about _n^2_ as many requests to cancel the effect of your padding. So we are talking about adding at least 1 kB of random padding -- it seems more efficient (for both CPU and network) to disable compression for non-static content. – Thomas Pornin Sep 20 '12 at 12:19
  • 1
    Interesting. 1kB seems a little excessive though - wouldn't 128 to 256 bytes do the job? That forces them to do ~65k times the requests, which is more than enough to make it difficult. – Polynomial Sep 20 '12 at 12:21
  • 4
    @Polynomial: I don't trust numbers less than 1 million to be "high enough". We use computers with CPU in the _gigahertz_ range, and networks which transmits several _megabytes_ per second. Remember that the attacker can often afford to be patient. – Thomas Pornin Sep 20 '12 at 12:29
  • 3
    Good point, no idea why I didn't think of a _patient_ attacker. I've got my "user-friendly" hat on today, rather than my "evil bastard" hat - it must be warping my judgement! – Polynomial Sep 20 '12 at 12:52
  • 4
    @ThomasPornin, Even the HTTPS version https://www.google.com/sdfdfskfdjsdfhfkjsbkfbsjksalakjsflfa shows an 404....... – Pacerier Oct 17 '14 at 14:37
  • What if the padding is not random, but a remainder to make the final payload have the size quantized to some large-ish step? For example, if we make the size of all responses divisible by 1kB. Then if the compressed body is 500 bytes, we will add 524 bytes of padding, or if it was 1000 or 2024 bytes, we will add 24 bytes, etc. Then small changes in data won't affect size *at all*, and replaying exactly the same request would be totally pointless. – Display Name Oct 01 '17 at 10:42
  • That by itself would be insufficient, since an attacker could add his own padding to push you up to near the next block size. Setting a minimum padding amount helps further, and detecting requests with abnormally long padding is relatively easy. The final step would be to randomize the padding, sometimes adding 1-2kb blocks at the end. – Perkins Nov 10 '18 at 19:15