To avoid BREACH, can we use gzip on non-token responses?

Question

I work on a site that has a web interface an an API. I'm trying to determine if we can safely use gzip, or if that will open us to BREACH.

The site says:

If you have an HTTP response body that meets all the following conditions, you might be vulnerable:

Compression: Your page is served with HTTP compression enabled (GZIP / DEFLATE)

User Data: Your page reflects user data via query string parameters, POST..

A Secret: Your application page serves PII, a CSRF token, sensitive data...

Another site says:

It requires the attacker to be able to read the size of encrypted traffic and perform CSRF requests at will

We use a web framework that automatically adds a masked (different every time) CSRF token to every form that uses POST.

We have pages (eg search results) that reflect user input, but contain no sensitive data (the form to search uses GET).

We have an API endpoint that serves an API token, but it's a different value every time because it's cryptographically signed with a timestamp.

The majority of our responses are large JSON bodies that would benefit a lot from gzip.

Can we turn on gzip everywhere? Is it sufficient to turn it off when responding with an API token? Or can we just not have nice things?

the issue is compressing html interfaces, not api responses. — dandavis, Oct 31 '17 at 23:30
@dandavis http://www.apiacademy.co/can-your-api-be-breached/ shows a scenario where they say an API could be vulnerable, and helps me understand the attack much better. — Nathan Long, Nov 01 '17 at 13:43

Nathan Long · Answer 1 · 2019-08-27T14:42:11.970

I'm answering my own question because I think I now understand BREACH and how to prevent it. I'd love feedback.

How BREACH works (as I understand it)

(Expanding on an explanation here that helped me.)

Suppose you're an attacker. You are signed into a service as yourself. You notice that there's a search endpoint, and if you send the search term rabbits, you get back a response like this:

<SearchResponse>
  <AuthToken>d2a372efa35aab29028c49d71f56789</AuthToken>
  <SearchTerm>rabbits</SearchTerm>
  <Results>
    <Result>rabbits rock</Result>
    <Result>yay rabbits</Result>
  </Results>
</SearchResponse>

You also notice that the response is gzipped and encrypted (HTTPS).

You try searching for a string that's formatted like the <AuthToken value, likeaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa. The response is:

<SearchResponse>
  <AuthToken>d2a372efa35aab29028c49d71f56789</AuthToken>
  <SearchTerm>aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa</SearchTerm>
  <Results>
  </Results>
</SearchResponse>

There are no results for this. You then modify your search term slightly:

<SearchResponse>
  <AuthToken>d2a372efa35aab29028c49d71f56789</AuthToken>
  <SearchTerm>d2a37aaaaaaaaaaaaaaaaaaaaaaaaaa</SearchTerm>
  <Results>
  </Results>
</SearchResponse>

As you hoped, something interesting is happening. Because the search term is nonsense, the <Results> are always the same: empty. The only thing chaging is the <SearchTerm>. And because of compression, the more the <SearchTerm> value resembles the <AuthToken> value, the smaller the response is.

This is because of how gzip compression works: it removes repetition when compressing, and restores it when decompressing. The more repetitive the input, the smaller it compresses.

You search again, using the exact value of the <AuthToken>.

<SearchResponse>
  <AuthToken>d2a372efa35aab29028c49d71f56789</AuthToken>
  <SearchTerm>d2a372efa35aab29028c49d71f56789</SearchTerm>
  <Results>
  </Results>
</SearchResponse>

This time you make a note of how small the response is. Now you know that any time the response is this size, it means the search term matched the auth token exactly.

Now, because these are your requests, you've been able to read them directly. If you could do a MITM attack on another user of the site (eg, by running a rogue router), you'd be able to see the size of the encrypted response, but not the actual contents.

You think to yourself: if I can trick someone else into sending the search terms I want them to, and if I can see how big the encrypted response is, I can tweak the search term over and over. The closer I get to guessing the auth token, the smaller the response will be, and when it's the size of the response I just saw, I've guessed correctly. Once I know their auth token, I can sign in as them.

If you can somehow execute an XSS attack on your victim, you can get them to make the necessary requests.

Mitigation

This attack would not work if:

The server did not use HTTP compression (like gzip, in our example)
The request could not be made successfully without a CSRF token, which the attacker could not know
The server never put both sensitive data (like an API token) and user-supplied data (like the search term) in the same response
The server never returned the same API token twice (eg, if raw token values were timestamped and signed before sending, the timestamp would ensure the token in the response changed constantly)
The response always contained random-length padding, as @AndrolGenhald pointed out in a comment (although with enough requests, an attacker might separate the signal from this noise)
The request could not be made successfully without a session cookie, and site's session cookie had a SameSite attribute, and the would-be victim was using a browser that understands this attribute so that it understood not to include the cookie with requests originating from another site.

Another common mitigation is to add data of random length to any responses containing sensitive data. — AndrolGenhald, Nov 01 '17 at 14:48
@AndrolGenhald Yep. They say this will be done automatically in TLS 1.3 - https://tools.ietf.org/html/draft-ietf-tls-tls13-21#section-5.4, although padding "wastes bandwidth and breaks caching mechanisms", as https://www.sjoerdlangkemper.nl/2016/11/07/current-state-of-breach-attack/ says. — Nathan Long, Nov 01 '17 at 14:56
thank you very much for this explanation of BREACH. Your text make BREACH easy to understand. — guettli, Jul 14 '21 at 10:23

To avoid BREACH, can we use gzip on non-token responses?

1 Answers1

How BREACH works (as I understand it)

Mitigation