Brotli compression algorithm and BREACH attack

Question

I spent a lot of time on BREACH attack implementations. In theory, Brotli compression, like other compressions that use lzz7 family algorithms, must be vulnerable to the BREACH attack.

Experiences and tests that I did on a test environment show BREACH attack validity and performance on gzip. But when I test on Brotli, the attack fails because the compression result length does not show the expected result. For example, it returns equal value for true and false characters!

I don't know the Brotli algorithm details, the BREACH attack is so sensitive to compression algorithm and decision is based on just one byte difference between correct character compression result and false characters. Use of Static dictionary or Context modelling maybe a reason for this behaviour.

Why does Brotli behave like this?

"true" and "false" are both common words, sure to be in the 13k dictionary. since it can (probably) swap a word for a token, there's no diff in output size. For mis-spellings, or partial passwords, i wouldn't be so sure. It should be better than gzip, but still (in theory) can be used w/breach in some situations. — dandavis, Oct 26 '17 at 06:17
edit, i checked, "true" and "false" are indeed both represented in the hard-coded listing — dandavis, Oct 26 '17 at 06:24
Related: https://security.stackexchange.com/q/150351/29865 (Possible dup?) — Ajedi32, Oct 26 '17 at 15:16
hi, my main problem is why two tries method that present in breach attack documents dont work for br compression. in my tests apache return same result for two request like this: Request Response Size canary=a{} 99 canary={}a 99 whiles if 'a' is correct character result should be like this: Request Response Size canary=a{} 99 canary={}a 100 — seyyed heydar javadi, Oct 28 '17 at 07:37

Polynomial · Accepted Answer · 2018-11-12T19:46:44.823

I thought this was quite an interesting question and problem, so I had a go at implementing the attacks myself, but against something other than "true" or "false".

To make things simple I used a static page as a file and injected content into it at the end. The target secret looked like this:

<input type="hidden" name="secret" value="7253b8f45f322b411ce39b12c6e9a84c" />

I used the source of an MSDN page as an example to build around. The full target page source I used can be found here; the secret is on line 33. An important factor here is that the page content includes a lot of other hexadecimal strings, which imitates a potentially more difficult attack scenario.

To attack this, I used a fairly naive method:

Append <input type="hidden" name="secret" value=" to the end of the page, then append our previous guesses (if any), then append 0000 to ffff in sequence and look for the smallest result when compressed.
For the 4-character hex string that produced the smallest result, add the first three characters to our guess. Dropping the last character is important as there is quite a bit of uncertainty in the last character, and if it does get it wrong it tends to just repeatedly generate the last character for all remaining guesses (e.g. 7251 on the first would generate 1111 on all subsequent guesses).
Loop steps 1 and 2 until we have <4 characters left to guess. Run through the remaining possible combinations, in each case adding " /> to the appended string to complete the tag.

I wrote a complete implementation with parallelism in C#, which you can find here.

Here are the results for GZip:

................................
Guessed f45a
Secret: 7253b8f45
................................
Guessed f322
Secret: 7253b8f45f32
................................
Guessed 2b41
Secret: 7253b8f45f322b4
................................
Guessed 11ca
Secret: 7253b8f45f322b411c
................................
Guessed e39a
Secret: 7253b8f45f322b411ce39
................................
Guessed b12a
Secret: 7253b8f45f322b411ce39b12
................................
Guessed c6e9
Secret: 7253b8f45f322b411ce39b12c6e
................................
Guessed 9a84
Secret: 7253b8f45f322b411ce39b12c6e9a8
Reached last block.

Guessed 4c
Secret: 7253b8f45f322b411ce39b12c6e9a84c
Done!

This took only a couple of minutes to run and it correctly discovers the secret.

Now let's try again with Brotli, using the "fastest" compression mode, with the same configuration for the attack:

................................
Guessed 7253
Secret: 725
................................
Guessed 3b77
Secret: 7253b7
................................
Guessed 4aa7
Secret: 7253b74aa
................................
Guessed 47aa
Secret: 7253b74aa47a
................................
Guessed 0aaa
Secret: 7253b74aa47a0aa
................................
Guessed 0aaa
Secret: 7253b74aa47a0aa0aa
................................
Guessed 3a0a
Secret: 7253b74aa47a0aa0aa3a0
................................
Guessed 0aaa
Secret: 7253b74aa47a0aa0aa3a00aa
................................
Guessed 4aaa
Secret: 7253b74aa47a0aa0aa3a00aa4aa
................................
Guessed 2aaa
Secret: 7253b74aa47a0aa0aa3a00aa4aa2aa
Reached last block.

Guessed 7a
Secret: 7253b74aa47a0aa0aa3a00aa4aa2aa7a
Done!

Not so good. The second guess goes wrong and this throws off the whole algorithm. So maybe we need to be a bit more cautious, and only accept 2/4 characters per block instead of 3/4.

................................
Guessed 411c
Secret: 7253b8f45f322b41
................................
Guessed 1c77
Secret: 7253b8f45f322b411c
................................
Guessed e377
Secret: 7253b8f45f322b411ce3
................................
Guessed 9b12
Secret: 7253b8f45f322b411ce39b
................................
Guessed 1277
Secret: 7253b8f45f322b411ce39b12
................................
Guessed c677
Secret: 7253b8f45f322b411ce39b12c6
................................
Guessed e977
Secret: 7253b8f45f322b411ce39b12c6e9
................................
Guessed a84c
Secret: 7253b8f45f322b411ce39b12c6e9a8
Reached last block.

Guessed 4c
Secret: 7253b8f45f322b411ce39b12c6e9a84c
Done!

And there we can see it works again! So it turns out that it isn't impossible to attack Brotli in this way.

The key part that explains why your attack failed is the pre-defined dictionary:

Unlike most general purpose compression algorithms, Brotli uses a pre-defined 120 kilobyte dictionary, in addition to the dynamically populated ("sliding window") dictionary. The pre-defined dictionary contains over 13000 common words, phrases and other substrings derived from a large corpus of text and HTML documents

(from Brotli - Wikipedia)

So do "true" and "false" appear in this dictionary? Yes! I found a textual copy of the dictionary here and you can find true and false on lines 6829 and 1204 respectively.

The reason my attack works and yours doesn't is that I'm looking for something that isn't in the dictionary. For a single token like "true" or "false", using the dictionary index is much more efficient than trying to fold the characters together with another instance in the stream using distance encoding.

But why was it harder for me to perform a successful compression oracle attack using Brotli than with GZip? There don't appear to be many hexadecimal strings in the dictionary. I believe this is simply because the Brotli compression system is more complex than GZip. GZip uses a combination of LZ77 and Huffman coding, which has a strong focus on duplicate elimination, which works great for a compression oracle attack. Brotli, on the other hand, has more tricks up its sleeve, leading to a higher likelihood of other compression mechanisms being used instead of duplicate elimination during the first few blocks when the duplicate isn't very long. One way to improve the attack for Brotli would be to brute-force a longer set of values in the first block (e.g. 6 or even 8 characters) to try to elicit a duplicate elimination. This is much slower but would likely force a reliable compression oracle out of the Brotli compressor.

I have gzip disabled in the server side. And I have cloudflare also. Clouflare uses Brotli. So, I understand the communication from browser to CF remains compressed and CF to server is uncompressed. Does this scenario saves me from BREACH? Because, even if we disable Brotli, the SSL scanner still shows I am vulnerable. — Anonymous Platypus, Nov 06 '19 at 06:28

Brotli compression algorithm and BREACH attack

1 Answers1

Linked