I thought this was quite an interesting question and problem, so I had a go at implementing the attacks myself, but against something other than "true" or "false".
To make things simple I used a static page as a file and injected content into it at the end. The target secret looked like this:
<input type="hidden" name="secret" value="7253b8f45f322b411ce39b12c6e9a84c" />
I used the source of an MSDN page as an example to build around. The full target page source I used can be found here; the secret is on line 33. An important factor here is that the page content includes a lot of other hexadecimal strings, which imitates a potentially more difficult attack scenario.
To attack this, I used a fairly naive method:
- Append
<input type="hidden" name="secret" value="
to the end of the page, then append our previous guesses (if any), then append 0000
to ffff
in sequence and look for the smallest result when compressed.
- For the 4-character hex string that produced the smallest result, add the first three characters to our guess. Dropping the last character is important as there is quite a bit of uncertainty in the last character, and if it does get it wrong it tends to just repeatedly generate the last character for all remaining guesses (e.g.
7251
on the first would generate 1111
on all subsequent guesses).
- Loop steps 1 and 2 until we have <4 characters left to guess. Run through the remaining possible combinations, in each case adding
" />
to the appended string to complete the tag.
I wrote a complete implementation with parallelism in C#, which you can find here.
Here are the results for GZip:
................................
Guessed f45a
Secret: 7253b8f45
................................
Guessed f322
Secret: 7253b8f45f32
................................
Guessed 2b41
Secret: 7253b8f45f322b4
................................
Guessed 11ca
Secret: 7253b8f45f322b411c
................................
Guessed e39a
Secret: 7253b8f45f322b411ce39
................................
Guessed b12a
Secret: 7253b8f45f322b411ce39b12
................................
Guessed c6e9
Secret: 7253b8f45f322b411ce39b12c6e
................................
Guessed 9a84
Secret: 7253b8f45f322b411ce39b12c6e9a8
Reached last block.
Guessed 4c
Secret: 7253b8f45f322b411ce39b12c6e9a84c
Done!
This took only a couple of minutes to run and it correctly discovers the secret.
Now let's try again with Brotli, using the "fastest" compression mode, with the same configuration for the attack:
................................
Guessed 7253
Secret: 725
................................
Guessed 3b77
Secret: 7253b7
................................
Guessed 4aa7
Secret: 7253b74aa
................................
Guessed 47aa
Secret: 7253b74aa47a
................................
Guessed 0aaa
Secret: 7253b74aa47a0aa
................................
Guessed 0aaa
Secret: 7253b74aa47a0aa0aa
................................
Guessed 3a0a
Secret: 7253b74aa47a0aa0aa3a0
................................
Guessed 0aaa
Secret: 7253b74aa47a0aa0aa3a00aa
................................
Guessed 4aaa
Secret: 7253b74aa47a0aa0aa3a00aa4aa
................................
Guessed 2aaa
Secret: 7253b74aa47a0aa0aa3a00aa4aa2aa
Reached last block.
Guessed 7a
Secret: 7253b74aa47a0aa0aa3a00aa4aa2aa7a
Done!
Not so good. The second guess goes wrong and this throws off the whole algorithm. So maybe we need to be a bit more cautious, and only accept 2/4 characters per block instead of 3/4.
................................
Guessed 411c
Secret: 7253b8f45f322b41
................................
Guessed 1c77
Secret: 7253b8f45f322b411c
................................
Guessed e377
Secret: 7253b8f45f322b411ce3
................................
Guessed 9b12
Secret: 7253b8f45f322b411ce39b
................................
Guessed 1277
Secret: 7253b8f45f322b411ce39b12
................................
Guessed c677
Secret: 7253b8f45f322b411ce39b12c6
................................
Guessed e977
Secret: 7253b8f45f322b411ce39b12c6e9
................................
Guessed a84c
Secret: 7253b8f45f322b411ce39b12c6e9a8
Reached last block.
Guessed 4c
Secret: 7253b8f45f322b411ce39b12c6e9a84c
Done!
And there we can see it works again! So it turns out that it isn't impossible to attack Brotli in this way.
The key part that explains why your attack failed is the pre-defined dictionary:
Unlike most general purpose compression algorithms, Brotli uses a pre-defined 120 kilobyte dictionary, in addition to the dynamically populated ("sliding window") dictionary. The pre-defined dictionary contains over 13000 common words, phrases and other substrings derived from a large corpus of text and HTML documents
(from Brotli - Wikipedia)
So do "true" and "false" appear in this dictionary? Yes! I found a textual copy of the dictionary here and you can find true and false on lines 6829 and 1204 respectively.
The reason my attack works and yours doesn't is that I'm looking for something that isn't in the dictionary. For a single token like "true" or "false", using the dictionary index is much more efficient than trying to fold the characters together with another instance in the stream using distance encoding.
But why was it harder for me to perform a successful compression oracle attack using Brotli than with GZip? There don't appear to be many hexadecimal strings in the dictionary. I believe this is simply because the Brotli compression system is more complex than GZip. GZip uses a combination of LZ77 and Huffman coding, which has a strong focus on duplicate elimination, which works great for a compression oracle attack. Brotli, on the other hand, has more tricks up its sleeve, leading to a higher likelihood of other compression mechanisms being used instead of duplicate elimination during the first few blocks when the duplicate isn't very long. One way to improve the attack for Brotli would be to brute-force a longer set of values in the first block (e.g. 6 or even 8 characters) to try to elicit a duplicate elimination. This is much slower but would likely force a reliable compression oracle out of the Brotli compressor.