5

I'm just starting out to learn about fuzzing and have made a dumb fuzzer that changes several random bytes in a pdf file to random values, opens it and detects if Acrobat Reader has crashed. What types of bugs can I expect to find using such a method?

I assume that a format string vulnerability could be found in case the dumb fuzzer happens to inserts a format string parameter at the right place. I also assume that integer overflows could be found. But what else could be found?

Is it possible to find buffer overflows? Since I change only few random bytes, I assume that the chance of overflowing a buffer is actually small or none. Or am I wrong?

Anders
  • 64,406
  • 24
  • 178
  • 215
pineappleman
  • 2,279
  • 11
  • 21
  • You are *very* unlikely to find any bugs in Acrobat Reader with a dumb fuzzer - especially format string vulns, since they are usually trivial to find with static analysis. But yes, buffer overflows are possible. Memory corruption bugs will probably cause most of your crashes. – grc Apr 08 '16 at 12:11
  • 1
    It's hard to determine the chance of finding something as you don't know about the non-existent exploit until you find it. I would agree that the chance of finding a buffer overflow is small though. – Dane Apr 08 '16 at 12:14

2 Answers2

2

Dumb fuzzing

I'm just starting out to learn about fuzzing and have made a dumb fuzzer that changes several random bytes in a pdf file to random values, opens it and detects if Acrobat Reader has crashed. What types of bugs can I expect to find using such a method?

None. Such "dumb" fuzzing has incredibly limited coverage. I would be extremely surprised if that application actually crashed just because a few bytes in the input randomly changed. This would be the case even if the fuzzer was trying millions of samples per second. However you also say that your fuzzer opens it and sees if it crashes. This would only get what, a few tries a second? A try every few seconds? Not even close to worth it.

The way some professional fuzzers work is spawning a process, pausing its memory state right before it is about to parse the target data (to skip the bootstrapping state), and then resume it. If it doesn't crash within several tens of milliseconds, it reverts it back to the paused state it was in before. This allows fuzzing at hundreds of times per second at the slowest, or tens of thousand at the fastest.

I assume that a format string vulnerability could be found in case the dumb fuzzer happens to inserts a format string parameter at the right place. I also assume that integer overflows could be found. But what else could be found?

This depends on how the parser works. Without having the Acrobat Reader source code, I couldn't even begin to answer this question. Buffer overflows, string format vulnerabilities, integer overflows, logic errors, etc. are all possible. You also have to be aware that not all vulnerabilities will be easy to discover, since some parts of the input may be protected with a checksum, complex magic number, or may be compressed.

Imagine you have a field protected with a simple checksum. The chances that random changes will result in a collision are quite low, since that is what the checksum is designed for, and as a result a vulnerable code path may never be taken. However an actual exploit could simply make sure the checksum is valid. This is a common issue with new users fuzzing the rar format. The entire file is covered with a checksum, so no matter how much naive fuzzing someone does, it will never crash. When the checksum code is removed, it turns out to be quite buggy and easy to crash. To avoid this issue, you have to understand the format and either remove the checksum code, or ensure your input always uses correct checksums.

Is it possible to find buffer overflows? Since I change only few random bytes, I assume that the chance of overflowing a buffer is actually small or none. Or am I wrong?

If you change a value that ends up specifying the size of a buffer, then even if you don't change the size of the buffer, it may still result in an overflow. For example, take a hypothetical memory contents here is some text17. This is composed of a buffer, here is some text, and a size, 17. A permutation that modifies the size so it is lower would result in the buffer overflowing once it is processed. Now it's not likely that Acrobat Reader has something this obvious, but a bug could hypothetically result in a similar outcome internally. I could imagine several possibilities.

Intelligent fuzzing

Professional fuzzers do not just throw random input at a program to see if it crashes. They modify the input in specific ways that are likely to result in changes in target behavior. The program is compiled with special code inserted that causes it to report its internal state to the fuzzer at each branch, allowing the fuzzer to know when a given input resulted in a different codepath being taken, even if its external behavior doesn't change. This is the technique used by AFL, one of the most popular general-purpose fuzzers. It is intelligent enough to create a valid JPEG file out of thin air, simply by examining the state of a JPEG decoder as it is feeding it input.

In order to write a fuzzer that finds real-life bugs, you would need to either use binary instrumentation like AFL with the ability to automatically analyze the format, or write an application-specific format-aware fuzzer (i.e. one that is deeply aware of the specifics of the PDF format, and is able to create permutations on a per-syntax keyword basis rather than a per-byte basis).

forest
  • 64,616
  • 20
  • 206
  • 257
  • Thanks for the answer! I have learned lots of interesting things from it. By the way, I have tried the dumb approach on Acrobat Reader, and have found several crashes. In order to achieve better code coverage, I have downloaded 100.000 pdfs from the internet and have used minset from the peach framework to calculate the smallest set of pdfs that have the best code coverage. After submitting the crashes to Adobe, they have issued a CVE for that and have fixed it in the meantime. – pineappleman Dec 13 '17 at 18:30
  • I suppose I overestimated the code quality of Adobe's product quite a bit! – forest Dec 14 '17 at 03:25
1

I don't think that this approach will help to find any vulnerabilities in this scope since the Adobe PDF format is pretty complex. It's very unlikely that you will achieve a good code coverage in a reasonable amount of time.

Instead have a look at the reference of the PDF format and the Adobe extensions and start writing tests for particular parts you find interesting. (This might lead you to the specification of other file formats which can be embedded.) If you focus particular implementation details you can work with byte flipping very well.

Noir
  • 2,523
  • 13
  • 23
  • Actually fuzzing the PDF format is one of the most basic and elementary fuzzing tasks, right next to image format fuzzers. – forest Dec 12 '17 at 03:18