Dumb fuzzing
I'm just starting out to learn about fuzzing and have made a dumb fuzzer that changes several random bytes in a pdf file to random values, opens it and detects if Acrobat Reader has crashed. What types of bugs can I expect to find using such a method?
None. Such "dumb" fuzzing has incredibly limited coverage. I would be extremely surprised if that application actually crashed just because a few bytes in the input randomly changed. This would be the case even if the fuzzer was trying millions of samples per second. However you also say that your fuzzer opens it and sees if it crashes. This would only get what, a few tries a second? A try every few seconds? Not even close to worth it.
The way some professional fuzzers work is spawning a process, pausing its memory state right before it is about to parse the target data (to skip the bootstrapping state), and then resume it. If it doesn't crash within several tens of milliseconds, it reverts it back to the paused state it was in before. This allows fuzzing at hundreds of times per second at the slowest, or tens of thousand at the fastest.
I assume that a format string vulnerability could be found in case the dumb fuzzer happens to inserts a format string parameter at the right place. I also assume that integer overflows could be found. But what else could be found?
This depends on how the parser works. Without having the Acrobat Reader source code, I couldn't even begin to answer this question. Buffer overflows, string format vulnerabilities, integer overflows, logic errors, etc. are all possible. You also have to be aware that not all vulnerabilities will be easy to discover, since some parts of the input may be protected with a checksum, complex magic number, or may be compressed.
Imagine you have a field protected with a simple checksum. The chances that random changes will result in a collision are quite low, since that is what the checksum is designed for, and as a result a vulnerable code path may never be taken. However an actual exploit could simply make sure the checksum is valid. This is a common issue with new users fuzzing the rar format. The entire file is covered with a checksum, so no matter how much naive fuzzing someone does, it will never crash. When the checksum code is removed, it turns out to be quite buggy and easy to crash. To avoid this issue, you have to understand the format and either remove the checksum code, or ensure your input always uses correct checksums.
Is it possible to find buffer overflows? Since I change only few random bytes, I assume that the chance of overflowing a buffer is actually small or none. Or am I wrong?
If you change a value that ends up specifying the size of a buffer, then even if you don't change the size of the buffer, it may still result in an overflow. For example, take a hypothetical memory contents here is some text17
. This is composed of a buffer, here is some text
, and a size, 17
. A permutation that modifies the size so it is lower would result in the buffer overflowing once it is processed. Now it's not likely that Acrobat Reader has something this obvious, but a bug could hypothetically result in a similar outcome internally. I could imagine several possibilities.
Intelligent fuzzing
Professional fuzzers do not just throw random input at a program to see if it crashes. They modify the input in specific ways that are likely to result in changes in target behavior. The program is compiled with special code inserted that causes it to report its internal state to the fuzzer at each branch, allowing the fuzzer to know when a given input resulted in a different codepath being taken, even if its external behavior doesn't change. This is the technique used by AFL, one of the most popular general-purpose fuzzers. It is intelligent enough to create a valid JPEG file out of thin air, simply by examining the state of a JPEG decoder as it is feeding it input.
In order to write a fuzzer that finds real-life bugs, you would need to either use binary instrumentation like AFL with the ability to automatically analyze the format, or write an application-specific format-aware fuzzer (i.e. one that is deeply aware of the specifics of the PDF format, and is able to create permutations on a per-syntax keyword basis rather than a per-byte basis).