@binarym's answer is pretty good. He already explains the reasons behind a buffer overflow, how you can find a simple overflow and how we can look at the stack using a corefile and/or GDB. I just want to add two extra details:
- A more in-depth black-box test example, i.e, this:
a description of how to consistently detect buffer overflows (black-box testing)
- Compiler quirks, i.e. where black-box testing fails (more-or-less, it is more like where a black-box generated payload may fail).
The code we will use is a little more complex:
#include <stdio.h>
#include <string.h>
void do_post(void)
{
char curr = 0, message[128] = {};
int i = 0;
while (EOF != (curr = getchar())) {
if ('\n' == curr) {
message[i] = 0;
break;
} else {
message[i] = curr;
}
i++;
}
printf("I got your message, it is: %s\n", message);
return;
}
int main(void)
{
char curr = 0, request[8] = {};
int i = 0;
while (EOF != (curr = getchar())) {
request[i] = curr;
if (!strcmp(request, "GET\n")) {
printf("It's a GET!\n");
return 0;
} else if (!strcmp(request, "POST\n")) {
printf("It's a POST, get the message\n");
do_post();
return 0;
} else if (5 < strlen(request)) {
printf("Some rubbish\n");
return 1;
} /* else keep reading */
i++;
}
printf("Assertion error, THIS IS A BUG please report it\n");
return 0;
}
I'm making fun out of HTTP with POST and GET requests. And I am using getchar()
to read STDIN character by character (that's a poor implementation but it is educational). The code will differentiate between GET, POST and "rubbish" (whatever else), and does that using a more-or-less properly written loop (without overflows).
Yet, when parsing the POST message there is an overflow, in the message[128]
buffer. Unfortunately that buffer is deep inside the program (well, not really that deep but a simple long argument will not find it). Let's compile it and try long strings:
[~]$ gcc -O2 -o over over.c
[~]$ perl -e 'print "A"x2000' | ./over
Some rubbish
Yeah, that does not work. Since we know the code we know that if we add "POST\n" to the beginning we will trigger the overflow. But what if we do not know the code? Or it the code is too complex? Enters black-box testing.
Black Box Testing
The most popular black box testing technique is fuzzing. Almost all other (black box) techniques are a variation of it. Fuzzing is simply feeding the program random input until we find something interesting. I wrote a simple fuzzing script to check this program, let's look at it:
#!/usr/bin/env python3
from itertools import product
from subprocess import Popen, PIPE, DEVNULL
prog = './over'
valid_returns = [ 0, 1 ]
all_chars = list(map(chr, range(256)))
# This assumes that we may find something with an input as small as 1024 bytes,
# which isn't realistic. In the real world several megabytes of need to be
# tried.
for input_size in range(1,1024):
input = [p for p in product(all_chars, repeat=input_size)]
for single_input in input:
child = Popen(prog, stdin=PIPE, stdout=DEVNULL)
byte_input = (''.join(single_input)).encode("utf-8")
child.communicate(input=byte_input)
child.stdin.close()
ret = child.wait()
if not ret in valid_returns:
print("INPUT", repr(byte_input), "RETURN", ret)
exit(0)
# The exit(0) is not realistic either, in the real world I'd like to have a
# full log of the entire search space.
It simply does that: feeds increasingly big random input to the program. (WARNING: the script requires a good deal of RAM) I run this and after a few hours I get an interesting output:
INPUT b"POST\nXl_/.\xc3\x93\xc3\x90\xc2\x87\xc3\xa6dh\xc3\xaeH\xc2\xa0\xc2\x836\x16.\xc3\xb7\x1be\x1e,\xc3\x98\xc3\xa4\xc2\x81\xc2\x83 su\xc2\xb1\xc3\xb2\xc3\x8d^\xc2\xbc\xc2\xa11/\xc2\x9f\x12vY\x12[0\x0c]\xc3\xb6\x19zI\xc2\xb8\xc2\xb5\xc3\xbb\xc2\x9e\xc3\xab>^\xc2\x85\xc2\x91\xc2\xb5\xc2\xb5\xc3\xb6u\xc3\x8e).\xc3\xbcn\x1aM\xc3\xbb+{\x1c\xc3\x9a\xc3\x8b&\xc2\x93\xc2\xa1D\xc3\xad\xc3\xad\xc3\x81\xc2\xbd\xc2\x8d\xc2\xa3 \xc3\x87_\xc2\x82\xc3\x9asv\xc3\x92\xc2\x85IP\xc2\xb8\x1bS\xc3\xbe\xc3\x9e\\\xc2\x8e\xc3\x9f\xc2\xb1\xc3\xa4\xc2\xbe\x1fue\xc3\x81\xc3\x8a\xc2\x8b'\xc3\xaf\xc2\xa1\xc3\x95'\xc2\xaa\xc3\xa8P\xc2\xa7\xc2\x8f\xc3\x99\xc2\x94S5\xc2\x83\xc3\x85U" RETURN -11
The process exited -11, is it a segfault? Let's see:
kill -l | grep SIGSEGV
11) SIGSEGV 12) SIGUSR2 13) SIGPIPE 14) SIGALRM 15) SIGTERM
It is a segmentation fault alright (see this answer for clarification). Now I do have an input sample which I can use to simulate this segfault and discover (with GDB) where the overflow is.
Compiler quirks
Did you see something strange above? There is a piece of information I omitted, I used a spoiler tag below so you can go back and try to figure out. The answer is here:
Why the hell I used gcc -O2 -o over over.c
? Why a plain gcc -o over over.c
is not enough? What is so special about compiler optimisation (-O2
) in this context?
To be fair, I myself found it astonishing that I could find this behaviour in such a simple program. Compilers rewrite a good deal of code during compilation, for performance reasons. Compilers also do try to mitigate several risks (e.g. clearly visible overflows). Often the same code may look very different with and without optimisation enabled.
Let's have a look at this specific quirk, but let's go back to perl since we do know the vulnerability already:
[~]$ gcc -O2 -o over over.c
[~]$ perl -e 'print "POST\n" . "A"x2000' | ./over
It's a POST, get the message
I got your message, it is: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAins
Segmentation fault (core dumped)
Yes, that is exactly what we expected. But now, let's disable optimisation:
[~]$ gcc -o over over.c
[~]$ perl -e 'print "POST\n" . "A"x2000' | ./over
It's a POST, get the message
I got your message, it is: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAÿ}
$ echo $?
0
What the hell! The compiler managed to patch the vulnerability I crafted with so much love. If you look at the length of that message you will see that it is 141 bytes long. The buffer did overflow, but the compiler added some kind of assembly to stop the writes in case the overflow gets to something important.
For the skeptics, here is the compiler version I'm using to get the behavior above:
[~]$ gcc --version
gcc (GCC) 6.2.1 20160830
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
The moral of the story is that most buffer overflow vulnerabilities only work with the same payload if compiled by the same compiler and with the same optimisation (or even other parameters). Compilers do evil things to your code to make it run faster, and although there are good chance that a payload will work on the same program compiled by two compilers, it is not always true.
Postscript
I did this answer for fun and to keep a record for myself. I do not deserve the bounty because I do not fully answer your question, I only answer the extra question added in the bounty definition. bynarym's answer deserves the bounty because he answers more parts of the original question.