14

I recently started learning about buffer overflows and how they work. Someone shared a binary to practice on (in a vm, don't worry). I've been feeding strings to the socket that the binary opens, and I noticed that at a certain length, the string will cause the program to not respond with the message it is supposed to. Also, if I feed another string of certain lengths, parts of the message are sent back through the socket from the server, but the other parts just print to console on the server end. I'm not entirely sure what caused this (this isn't the official question for this post, but I'd love to hear an answer in the comments).

And that brings me to my question: Are there any applications that can generate an image of the stack or dump it and generally what's written to it? I think it'd be really helpful for seeing what is happening when I feed the socket strings of different lengths. I'd love it if the size of each 'section' of the stack (don't know what it's called) were represented in the image with a size relative to the other sections (so, I can visualize the size of the stack), or in a readable way.

Something like this would be great if your answer is about generating the image, except it would be nice if it showed how much is written to it (that way I can see when it is overflowing)...

stack image

I'd probably generate an image when I start the program, and after I feed the socket the huge values. Then I'd compare. If there are any other ways / better ways of learning, I'd love to hear them.

Edit #1: I'm black-box testing.

Edit #2: While there already is an accepted answer to this question, I would appreciate other answers too. The more information, and responses, the more I will be able to learn. Thus, I will reward new answers (if deserved) with bounties. Appreciate it!

Aaron Esau
  • 278
  • 3
  • 15
  • 3
    Any debugger can print memory. You simply need to print the memory from the `$esp` register forward. If you are on any *nix, GDB should be there (or in the package manager). PS: Most often you do not want the full memory dump of a program, that is often too big to analyze manually. – grochmal Oct 04 '16 at 01:09
  • 1
    Another comment from me. In the bounty definition you add an extra question, namely *"how to consistently detect buffer overflows (black-box testing)"*. I've added an answer to that question. Yet, it may be useful to push that question inside the question body. (I do not wanna do it myself 'cause it will look like I altered the question to suit my answer :) ) – grochmal Oct 10 '16 at 20:51
  • @grochmal I read the answer too late. The bounty is already over, so I don't know how/where to find the text I wrote in the bounty description. Do you know how? – Aaron Esau Oct 16 '16 at 20:56
  • 1
    If i'm not mistaken it was something of the lines: *"I really want to know how to read the stack of a process and see a description of how to consistently detect buffer overflows (black-box testing)"*. Regarding your EDIT#2, I believe it is much better to try several things (e.g. get a book and/or do some experimentation) and then post another question. Also, there is a [reverseengineering.SE](http://reverseengineering.stackexchange.com/) and they're desperate for good questions there :) (shameless marketing, I know). – grochmal Oct 16 '16 at 21:51
  • Heh, thanks. I know this isn't really the right place to ask, but... The program prints two strings (separated by a newline) normally, when I have a string with a length under a certain number. When I feed a string of a certain length to the socket, the first part of the string it normally responds with (before the newline) is printed to the console where the binary is being executed, but the part after the newline is printed to the socket. I really don't know why it's doing that. Do you? Sorry if that's hard to understand, it's complicated to explain. :) – Aaron Esau Oct 17 '16 at 03:31
  • 1
    Sorry for the late answer. That kind of challenge is normally made a lot simpler by disassembling the binary with GDB, there is one [question on re.SE that I always come back to](http://reverseengineering.stackexchange.com/questions/1935/how-to-handle-stripped-binaries-with-gdb-no-source-no-symbols-and-gdb-only-sho) when I need to look at stripped binaries. 0xC0000022L's answers is just awesome. Another help is running the program through starce (`strace ./binary`). It will print all syscalls (including printing/reading), so you do not need to guess. That is how I tackle binaries at least – grochmal Oct 18 '16 at 21:01

2 Answers2

9

Getting a dump of memory the simple way

You can simply send your vulnerable process a SIGSEGV (kill -SEGV pid) and, if coredump is allowed (ulimit -c unlimited), you gonna get a nice core dump file with all your memory in it.

Example:

On terminal #1:

/tmp$ ./test 
idling...
idling...
Segmentation fault <---- HERE I SEND THE 1st SIGSEGV
/tmp$ ulimit -c unlimited
/tmp$ ./test 
idling...
idling...
Segmentation fault (core dumped) <---- HERE IS THE 2d SIGSEGV
/tmp$ ls test
test    test.c  
/tmp$ ls -lah core 
-rw------- 1 1000 1000 252K Oct 10 17:42 core

On terminal #2

/tmp$ ps aux|grep test
1000  6529  0.0  0.0   4080   644 pts/1    S+   17:42   0:00 ./test
1000  6538  0.0  0.0  12732  2108 pts/2    S+   17:42   0:00 grep test
/tmp$ kill -SEGV 6529
/tmp$ ps aux|grep test
1000  6539  0.0  0.0   4080   648 pts/1    S+   17:42   0:00 ./test
1000  6542  0.0  0.0  12732  2224 pts/2    S+   17:42   0:00 grep test
/tmp$ kill -SEGV 6539

Please note that this will give you a dump of your state at the moment the binary got the SIGSEGV. So, if your binary consists of main() and evil_function() and, while receiving SIGSEV, your program was running evil_function(), you gonna get the stack of evil_function(). But you may also inspect around to get back to main() stack.

Good pointer about all that is Aleph One paper: http://insecure.org/stf/smashstack.html

Guessing the "mapping" by yourself

If we imagine that your binary is implementing a basic buffer overflow, like in this code snippet:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>


int evil_function(char *evil_input)
{
    char stack_buffer[10];
    strcpy(stack_buffer, evil_input);
    printf("input is: %s\n", stack_buffer);
    return 0;
}


int main (int ac, char **av)
{
    if (ac != 2) 
    {
        printf("Wrong parameter count.\nUsage: %s: <string>\n",av[0]);
        return EXIT_FAILURE;
    }
    evil_function(av[1]);

    return (EXIT_SUCCESS);
}

It's quite simple to guess where you should write your buffer address just by using gdb. Let's have a try with the above example program:

/tmp/bo-test$ ./test-buffer-overflow $(perl -e "print 'A'x10")
input is: AAAAAAAAAA
/tmp/bo-test$ ./test-buffer-overflow $(perl -e "print 'A'x11")
input is: AAAAAAAAAAA
/tmp/bo-test$ ./test-buffer-overflow $(perl -e "print 'A'x12")
input is: AAAAAAAAAAAA
/tmp/bo-test$ ./test-buffer-overflow $(perl -e "print 'A'x13")
input is: AAAAAAAAAAAAA
/tmp/bo-test$ ./test-buffer-overflow $(perl -e "print 'A'x14")
input is: AAAAAAAAAAAAAA
/tmp/bo-test$ ./test-buffer-overflow $(perl -e "print 'A'x15")
input is: AAAAAAAAAAAAAAA
/tmp/bo-test$ ./test-buffer-overflow $(perl -e "print 'A'x16")
input is: AAAAAAAAAAAAAAAA
Segmentation fault (core dumped)

Ok, so the stack begin being fucked up after giving 6 extra chars... Let's have a look to the stack:

/tmp/bo-test$ gdb test-buffer-overflow core
GNU gdb (Debian 7.7.1+dfsg-5) 7.7.1
[...]
Core was generated by `./test-buffer-overflow AAAAAAAAAAAAAAAA'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f2cb2c46508 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007f2cb2c46508 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x0000000000000000 in ?? ()
(gdb) Quit

Let's continue with feeding it more extra char:

/tmp/bo-test$ ./test-buffer-overflow $(perl -e "print 'A'x26")
input is: AAAAAAAAAAAAAAAAAAAAAAAAAA
Segmentation fault (core dumped)
/tmp/bo-test$ gdb test-buffer-overflow core
GNU gdb (Debian 7.7.1+dfsg-5) 7.7.1
[...]
Core was generated by `./test-buffer-overflow AAAAAAAAAAAAAAAAAAAAAAAAAA'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000000000004141 in ?? ()
(gdb) 

Hey ... look at this adress: 0x0000000000004141! 0x41 is hex ascii code for ... 'A' :p We just rewrote the RET adress :) Now, last attempt, just to see:

/tmp/bo-test$ ./test-buffer-overflow AAAAAAAAAAAAAAAAAAAAAAAAABCDEFGHI
input is: AAAAAAAAAAAAAAAAAAAAAAAAABCDEFGHI
Segmentation fault (core dumped)
/tmp/bo-test$ gdb test-buffer-overflow core GNU gdb 
Core was generated by `./test-buffer-overflow AAAAAAAAAAAAAAAAAAAAAAAAABCDEFGHI'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000000000400581 in evil_function (
    evil_input=0x7fff7e2712a6 'A' <repeats 25 times>, "BCDEFGHI")
    at test-buffer-overflow.c:12
12  }
(gdb) bt
#0  0x0000000000400581 in evil_function (
    evil_input=0x7fff7e2712a6 'A' <repeats 25 times>, "BCDEFGHI")
    at test-buffer-overflow.c:12
#1  0x4847464544434241 in ?? ()
#2  0x00007fff7e260049 in ?? ()
#3  0x0000000200000000 in ?? ()
#4  0x0000000000000000 in ?? ()

This time, look at the address again: 0x4847464544434241... Now you know exactly where to write ...

Aaron Esau
  • 278
  • 3
  • 15
binarym
  • 744
  • 4
  • 8
  • It doesn't print "Segmentation Fault", but I'm fairly certain it should. Is there any other way to tell if it is being overflowed? – Aaron Esau Oct 18 '16 at 05:30
  • 1
    You should get segmentation fault on the term where the vulnerable binary run, not in the term you launch the attack from. A segmentation fault will always cause the vulnerable program to crash, so if it's still running, it looks that you didn't overflowed it enought or in the right way :) – binarym Oct 18 '16 at 15:40
  • (late response) Thanks. I fed it a string of 111K "A"s with no crash or segmentation fault. I must be trying to overflow it in the wrong way. The return address isn't being overwritten when I check in gdb anyways. I made a simple fuzzer to send huge strings and stop when it crashes. It doesn't seem to crash though (and if it did, it would probably be from application DOS, lol). Do you think I'm trying to exploit it wrong? – Aaron Esau Nov 03 '16 at 05:50
  • 1
    If you're only way to interact with the bogus program is sending string, i think you're attempt to exploit are good. Maybe you should check your OS configuration to make sure it doesn't implement some security features making it buffer overflow proof. Also, check out your system log (especially /var/log/message , syslog and debug). If your OS stopped the overflow, you'll probably have a line related to this on logs. – binarym Nov 03 '16 at 09:59
  • Oh hey @binarym, just thought you might want to know-- Thanks to you, I got it to work, and since, I've learned a lot about binary exploitation. If I were old enough to, I'd buy you a drink at Defcon. ;) – Aaron Esau Mar 19 '18 at 04:38
  • @Arin my pleasure ;-) – binarym Apr 10 '18 at 12:43
3

@binarym's answer is pretty good. He already explains the reasons behind a buffer overflow, how you can find a simple overflow and how we can look at the stack using a corefile and/or GDB. I just want to add two extra details:

  1. A more in-depth black-box test example, i.e, this:

a description of how to consistently detect buffer overflows (black-box testing)

  1. Compiler quirks, i.e. where black-box testing fails (more-or-less, it is more like where a black-box generated payload may fail).

The code we will use is a little more complex:

#include <stdio.h>
#include <string.h>

void do_post(void)
{
    char curr = 0, message[128] = {};
    int i = 0;
    while (EOF != (curr = getchar())) {
        if ('\n' == curr) {
            message[i] = 0;
            break;
        } else {
            message[i] = curr;
        }
        i++;
    }
    printf("I got your message, it is: %s\n", message);
    return;
}

int main(void)
{
    char curr = 0, request[8] = {};
    int i = 0;
    while (EOF != (curr = getchar())) {
        request[i] = curr;
        if (!strcmp(request, "GET\n")) {
            printf("It's a GET!\n");
            return 0;
        } else if (!strcmp(request, "POST\n")) {
            printf("It's a POST, get the message\n");
            do_post();
            return 0;
        } else if (5 < strlen(request)) {
            printf("Some rubbish\n");
            return 1;
        }  /* else keep reading */
        i++;
    }
    printf("Assertion error, THIS IS A BUG please report it\n");
    return 0;
}

I'm making fun out of HTTP with POST and GET requests. And I am using getchar() to read STDIN character by character (that's a poor implementation but it is educational). The code will differentiate between GET, POST and "rubbish" (whatever else), and does that using a more-or-less properly written loop (without overflows).

Yet, when parsing the POST message there is an overflow, in the message[128] buffer. Unfortunately that buffer is deep inside the program (well, not really that deep but a simple long argument will not find it). Let's compile it and try long strings:

[~]$ gcc -O2 -o over over.c
[~]$ perl -e 'print "A"x2000' | ./over 
Some rubbish

Yeah, that does not work. Since we know the code we know that if we add "POST\n" to the beginning we will trigger the overflow. But what if we do not know the code? Or it the code is too complex? Enters black-box testing.

Black Box Testing

The most popular black box testing technique is fuzzing. Almost all other (black box) techniques are a variation of it. Fuzzing is simply feeding the program random input until we find something interesting. I wrote a simple fuzzing script to check this program, let's look at it:

#!/usr/bin/env python3

from itertools import product
from subprocess import Popen, PIPE, DEVNULL

prog = './over'
valid_returns = [ 0, 1 ]

all_chars = list(map(chr, range(256)))
# This assumes that we may find something with an input as small as 1024 bytes,
# which isn't realistic.  In the real world several megabytes of need to be
# tried.
for input_size in range(1,1024):
    input = [p for p in product(all_chars, repeat=input_size)]
    for single_input in input:
        child = Popen(prog, stdin=PIPE, stdout=DEVNULL)
        byte_input = (''.join(single_input)).encode("utf-8")
        child.communicate(input=byte_input)
        child.stdin.close()
        ret = child.wait()
        if not ret in valid_returns:
            print("INPUT", repr(byte_input), "RETURN", ret)
            exit(0)

# The exit(0) is not realistic either, in the real world I'd like to have a
# full log of the entire search space.

It simply does that: feeds increasingly big random input to the program. (WARNING: the script requires a good deal of RAM) I run this and after a few hours I get an interesting output:

INPUT b"POST\nXl_/.\xc3\x93\xc3\x90\xc2\x87\xc3\xa6dh\xc3\xaeH\xc2\xa0\xc2\x836\x16.\xc3\xb7\x1be\x1e,\xc3\x98\xc3\xa4\xc2\x81\xc2\x83 su\xc2\xb1\xc3\xb2\xc3\x8d^\xc2\xbc\xc2\xa11/\xc2\x9f\x12vY\x12[0\x0c]\xc3\xb6\x19zI\xc2\xb8\xc2\xb5\xc3\xbb\xc2\x9e\xc3\xab>^\xc2\x85\xc2\x91\xc2\xb5\xc2\xb5\xc3\xb6u\xc3\x8e).\xc3\xbcn\x1aM\xc3\xbb+{\x1c\xc3\x9a\xc3\x8b&\xc2\x93\xc2\xa1D\xc3\xad\xc3\xad\xc3\x81\xc2\xbd\xc2\x8d\xc2\xa3 \xc3\x87_\xc2\x82\xc3\x9asv\xc3\x92\xc2\x85IP\xc2\xb8\x1bS\xc3\xbe\xc3\x9e\\\xc2\x8e\xc3\x9f\xc2\xb1\xc3\xa4\xc2\xbe\x1fue\xc3\x81\xc3\x8a\xc2\x8b'\xc3\xaf\xc2\xa1\xc3\x95'\xc2\xaa\xc3\xa8P\xc2\xa7\xc2\x8f\xc3\x99\xc2\x94S5\xc2\x83\xc3\x85U" RETURN -11

The process exited -11, is it a segfault? Let's see:

kill -l | grep SIGSEGV
11) SIGSEGV 12) SIGUSR2 13) SIGPIPE 14) SIGALRM 15) SIGTERM

It is a segmentation fault alright (see this answer for clarification). Now I do have an input sample which I can use to simulate this segfault and discover (with GDB) where the overflow is.

Compiler quirks

Did you see something strange above? There is a piece of information I omitted, I used a spoiler tag below so you can go back and try to figure out. The answer is here:

Why the hell I used gcc -O2 -o over over.c? Why a plain gcc -o over over.c is not enough? What is so special about compiler optimisation (-O2) in this context?

To be fair, I myself found it astonishing that I could find this behaviour in such a simple program. Compilers rewrite a good deal of code during compilation, for performance reasons. Compilers also do try to mitigate several risks (e.g. clearly visible overflows). Often the same code may look very different with and without optimisation enabled.

Let's have a look at this specific quirk, but let's go back to perl since we do know the vulnerability already:

[~]$ gcc -O2 -o over over.c
[~]$ perl -e 'print "POST\n" . "A"x2000' | ./over 
It's a POST, get the message
I got your message, it is: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAins
Segmentation fault (core dumped)

Yes, that is exactly what we expected. But now, let's disable optimisation:

[~]$ gcc -o over over.c
[~]$ perl -e 'print "POST\n" . "A"x2000' | ./over 
It's a POST, get the message
I got your message, it is: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAÿ}
$ echo $?
0

What the hell! The compiler managed to patch the vulnerability I crafted with so much love. If you look at the length of that message you will see that it is 141 bytes long. The buffer did overflow, but the compiler added some kind of assembly to stop the writes in case the overflow gets to something important.

For the skeptics, here is the compiler version I'm using to get the behavior above:

[~]$ gcc --version
gcc (GCC) 6.2.1 20160830
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

The moral of the story is that most buffer overflow vulnerabilities only work with the same payload if compiled by the same compiler and with the same optimisation (or even other parameters). Compilers do evil things to your code to make it run faster, and although there are good chance that a payload will work on the same program compiled by two compilers, it is not always true.

Postscript

I did this answer for fun and to keep a record for myself. I do not deserve the bounty because I do not fully answer your question, I only answer the extra question added in the bounty definition. bynarym's answer deserves the bounty because he answers more parts of the original question.

grochmal
  • 5,677
  • 2
  • 19
  • 30
  • I really appreciate this answer. Even though this technically isn't answering the main question, it does contribute a lot to the topic, and I've very grateful for your answer. I don't have enough reputation right now to be able to start another bounty and award you with the amount of reputation that I think is deserved. Once I get enough, however, I'll be sure to award you. Thanks again! – Aaron Esau Oct 16 '16 at 20:47
  • 2
    @Arin - There's really no need, I have so many rep points that I stopped caring. I come to SE to hone some skills, read something new/interesting and learn something useful from time to time. There really is no point in getting 100k rep just for the sake of it. We should all be here for the fun of it. – grochmal Oct 16 '16 at 21:55