19

I read the book "The Shellcoders Handbook", and in it there is some C code which will execute shellcode (it will only call exit syscall).

char shellcode[] = “\xbb\x00\x00\x00\x00\xb8\x01\x00\x00\x00\xcd\x80”;

int main(){
    int *ret;
    ret = (int *)&ret + 2;
    (*ret) = (int)shellcode;
}

Im interested about these three lines in main function. What exactly are they doing and how do they execute the shellcode?

I might have figured it out: Before main was called on the stack were pushed ebp and return address from some previous stack frame, so here we are overwriting that address and placing our shellcode there. Is that right?

Anders
  • 64,406
  • 24
  • 178
  • 215
  • 3
    look at diassembly of `main` and step through the code with a debugger – julian Dec 25 '17 at 05:28
  • 2
    Yes you're right. It is trying to patch the saved eip of main (which is somewhere in __libc_start_main normally) to the address of the shellcode. However you'd need -zexecstack as a compilation flag in gcc in order to execute the shellcode. – sudhackar Dec 25 '17 at 09:19
  • Modern systems do not compile that way by default, hence why it no longer works (though I suppose it would work on systems where read implies exec like MIPS32). – forest Dec 25 '17 at 10:33

1 Answers1

41

TL;DR This is a way to execute shellcode which no longer works.

What is a function?

Shellcode is just machine code in places where it is not normally found, such as a variable of type char. In C, there's no distinction between functions and variables. A function is just a variable that points to executable code. This means that, if you create a variable that points to executable code and call it as if it were a function, it will run. To illustrate how it is just a variable, see this simple program:

#include <stdio.h>
#include <stdint.h>

void print_hello(void)
{
    printf("Hello, world!\n");
}

int main(void)
{
    uintptr_t new_print_hello;

    printf("print_hello = %p\n", print_hello);
    new_print_hello = (uintptr_t)print_hello;
    (*(void(*)())new_print_hello)();
    print_hello();

    return 0;
}

When compiled and executed, this program gives output like so:

$ ./a.out
print_hello = 0x28bc4bf6da
Hello, world!
Hello, world!

This makes it easy to see that a function is nothing more than an address in memory, compatible with type uintptr_t. You can see how a function can be referenced simply as a variable, in this case by printing its value, or by copying it to another variable of a compatible type and calling the variable like a function, albeit with a bit of casting magic in order to make the C compiler happy. Once you see how a function is nothing more than a variable pointing to some executable memory, it's not a stretch to see how a variable pointing to some bytecode you manually define can also be executed.

How do functions work?

Now that you know a function is just an address in memory, you need to know how a function is actually executed. Once you call a function, typically with the call instruction, the instruction pointer (which points to the currently executing instruction) changes to point to the first instruction of the function. The location right before the function is called is saved to the stack by call. Once the function is finished, it is terminated with the ret instruction, which pops it from the stack, saving it back to the IP. So a (somewhat simplified) view is that call pushes the IP to the stack, and ret pops it back.

Depending on the architecture and OS you are on, the arguments to the function may be passed in registers or the stack, and the return value may be in different registers, or the stack. This is called the function call ABI, and it is specific to each type of system. Shellcode designed for one type of system may not work on another, even if the architecture is the same and the operating system different, or vise versa.

What does your shellcode do?

Let's look at the disassembly of the shellcode you provided:

0000000000201010 <shellcode>:
   201010:      bb 00 00 00 00          mov    ebx,0x0
   201015:      b8 01 00 00 00          mov    eax,0x1
   20101a:      cd 80                   int    0x80

This does three things. First, it sets the ebx to 0. Second, it sets the eax register to 1. Finally, it triggers interrupt 0x80 which, on 32-bit systems, is the syscall interrupt. In the SysV calling ABI, the syscall number is placed in eax, and up to 6 arguments are passed in ebx, ecx, edx, esi, edi, and ebp. In this case, only ebx is set, meaning the syscall takes only one argument. Once the 0x80 interrupt is called, the kernel takes over and looks at these values, executing the correct system call. The system call numbers are defined in /usr/include/asm/unistd_32.h. Looking at that, we see that syscall 1 is exit(). From that, we can see the three things this shellcode does:

  1. It sets the first argument of the syscall to 0 (which means exit success).
  2. It sets the syscall number to 1, which is the exit call.
  3. It invokes the syscall, causing the program to exit with status 0.

When you look at the big picture, we see that the shellcode is essentially equivalent to exit(0). It does not need ret because it never returns, and instead causes the program to terminate. If you wanted the function to return, you would need to add ret to the end. If you don't, at the very least, use ret, then the program will crash unless it terminates before it reaches the end of the function, as in your example with the exit() syscall.

What's wrong with your shellcode?

The method of calling shellcode you are showing does not work anymore. It used to, but now days Linux does not allow arbitrary data to be executed, necessitating some arcane casting. This older technique is explained well in the famous Smashing The Stack For Fun And Profit article:

   Lets try to modify our first example so that it overwrites the return
address, and demonstrate how we can make it execute arbitrary code.  Just
before buffer1[] on the stack is SFP, and before it, the return address.
That is 4 bytes pass the end of buffer1[].  But remember that buffer1[] is
really 2 word so its 8 bytes long.  So the return address is 12 bytes from
the start of buffer1[].  We'll modify the return value in such a way that the
assignment statement 'x = 1;' after the function call will be jumped.  To do
so we add 8 bytes to the return address.  Our code is now:

example3.c:
------------------------------------------------------------------------------
void function(int a, int b, int c) {
   char buffer1[5];
   char buffer2[10];
   int *ret;

   ret = buffer1 + 12;
   (*ret) += 8;
}

void main() {
  int x;

  x = 0;
  function(1,2,3);
  x = 1;
  printf("%d\n",x);
}
------------------------------------------------------------------------------

   What we have done is add 12 to buffer1[]'s address.  This new address is
where the return address is stored.  We want to skip pass the assignment to
the printf call.  How did we know to add 8 to the return address?  We used a
test value first (for example 1), compiled the program, and then started gdb

The correct version of your shellcode for newer systems would be:

const char shellcode[] = “\xbb\x00\x00\x00\x00\xb8\x01\x00\x00\x00\xcd\x80”;

int main(){
    int (*ret)() = (int(*)())shellcode;
    ret();
}
forest
  • 64,616
  • 20
  • 206
  • 257
  • 4
    Won't `shellcode` end up in a non-executable page anyway, preventing this from working? – pipe Dec 25 '17 at 13:19
  • @pipe yes, -zexecstack would be needed. – sudhackar Dec 25 '17 at 13:22
  • 3
    @sudhackar Not if it's in the `.text` section. Making the shellcode `const` will put it there, so you won't need `-zexecstack`. – forest Dec 25 '17 at 13:30
  • @forest yeah, that makes sense. You just edited and made it const. – sudhackar Dec 25 '17 at 13:34
  • Yeah, just fixed an oversight. – forest Dec 25 '17 at 13:53
  • There is lots of good stuff in this answer. But I think you left out the most important part, namely why `(*ret) = (int)shellcode;` causes the code to be run. – kasperd Dec 25 '17 at 21:47
  • Fun fact: taking a function pointer to a GNU C nested function [causes gcc to emit `mov`-immediate instructions to build a trampoline on the stack](https://stackoverflow.com/questions/8179521/implementation-of-nested-functions), and **make the stack executable** with `.section .note.GNU-stack,"x",@progbits`. https://godbolt.org/g/PCyKEj. So you can get an executable stack "unexpectedly" if any part of your source uses that GNU C feature. (C only, not C++). – Peter Cordes Dec 26 '17 at 05:01
  • *In C, there's no distinction between functions and variables*: No, in *asm* bytes in memory have no type. (But some DSPs and microcontrollers have separate address spaces for code and data, not just read-only code, i.e. strict Harvard architectures, not Von Neumann). In C, it's undefined behaviour to cast `char[]` to a function pointer. But in most C implementations that compile to asm the normal way, compilers do what you expect and generate asm with a `call` to the address of your `char array[]`. Anyway yes, the underlying point is correct and important, but you could be more precise. – Peter Cordes Dec 26 '17 at 05:08
  • 1
    Related fun fact: [`const main=6;` is the shortest C program that generates a SIGILL when compiled for x86-64](https://codegolf.stackexchange.com/a/100557/30206). Note that `int` is the default type, so the un-golfed version is `const int main=0x06;` (`06` is the opcode for `push es`, which is invalid in long mode.) It's another fun way to get the compiler to stick a label on some bytes. – Peter Cordes Dec 26 '17 at 05:15
  • @PeterCordes I know it's UB and that technically there is a distinction in the C standard, but the point is that they are interchangeable for this purpose. – forest Dec 26 '17 at 11:34
  • Maybe there isn't a short way to phrase it, and I agree this answer doesn't need to get pedantic about it. But modern C isn't a portable assembly language, now that compilers optimize based on some kinds of UB, so you can't always reason about what the compiler will do with C UB based on knowing the target asm. It works for this use-case on normal implementations, though. (Hmm, I'm not even sure it's UB, maybe just implementation-defined, at least for the other way: `char*` can alias anything and read the object representation. IDK if functions have an object representation in ISO C.) – Peter Cordes Dec 26 '17 at 15:15