TL;DR This is a way to execute shellcode which no longer works.
What is a function?
Shellcode is just machine code in places where it is not normally found, such as a variable of type char
. In C, there's no distinction between functions and variables. A function is just a variable that points to executable code. This means that, if you create a variable that points to executable code and call it as if it were a function, it will run. To illustrate how it is just a variable, see this simple program:
#include <stdio.h>
#include <stdint.h>
void print_hello(void)
{
printf("Hello, world!\n");
}
int main(void)
{
uintptr_t new_print_hello;
printf("print_hello = %p\n", print_hello);
new_print_hello = (uintptr_t)print_hello;
(*(void(*)())new_print_hello)();
print_hello();
return 0;
}
When compiled and executed, this program gives output like so:
$ ./a.out
print_hello = 0x28bc4bf6da
Hello, world!
Hello, world!
This makes it easy to see that a function is nothing more than an address in memory, compatible with type uintptr_t
. You can see how a function can be referenced simply as a variable, in this case by printing its value, or by copying it to another variable of a compatible type and calling the variable like a function, albeit with a bit of casting magic in order to make the C compiler happy. Once you see how a function is nothing more than a variable pointing to some executable memory, it's not a stretch to see how a variable pointing to some bytecode you manually define can also be executed.
How do functions work?
Now that you know a function is just an address in memory, you need to know how a function is actually executed. Once you call a function, typically with the call
instruction, the instruction pointer (which points to the currently executing instruction) changes to point to the first instruction of the function. The location right before the function is called is saved to the stack by call
. Once the function is finished, it is terminated with the ret
instruction, which pops it from the stack, saving it back to the IP. So a (somewhat simplified) view is that call
pushes the IP to the stack, and ret
pops it back.
Depending on the architecture and OS you are on, the arguments to the function may be passed in registers or the stack, and the return value may be in different registers, or the stack. This is called the function call ABI, and it is specific to each type of system. Shellcode designed for one type of system may not work on another, even if the architecture is the same and the operating system different, or vise versa.
What does your shellcode do?
Let's look at the disassembly of the shellcode you provided:
0000000000201010 <shellcode>:
201010: bb 00 00 00 00 mov ebx,0x0
201015: b8 01 00 00 00 mov eax,0x1
20101a: cd 80 int 0x80
This does three things. First, it sets the ebx
to 0. Second, it sets the eax
register to 1. Finally, it triggers interrupt 0x80 which, on 32-bit systems, is the syscall interrupt. In the SysV calling ABI, the syscall number is placed in eax
, and up to 6 arguments are passed in ebx
, ecx
, edx
, esi
, edi
, and ebp
. In this case, only ebx
is set, meaning the syscall takes only one argument. Once the 0x80 interrupt is called, the kernel takes over and looks at these values, executing the correct system call. The system call numbers are defined in /usr/include/asm/unistd_32.h
. Looking at that, we see that syscall 1 is exit()
. From that, we can see the three things this shellcode does:
- It sets the first argument of the syscall to 0 (which means exit success).
- It sets the syscall number to 1, which is the exit call.
- It invokes the syscall, causing the program to exit with status 0.
When you look at the big picture, we see that the shellcode is essentially equivalent to exit(0)
. It does not need ret
because it never returns, and instead causes the program to terminate. If you wanted the function to return, you would need to add ret
to the end. If you don't, at the very least, use ret
, then the program will crash unless it terminates before it reaches the end of the function, as in your example with the exit()
syscall.
What's wrong with your shellcode?
The method of calling shellcode you are showing does not work anymore. It used to, but now days Linux does not allow arbitrary data to be executed, necessitating some arcane casting. This older technique is explained well in the famous Smashing The Stack For Fun And Profit article:
Lets try to modify our first example so that it overwrites the return
address, and demonstrate how we can make it execute arbitrary code. Just
before buffer1[] on the stack is SFP, and before it, the return address.
That is 4 bytes pass the end of buffer1[]. But remember that buffer1[] is
really 2 word so its 8 bytes long. So the return address is 12 bytes from
the start of buffer1[]. We'll modify the return value in such a way that the
assignment statement 'x = 1;' after the function call will be jumped. To do
so we add 8 bytes to the return address. Our code is now:
example3.c:
------------------------------------------------------------------------------
void function(int a, int b, int c) {
char buffer1[5];
char buffer2[10];
int *ret;
ret = buffer1 + 12;
(*ret) += 8;
}
void main() {
int x;
x = 0;
function(1,2,3);
x = 1;
printf("%d\n",x);
}
------------------------------------------------------------------------------
What we have done is add 12 to buffer1[]'s address. This new address is
where the return address is stored. We want to skip pass the assignment to
the printf call. How did we know to add 8 to the return address? We used a
test value first (for example 1), compiled the program, and then started gdb
The correct version of your shellcode for newer systems would be:
const char shellcode[] = “\xbb\x00\x00\x00\x00\xb8\x01\x00\x00\x00\xcd\x80”;
int main(){
int (*ret)() = (int(*)())shellcode;
ret();
}