8

I saw various different scripts for a buffer overflow attack. Many of the scripts include lines that look like this:

char code[] =
"\xdb\xd7\xd9\x74\x24\xf4\xb8\x79\xc4\x64\xb7\x33\xc9\xb1\x38"
"\x5d\x83\xc5\x04\x31\x45\x13\x03\x3c\xd7\x86\x42\x42\x3f\xcf"

This one is from the CastRipper [.m3u] 2.9.6 stack buffer overflow exploit. Can someone explain what this bit is? And what exactly it's doing?

It looks like Linux header files I've seen on forensic tools.

Green Fly
  • 1,957
  • 1
  • 16
  • 21
BubbleMonster
  • 267
  • 3
  • 7

2 Answers2

8

It's actually byte code.

Bytecode, also known as p-code (portable code), is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references (normally numeric addresses) which encode the result of parsing and semantic analysis of things like type, scope, and nesting depths of program objects. They therefore allow much better performance than direct interpretation of source code.

It's a compiled program which is made out of instructions the CPU directly understands. It's often used to exploit vulnerabilities by making the vulnerable running program execute this program by overflowing it's buffer and making the return address the address which is the first instruction of the byte code program. Often you try to spawn an interactive shell with, in this case it's called shellcode.

In computer security, a shellcode is a small piece of code used as the payload in the exploitation of a software vulnerability. It is called "shellcode" because it typically starts a command shell from which the attacker can control the compromised machine, but any piece of code that performs a similar task can be called shellcode. Because the function of a payload is not limited to merely spawning a shell, some have suggested that the name shellcode is insufficient.1 However, attempts at replacing the term have not gained wide acceptance. Shellcode is commonly written in machine code.

There's a good book about the subject named the Shellcoder's Manual.

Lucas Kauffman
  • 54,169
  • 17
  • 112
  • 196
  • Also, if you are seeing these strings in the context of a specific vulnerability, what you are looking at is the whole exploit; the shellcode *and* the correct structure and address overwrite. – lynks Aug 05 '13 at 11:03
7

Background:

What you are seeing is machine language code. They are the data values that are actual instructions to a CPU chip.

When a programmer writes a program in a higher level language like C or C++, tools called compilers take those instructions and turn them into the machine language. Normally, programmers don't care about those machine language instructions. But the programmers who write those compilers care a lot about machine instructions, of course, and so do people who are writing an optimized version of a routine (something that might be inefficient in a high level language.) And so do hackers.

Even when programmers have a reason to care about machine instructions, they don't write their code in hard-to-read raw byte values like this. Instead, programmers will use a mnemonic instruction set called an assembler language. The mnemonics are just abbreviated names that represent the numbers you see here. A simple example is MOV CL,0x38 which moves the byte value 38 into the low half of the register named C. Under the covers, "MOV CL" is a machine language byte with the value of b1.

A tool called a disassembler will help translate from the digits you see into the assembly language that humans can more easily read. You could put these bytes into http://onlinedisassembler.com/odaweb/# and have a look at the instructions they will cause the CPU to execute. Your sample contained these instructions:

.data:0x00000006    b879c464b7  mov eax,0xb764c479
.data:0x0000000b    33c9        xor ecx,ecx
.data:0x0000000d    b138        mov cl,0x38
.data:0x0000000f    5d          pop ebp
.data:0x00000010    83c504      add ebp,0x4
.data:0x00000013    314513      xor DWORD PTR [ebp+0x13],eax
.data:0x00000016    033cd7      add edi,DWORD PTR [edi+edx*8]
.data:0x00000019    864242      xchg BYTE PTR [edx+0x42],al
.data:0x0000001c    3f          aas
.data:0x0000001d    cf          iret

(You can see the mnemonic assembler instructions above include mov, xor, pop, etc.)

However, prepare for more disappointment. You've selected a very sophisticated attack that makes use of a technique called Return Oriented Programming (ROP), so most of the bytes are meaningless to us because they're not CPU instructions. They're mostly data that contain parameters to and addresses of other routines. That makes this exploit particularly difficult to reverse engineer without a lot of work.

Closer to your answer:

More to your question, a buffer overflow works because of the way memory is used on the x86 architecture. When a programmer allows memory to be overwritten, the extra data goes right over the top of something else. If you write the correct number of bytes, that something else is the return pointer of the function. If you provide a machine language routine in the buffer data, then overwrite the return pointer to return to your buffer (instead of its normal return location), when the CPU returns it will execute your code instead of the code it was intended to execute.

The machine code that tricks the instruction pointer to going where the hacker needs it to go is referred to as the exploit code. The exploit code is custom written to take advantage of each specific bug.

The code that does something useful for the hacker is called the payload. Payloads are generally pre-thought out by somebody else, and are often a copy and paste affair to the hacker. A very common payload is called "shellcode", which is machine instructions that yield a command shell to the attacker.

An exploit needs both. Once the exploit has control, it invokes the shellcode, giving the hacker the access he desires.

A good resource:

There is a well documented on-line tutorial of such an exploit here: http://www.tenouk.com/Bufferoverflowc/Bufferoverflow6.html Be warned that this is pretty deep stuff, and the author of that page is assuming you understand the fundamentals of CPU operation, stacks, pointers, assembly languages, machine languages, etc.

John Deters
  • 33,650
  • 3
  • 57
  • 110