13

When executables are packed with a tool such as UPX, the real code and data sections are encrypted or obfuscated, then loaded into memory using an injected decrypter stub. This makes static analysis impossible.

In order to circumvent this, I would normally run the executable, attach a debugger, take a memory dump, then use that dump to produce an unpacked executable. Unfortunately, this destroys the import address table (IAT).

I'm aware of certain tools that can be used to patch the IAT, but I don't know how they work internally. How would I go about manually recreating the IAT?

Polynomial
  • 132,208
  • 43
  • 298
  • 379
  • It may be easier to dump the process memory with e.g. pdmp.c or OllyDbg OllyDump so that that IAT is automatically rebuilt. – atdre Oct 16 '12 at 07:29

2 Answers2

14

Ok, I said I would go for an answer and here it is, as promised.

Firstly, I wanted to build an actual target to play with. There's nothing like a working, touchable example, and this is very much a manual process. So, without further ado, I compiled this in MSVC with /MT:

#define WINVER 0x501
#define _WIN32_WINNT   0x0501
#define _WIN32_WINDOWS 0x0501
#define _WIN32_IE      0x0501
#define UNICODE

#include <wchar.h>
#include <windows.h>

int APIENTRY wWinMain(HINSTANCE hInstance,
                     HINSTANCE hPrevInstance,
                     LPTSTR    lpCmdLine,
                     int       nCmdShow)
{
    UNREFERENCED_PARAMETER(hPrevInstance);
    UNREFERENCED_PARAMETER(lpCmdLine);

    MessageBox(NULL, L"Hello, world.", L"A messagebox", MB_OK | MB_ICONEXCLAMATION);
    return 0;
}

We'd fully expect this beasty to import user32.dll!MessageBoxW and sure enough it actually does.

To find this inside the executable itself, let's use WinDbg. I use WinDbg for two reasons: firstly, I can't afford IDA, and secondly, it's the same debugger you'd use for a kernel driver (even with IDA - which uses its engine) so it's good, if a little unintuitive.

I compiled my executable to UnpackSimple.exe for the x86 architecture, as UPX doesn't support Win64. Now, the first thing you'll notice when you run this under WinDbg is this line:

ModLoad: 01360000 01370000   UnpackSimple.exe

Right there, we see the address range to which UnpackSimple.exe has been mapped. This is important, because all addresses in its header are relative to 01360000.

So now, to find the header information, we can simply request it:

!dh UnpackSimple.exe
OPTIONAL HEADER VALUES

    10B magic #
   11.00 linker version
    5E00 size of code
    6C00 size of initialized data
       0 size of uninitialized data
    1193 address of entry point
    1000 base of code
         ----- new -----
01360000 image base            <--- this is the image base address.
    1000 section alignment
     200 file alignment
       2 subsystem (Windows GUI)
    6.00 operating system version
    0.00 image version

    ... blah blah blah ....

   0 [       0] address [size] of Export Directory
8D54 [      3C] address [size] of Import Directory
D000 [     1E0] address [size] of Resource Directory
   0 [       0] address [size] of Exception Directory
   0 [       0] address [size] of Security Directory
E000 [     6D0] address [size] of Base Relocation Directory
7150 [      38] address [size] of Debug Directory
   0 [       0] address [size] of Description Directory
   0 [       0] address [size] of Special Directory
   0 [       0] address [size] of Thread Storage Directory
89D8 [      40] address [size] of Load Configuration Directory
   0 [       0] address [size] of Bound Import Directory
7000 [     108] address [size] of Import Address Table Directory
   0 [       0] address [size] of Delay Import Directory
   0 [       0] address [size] of COR20 Header Directory
   0 [       0] address [size] of Reserved Directory

Oh look, one IAT! It lives at 01360000+7000 and we can persuade WinDbg to provide us that information in a fairly straightforward manner. The dps command we are about to use is short for dump pointers, dereferencing with symbols, it takes an argument address and a space per entry address.

dps 01360000+7000 L108/4
01367000  75b3d7ea kernel32!TerminateProcessStub
01367004  75b23f3c kernel32!CreateFileWImplementation
01367008  75b2520b kernel32!GetCommandLineWStub
... <snip> ...
013670f4  75b47ab2 kernel32!WriteConsoleW
013670f8  75b21410 kernel32!CloseHandleImplementation
013670fc  00000000
01367100  76bafd3f USER32!MessageBoxW
01367104  00000000

You can see some important features of the IAT here. First up is the address on the left hand side - IAT entry address. The address on the left is the in-memory address for the DLL. Finally, note there are some zero entries. Each DLL import ends with a zero entry, telling the loader that there are no more entries to load.

Now, we shall look at the disassembly of the main function. To do this, type these two commands

bp UnpackSimple!wWinMain
g

You'll then break on entry to the wWinMain above. Now, if you look closely:

UnpackSimple!wWinMain:
    01361000 6a30            push    30h
    01361002 68a0893601      push    offset UnpackSimple!`string' (013689a0)
    01361007 68bc893601      push    offset UnpackSimple!`string' (013689bc)
    0136100c 6a00            push    0
    0136100e ff1500713601    call    dword ptr [UnpackSimple!_imp__MessageBoxW (01367100)]
    01361014 33c0            xor     eax,eax
    01361016 c21000          ret     10h

Unsurprisingly, there's our function, _imp__MessageBoxW, referencing a nice IAT entry :)

So now the question is how on earth do we go backwards? Well, to first build something that just doesn't work, we need to get hold of a packed executable.

To that end, I ran upx.exe -o UnpackSimplePacked.exe UnpackSimple.exe to produce a nice packed binary. Load this binary up in WinDbg and the first thing you'll notice is that:

!dh UnpackSimplePacked.exe

Does absolutely nothing. However, we can note that:

ModLoad: 01220000 01233000   image01220000

has been seen - so the image we need is there, in memory. It just doesn't exist as an on disk image. At this stage, I am guessing that upx actually loads DLLs directly itself, rather than using the NT Loader. We can prove this two ways: firstly, if you type sxe ld into WinDbg, it'll break on all module loads. No breaks occur for user32.dll, nor is it loaded already.

Secondly, as you wanted to see:

    0 [       0] address [size] of Import Address Table Directory

It's completely empty.

However, if we check the loaded modules list:

lm
start    end        module name
00210000 00223000   image00210000   (deferred)             
74c20000 74c2c000   CRYPTBASE   (deferred)             
74c30000 74c90000   SspiCli    (deferred)             
74d80000 74e20000   ADVAPI32   (deferred)             
74eb0000 74f4d000   USP10      (deferred)             
74f50000 74f5a000   LPK        (deferred)             
75030000 750dc000   msvcrt     (deferred)             
755a0000 755b9000   sechost    (deferred)             
75720000 757b0000   GDI32      (deferred)             
757e0000 758d0000   RPCRT4     (deferred)             
75b10000 75c20000   kernel32   (deferred)             
76ab0000 76af7000   KERNELBASE   (deferred)             
76b40000 76c40000   USER32     (deferred)             
77550000 776d0000   ntdll      (pdb symbols) 

As you can see, USER32 has space allocated ready to go.

Now here's some trickery. I opened up another WinDbg Window with my UnpackSimple.exe version and found the address of user32!MessageBoxW. Having got this, I found it to be 76bafd3f from a base offset of 76b40000 giving me an offset of 6FD3F. In my packed version, I set a breakpoint on bp 76b40000+6FD3F and sure enough, there was a stack trace!

k
003cfd0c 013a1014 USER32!MessageBoxW
WARNING: Stack unwind information not available. Following frames may be wrong.
003cfd6c 75b233aa image013a0000+0x1014
003cfd78 77589ef2 kernel32!BaseThreadInitThunk+0xe
003cfdb8 77589ec5 ntdll!__RtlUserThreadStart+0x70
003cfdd0 00000000 ntdll!_RtlUserThreadStart+0x1b

The dissassembly at the location in image0... looks like this:

013a100e ff1500713a01    call    dword ptr [image013a0000+0x7100 (013a7100)]
013a1014 33c0            xor     eax,eax
013a1016 c21000          ret     10h

Now we're getting somewhere! At that address you'll find some interesting data:

db 013a7100
013a7100  3f fd ba 76 00 00 00

Funnily enough, there's our address! 76bafd3f in good old little endian. Let's step back a bit:

db 013a7098
013a7098  a0 22 57 77 60 22 57 77-c9 14 b2 75 ff 10 b2 75  ."Ww`"Ww...u...u
013a70a8  7b 44 b2 75 9c 17 b2 75-91 d1 b4 75 71 51 b2 75  {D.u...u...uqQ.u
013a70b8  45 49 b2 75 c4 d1 b4 75-13 49 b2 75 b3 d1 b4 75  EI.u...u.I.u...u
013a70c8  46 e0 57 77 f1 24 59 77-0d 17 b2 75 46 19 b2 75  F.Ww.$Yw...uF..u
013a70d8  2a 30 58 77 61 48 ba 75-83 46 b2 75 07 7c bc 75  *0XwaH.u.F.u.|.u
013a70e8  28 13 b2 75 bf 45 ba 75-ef c7 b3 75 b2 7a b4 75  (..u.E.u...u.z.u
013a70f8  10 14 b2 75 00 00 00 00-3f fd ba 76 00 00 00 00  ...u....?..v....

I see pointers!

This is the IAT we found previously, or at least an array of IMAGE_IMPORT_DESCRIPTOR thunks.

So having got this far, what remains? Well, we'd need to search through the dumped memory to try to find functions which match known Windows DLLs. From there, we can begin to reconstruct a full IAT for our new PE file. I suspect the process ImpRec goes through, as described in Viv's answer is to attempt to find these entries as I have done manually. I've chosen to use breakpoints, but there are a number of ways this could be automated differently, I imagine. I've never written an automated tool to go through the process by hand...! That said, at a guess hoovering up call ptr instructions, looking at their entries outside of a mapped region then attempting to find those against known DLL offsets would do the trick.

Two notes:

  1. I've written this answer as a "here's how I'd attack it manually" type answer, with relevant background, not a "use this tool".
  2. Watch out for ASLR. Those module addresses will keep shifting around, so keep an eye for changing module bases!
7

First of all for a general overview of the pe format,I will recommend reading the pecoff file format given by Microsoft.The import table is destroyed either partially or completely by most of the packers. Imprec is usually the preferred choice for rebuilding the IAT (Import Address Table) but if you really want to get into the details then you may read this excellent paper- PE Packers Used in Malicious Software by Paul Craig of security-assessment team.You may also try an open source tool scylla

In Windows executables, import table is the table(data directory) that contains all the import information(name of Dlls and functions referred in the address space ) of that image.

IMAGE_IMPORT_DESCRIPTOR has the following format:

struct IMAGE_IMPORT_DESCRIPTOR {
DWORD   OriginalFirstThunk; //same as FirstThunk
DWORD   TimeDateStamp;
DWORD   ForwarderChain;     //usually set to 0
DWORD   Name;               //name of the dll
DWORD   FirstThunk;         //before loading,contains the pointer to the api name thunk array 
};

There is one descriptor for each imported DLL and the last one consists of NULLs. The Windows loader will load all DLLs referred(remember one DLL for each IMAGE_IMPORT_DESCRIPTOR) in the executable by seeing the name field of the stucture . Then it tries to construct the IAT in the following way. For each name in the array pointed to by FirstThunk, it substitutes that address by the api's actual address.If that api name's address is not found in the FirstThunk, it will go to the OriginalFirstThunk and try to get information from there(consider it as a backup of FirstThunk).The overwritten FirstThunk is your IAT! (all of this happens before the instruction pointer ie. the eip reaches the entry point of the exe).

While building the fresh import table (or fixing the old one),Imprec will first find the IAT either by automatically searching through the code or through the value provided by the user.After that from the fresh IAT it will find the api names from their addresses and populate the IMAGE_IMPORT_DESCRIPTORs.This eases our work as the loader will now be able to find all the information in the import table.This is visually explained very well in lena's tutorial no. 21.

viv
  • 637
  • 1
  • 7
  • 13