Ok, I said I would go for an answer and here it is, as promised.
Firstly, I wanted to build an actual target to play with. There's nothing like a working, touchable example, and this is very much a manual process. So, without further ado, I compiled this in MSVC with /MT
:
#define WINVER 0x501
#define _WIN32_WINNT 0x0501
#define _WIN32_WINDOWS 0x0501
#define _WIN32_IE 0x0501
#define UNICODE
#include <wchar.h>
#include <windows.h>
int APIENTRY wWinMain(HINSTANCE hInstance,
HINSTANCE hPrevInstance,
LPTSTR lpCmdLine,
int nCmdShow)
{
UNREFERENCED_PARAMETER(hPrevInstance);
UNREFERENCED_PARAMETER(lpCmdLine);
MessageBox(NULL, L"Hello, world.", L"A messagebox", MB_OK | MB_ICONEXCLAMATION);
return 0;
}
We'd fully expect this beasty to import user32.dll!MessageBoxW
and sure enough it actually does.
To find this inside the executable itself, let's use WinDbg. I use WinDbg for two reasons: firstly, I can't afford IDA, and secondly, it's the same debugger you'd use for a kernel driver (even with IDA - which uses its engine) so it's good, if a little unintuitive.
I compiled my executable to UnpackSimple.exe
for the x86 architecture, as UPX doesn't support Win64. Now, the first thing you'll notice when you run this under WinDbg is this line:
ModLoad: 01360000 01370000 UnpackSimple.exe
Right there, we see the address range to which UnpackSimple.exe
has been mapped. This is important, because all addresses in its header are relative to 01360000
.
So now, to find the header information, we can simply request it:
!dh UnpackSimple.exe
OPTIONAL HEADER VALUES
10B magic #
11.00 linker version
5E00 size of code
6C00 size of initialized data
0 size of uninitialized data
1193 address of entry point
1000 base of code
----- new -----
01360000 image base <--- this is the image base address.
1000 section alignment
200 file alignment
2 subsystem (Windows GUI)
6.00 operating system version
0.00 image version
... blah blah blah ....
0 [ 0] address [size] of Export Directory
8D54 [ 3C] address [size] of Import Directory
D000 [ 1E0] address [size] of Resource Directory
0 [ 0] address [size] of Exception Directory
0 [ 0] address [size] of Security Directory
E000 [ 6D0] address [size] of Base Relocation Directory
7150 [ 38] address [size] of Debug Directory
0 [ 0] address [size] of Description Directory
0 [ 0] address [size] of Special Directory
0 [ 0] address [size] of Thread Storage Directory
89D8 [ 40] address [size] of Load Configuration Directory
0 [ 0] address [size] of Bound Import Directory
7000 [ 108] address [size] of Import Address Table Directory
0 [ 0] address [size] of Delay Import Directory
0 [ 0] address [size] of COR20 Header Directory
0 [ 0] address [size] of Reserved Directory
Oh look, one IAT! It lives at 01360000+7000
and we can persuade WinDbg to provide us that information in a fairly straightforward manner. The dps
command we are about to use is short for dump pointers, dereferencing with symbols
, it takes an argument address and a space per entry address.
dps 01360000+7000 L108/4
01367000 75b3d7ea kernel32!TerminateProcessStub
01367004 75b23f3c kernel32!CreateFileWImplementation
01367008 75b2520b kernel32!GetCommandLineWStub
... <snip> ...
013670f4 75b47ab2 kernel32!WriteConsoleW
013670f8 75b21410 kernel32!CloseHandleImplementation
013670fc 00000000
01367100 76bafd3f USER32!MessageBoxW
01367104 00000000
You can see some important features of the IAT here. First up is the address on the left hand side - IAT entry address. The address on the left is the in-memory address for the DLL. Finally, note there are some zero entries. Each DLL import ends with a zero entry, telling the loader that there are no more entries to load.
Now, we shall look at the disassembly of the main function. To do this, type these two commands
bp UnpackSimple!wWinMain
g
You'll then break on entry to the wWinMain
above. Now, if you look closely:
UnpackSimple!wWinMain:
01361000 6a30 push 30h
01361002 68a0893601 push offset UnpackSimple!`string' (013689a0)
01361007 68bc893601 push offset UnpackSimple!`string' (013689bc)
0136100c 6a00 push 0
0136100e ff1500713601 call dword ptr [UnpackSimple!_imp__MessageBoxW (01367100)]
01361014 33c0 xor eax,eax
01361016 c21000 ret 10h
Unsurprisingly, there's our function, _imp__MessageBoxW
, referencing a nice IAT entry :)
So now the question is how on earth do we go backwards? Well, to first build something that just doesn't work, we need to get hold of a packed executable.
To that end, I ran upx.exe -o UnpackSimplePacked.exe UnpackSimple.exe
to produce a nice packed binary. Load this binary up in WinDbg and the first thing you'll notice is that:
!dh UnpackSimplePacked.exe
Does absolutely nothing. However, we can note that:
ModLoad: 01220000 01233000 image01220000
has been seen - so the image we need is there, in memory. It just doesn't exist as an on disk image. At this stage, I am guessing that upx
actually loads DLLs directly itself, rather than using the NT Loader. We can prove this two ways: firstly, if you type sxe ld
into WinDbg, it'll break on all module loads. No breaks occur for user32.dll
, nor is it loaded already.
Secondly, as you wanted to see:
0 [ 0] address [size] of Import Address Table Directory
It's completely empty.
However, if we check the loaded modules list:
lm
start end module name
00210000 00223000 image00210000 (deferred)
74c20000 74c2c000 CRYPTBASE (deferred)
74c30000 74c90000 SspiCli (deferred)
74d80000 74e20000 ADVAPI32 (deferred)
74eb0000 74f4d000 USP10 (deferred)
74f50000 74f5a000 LPK (deferred)
75030000 750dc000 msvcrt (deferred)
755a0000 755b9000 sechost (deferred)
75720000 757b0000 GDI32 (deferred)
757e0000 758d0000 RPCRT4 (deferred)
75b10000 75c20000 kernel32 (deferred)
76ab0000 76af7000 KERNELBASE (deferred)
76b40000 76c40000 USER32 (deferred)
77550000 776d0000 ntdll (pdb symbols)
As you can see, USER32
has space allocated ready to go.
Now here's some trickery. I opened up another WinDbg Window with my UnpackSimple.exe version and found the address of user32!MessageBoxW
. Having got this, I found it to be 76bafd3f
from a base offset of 76b40000
giving me an offset of 6FD3F
. In my packed version, I set a breakpoint on bp 76b40000+6FD3F
and sure enough, there was a stack trace!
k
003cfd0c 013a1014 USER32!MessageBoxW
WARNING: Stack unwind information not available. Following frames may be wrong.
003cfd6c 75b233aa image013a0000+0x1014
003cfd78 77589ef2 kernel32!BaseThreadInitThunk+0xe
003cfdb8 77589ec5 ntdll!__RtlUserThreadStart+0x70
003cfdd0 00000000 ntdll!_RtlUserThreadStart+0x1b
The dissassembly at the location in image0...
looks like this:
013a100e ff1500713a01 call dword ptr [image013a0000+0x7100 (013a7100)]
013a1014 33c0 xor eax,eax
013a1016 c21000 ret 10h
Now we're getting somewhere! At that address you'll find some interesting data:
db 013a7100
013a7100 3f fd ba 76 00 00 00
Funnily enough, there's our address! 76bafd3f
in good old little endian. Let's step back a bit:
db 013a7098
013a7098 a0 22 57 77 60 22 57 77-c9 14 b2 75 ff 10 b2 75 ."Ww`"Ww...u...u
013a70a8 7b 44 b2 75 9c 17 b2 75-91 d1 b4 75 71 51 b2 75 {D.u...u...uqQ.u
013a70b8 45 49 b2 75 c4 d1 b4 75-13 49 b2 75 b3 d1 b4 75 EI.u...u.I.u...u
013a70c8 46 e0 57 77 f1 24 59 77-0d 17 b2 75 46 19 b2 75 F.Ww.$Yw...uF..u
013a70d8 2a 30 58 77 61 48 ba 75-83 46 b2 75 07 7c bc 75 *0XwaH.u.F.u.|.u
013a70e8 28 13 b2 75 bf 45 ba 75-ef c7 b3 75 b2 7a b4 75 (..u.E.u...u.z.u
013a70f8 10 14 b2 75 00 00 00 00-3f fd ba 76 00 00 00 00 ...u....?..v....
I see pointers!
This is the IAT we found previously, or at least an array of IMAGE_IMPORT_DESCRIPTOR
thunks.
So having got this far, what remains? Well, we'd need to search through the dumped memory to try to find functions which match known Windows DLLs. From there, we can begin to reconstruct a full IAT for our new PE file. I suspect the process ImpRec goes through, as described in Viv's answer is to attempt to find these entries as I have done manually. I've chosen to use breakpoints, but there are a number of ways this could be automated differently, I imagine. I've never written an automated tool to go through the process by hand...! That said, at a guess hoovering up call ptr
instructions, looking at their entries outside of a mapped region then attempting to find those against known DLL offsets would do the trick.
Two notes:
- I've written this answer as a "here's how I'd attack it manually" type answer, with relevant background, not a "use this tool".
- Watch out for ASLR. Those module addresses will keep shifting around, so keep an eye for changing module bases!