CONTENTS
0 the problem — aslr and no fixed addresses
1 why not use syscalls directly
2 finding kernel32.dll at runtime
3 parsing the export directory
4 donut — pe to shellcode
5 testing a shellcode
──[ 0. The Problem — ASLR and No Fixed Addresses ]──
A shellcode lands in a memory region whose address is unknown at compile time —
thanks to ASLR.
ASLR randomises the base addresses of stack, heap, and every loaded module on
each process launch. Your shellcode can't have hardcoded jumps like JMP
0x12345678 — they'd crash on the spot.
So the shellcode has to be Position Independent Code (PIC): only relative
addresses, and resolve everything it needs dynamically at runtime.
──[ 1. Why Not Use Syscalls Directly ]──
On Linux, shellcodes call the kernel directly with the syscall instruction and
hardcoded numbers. Simple, portable.
On Windows, that doesn't fly:
- SSNs (Windows Native API syscall numbers) change between OS versions
(XP → 7 → 10 → 11). A hardcoded SSN for NtAllocateVirtualMemory on
Windows 10 22H2 isn't the same on Windows 11 23H2.
- Microsoft doesn't document or guarantee SSN stability.
The right way is to call the documented Win32 API through DLLs:
kernel32.dll — memory management, process/thread creation, DLL loading
ntdll.dll — thin bridge to the kernel (Native API)
──[ 2. Finding kernel32.dll at Runtime ]──
The shellcode needs kernel32.dll's base address so it can grab GetProcAddress,
then find everything else. The trick uses the PEB and TEB — two structures
Windows keeps in predictable places.
Steps:
1. Read the TEB address from FS (x86) or GS (x64).
2. TEB contains a pointer to the PEB at a known offset.
3. PEB.Ldr points to PEB_LDR_DATA which holds InMemoryOrderModuleList.
4. Walk the list: entry 0 is the exe, entry 1 is ntdll.dll, entry 2
is kernel32.dll. DllBase on that entry is the base address.
This load order is stable for standard Windows processes.
──[ 3. Parsing the Export Directory ]──
With kernel32.dll located, the shellcode has to find GetProcAddress inside it.
That means parsing the PE export table by hand.
1. At offset 0x3C from DLL base: e_lfanew — offset to NT Headers.
2. NT Headers → Optional Header → DataDirectory[0]: Export Directory RVA.
3. IMAGE_EXPORT_DIRECTORY holds three parallel arrays:
AddressOfNames — RVAs of function name strings
AddressOfNameOrdinals — index into AddressOfFunctions for each name
AddressOfFunctions — RVAs of the actual function code
4. Loop over AddressOfNames, compare each with "GetProcAddress". When
you find it, use the corresponding ordinal to index AddressOfFunctions,
get the RVA, add the DLL base for the VA.
Once you have GetProcAddress, it resolves anything else by name from any loaded
DLL.
──[ 4. Donut — PE to Shellcode ]──
Writing the PEB-walking and export-parsing stubs by hand in assembly is tedious.
The open-source tool Donut (github.com/TheWover/donut) does the whole thing:
feed it any PE (EXE or DLL), it wraps it in a position-independent shellcode
loader.
./donut.exe -i "rev_shell.exe" -a 3 -f 3 -o payload.c
-i input file (PE)
-a 3 architecture: x64
-f 3 output format: C array
Donut embeds the PE, compresses it, handles relocation, resolves imports at
runtime, and calls the entry point — all in a self-contained shellcode blob.
──[ 5. Testing a Shellcode ]──
Minimal in-process test harness:
int main() {
void* mem;
// Allocate RWX memory
mem = VirtualAlloc(
NULL,
sizeof(buf),
MEM_COMMIT | MEM_RESERVE,
PAGE_EXECUTE_READWRITE
);
// Copy shellcode
RtlMoveMemory(mem, buf, sizeof(buf));
// Execute as a function pointer
((void(*)())mem)();
return 0;
}
buf[] holds the raw shellcode bytes (e.g. from Donut).
Simple loader: alloc an RWX page, copy bytes, call into it. In a real injection
scenario you'd do the same in a remote process: VirtualAllocEx +
WriteProcessMemory + CreateRemoteThread.