<index> / <wazabiedr> / intro
[ en | fr ]
┌───────────────────────┐
│                       │
│                       │
│                       │
│                       │
│                       │
│                       │
│                       │
│                       │
│                       │
│                       │
│                       │
│                       │
│                       │
│                       │
│                       │
└───────────────────────┘
Building WazabiEDR — Intro
~ lululufr
CONTENTS
  0  what this series is
  1  the system in one diagram
  2  the stack — language per layer
  3  the parts
  4  three load-bearing decisions
  5  where to start

──[ 0. What This Series Is ]──

WazabiEDR is a working endpoint detection and response (*EDR*: a class of 
host-based security product that observes process, file, registry and network
activity from the kernel, forwards it to a backend, and exposes alerts to an
operator) implementation written end-to-end. The series walks through it the way
you'd build it: kernel driver first, agent on top, server downstream, plugin
protocol last. The code is real and runs. It is not feature-complete against any commercial EDR
— there is no detection rule engine yet, no console UI worth the name, no
WHQL-signed driver. What it does have is a working ingestion pipeline from a
KMDF driver through a Rust user-mode agent into a FastAPI server with three
backing stores, plus a plugin protocol with a reference plugin that bridges
Windows Defender. That's enough to teach the load-bearing decisions, and enough
to extend. By the end of the series the reader should be able to read the seven repos at
lululufr/WazabiEDR_* and understand why each piece looks the way it does, not
just what it does.
──[ 1. The System in One Diagram ]──
flowchart TD
    subgraph EP["Endpoint (Windows host)"]
        direction TB
        DRV["Kernel driver<br/>(WazabiEDR_Driver, Rust + KMDF)"]
        AGT["Agent<br/>(WazabiEDR_Agent, Rust user-mode)"]
        PLG["Plugins<br/>(Defender Bridge, …)"]
        SDK["Plugin SDK<br/>(WazabiEDR_PluginSDK)"]
        DRV -- "IOCTL_GET_EVENT<br/>(inverted call)" --> AGT
        PLG -- "named pipe<br/>JSON length-prefixed" --> AGT
        SDK -. "linked into" .-> PLG
    end

    subgraph SRV["Server (Linux, Docker Compose)"]
        direction TB
        API["FastAPI<br/>(WazabiEDR_Server)"]
        PG[("PostgreSQL<br/>state & config")]
        OS[("OpenSearch<br/>events & alerts")]
        RD[("Redis<br/>cache & commands")]
        API --- PG
        API --- OS
        API --- RD
    end

    CON["Console (human admin)"]

    AGT -- "HTTPS:<br/>enroll · heartbeat · logs · alerts" --> API
    CON -- "HTTPS:<br/>auth · endpoints · rules" --> API
Seven repos make up the codebase. Five are on the diagram: driver, agent, plugin 
SDK, an example plugin, the server. Two are not — `WazabiEDR_Utils` ships the
operator-side CLI for plugin enrolment (Part 9), and `WazabiEDR_Doc` carries
cross-repo documentation. The split into seven is not aesthetic: each repo is
one deployment unit with its own release cadence, build pipeline, and ACL story
(*ACL, Access Control List*: the per-object permission descriptor that governs
which principals can act on a Windows object).
──[ 2. The Stack — Language Per Layer ]──

The kernel-mode driver, the user-mode agent, the plugin SDK, and the operator 
CLI are written in Rust. The server is in Python. The driver runs in ring 0 — every panic is a bug-check (the Blue Screen) — and
operates against the WDK headers via `wdk-sys` bindings, with KMDF (*Kernel-Mode
Driver Framework*: Microsoft's higher-level layer over the raw WDM driver model
that handles object lifetime and IRP cancellation for you) on top. The agent
runs as a Windows service in user mode, exclusively against `windows-sys` for
the Win32 surface it needs. The server is a FastAPI application on Python 3.12 with async SQLAlchemy 2
(against PostgreSQL), async OpenSearch (a fork of Elasticsearch used here as the
firehose store for events and alerts), and async Redis. The ingestion path is
bottlenecked by I/O, not memory layout; Python's productivity wins this layer
cleanly. Rust is not chosen for either novelty or signal. It is chosen because the driver
and the agent both have correctness requirements that map well to Rust's
compile-time guarantees: no GC pauses in `IRP` completion paths, ownership of
pool allocations enforced through the type system, and no unwind across kernel
callbacks.
──[ 3. The Parts ]──
   Part 0  ──  Intro (this post)
   Part 1  ──  Driver — DriverEntry, DriverUnload, dispatch table
   Part 2  ──  Driver — the five kernel callbacks
                (process, image, registry, thread, handle access)
   Part 3  ──  Driver — wire format (repr(C, packed))
   Part 4  ──  Driver — ring buffer + inverted-call IOCTL
   Part 5  ──  Agent — pump, spool, shipper
   Part 6  ──  Server — FastAPI + Postgres + OpenSearch + Redis
   Part 7  ──  Plugins — named pipe, handshake, framing
   Part 8  ──  Plugin SDK — trait Plugin, EventSink, Runner
   Part 9  ──  Manifests — wedr-plugin, ACL, SHA-256 via BCrypt CNG
   Part 10 ──  Defender Bridge — EvtSubscribe on the Defender channel
   Part 11 ──  Packaging — install layout, what hot-reloads
Each part can be read on its own. They are sequenced so the implementation can 
be built incrementally — every part adds code that compiles against the previous
part's code without rework.
──[ 4. Three Load-Bearing Decisions ]──

Three design decisions shape the entire codebase and surface in every part. 
Worth stating up front: **Observe-only at the kernel.** The driver registers all five callbacks as
*preoperation* callbacks (*preoperation callback*: the callback variant that
runs before the kernel commits the operation, with the ability to alter or
refuse it), but it never alters or blocks. Every callback returns the equivalent
of "allow, no changes". A future blocking mode would slot in at the same call
sites, but the current scope is pure telemetry. The rationale is operational: a blocking EDR can take a machine down if its
policy is wrong. A telemetry-only EDR can be wrong in ways that produce noise
but not outage. We will earn the right to block by being correct as observers
first. **Inverted call IPC.** The driver does not push events to the agent. The agent
issues a `DeviceIoControl` IOCTL (`IOCTL_WEDR_GET_EVENT`) and the driver
completes it with one event — synchronously if an event is queued, by parking
the IRP otherwise (*IRP, I/O Request Packet*: the structure used to carry an I/O
operation through the Windows I/O stack). When a producer (callback) generates
an event, it either copies straight into the parked IRP's output buffer and
completes it, or pushes onto the ring buffer. The advantage is that the driver owns no thread. Thread creation in
`DriverEntry` (`PsCreateSystemThread`) works but adds explicit shutdown
signalling, IRQL-respecting wait primitives, and ownership of the
producer-consumer queue. The inverted-call design folds all of that into the I/O
manager's existing IRP lifecycle. **Bounded ring buffer with drop accounting.** Producers cannot block — a kernel
callback that waits for queue space is a kernel deadlock — so the ring is
bounded (4096 slots) and full means *evict oldest*. A counter (`DROP_COUNT`)
records evictions; the next successfully delivered event carries the count in
its header, and the agent surfaces the gap. These three decisions are the answer to most "why is X done this way" questions
in subsequent parts.
──[ 5. Where to Start ]──

To build along:
    - A Windows 11 lab machine or VM with the WDK installed and
      test-signing enabled (see windows-internals/dev-setup
      in this site's older volumes for the exact steps).
    - Clone the seven repos under lululufr/WazabiEDR_*.
    - Each repo has an ARCHITECTURE.md (in French) that documents
      the corresponding code in depth. This series is the
      narrative tour across the seven; the per-repo doc is the
      reference.

Part 1 starts with the driver skeleton: a KMDF driver in Rust that loads, 
exposes the device, and does nothing else. The four sections it covers — Cargo
configuration, `no_std` + allocator + panic plumbing, `DriverEntry`,
`DriverUnload` — establish the unwind discipline that every later registration
relies on.