CONTENTS
0 the agent's contract
1 module layout
2 open and pump
3 buffer growth on demand
4 the local spool
5 the shipper
6 the plugin server (forward reference)
7 shutdown semantics
──[ 0. The Agent's Contract ]──
The agent is a Windows service (a long-running user-mode process managed by the
Service Control Manager — *SCM*: the Windows component that starts services at
boot, restarts them on failure, and gives them their own session-0 environment)
that sits between the kernel driver and the server. Its surface obligations are:
1. Open \.WazabiEDR, issue IOCTL_WEDR_GET_EVENT in a tight
loop, and decode every event the driver returns.
2. Write every decoded event as a JSON line to a local
persistent spool (a queue-like sequence of files on disk).
3. Periodically seal closed spool segments into compressed
batches and ship them to a remote server over HTTPS.
4. Host a named-pipe server through which plugins (Part 7)
can submit their own events into the same pipeline.
5. Provide a clean shutdown path that flushes everything in
flight when the service is stopped or the host reboots.
The agent runs as `LOCAL_SYSTEM` because it needs to open the driver device —
whose ACL grants access only to `SYSTEM` — and because it reads the plugin
manifest directory whose write access is restricted to Administrators.
──[ 1. Module Layout ]──
WazabiEDR_Agent/
└── src/
├── main.rs wiring
├── config.rs parses %ProgramData%\WazabiEDR\agent.json
├── shutdown.rs Ctrl+C handler → SHUTDOWN flag
├── ipc/ driver-side IPC
│ ├── device.rs open + pump loop
│ ├── events.rs mirror of the wire-format structs
│ ├── parser.rs bytes → typed event
│ └── json.rs typed event → NDJSON line
├── spool/ on-disk persistence
│ ├── writer.rs thread owning the active file
│ └── file.rs per-file rotation
├── shipper/ HTTPS upload
│ ├── run.rs shipper thread
│ └── secret.rs HMAC + bearer credentials
├── plugin/ named-pipe server (Part 7)
└── util/
Each top-level module owns one responsibility. `main.rs` is configuration and
wiring only — it parses the config, opens the driver, spawns the spool writer
and shipper threads, spawns the plugin server, then enters the pump loop on the
main thread.
──[ 2. Open and Pump ]──
Opening the device is a routine `CreateFileW` with one detail worth flagging:
pub fn open_device() -> io::Result<HANDLE> {
let path = to_wide_nul(r"\\.\WazabiEDR");
let handle = unsafe {
CreateFileW(
path.as_ptr(),
GENERIC_READ,
FILE_SHARE_READ | FILE_SHARE_WRITE,
ptr::null(),
OPEN_EXISTING,
0,
ptr::null_mut(),
)
};
if handle == INVALID_HANDLE_VALUE {
let err = unsafe { GetLastError() };
return Err(io::Error::from_raw_os_error(err as i32));
}
Ok(handle)
}
`GetLastError` is read *immediately* after the failure check, before any other
Win32 call. `GetLastError` is thread-local but mutated by every Win32 entry
point — a single intervening allocation, format call, or logger touch can
overwrite it with an unrelated value. Treating it as a fragile resource and
snapshotting it at first opportunity is the standard discipline.
The pump loop is short:
const IOCTL_WEDR_GET_EVENT: u32 = 0x0022_6000;
const INITIAL_BUF: usize = 4096;
let mut buf = vec![0u8; INITIAL_BUF];
while !SHUTDOWN.load(Ordering::Acquire) {
let mut returned: u32 = 0;
let ok = unsafe {
DeviceIoControl(
handle,
IOCTL_WEDR_GET_EVENT,
ptr::null(), 0,
buf.as_mut_ptr() as *mut _, buf.len() as u32,
&mut returned, ptr::null_mut(),
)
};
if ok == FALSE as i32 {
// error handling — see next section
continue;
}
let payload = &buf[..returned as usize];
// spool + (optional) print
}
The IOCTL is blocking by design — Part 4's `STATUS_PENDING` mechanism is what
allows this call to sit indefinitely until the driver has an event. No
additional wait primitive is needed in user mode; no select, no timer, no
overlapped I/O. The blocking syscall is the wait.
The IOCTL code (`0x0022_6000`) is duplicated from the driver, with a comment
pointing at the source. The duplication is deliberate — sharing the constant via
a common crate would drag the driver's `wdk-sys` transitive dependencies into
the agent build (the only reason the constant is reachable in the driver is its
definition site near `IOCTL_WEDR_GET_EVENT`), and the cost of duplicating one
`u32` is negligible.
──[ 3. Buffer Growth on Demand ]──
If the driver pops an event larger than the agent's current buffer, the IOCTL
completes with `STATUS_BUFFER_TOO_SMALL` (mapped to Win32
`ERROR_INSUFFICIENT_BUFFER`) and `Information` (returned in `returned`) set to
the required size:
if err == ERROR_INSUFFICIENT_BUFFER {
let needed = returned.max(buf.len() as u32 * 2) as usize;
eprintln!("[agent] buffer too small, growing {} → {}", buf.len(), needed);
buf.resize(needed, 0);
continue;
}
The growth rule is "at least the size the driver asked for, but at least double
the current capacity". The doubling avoids reallocating on every iteration if a
sequence of slightly larger events follows. The event is not lost in this path —
per Part 4, the buffer-too-small response on a parked-IRP path does drop the
current event with a `drop_count` increment, but on the regular ring-pop path
the event remains in the queue and the agent retrieves it on the next iteration.
In practice the initial 4 KiB comfortably exceeds the largest current event
(`ProcessCreateEvent` at roughly 1 KiB), so this branch effectively never fires.
It is present for forward compatibility with future event types that may carry
larger inline payloads.
──[ 4. The Local Spool ]──
Every received event flows through a single pipeline:
driver bytes ─parse_and_decode─> typed event ─encode_kernel_event─> NDJSON line
│
▼
spool writer thread
│
▼
active.ndjson (uncompressed)
│
rotated when:
· file size > max_bytes_per_file
· file age > max_age
│
▼
batch-NNN.ndjson
│
▼
batch-NNN.ndjson.zst
│
▼
shipper picks up
The format is NDJSON (*Newline-Delimited JSON*: one independent JSON object per
line, the de facto format for log shipping). It is uncompressed during the
active phase so that each line is atomically appended and recoverable after a
crash (a torn last line is detectable and skippable). Compression happens at
rotation — the closed `batch-NNN.ndjson` is run through zstd (*zstd*: a fast
lossless compressor with adjustable speed/ratio levels) into
`batch-NNN.ndjson.zst`.
Three rotation triggers:
max_bytes_per_file — rotate when active.ndjson grows past
this size
max_age — rotate when active.ndjson reaches this
age (so an idle host doesn't keep events
in the active file indefinitely)
max_total_bytes — directory-level cap; oldest batches are
removed when the spool root exceeds this
The third trigger matters on hosts where no shipper is configured — the
directory would otherwise grow without bound. The policy is to lose old
telemetry rather than fill the disk; the agent always survives, the operator can
re-prioritise.
The spool writer is a dedicated thread fed by an mpsc channel (*mpsc,
multi-producer single-consumer channel*: the standard Rust inter-thread queue,
here used to decouple the pump from disk I/O). The pump issues `try_submit` and
never blocks: if the channel is full (writer falling behind), the event is
dropped and a single stderr warning is emitted for the agent's lifetime. The
pump-loop-never-blocks-on-disk-I/O rule is what lets the agent keep up with the
driver's worst-case event rate.
──[ 5. The Shipper ]──
If the `shipper` section is present in `agent.json`, a second thread watches the
spool directories for `.zst` files:
match cfg.shipper {
Some(sc) => {
let dirs = vec![cfg.agent.spool_dir.clone(), plugin_spool_dir.clone()];
spawn_shipper(sc, dirs).ok()
}
None => None, // spool-only mode
};
Each batch is signed with an HMAC (*HMAC, Hash-based Message Authentication
Code*: a symmetric-key construction over a hash function used here to bind the
batch content to a per-agent secret) and POSTed to the server. Outcomes:
HTTP 2xx — batch deleted from disk
HTTP 5xx, network — batch left in place; next pass retries
HTTP 4xx — batch renamed to `.poisoned` and skipped
forever (server has rejected its content;
retrying would loop)
HTTP 401 / 403 — same as 4xx, but additionally pause the
shipper for backoff (operator likely
needs to refresh credentials)
The agent runs functionally without a shipper. Air-gapped installations leave
the `shipper` section out of `agent.json` and produce the same `.zst` batches;
an operator can copy them off-host through any side-channel. The shipper is
convenience, not infrastructure.
──[ 6. The Plugin Server (Forward Reference) ]──
`plugin::spawn_server` opens the named pipe `\.pipeWazabiEDR_plugin` and accepts
plugin connections. Each successful handshake yields a session whose events flow
through the same spool/shipper pipeline as kernel events, into a separate
`plugins/` subdirectory.
The protocol, the SDK plugins are written against, and the manifest mechanism
that controls *which* plugins can complete the handshake are the subject of
Parts 7, 8, and 9 respectively.
──[ 7. Shutdown Semantics ]──
`shutdown.rs` installs a Windows console control handler (via
`SetConsoleCtrlHandler`) and a Service Control Manager stop-handler that both
flip the same atomic flag:
pub static SHUTDOWN: AtomicBool = AtomicBool::new(false);
Every long-running loop in the agent checks the flag at its natural iteration
boundary:
Pump loop — checked before each IOCTL issue
Spool writer — checked between channel drains
Shipper — checked between directory polls
Plugin server — checked before each pipe accept
No broadcast channel, no select primitive, no cross-thread condvar pyramid. One
atomic, polled at well-defined points. The cost of a polled-atomic design is
bounded latency at shutdown (worst case: the duration of one in-flight blocking
syscall), and that cost is acceptable for a process stopped once per uptime.
The clean-shutdown sequence is:
1. SHUTDOWN flag flips.
2. Pump loop wakes — either because the next IOCTL completes,
or because dispatch_cleanup in the driver cancels the
pending IRP (Part 4) when the agent's handle closes.
3. Spool writer drains its in-flight queue and closes the
active file (rotating and compressing on the way out).
4. Shipper finishes its current upload or aborts on network
timeout, then exits.
5. Plugin server sends a goodbye to each active session and
closes the pipe server.
Next post: the server side. FastAPI, PostgreSQL, OpenSearch, Redis, the ingest
path, and the reasons each store is where it is.