CONTENTS
0 scope of the boundary
1 why not serde
2 the event header
3 the type discriminant
4 fixed-size buffers and truncation policy
5 utf-16 throughout
6 the seven event types
7 versioning
──[ 0. Scope of the Boundary ]──
The wire format described here lives at one boundary only: between the kernel
driver and the user-mode agent, on the same Windows host. Every byte we write
here is read by an agent process opening `\.WazabiEDR` on the same machine, in
the same OS install. Nothing on this wire ever crosses a network.
That constraint dictates almost every decision in this post. There is no need
for the format to be self-describing, evolvable across vendors, or readable
without the matching C/Rust headers. The two sides ship together; on a
mismatched build, refusing to parse is strictly better than guessing. The result
is a fixed-layout, version-stamped binary protocol that exists in two mirror
declarations — one in `WazabiEDR_Driver::events`, one in
`WazabiEDR_Agent::ipc::events` — built to match byte-for-byte.
──[ 1. Why Not Serde ]──
Serde (`serde`: the de facto serialization framework in the Rust ecosystem,
consisting of a runtime-agnostic core and per-format crates like `serde_json`,
`bincode`, `ciborium`) is the first reflex for any in-process serialization in
Rust. It does not fit here.
The kernel half of the driver is `no_std` and operates under tight
non-paged-pool budgets. `serde` itself is `no_std`-compatible, but most useful
binary serializers built on it — `bincode`, CBOR — bring transitive dependencies
that fight `no_std` or expect allocator behaviour we don't want on the hot path.
More importantly, the boundary is one process pair on one host. No format
negotiation, no cross-vendor compatibility, no inspection by intermediate hops.
The agent and driver always agree on the wire because they were built and
deployed together. A self-describing format adds bytes (type tags, field names)
and CPU (encoder/decoder loops) without buying anything.
So the wire format is hand-rolled `#[repr(C, packed)]` structs. The driver
writes them by `ptr::write`-ing field by field into a pool buffer; the agent
reads them by `ptr::read_unaligned` into mirror structs. No serializer, no
parser, no framing beyond a length-bearing header.
──[ 2. The Event Header ]──
Every event on the wire begins with the same 20-byte header:
#[repr(C, packed)]
pub struct EventHeader {
pub version: u16,
pub type_: u16,
pub timestamp: i64,
pub size: u32,
pub drop_count: u32,
}
Each field earns its slot:
`version` is the schema version (`EVENT_VERSION = 3`). The agent reads this
first and rejects the entire event on mismatch — it does not attempt to parse
the rest. Old driver against new agent (or vice versa) yields a clean, loud
error rather than silent misinterpretation.
`type_` is the event-type discriminant. It selects which of the seven structs
(see Section 6) the rest of the bytes represent.
`timestamp` is a 64-bit FILETIME value (FILETIME: the Windows time
representation — signed 64-bit integer counting 100-nanosecond ticks since
January 1, 1601 UTC). Captured in the producer via `KeQuerySystemTimePrecise`,
which is sub-microsecond. The agent does not re-stamp; the kernel time is
authoritative.
`size` is the total event size in bytes, header included. The agent uses it to
advance to the next event without needing the type.
`drop_count` is the running count of events evicted from the ring buffer since
the last successfully delivered event (Part 2). Zero in steady state; non-zero
values are a gap signal the agent surfaces upstream.
──[ 3. The Type Discriminant ]──
#[repr(u16)]
pub enum EventType {
ProcessCreate = 1,
ProcessExit = 2,
ImageLoad = 3,
RegistryModify = 4,
ThreadCreate = 5,
ThreadExit = 6,
ProcessHandleAccess = 7,
}
The discriminant values are explicit (`= 1`, `= 2`, …) rather than left to the
compiler. The on-wire value of the discriminant outlives the source code — a
captured event in a debug dump or a journaling layer downstream may be decoded
by a different build than the one that wrote it. Reordering variants in Rust
without specifying values would silently shift the on-wire numbers, breaking
cross-version parsing.
Two sub-discriminants live inside payloads where a parent type covers multiple
operations:
#[repr(u16)]
pub enum HandleAccessOp {
Create = 1, // OB_OPERATION_HANDLE_CREATE
Duplicate = 2, // OB_OPERATION_HANDLE_DUPLICATE
}
#[repr(u16)]
pub enum RegistryOp {
SetValue = 1,
DeleteValue = 2,
DeleteKey = 3,
RenameKey = 4,
CreateKey = 5,
}
Both sit in the event payload, not the header. They make sense only for their
containing event types and would be wasted bytes on the others.
──[ 4. Fixed-Size Buffers and Truncation Policy ]──
Every variable-length field on the wire is a fixed-size buffer paired with a
`len` companion. The `ProcessCreateEvent` is representative:
#[repr(C, packed)]
pub struct ProcessCreateEvent {
pub header: EventHeader,
pub process_id: u32,
pub parent_process_id: u32,
pub creating_process_id: u32,
pub image_path: [u16; IMAGE_PATH_MAX], // IMAGE_PATH_MAX = 512
pub image_path_len: u16,
}
The fixed `[u16; 512]` is 1024 bytes that are partially wasted on every event
with a path shorter than 512 UTF-16 units. The trade is deliberate.
A variable-size event in the kernel half would require a separate allocation
path per event size, multi-step copy logic into the ring buffer, and either
explicit framing or per-event length headers on both sides. The current layout
has one allocation, one copy, and `size_of::<T>()` is statically known on both
producer and consumer.
Truncation is uniform across every variable-length field:
- copy up to MAX - 1 units
- record the count in the matching len field
- leave the unused tail of the buffer at zero
The one-unit reservation below `MAX` is the truncation marker convention: when
`len == MAX - 1`, the source was longer than the buffer; when `len < MAX - 1`,
the source fit. Without the reservation, a path that exactly filled the buffer
would be indistinguishable from one that was truncated.
──[ 5. UTF-16 Throughout ]──
All string fields are `[u16; N]`, never `[u8; N]`. Windows hands us
`UNICODE_STRING` instances — UTF-16 little-endian buffers with byte-length and
no NUL terminator — and any conversion to UTF-8 in the kernel would require:
- a UTF-16 decoder running at potentially elevated IRQL
- a destination buffer whose size depends on the content
- re-encoding on every event
All for the benefit of the agent, which already has a perfectly good UTF-16 →
UTF-8 path in `std`. The driver ships UTF-16 raw, the agent decodes once at the
boundary.
The `len` field counts UTF-16 code units, not bytes. Byte counts would be a
small but recurring source of bug — "did I divide by 2 here?" — so the field is
in units throughout. The matching agent-side decode bounds itself by the same
unit count.
──[ 6. The Seven Event Types ]──
For completeness, the full set as it appears in `events.rs`:
#[repr(C, packed)] pub struct ProcessExitEvent { … pub process_id: u32; }
#[repr(C, packed)] pub struct ImageLoadEvent {
pub header: EventHeader,
pub process_id: u32,
pub image_base: u64, // load address in target / kernel
pub image_size: u64, // bytes
pub image_path: [u16; IMAGE_PATH_MAX],
pub image_path_len: u16,
}
#[repr(C, packed)] pub struct ThreadCreateEvent {
pub header: EventHeader,
pub process_id: u32, // owner
pub thread_id: u32,
pub creating_process_id: u32, // != process_id → remote thread
}
#[repr(C, packed)] pub struct ThreadExitEvent { …
pub process_id: u32;
pub thread_id: u32;
}
#[repr(C, packed)] pub struct ProcessHandleAccessEvent {
pub header: EventHeader,
pub source_process_id: u32,
pub target_process_id: u32,
pub desired_access: u32, // possibly modified by upstream
// OB callbacks
pub original_desired_access: u32, // as requested by caller
pub operation: u16, // HandleAccessOp
}
#[repr(C, packed)] pub struct RegistryEvent {
pub header: EventHeader,
pub process_id: u32,
pub operation: u16, // RegistryOp
pub value_type: u32, // REG_SZ / REG_DWORD / …
pub data_size: u32, // real (untruncated) size
pub key_path: [u16; REGISTRY_KEY_PATH_MAX], // 512
pub key_path_len: u16,
pub value_name: [u16; REGISTRY_VALUE_NAME_MAX], // 128
pub value_name_len: u16,
pub data_preview: [u8; REGISTRY_DATA_PREVIEW_MAX], // 256
pub data_preview_len: u16,
}
`RegistryEvent` is the largest. Two of its design choices deserve a closer look.
`data_preview` carries up to 256 bytes of the value being written by a
`RegNtPreSetValueKey`. `data_size` reports the real, untruncated length: small
`REG_DWORD` / `REG_SZ` writes are captured in full and the agent can verify that
with `data_size == data_preview_len`. For large `REG_BINARY` blobs, the agent
sees the first 256 bytes plus the real size, which is enough to flag the write
without ballooning per-event memory.
`ProcessHandleAccessEvent` carries both the current and the original
`DesiredAccess`. The OB callback chain can have multiple registered drivers, and
any of them — competing EDRs, AV products — may strip bits before our callback
runs. The dual field lets the agent observe that the access surface visible to
us differs from what the caller asked for, which is itself signal.
──[ 7. Versioning ]──
pub const EVENT_VERSION: u16 = 3;
The driver and the agent both consume the constant. The agent rejects any event
whose `header.version` doesn't equal its own; the driver writes its own value.
The bump policy:
- Any change to a field's offset, size, or semantic
meaning requires a version bump.
- Adding a new EventType does NOT bump the version. The
agent ignores unknown types — it advances using
header.size and moves on. Forward-compatible by design.
- Adding a field to an existing struct DOES bump. Even
if the new field appends, the agent's local size_of::<T>
would change, and the producer's would not match.
The agent's reject-on-mismatch behaviour is the load-bearing part of this
protocol's evolution story. A development VM running a stale driver against a
fresh agent surfaces the version difference at handshake time — the first event
the agent reads — rather than corrupting interpretation silently. The cost of
that strictness is one clean error message; the cost of the alternative is hours
of misdiagnosed downstream symptoms.
Next post: the ring buffer that absorbs the events on the way out, and the
inverted-call IOCTL that drains it.