<index> / <wazabiedr> / plugin-sdk
[ en | fr ]
┌───────────────────────┐
│                       │
│                       │
│                       │
│                       │
│                       │
│                       │
│                       │
│                       │
│                       │
│                       │
│                       │
│                       │
│                       │
│                       │
│                       │
└───────────────────────┘
Part 8 — The Plugin SDK
~ lululufr
CONTENTS
  0  what the sdk owns
  1  two surfaces — trait vs closure
  2  metadata and the macro
  3  the runner — handshake, heartbeat, shutdown
  4  eventsink — emit, mutex, error semantics
  5  sessioncontext — shutdown without busy-waiting
  6  single-shot — no internal reconnection
  7  out of scope

──[ 0. What the SDK Owns ]──

The protocol of Part 7 is straightforward but tedious to implement: open the 
pipe, write the `Hello` frame, read the `HelloAck` or `Reject`, spawn a
heartbeat thread, handle Ctrl+C, send `Goodbye` on exit, close the pipe. Every
plugin would re-implement the same machinery. The SDK (`wedr-plugin-sdk`) absorbs all of it. A plugin author writes one thing:
a function that produces events. Both forms below are complete, working plugins:
fn main() -> wedr_plugin_sdk::Result<()> {
    let meta = wedr_plugin_sdk::metadata!("acme-tick")?;
    wedr_plugin_sdk::run_with(meta, |sink, ctx| {
        while !ctx.is_shutting_down() {
            sink.emit("acme.tick", serde_json::json!({ "alive": true }))?;
            ctx.sleep(std::time::Duration::from_secs(10));
        }
        Ok(())
    })
}
Ten lines against the SDK; everything else — pipe management, framing, 
handshake, heartbeat, signal handling — is hidden.
──[ 1. Two Surfaces — Trait vs Closure ]──

The SDK exposes two API shapes for the same underlying machinery.

The `Plugin` trait suits long-lived state and a polled loop:
pub trait Plugin {
    fn metadata(&self) -> Metadata;
    fn run(&mut self, sink: &EventSink, ctx: &SessionContext) -> Result<()>;
}
The author defines a struct, implements two methods, and calls 
`run(YourPlugin::new())`. Suitable when the plugin has configuration, internal
state, or a natural object-oriented shape. `run_with` suits callback-driven sources:
pub fn run_with<F>(self, metadata: Metadata, body: F) -> Result<()>
where
    F: FnOnce(&EventSink, &SessionContext) -> Result<()>,
The author supplies a closure that runs once. Used when the actual event source 
is an OS callback, an `EvtSubscribe` (Part 10), a hook, or any push-driven API:
the closure registers its callback with the source, hands the callback
`sink.clone()`, and parks the main thread on `ctx.wait_for_shutdown()`. Defender
Bridge in Part 10 uses this shape. Both shapes compile down to the same `run_inner` routine inside the SDK.
──[ 2. Metadata and the Macro ]──

Every plugin advertises identity in its `Hello` frame:
#[derive(Debug, Clone)]
pub struct Metadata {
    pub plugin_id:      String,   // UUID from the manifest
    pub plugin_version: String,   // crate version
    pub plugin_name:    String,   // human label
}
`plugin_id` is sourced from the environment (`WEDR_PLUGIN_ID`) rather than 
hard-coded in the binary. The reasoning is that the same plugin binary may be
installed and enrolled multiple times under different UUIDs — for testing, for
multi-tenant deployments, for staged rollouts. Hard-coding the UUID prevents
that. `Metadata::from_env` reads `WEDR_PLUGIN_ID` and constructs the metadata:
impl Metadata {
    pub fn from_env(plugin_name: &str, plugin_version: &str) -> Result<Self> {
        let id = env::var("WEDR_PLUGIN_ID")
            .ok()
            .filter(|s| !s.is_empty())
            .ok_or(Error::MissingPluginId)?;
        Ok(Self::new(id, plugin_version, plugin_name))
    }
}
The `metadata!` macro is one small convenience on top:
#[macro_export]
macro_rules! metadata {
    ($plugin_name:expr) => {
        $crate::Metadata::from_env($plugin_name, env!("CARGO_PKG_VERSION"))
    };
}
The macro form matters specifically because of how `env!` resolves: 
`env!("CARGO_PKG_VERSION")` returns the version of *whichever crate the call is
in*. Inside the SDK that would mean every plugin reports the SDK's version
forever. Inside the macro it expands at the caller's compilation site, capturing
the caller's crate version. Macros are the standard mechanism for hoisting a
value from the caller's compilation environment in Rust.
──[ 3. The Runner — Handshake, Heartbeat, Shutdown ]──

`Runner` is a fluent builder:
let result = Runner::new()
    .pipe_path(r"\\.\pipe\WazabiEDR_plugin_dev")   // optional
    .install_signal_handler(false)                   // optional
    .heartbeat(Duration::from_secs(15))              // optional
    .run(my_plugin);
The defaults match the common case. Override the pipe path only for testing or 
when running against a non-default agent. Disable the signal handler if the host
process already owns the Ctrl+C chain — Windows allows only one
console-control-handler chain per process, and the SDK's installer would
conflict. `Runner::run` proceeds in a fixed sequence:
    1. Resolve the global shutdown flag (a process-wide
       AtomicBool that every active Runner shares, so a
       process-level Ctrl+C reaches all of them).
    2. Open the pipe with OpenOptions in read-write mode.
    3. Send Hello; read HelloAck or Reject. A Reject
       returns Err(Error::Rejected{reason}) immediately.
    4. Build SinkInner around the open pipe.
    5. If heartbeat is enabled, spawn the heartbeat thread.
    6. Invoke the plugin body (trait::run or closure).
    7. On any exit path: set shutdown, attempt Goodbye
       (best-effort), join the heartbeat thread, close the
       pipe.

A plugin that returns `Err(Error::Shutdown)` from its body is treated as a clean 
exit, not a failure — `Error::Shutdown` is the documented way for a plugin to
say "I noticed the shutdown flag and I'm leaving". Any other error propagates to
the plugin's `main`.
──[ 4. EventSink — Emit, Mutex, Error Semantics ]──

`EventSink` is the API surface every plugin actually touches:
pub fn emit(&self, kind: &str, payload: serde_json::Value) -> Result<()>;
pub fn emit_with_ts(&self, kind: &str, ts_unix_ns: u64,
                    payload: serde_json::Value) -> Result<()>;
`emit` stamps the current wall-clock time. `emit_with_ts` accepts an explicit 
timestamp — used when the plugin forwards events from a source that carries its
own time (a log line being tailed has its own timestamp; the time the plugin
observed it does not matter). `EventSink` is `Clone` and the clone is cheap (an `Arc` refcount bump). The
common pattern is to clone the sink into producer callbacks:
run_with(metadata, |sink, ctx| {
    let cb_sink = sink.clone();
    let subscription = some_api::subscribe(move |evt| {
        let _ = cb_sink.emit("acme.event", evt.into_json());
    })?;
    ctx.wait_for_shutdown();
    drop(subscription);   // unsubscribe before the SDK tears down the pipe
    Ok(())
})
Under the hood the pipe is wrapped in a `Mutex<File>`. Multiple threads can call 
`emit` concurrently; the mutex is held just long enough to perform one
`write_all + flush`. The fact that the mutex serialises *producers*, not
consumers, means a slow agent at the other end cannot deadlock the plugin — it
only slows down emission. `emit` returns `Err(Error::Shutdown)` once the shutdown flag is set. That makes
the failure path explicit at every emit site: a plugin in a callback that fires
after shutdown was signalled discovers it the moment it tries to ship the next
event, without any explicit polling.
──[ 5. SessionContext — Shutdown Without Busy-Waiting ]──

`SessionContext` is the read side of the shutdown flag plus two helpers:
pub fn is_shutting_down(&self) -> bool;
pub fn wait_for_shutdown(&self);     // parks the calling thread
pub fn sleep(&self, dur: Duration);  // sleep that returns early on shutdown
`is_shutting_down` is a single `Ordering::Acquire` load. The polled-loop case 
(the trait API) calls it at each iteration boundary. `wait_for_shutdown` is the standard pattern for callback-driven plugins.
Register the callback, hand it `sink.clone()`, then call `wait_for_shutdown` on
the main thread. The function blocks on a condition variable (*condvar*: a
synchronisation primitive that lets a thread wait until another thread signals
it; the standard pairing is mutex + condvar) until the shutdown flag flips. `sleep` is the combination: a thread can ask to sleep for a duration but be
woken early if shutdown is signalled. It waits on the same condvar
`wait_for_shutdown` uses, with a timeout. The behaviour is equivalent to "poll
`is_shutting_down` until true or `dur` elapses" but without the polling overhead
— the thread is genuinely parked, and the wake is signalled by the shutdown
setter rather than detected by the sleeper.
──[ 6. Single-Shot — No Internal Reconnection ]──

`Runner` does not reconnect. If the agent goes away mid-session, the next `emit` 
returns `Err(Error::Io(BrokenPipe))`, the plugin's `run` propagates that to
`main`, and the process exits. The reasoning is one observable failure model per process. A plugin that exits
on pipe loss has one observable state: alive or not. A plugin with internal
reconnection logic has at least three: connected, reconnecting (with some events
dropped), and "the SDK thinks it's connected but isn't". The third is the worst
kind of failure to debug. Process supervisors that restart on exit are a standard OS facility — `sc.exe`
with appropriate `failure=` options, systemd `Restart=on-failure` on Linux, the
agent's own supervisor for plugins flagged `auto_launch = true` (Part 9). All of
them provide the restart behaviour reconnection would, with strictly clearer
semantics.
──[ 7. Out of Scope ]──

For completeness, the things the SDK explicitly does not try to be:
    - A retry / backoff library. The plugin author imports
      one (or writes ten lines) if needed.
    - A scheduler. Plain threads or any executor the author
      prefers.
    - A logger. Plugins log to stderr; the agent captures
      stderr when it auto-launches the plugin and forwards
      it to its own log.
    - A configuration framework. The manifest does not carry
      plugin configuration; the plugin reads its own config
      file via whatever mechanism it likes.

The SDK is small on purpose. The first iteration was substantially larger and 
tried to be a generic event-collection framework; the second cut it back to what
is here. Less code on the path between event source and pipe means fewer
regressions and a clearer surface to reason about. Next post: the manifest and the CLI that writes it. The mechanism that decides
which plugins the agent will accept handshakes from at all.