Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Adora

Agentic Dataflow-Oriented Robotic Architecture – a 100% Rust framework for building real-time robotics and AI applications.

Why Adora?

Performance

  • 10-17x faster than ROS2 Python – 100% Rust internals with zero-copy shared memory IPC for messages >4KB, flat latency from 4KB to 4MB payloads
  • Apache Arrow native – columnar memory format end-to-end with zero serialization overhead; shared across all language bindings

Developer Experience

  • Single CLI, full lifecycleadora run for local dev, adora up/start for distributed prod, plus build, logs, monitoring, record/replay all from one tool
  • Declarative YAML dataflows – define pipelines as directed graphs, connect nodes through typed inputs/outputs, optional type annotations with static validation
  • Multi-language nodes – write nodes in Rust, Python, C, or C++ with native APIs (not wrappers); mix languages freely in one dataflow
  • Reusable modules – compose sub-graphs as standalone YAML files with typed inputs/outputs, parameters, and nested composition
  • Hot reload – live-reload Python operators without restarting the dataflow

Production Readiness

  • Fault tolerance – per-node restart policies (never/on-failure/always), exponential backoff, health monitoring, circuit breakers with configurable input timeouts
  • Distributed by default – local shared memory between co-located nodes, automatic Zenoh pub-sub for cross-machine communication, SSH-based cluster management with label scheduling
  • Configurable queue policiesdrop_oldest (default) or backpressure per input, with metrics on dropped messages
  • OpenTelemetry – built-in structured logging with rotation/routing, metrics, distributed tracing

Debugging and Observability

  • Record/replay – capture dataflow messages to .adorec files, replay offline at any speed with node substitution
  • Topic inspectiontopic echo to print live data, topic hz TUI for frequency analysis, topic info for schema and bandwidth
  • Resource monitoringadora top TUI showing per-node CPU, memory, queue depth, network I/O across all machines
  • Log aggregation – subscribe to adora/logs to receive structured log messages from all nodes without extra wiring
  • Trace inspectiontrace list and trace view for viewing coordinator spans without external infrastructure

Ecosystem

Next Steps

Installation

cargo install adora-cli           # CLI (adora command)
pip install adora-rs              # Python node/operator API

From source

git clone https://github.com/dora-rs/adora.git
cd adora
cargo build --release -p adora-cli
PATH=$PATH:$(pwd)/target/release

# Python API (requires maturin >= 1.8: pip install maturin)
# Must run from the package directory for dependency resolution
cd apis/python/node && maturin develop --uv && cd ../../..

Platform installers

macOS / Linux:

curl --proto '=https' --tlsv1.2 -LsSf \
  https://github.com/dora-rs/adora/releases/latest/download/adora-cli-installer.sh | sh

Windows:

powershell -ExecutionPolicy ByPass -c "irm https://github.com/dora-rs/adora/releases/latest/download/adora-cli-installer.ps1 | iex"

Build features

FeatureDescriptionDefault
tracingOpenTelemetry tracing supportYes
metricsOpenTelemetry metrics collectionNo
pythonPython operator support (PyO3)No
redb-backendPersistent coordinator state (redb)No
prometheusPrometheus /metrics endpoint on coordinatorNo
cargo install adora-cli --features redb-backend

Verify

adora --version
adora status

Getting Started with Python

This guide walks you through writing Python nodes and operators for adora dataflows.

Prerequisites

cargo install adora-cli    # CLI (adora command)
pip install adora-rs       # Python node/operator API

The adora-rs package includes pyarrow as a dependency.

Building from source (instead of pip install adora-rs):

pip install maturin  # requires >= 1.8
cd apis/python/node && maturin develop --uv && cd ../../..

Hello World: Sender and Receiver

Create three files:

sender.py – sends 100 numbered messages:

import pyarrow as pa
from adora import Node

node = Node()
for i in range(100):
    node.send_output("message", pa.array([i]))

receiver.py – receives and prints messages:

from adora import Node

node = Node()
for event in node:
    if event["type"] == "INPUT":
        values = event["value"].to_pylist()
        print(f"Received {event['id']}: {values}")
    elif event["type"] == "STOP":
        break

dataflow.yml – connects sender to receiver:

nodes:
  - id: sender
    path: sender.py
    outputs:
      - message

  - id: receiver
    path: receiver.py
    inputs:
      message: sender/message

Run it:

adora run dataflow.yml

Events

Every call to node.next() or iteration over for event in node returns an event dictionary:

KeyTypeDescription
typestr"INPUT", "INPUT_CLOSED", "STOP", or "ERROR"
idstrInput name (e.g. "message") – only for INPUT events
valuepyarrow.Array or NoneThe data payload
metadatadictTracing/routing metadata

Handle events by checking event["type"]:

for event in node:
    match event["type"]:
        case "INPUT":
            process(event["id"], event["value"])
        case "INPUT_CLOSED":
            print(f"Input {event['id']} closed")
        case "STOP":
            break

Working with Arrow Data

All data flows through adora as Apache Arrow arrays. Common patterns:

import pyarrow as pa

# Simple values
node.send_output("count", pa.array([42]))
node.send_output("names", pa.array(["alice", "bob"]))

# Read values back
values = event["value"].to_pylist()  # [42] or ["alice", "bob"]

# Structured data
struct = pa.StructArray.from_arrays(
    [pa.array([1.5]), pa.array(["hello"])],
    names=["x", "y"],
)
node.send_output("point", struct)

# Raw bytes (images, serialized data, etc.)
node.send_output("frame", pa.array(raw_bytes))

Operators

Operators are lightweight alternatives to nodes. They run inside the adora runtime process (no separate OS process), making them faster for simple transformations.

Define an Operator class with an on_event method:

# doubler_op.py
import pyarrow as pa
from adora import AdoraStatus

class Operator:
    def on_event(self, event, send_output) -> AdoraStatus:
        if event["type"] == "INPUT":
            value = event["value"].to_pylist()[0]
            send_output("doubled", pa.array([value * 2]), event["metadata"])
        return AdoraStatus.CONTINUE

Reference it in YAML with operator instead of path:

nodes:
  - id: timer
    path: adora/timer/millis/500
    outputs:
      - tick

  - id: doubler
    operator:
      python: doubler_op.py
      inputs:
        tick: timer/tick
      outputs:
        - doubled

When to use operators vs nodes:

NodesOperators
Process modelSeparate OS processIn-process (shared runtime)
Startup costHigherLower
IsolationFull process isolationShared memory space
Best forLong-running, heavy computeLightweight transforms, filters

Async Nodes

For nodes that need async I/O (HTTP calls, database queries, etc.), use recv_async():

import asyncio
from adora import Node

async def main():
    node = Node()
    for _ in range(50):
        event = await node.recv_async()
        if event["type"] == "STOP":
            break
        # Do async work here
        result = await fetch_data(event["value"])
        node.send_output("result", result)

asyncio.run(main())

See examples/python-async for a complete example.

Logging

Use node.log() for structured logging that integrates with adora logs:

node.log("info", "Processing item", {"count": str(i)})

Or use Python’s standard logging module – adora captures stdout/stderr automatically:

import logging
logging.info("Processing item %d", i)

See examples/python-logging for logging module integration.

Timers

Built-in timer nodes generate periodic ticks without writing any code:

nodes:
  - id: tick-source
    path: adora/timer/millis/100    # tick every 100ms
    outputs:
      - tick

  - id: my-node
    path: my_node.py
    inputs:
      tick: tick-source/tick

Also available: adora/timer/hz/30 for 30 Hz.

Next Steps

Adora Architecture

Comprehensive architecture reference for Adora (AI-Dora, Agentic Dataflow-Oriented Robotic Architecture) — a 100% Rust framework for real-time robotics and AI applications.

Overview and Design Philosophy

Adora is built on four core principles:

  1. Dataflow-oriented: Applications are directed graphs of nodes connected by typed data channels. Nodes declare inputs and outputs; the framework handles routing, scheduling, and lifecycle.
  2. Zero-copy performance: Messages above 4 KiB use shared memory with 128-byte aligned buffers and atomic coordination, achieving 10-17x lower latency than ROS2.
  3. Multi-language: First-class support for Rust, Python (PyO3), C, and C++ nodes — all sharing the same Apache Arrow data format.
  4. Four-layer stack: Message protocol, core libraries, daemon/runtime execution, and CLI/coordinator orchestration.

Architecture Stack

┌─────────────────────────────────────────────────┐
│  CLI (adora)          Coordinator (orchestrator) │  Layer 4: Orchestration
├─────────────────────────────────────────────────┤
│  Daemon (per-machine)    Runtime (operators)     │  Layer 3: Execution
├─────────────────────────────────────────────────┤
│  adora-core    shared-memory-server    Node API  │  Layer 2: Core Libraries
├─────────────────────────────────────────────────┤
│  adora-message (protocol + Arrow types)          │  Layer 1: Protocol
└─────────────────────────────────────────────────┘

Workspace Structure

Rust edition 2024, MSRV 1.85.0, workspace version 0.1.0. All crates share the workspace version.

Binaries (7)

PathCrateRole
binaries/cliadora-cliCLI binary (adora command) — build, run, stop dataflows
binaries/coordinatoradora-coordinatorOrchestrates distributed multi-daemon deployments; WebSocket server
binaries/daemonadora-daemonSpawns nodes, manages shared-memory/TCP communication per machine
binaries/runtimeadora-runtimeIn-process operator execution (Python/C/C++ via dlopen/PyO3)
binaries/ros2-bridge-nodeadora-ros2-bridge-nodeROS2 integration node
binaries/record-nodeadora-record-nodeRecords dataflow messages to .adorec format
binaries/replay-nodeadora-replay-nodeReplays recorded messages from .adorec files

Core Libraries (6)

PathCrateRole
libraries/messageadora-messageAll inter-component message types, protocol definitions, Arrow metadata
libraries/coreadora-coreDataflow descriptor parsing, build utilities, Zenoh config
libraries/shared-memory-servershared-memory-serverZero-copy IPC for messages >= 4 KiB
libraries/recordingadora-recordingRecording format (.adorec): bincode header + entries + footer
libraries/arrow-convertadora-arrow-convertArrow type conversions (numeric, datetime)
libraries/coordinator-storeadora-coordinator-storeState persistence for coordinator (in-memory or redb backend)

Extension Libraries (5)

PathCrateRole
libraries/extensions/telemetry/tracingadora-tracingOpenTelemetry distributed tracing (OTLP exporter)
libraries/extensions/telemetry/metricsadora-metricsSystem metrics collection (CPU, memory, disk)
libraries/extensions/downloadadora-downloadHTTP file download utility for operator/node binaries
libraries/extensions/ros2-bridgeadora-ros2-bridgeROS2 integration: topic pub/sub, services, actions
libraries/log-utilsadora-log-utilsLog parsing, merging, filtering, formatting

API Crates (9)

PathCrateLanguage
apis/rust/nodeadora-node-apiRust
apis/rust/operatoradora-operator-apiRust
apis/rust/operator/macrosadora-operator-api-macrosRust (proc-macro)
apis/rust/operator/typesadora-operator-api-typesRust (FFI-safe types)
apis/python/nodeadora-node-api-pythonPython (PyO3) – builds the adora module
apis/python/operatoradora-operator-api-pythonPython (PyO3) – compiled into adora-node-api-python
apis/c/nodeadora-node-api-cC
apis/c/operatoradora-operator-api-cC/C++

Component Architecture

CLI

The adora command provides three command groups:

Lifecycle (run, up, down, build, start, stop, restart):

  • adora run executes a dataflow locally without coordinator/daemon (single-machine shortcut)
  • adora up / adora down manage coordinator + daemon infrastructure
  • adora start / adora stop control dataflows on a running coordinator

Monitoring (list, logs, inspect, topic, node, record, replay, trace):

  • Real-time inspection with adora inspect top
  • Topic subscription and data inspection
  • Recording and replay via .adorec files

Setup (status, new, graph, system, completion, self):

  • Project scaffolding, dataflow visualization, self-update

Coordinator

The coordinator is an Axum-based WebSocket server that orchestrates distributed deployments.

                          ┌──────────────────┐
                          │   Coordinator     │
            WS /api/control  │  ┌────────────┐  │  WS /api/daemon
   CLI ◄──────────────────►  │  │   State    │  │ ◄──────────────────► Daemon(s)
                          │  │   Store    │  │
                          │  └────────────┘  │
                          │  /api/artifacts  │
                          │  /health         │
                          └──────────────────┘

WebSocket routes:

  • /api/control — CLI control plane (build, start, stop, list, logs, topic subscribe)
  • /api/daemon — Daemon registration and event stream
  • /api/artifacts/{build_id}/{node_id} — Binary artifact downloads
  • /health — Health check endpoint

State management: In-memory by default, optional persistent storage via redb backend.

Daemon

The daemon runs one per machine and manages the lifecycle of all nodes on that machine.

┌──────────────────────────────────────────────────────┐
│                     Daemon                           │
│                                                      │
│  ┌──────────┐  ┌───────────┐  ┌──────────────────┐  │
│  │ Event    │  │ Spawner   │  │ Node Comm        │  │
│  │ Loop     │──│ (nodes)   │  │ ┌──────────────┐ │  │
│  │          │  └───────────┘  │ │ TCP listener │ │  │
│  │ Sources: │  ┌───────────┐  │ │ Shmem server │ │  │
│  │ • Coord  │  │ Fault     │  │ │ Unix socket  │ │  │
│  │ • Nodes  │──│ Tolerance │  │ └──────────────┘ │  │
│  │ • Zenoh  │  └───────────┘  └──────────────────┘  │
│  │ • Timers │                                        │
│  └──────────┘                                        │
│                                                      │
│  ┌──────────────────────────────────────────────┐    │
│  │ Running Dataflows                            │    │
│  │  ├─ Node A (process) ◄──► TCP/Shmem          │    │
│  │  ├─ Node B (process) ◄──► TCP/Shmem          │    │
│  │  └─ Runtime (operators) ◄──► TCP/Shmem       │    │
│  └──────────────────────────────────────────────┘    │
└──────────────────────────────────────────────────────┘

Event loop (Daemon::run_inner()): Async Tokio event loop merging:

  • Coordinator commands (WebSocket)
  • Node events (TCP/shared memory)
  • Inter-daemon events (Zenoh)
  • Heartbeat (5s interval), metrics collection (2s), health checks (5s default)

Node spawning:

  1. Create working directory for the node
  2. Set up communication channel (TCP, shmem, or Unix domain socket)
  3. Serialize NodeConfig to environment variable
  4. Spawn process with sanitized environment (blocks LD_PRELOAD, DYLD_INSERT_LIBRARIES, etc.)
  5. Monitor via ProcessHandle

Runtime

The runtime executes in-process operators (Python, shared library, WASM) in a dedicated process.

┌──────────────────────────────┐
│          Runtime             │
│                              │
│  ┌────────────────────────┐  │
│  │ Operator Runner        │  │
│  │ (separate thread)      │  │
│  │                        │  │
│  │ SharedLibrary → dlopen │  │
│  │ Python → PyO3          │  │
│  │ Wasm → (planned)       │  │
│  └──────────┬─────────────┘  │
│             │ flume(2)       │
│  ┌──────────▼─────────────┐  │
│  │ Event Merge Loop       │  │
│  │ ├─ OperatorEvent       │  │
│  │ └─ DaemonEvent         │  │
│  └────────────────────────┘  │
└──────────────────────────────┘
  • Single-threaded Tokio runtime
  • Operator runs in a separate thread, communicates via flume::bounded(2) channel
  • Input queue size per data ID configurable (default: 10)

Nodes

Nodes are standalone processes that communicate with the daemon.

Lifecycle:

  1. Node starts, reads NodeConfig from environment
  2. Registers with daemon via DaemonRequest::Register
  3. Subscribes to events via DaemonRequest::Subscribe
  4. Processes events in a loop (NextEvent → handle → SendMessage)
  5. Reports drop tokens for shared memory cleanup
  6. Signals completion via OutputsDone

Communication Protocols

CLI to Coordinator (WebSocket)

PropertyValue
TransportWebSocket over TCP
Default port6013
AuthBearer token in Authorization header
Control messagesJSON text frames (request/response/event)
Topic dataBinary frames: [16-byte UUID][bincode payload]
Rate limit20 connections per IP per 60s
Max connections256

JSON-RPC-like message format:

// Request (client → server)
{"id": "uuid", "method": "control", "params": {...}}

// Response (server → client)
{"id": "uuid", "result": {...}}
// or
{"id": "uuid", "error": "message"}

// Event (fire-and-forget, either direction)
{"event": "log", "payload": {...}}

Key control methods: Build, Start, Stop, List, Logs, TopicSubscribe, TopicUnsubscribe, Reload, Restart, Destroy.

Coordinator to Daemon (WebSocket)

PropertyValue
TransportWebSocket (daemon connects to coordinator)
Route/api/daemon
RetryExponential backoff 1s → 30s, max 50 attempts
RegistrationDaemonRegisterRequest with version, machine_id, labels

Daemon events (daemon → coordinator): BuildResult, SpawnResult, AllNodesReady, AllNodesFinished, Heartbeat, StatusReport, Log, NodeMetrics, Exit.

Coordinator commands (coordinator → daemon): Build, Spawn, AllNodesReady, StopDataflow, ReloadDataflow, Logs, Destroy, Heartbeat.

Daemon to Node (Local)

Three transport options, configured via LocalCommunicationConfig:

TCP (default):

  • Binds 127.0.0.1:0 (ephemeral port), TCP_NODELAY enabled
  • Frame format: [8-byte u64 LE length][bincode payload]
  • Max message: 64 MiB, read timeout: 30s

Shared Memory (zero-copy):

  • Four 4 KiB regions per node: control, events, drop tokens, events-close
  • Used for messages >= 4096 bytes (ZERO_COPY_THRESHOLD)
  • Atomic synchronization with acquire/release ordering

Unix Domain Socket (Unix only):

  • Socket at /tmp/{dataflow_id}/{node_id}.sock
  • Permissions: 0o700
  • Same bincode frame format as TCP

Node → Daemon requests: Register, Subscribe, SendMessage, CloseOutputs, OutputsDone, NextEvent, ReportDropTokens, SubscribeDrop, NodeConfig.

Daemon → Node replies: Result, PreparedMessage, NextEvents, NextDropEvents, NodeConfig, Empty.

Node events: Stop, Reload, Input, InputClosed, InputRecovered, NodeRestarted, AllInputsClosed.

Daemon to Daemon (Zenoh)

PropertyValue
TransportZenoh pub-sub
Router port7447
Peer port5456
Routinglinkstate
Serializationbincode

Topic pattern:

adora/{network_id}/{dataflow_id}/output/{node_id}/{output_id}

Default network_id is "default".

InterDaemonEvent:

  • Output { dataflow_id, node_id, output_id, metadata, data } — data message
  • OutputClosed { dataflow_id, node_id, output_id } — stream end

Message Types and Wire Formats

Timestamped Wrapper

All inter-component messages are wrapped in a timestamp:

#![allow(unused)]
fn main() {
pub struct Timestamped<T> {
    pub inner: T,
    pub timestamp: uhlc::Timestamp,  // hybrid logical clock
}
}

DataMessage

Transport abstraction for payloads:

#![allow(unused)]
fn main() {
pub enum DataMessage {
    Vec(AVec<u8, ConstAlign<128>>),    // inline, 128-byte aligned
    SharedMemory {
        shared_memory_id: String,
        len: usize,
        drop_token: DropToken,          // UUIDv7, tracks lifetime
    },
}
}

LogMessage

#![allow(unused)]
fn main() {
pub struct LogMessage {
    pub build_id: Option<BuildId>,
    pub dataflow_id: Option<DataflowId>,
    pub node_id: Option<NodeId>,
    pub daemon_id: Option<DaemonId>,
    pub level: LogLevelOrStdout,       // Stdout | LogLevel(Error/Warn/Info/Debug/Trace)
    pub target: Option<String>,
    pub module_path: Option<String>,
    pub file: Option<String>,
    pub line: Option<u32>,
    pub message: String,
    pub timestamp: DateTime<Utc>,
    pub fields: Option<BTreeMap<String, String>>,
}
}

NodeError

#![allow(unused)]
fn main() {
pub struct NodeError {
    pub timestamp: uhlc::Timestamp,
    pub cause: NodeErrorCause,         // GraceDuration | Cascading | FailedToSpawn | Other
    pub exit_status: NodeExitStatus,   // Success | IoError | ExitCode | Signal | Unknown
}
}

Data Format and Metadata

Apache Arrow

All data payloads use Apache Arrow columnar format with 128-byte alignment. Arrow type information is carried in every message via ArrowTypeInfo:

#![allow(unused)]
fn main() {
pub struct ArrowTypeInfo {
    pub data_type: DataType,           // Arrow DataType
    pub len: usize,
    pub null_count: usize,
    pub validity: Option<Vec<u8>>,     // null bitmap
    pub offset: usize,
    pub buffer_offsets: Vec<BufferOffset>,
    pub child_data: Vec<ArrowTypeInfo>,  // recursive for nested types
}
}

Metadata

Every message carries structured metadata:

#![allow(unused)]
fn main() {
pub struct Metadata {
    metadata_version: u16,
    timestamp: uhlc::Timestamp,
    pub type_info: ArrowTypeInfo,
    pub parameters: MetadataParameters,   // BTreeMap<String, Parameter>
}
}

Parameter Types

#![allow(unused)]
fn main() {
pub enum Parameter {
    Bool(bool),
    Integer(i64),
    String(String),
    ListInt(Vec<i64>),
    Float(f64),
    ListFloat(Vec<f64>),
    ListString(Vec<String>),
    Timestamp(DateTime<Utc>),
}
}

Well-Known Metadata Keys

KeyPurpose
request_idService request/reply correlation
goal_idAction goal identifier
goal_statusAction completion: succeeded, aborted, canceled
session_idStreaming session identifier
segment_idStreaming segment within a session
seqStreaming chunk sequence number
finLast chunk of a streaming segment
flushDiscard older queued messages on input

Zero-Copy Shared Memory

Architecture

┌────────────────────────────────────────────────────┐
│              Shared Memory Region                  │
│                                                    │
│  ┌──────────┐ ┌──────────┐ ┌──────┐ ┌────┐ ┌────┐│
│  │ Server   │ │ Client   │ │Discon│ │Len │ │Data││
│  │ Event    │ │ Event    │ │(bool)│ │(u64)│ │    ││
│  └──────────┘ └──────────┘ └──────┘ └────┘ └────┘│
│  (raw_sync_2)  (raw_sync_2) AtomicBool AtomicU64  │
└────────────────────────────────────────────────────┘

ShmemChannel

#![allow(unused)]
fn main() {
pub struct ShmemChannel {
    memory: Shmem,
    server_event: Box<dyn EventImpl>,
    client_event: Box<dyn EventImpl>,
    disconnect_offset: usize,
    len_offset: usize,
    data_offset: usize,
    server: bool,
}
}

Synchronization Protocol

Send (write → release store length → signal event → check disconnect):

  1. Copy data to shared memory buffer
  2. Store message length with Release ordering (publishes data)
  3. Signal event to wake receiver
  4. Check disconnect flag with Acquire ordering

Receive (wait event → check disconnect → acquire load length → read data):

  1. Wait for event signal
  2. Check disconnect flag with Acquire ordering
  3. Load message length with Acquire ordering (ensures all writes visible)
  4. Read and deserialize data from buffer

Thresholds and Limits

ParameterValue
ZERO_COPY_THRESHOLD4096 bytes
Control region size4 KiB per node
Events region size4 KiB per node
Drop region size4 KiB per node
Max cache count20 regions
Max cache bytes256 MiB

DropToken Lifecycle

  1. Sender allocates shared memory, generates DropToken (UUIDv7)
  2. Sender transmits DataMessage::SharedMemory { shared_memory_id, len, drop_token }
  3. Receiver processes data, returns drop_token via ReportDropTokens
  4. Sender receives confirmed token, returns memory to cache for reuse

Dataflow Specification

YAML Format

nodes:
  # Standard node (executable)
  - id: my-node
    build: cargo build --release
    path: target/release/my-node
    inputs:
      tick: adora/timer/millis/100
      data: other-node/output
    outputs:
      - result
    restart_policy: on-failure
    max_restarts: 3
    restart_delay: 1.0
    env:
      DEBUG: true

  # Single operator (Python)
  - id: processor
    operator:
      python: process.py
      inputs:
        image: camera/frame
      outputs:
        - detection

  # Multi-operator runtime
  - id: pipeline
    operators:
      - id: stage1
        python: stage1.py
        inputs:
          data: source/output
        outputs:
          - intermediate
      - id: stage2
        shared-library: target/release/libstage2.so
        inputs:
          data: stage1/intermediate
        outputs:
          - final

  # ROS2 bridge
  - id: ros-input
    ros2:
      topic: /robot/state
      message_type: sensor_msgs/JointState
      direction: subscribe
      qos:
        reliable: true
    outputs:
      - joints

Descriptor Structs

#![allow(unused)]
fn main() {
pub struct Descriptor {
    pub nodes: Vec<Node>,
    pub communication: CommunicationConfig,
    pub deploy: Option<Deploy>,
    pub debug: Debug,
    pub health_check_interval: Option<f64>,  // default 5.0s
}
}

Node types (mutually exclusive fields):

  • path — standard executable/script
  • operator — single in-process operator
  • operators — multiple in-process operators
  • custom — legacy configuration
  • ros2 — declarative ROS2 bridge

Timer Nodes

Built-in timer nodes generate periodic ticks:

  • adora/timer/millis/<N> — every N milliseconds
  • adora/timer/secs/<N> — every N seconds

Operator Sources

#![allow(unused)]
fn main() {
pub enum OperatorSource {
    SharedLibrary(String),   // .so/.dll path
    Python(PythonSource),    // Python module
    Wasm(String),            // WebAssembly (planned)
}
}

Deploy Configuration

#![allow(unused)]
fn main() {
pub struct Deploy {
    pub machine: Option<String>,
    pub working_dir: Option<PathBuf>,
    pub labels: BTreeMap<String, String>,
    pub distribute: DistributeStrategy,  // Local | Scp | Http
}
}

Fault Tolerance

Restart Policies

#![allow(unused)]
fn main() {
pub enum RestartPolicy {
    Never,       // default
    OnFailure,   // restart on non-zero exit
    Always,      // restart unless user-stopped or inputs closed
}
}

Configuration fields per node:

  • max_restarts — 0 = unlimited
  • restart_delay — initial backoff in seconds (doubles each attempt)
  • max_restart_delay — caps exponential backoff
  • restart_window — reset counter after N seconds (enables “N restarts per M seconds”)
  • health_check_timeout — kill node if no activity within this duration

Health Monitoring

  • Heartbeat interval: 5 seconds (daemon → coordinator)
  • Health check interval: 5 seconds (configurable per dataflow)
  • Metrics collection: 2-second interval (CPU, memory, disk, pending messages)

Circuit Breaker

Per-input timeout detection with automatic recovery:

  1. Input configured with input_timeout: <seconds>
  2. If no data arrives within timeout → InputClosed event sent to node
  3. Node marks input as degraded, can use cached last-known value
  4. When upstream recovers → InputRecovered event, circuit breaker re-opens
  5. Node status transitions: RunningDegradedRunning

Cascading Error Tracking

#![allow(unused)]
fn main() {
pub struct CascadingErrorCauses {
    pub caused_by: BTreeMap<NodeId, NodeId>,
}
}

Tracks which node failure caused downstream failures, enabling root-cause analysis.

Fault Tolerance Metrics

#![allow(unused)]
fn main() {
pub struct FaultToleranceSnapshot {
    pub restarts: u64,
    pub health_check_kills: u64,
    pub input_timeouts: u64,
    pub circuit_breaker_recoveries: u64,
}
}

Reported per daemon via heartbeat events. Visible via adora inspect top.

Distributed Deployment

Multi-Daemon Architecture

  ┌──────────┐       Zenoh        ┌──────────┐
  │ Daemon A │◄──────────────────►│ Daemon B │
  │ Machine 1│    pub/sub         │ Machine 2│
  │          │                    │          │
  │ Node 1   │                    │ Node 3   │
  │ Node 2   │                    │ Node 4   │
  └────┬─────┘                    └────┬─────┘
       │ WS                            │ WS
       └──────────┐  ┌────────────────┘
                  ▼  ▼
             ┌──────────┐
             │Coordinator│
             │  :6013    │
             └──────────┘

Zenoh Topic Naming

adora/{network_id}/{dataflow_id}/output/{node_id}/{output_id}
  • network_id isolates separate Adora clusters (default: "default")
  • Zenoh router port: 7447, peer port: 5456
  • Routing mode: linkstate

Build Distribution

Three strategies via DistributeStrategy:

  • Local — each daemon builds from source (default)
  • Scp — CLI pushes built binaries via SSH/SCP
  • Http — daemons pull from coordinator’s /api/artifacts endpoint

Machine Labels

Nodes can target specific machines via labels:

_unstable_deploy:
  labels:
    gpu: "true"
    arch: "arm64"

Recording and Replay

.adorec Binary Format

[HEADER]
├─ MAGIC: 8 bytes ("ADORAREC")
├─ version: u16 LE (currently 1)
├─ start_nanos: u64 LE (Unix epoch nanoseconds)
├─ dataflow_id: 16 bytes (UUID)
├─ yaml_len: u32 LE
└─ descriptor_yaml: [u8; yaml_len]

[ENTRIES] (repeated)
├─ record_len: u32 LE
├─ node_id_len: u16 LE
├─ node_id: [u8; node_id_len]
├─ output_id_len: u16 LE
├─ output_id: [u8; output_id_len]
├─ timestamp_offset_nanos: u64 LE
├─ event_bytes_len: u32 LE
└─ event_bytes: [u8; event_bytes_len]    (bincode InterDaemonEvent)

[FOOTER] (optional, written on clean finish)
├─ FOOTER_MAGIC: 8 bytes ("ADORAEND")
├─ total_messages: u64 LE
└─ total_bytes: u64 LE

Writer/Reader API

#![allow(unused)]
fn main() {
pub struct RecordingWriter<W: Write> { /* ... */ }
impl<W: Write> RecordingWriter<W> {
    pub fn new(inner: W, header: &RecordingHeader) -> Result<Self>;
    pub fn write_entry(&mut self, entry: &RecordEntry) -> Result<()>;
    pub fn finish(self) -> Result<RecordingFooter>;
}

pub struct RecordingReader<R: Read> { /* ... */ }
impl<R: Read> RecordingReader<R> {
    pub fn open(inner: R) -> Result<Self>;
    pub fn header(&self) -> &RecordingHeader;
    pub fn next_entry(&mut self) -> Result<Option<RecordEntry>>;
}
}

Extensions

Telemetry

Distributed Tracing (adora-tracing):

  • OpenTelemetry with OTLP exporter (compatible with Jaeger, Zipkin, Tempo)
  • Context propagation across nodes
  • Setup: set_up_tracing(name: &str)

Metrics (adora-metrics):

  • System metrics via sysinfo (CPU, memory, disk)
  • OpenTelemetry meter with OTLP exporter
  • Async process observer: run_metrics_monitor(meter_id)

ROS2 Bridge

Declarative YAML-based ROS2 integration supporting:

Topics — subscribe (ROS2 → Adora) or publish (Adora → ROS2):

ros2:
  topic: /camera/image
  message_type: sensor_msgs/Image
  direction: subscribe

Services — client or server role:

ros2:
  service: /add_two_ints
  service_type: example_interfaces/AddTwoInts
  role: client

Actions — goal/feedback/result lifecycle:

ros2:
  action: /fibonacci
  action_type: example_interfaces/Fibonacci
  role: client

QoS configuration:

qos:
  reliable: true
  durability: transient_local
  keep_last: 10

Download

File download utility for fetching operator/node binaries from HTTP URLs. Sanitizes filenames, sets executable permissions on Unix.

Key Constants and Defaults

ConstantValueLocation
ADORA_COORDINATOR_PORT_WS_DEFAULT6013Coordinator WebSocket port
ADORA_DAEMON_LOCAL_LISTEN_PORT_DEFAULT53291Daemon TCP listener port
ZERO_COPY_THRESHOLD4096 bytesShared memory activation
MAX_MESSAGE_BYTES64 MiBMax TCP/bincode message
MAX_CONTROL_MESSAGE_BYTES1 MiBMax control plane JSON message
TCP_READ_TIMEOUT30 secondsSocket read timeout
WS_PING_INTERVAL10 secondsWebSocket keepalive
MAX_WS_CONNECTIONS256Concurrent WebSocket limit
MAX_CONNECTIONS_PER_IP20 / 60sRate limiting
MAX_TOPICS_PER_SUBSCRIBE64Topic batch limit
MAX_SUBSCRIPTIONS_PER_CONNECTION16Per-connection limit
MAX_BINARY_PAYLOAD_BYTES64 MiBTopic data frame limit
WATCHDOG_INTERVAL5 secondsHeartbeat to coordinator
METRICS_INTERVAL2 secondsMetrics collection
HEALTH_CHECK_INTERVAL5 secondsDefault node health check
MAX_BUFFERED_LOG_MESSAGES10,000Log buffer capacity
MAX_PENDING_REPLIES256Pending coordinator replies
MAX_ERROR_BYTES4096Max error message size
Default input queue size10Per-input message buffer

Identifiers and Data Structures

ID Types

TypeUnderlyingValidation
DataflowIduuid::UuidAssigned on dataflow start
SessionIduuid::Uuid (v7)Per CLI session
BuildIduuid::Uuid (v7)Per build operation
DaemonId{ machine_id: Option<String>, uuid: Uuid (v7) }Persisted in .daemon-id
NodeIdStringValidated: [a-zA-Z0-9_.-], non-empty
DataIdStringSame validation as NodeId
OperatorIdStringNo validation
DropTokenUuid (v7)Per shared-memory message

Authentication

#![allow(unused)]
fn main() {
pub struct AuthToken(String);  // 64 hex chars (32 bytes)
}
  • Generated via cryptographically random bytes
  • Stored at <working_dir>/.adora-token
  • Constant-time comparison to prevent timing attacks
  • Applied to all WebSocket routes

Node Status

#![allow(unused)]
fn main() {
pub enum NodeStatus {
    Running,     // healthy
    Restarting,  // restart in progress
    Degraded,    // circuit breaker open (input timeout)
    Failed,      // terminal failure
}
}

Serialization Summary

ChannelFormatNotes
CLI ↔ CoordinatorJSON text framesPreserves u128 for HLC timestamps
Coordinator ↔ DaemonJSON text framesDirect string serialization
Daemon ↔ Node (TCP)bincode over length-prefixed frames8-byte LE length prefix
Daemon ↔ Node (shmem)bincode via shared memoryAtomic synchronization
Daemon ↔ Daemonbincode over ZenohApache Arrow data format
Recordingbincode entries in .adorecCustom binary container

Dataflow YAML Specification

Dataflows are defined in YAML files. Each file describes a graph of nodes, their inputs/outputs, and execution parameters.

A JSON Schema is available at the repo root (adora-schema.json) for editor autocompletion and validation.

Quick Start

nodes:
  - id: sender
    path: sender.py
    outputs:
      - message

  - id: receiver
    path: receiver.py
    inputs:
      message: sender/message

Run with adora run dataflow.yml (local mode) or adora up && adora start dataflow.yml (networked mode).

Editor Setup

Add a schema comment at the top of your YAML file for VS Code autocompletion (requires the YAML extension):

# yaml-language-server: $schema=https://raw.githubusercontent.com/dora-rs/adora/main/adora-schema.json
nodes:
  - id: my-node
    # ... autocompletion works here

Root-Level Fields

FieldTypeDefaultDescription
nodeslistrequiredList of node configurations
strict_typesboolfalseTreat type warnings as errors in validate and build
type_ruleslist[]User-defined type compatibility rules (see Type Annotations)
health_check_intervalfloat5.0Seconds between daemon health check sweeps. For each node with health_check_timeout set, the daemon checks whether the node has communicated within its timeout; if not, the node is killed and its restart_policy is evaluated
_unstable_deployobjectRoot-level deployment config (see Deployment)
_unstable_debugobjectDebug options (see Debug)

Node Configuration

Every node requires an id. All other fields are optional (though most nodes need at least path or operator/operators).

Identity

FieldTypeDescription
idstringRequired. Unique identifier. Must not contain /. Whitespace is discouraged
namestringHuman-readable display name (metadata only, used in tooling and logs)
descriptionstringDocumentation string (metadata only, not used at runtime)

Source

A node’s executable comes from a local path, a git repository, a module reference, or is implicit (operator/ROS2 nodes).

FieldTypeDescription
pathstringPath to executable or script. Can also be a URL (legacy)
modulestringPath to a module definition file (mutually exclusive with path). See Modules Guide
gitstringGit repo URL. adora build clones it and uses the clone dir as working directory
branchstringBranch to checkout (requires git, mutually exclusive with tag/rev)
tagstringTag to checkout (requires git, mutually exclusive with branch/rev)
revstringCommit hash to checkout (requires git, mutually exclusive with branch/tag)
buildstringBuild commands run during adora build. Each line runs separately. pip/pip3 lines use uv when --uv is passed
argsstringCommand-line arguments (space-separated)

Example with git source:

- id: rust-node
  git: https://github.com/dora-rs/adora.git
  branch: main
  build: cargo build -p example-node --release
  path: target/release/example-node

Data I/O

Inputs

Inputs subscribe to another node’s output using the format <node-id>/<output-id>:

inputs:
  # Short form
  image: camera/frames
  tick: adora/timer/millis/100

  # Long form with options
  sensor_data:
    source: sensor/frames
    queue_size: 10
    queue_policy: drop_oldest
    input_timeout: 5.0

  # Lossless input (blocks sender when full)
  commands:
    source: controller/cmd
    queue_size: 100
    queue_policy: backpressure
Input optionTypeDefaultDescription
sourcestringrequired<node-id>/<output-id> or timer path
queue_sizeinteger10Input buffer size
queue_policystringdrop_oldestdrop_oldest: drops oldest message when full. backpressure: buffers up to 10x queue_size without dropping (drops with ERROR log at hard cap)
input_timeoutfloatCircuit breaker timeout in seconds. If no message arrives within this period, the daemon closes the input and the node receives an InputClosed event for graceful degradation

Built-in Timers

Timers are virtual nodes that emit ticks at fixed intervals:

inputs:
  tick: adora/timer/millis/100   # every 100ms
  slow: adora/timer/millis/1000  # every 1s
  fast: adora/timer/hz/30        # 30 Hz (~33ms)

Built-in Log Aggregation

Subscribe to structured log messages from all (or filtered) nodes:

inputs:
  all_logs: adora/logs               # all nodes, all levels
  errors:   adora/logs/error         # error+ from all nodes
  sensor:   adora/logs/info/sensor   # info+ from specific node

Each message arrives as a JSON-encoded LogMessage string. See Logging for details.

Outputs

A list of output identifiers the node produces:

outputs:
  - processed_image
  - metadata

Type Annotations

Optional type annotations for inputs and outputs. Types are never required – unannotated ports remain fully dynamic.

- id: camera
  path: camera.py
  outputs:
    - image
    - depth
  output_types:
    image: std/media/v1/Image
    depth: std/media/v1/Image

- id: detector
  path: detect.py
  inputs:
    image: camera/image
  input_types:
    image: std/media/v1/Image
  outputs:
    - bbox
  output_types:
    bbox: std/vision/v1/BoundingBox
FieldTypeDefaultDescription
output_typesobject{}Maps output IDs to type URNs. Keys must match entries in outputs
input_typesobject{}Maps input IDs to expected type URNs. Keys must match entries in inputs
output_metadataobject{}Maps output IDs to lists of required metadata keys
patternstringCommunication pattern shorthand: service-server, service-client, action-server, action-client

Type URNs use the format std/<category>/v<version>/<TypeName> and support parameters (e.g. std/media/v1/AudioFrame[sample_type=f32]). See the Type Annotations Guide for the full standard type library, parameterized types, compatibility rules, and user-defined types.

Run adora validate <file> to check type annotations statically. For runtime checking, set ADORA_RUNTIME_TYPE_CHECK=warn or error:

adora validate dataflow.yml
ADORA_RUNTIME_TYPE_CHECK=warn adora run dataflow.yml

Types also appear on adora graph edge labels when annotated.

Module Parameters

When using module:, pass configuration values via params::

- id: fast_pipeline
  module: modules/transform.module.yml
  inputs:
    data: sender/value
  params:
    speed: "2.0"
    mode: turbo

Inside the module, params are available as $PARAM_<UPPERCASE_KEY> in args: and as environment variables. See the Modules Guide for full documentation.

Environment

env:
  MY_VAR: "value"          # string
  DEBUG: true               # boolean
  PORT: 8080                # integer
  RATE: 1.5                 # float
  FROM_HOST:
    __adora_env: HOST_VAR   # read from host environment at runtime

Environment variables apply to both build commands and node execution. Values support $VAR expansion syntax.

Logging

FieldTypeDefaultDescription
send_stdout_asstringRoute raw stdout/stderr lines as a data output. Each line is sent as a separate Arrow message
send_logs_asstringRoute structured log entries as a data output. Each entry is a JSON string with fields: timestamp, level, node_id, message, target, fields
min_log_levelstringSuppress logs below this level from file output, coordinator forwarding, and send_logs_as. Levels from most to least verbose: stdout (all output including raw stdout), trace, debug, info, warn, error
max_log_sizestringRotate log file at this size (e.g. "50MB", "1GB")
max_rotated_filesinteger5Number of rotated log files to keep

Example:

- id: sensor
  path: ./sensor
  min_log_level: info
  send_stdout_as: raw_output
  send_logs_as: log_entries
  max_log_size: "100MB"
  max_rotated_files: 3
  outputs:
    - data
    - raw_output
    - log_entries

When using send_stdout_as or send_logs_as, include the output name in the outputs list so downstream nodes can subscribe to it.

For a complete guide to all logging features, see Logging.

Fault Tolerance

FieldTypeDefaultDescription
restart_policystringnevernever, on-failure, or always
max_restartsinteger0Max restart attempts. 0 = unlimited
restart_delayfloatInitial backoff in seconds. Doubles each attempt
max_restart_delayfloatCap for exponential backoff
restart_windowfloatTime window for counting restarts. The counter resets after this many seconds since the first restart in the current window. Enables “N restarts per M seconds” semantics with max_restarts
health_check_timeoutfloatIf the node does not communicate with the daemon (send outputs, subscribe, etc.) for this many seconds, the daemon kills the process and evaluates the restart_policy

Restart policies:

  • never (default): no automatic restart
  • on-failure: restart only on non-zero exit code
  • always: restart on any exit, except when stopped by user or all inputs closed with success

Example with exponential backoff:

- id: sensor
  path: ./sensor
  restart_policy: on-failure
  max_restarts: 5
  restart_delay: 1.0         # 1s, 2s, 4s, 8s, 16s
  max_restart_delay: 30.0    # capped at 30s
  restart_window: 300.0      # 5 restarts per 5 minutes
  health_check_timeout: 30.0

Deployment

Assign nodes to specific machines using _unstable_deploy:

- id: camera-driver
  _unstable_deploy:
    machine: robot-arm
  path: ./target/debug/camera
  outputs:
    - frames

- id: ml-inference
  _unstable_deploy:
    machine: gpu-server
    labels:
      gpu: "true"
    distribute: scp
  path: ./target/debug/inference
  inputs:
    frames: camera-driver/frames
Deploy fieldTypeDefaultDescription
machinestringTarget machine/daemon ID. The coordinator routes the node to the daemon registered with this ID
working_dirstringWorking directory on the target machine
labelsobjectKey-value labels for scheduling. The coordinator matches these against labels reported by each daemon at registration
distributestringlocalHow built binaries reach the target daemon: local – each daemon builds from source independently; scp – CLI pushes the built binary via SSH/SCP before spawn; http – daemon pulls the binary from the coordinator’s HTTP artifact store

When nodes are on different machines, communication automatically switches from shared memory to Zenoh pub/sub.

Operator Nodes

Operators run in-process inside a shared runtime (no separate process). Use operator for a single operator or operators for multiple.

Single Operator

The id field is optional for single operators (defaults to the node id):

- id: detector
  operator:
    python: detect.py
    build: pip install -r requirements.txt
    inputs:
      image: camera/frames
    outputs:
      - bbox

Multiple Operators

Each operator in operators requires a unique id:

- id: runtime-node
  operators:
    - id: preprocessor
      shared-library: ../../target/debug/libpreprocess
      inputs:
        raw: sensor/data
      outputs:
        - processed
    - id: analyzer
      shared-library: ../../target/debug/libanalyze
      inputs:
        data: runtime-node/preprocessor/processed
      outputs:
        - result

Operator Source Types

FieldDescription
pythonPython script path, or {source: "script.py", conda_env: "myenv"}
shared-libraryPath to a shared library (.so/.dylib/.dll)

Operators also support inputs, outputs, build, send_stdout_as, send_logs_as, min_log_level, max_log_size, and max_rotated_files with the same semantics as node-level fields.

ROS2 Bridge

Declare a node as a ROS2 bridge to automatically convert between ROS2 DDS messages and Adora’s Arrow format. No custom code needed.

Single Topic

- id: camera_bridge
  ros2:
    topic: /camera/image_raw
    message_type: sensor_msgs/Image
    direction: subscribe
  outputs:
    - image

Multiple Topics

- id: robot_bridge
  ros2:
    topics:
      - topic: /camera/image_raw
        message_type: sensor_msgs/Image
        direction: subscribe
        output: image
      - topic: /cmd_vel
        message_type: geometry_msgs/Twist
        direction: publish
        input: velocity
    qos:
      reliable: true
  inputs:
    velocity: planner/cmd_vel
  outputs:
    - image

Service Bridge

- id: add_service
  ros2:
    service: /add_two_ints
    service_type: example_interfaces/AddTwoInts
    role: server
  inputs:
    request: client_node/request
  outputs:
    - response

Action Bridge

- id: nav_action
  ros2:
    action: /navigate
    action_type: nav2_msgs/NavigateToPose
    role: client
  inputs:
    goal: planner/goal
  outputs:
    - feedback
    - result

QoS Configuration

QoS can be set at the bridge level (applies to all topics) or per-topic:

QoS fieldTypeDefaultDescription
reliableboolfalseReliable vs best-effort transport
durabilitystringvolatilevolatile or transient_local
livelinessstringautomaticautomatic, manual_by_participant, manual_by_topic
lease_durationfloatinfinityLease duration in seconds
max_blocking_timefloatMax blocking time for reliable transport
keep_lastinteger1History depth (KeepLast policy)
keep_allboolfalseUse KeepAll history instead of KeepLast

Other ROS2 Fields

FieldTypeDefaultDescription
namespacestring/ROS2 namespace
node_namestringnode idROS2 node name

Debug

_unstable_debug:
  publish_all_messages_to_zenoh: true

Required for adora topic echo, adora topic hz, and adora topic info commands.

Communication Patterns

Adora supports four communication patterns built on top of the dataflow:

  • Topic (default): pub/sub dataflow
  • Service: request/reply via request_id metadata
  • Action: goal/feedback/result via goal_id/goal_status metadata, with cancellation support
  • Streaming: session/segment/chunk via session_id/segment_id/seq/fin/flush metadata, with queue flush for interruption

See Communication Patterns for details and examples.

Full Example

health_check_interval: 10.0

_unstable_debug:
  publish_all_messages_to_zenoh: true

nodes:
  - id: webcam
    operator:
      python: webcam.py
      inputs:
        tick: adora/timer/millis/100
      outputs:
        - image

  - id: detector
    operator:
      python: detect.py
      build: pip install ultralytics
      inputs:
        image: webcam/image
      outputs:
        - bbox

  - id: plotter
    operator:
      python: plot.py
      inputs:
        image: webcam/image
        bbox: detector/bbox

  - id: logger
    path: ./logger
    inputs:
      bbox: detector/bbox
    send_stdout_as: logs
    min_log_level: info
    restart_policy: on-failure
    max_restarts: 3
    outputs:
      - logs

Type Annotations

Optional type annotations on dataflow inputs and outputs. Types are never required – unannotated ports remain fully dynamic. Type checking runs at build time and validate time (no runtime overhead by default).

Quick Start

nodes:
  - id: camera
    path: camera.py
    outputs:
      - image
    output_types:
      image: std/media/v1/Image

  - id: detector
    path: detect.py
    inputs:
      image: camera/image
    input_types:
      image: std/media/v1/Image
    outputs:
      - bbox
    output_types:
      bbox: std/vision/v1/BoundingBox

Validate with:

adora validate dataflow.yml

# Fail with non-zero exit code on warnings (for CI)
adora validate --strict-types dataflow.yml

# Type checks also run during build
adora build dataflow.yml --strict-types

You can also set strict_types: true at the top level of the YAML to enable strict mode without the CLI flag:

strict_types: true
nodes:
  # ...

Type URN Format

Type URNs follow the pattern std/<category>/v<version>/<TypeName>:

std/core/v1/Float32
std/media/v1/Image
std/vision/v1/BoundingBox

Parameterized Types

Some struct types accept parameters to distinguish variants:

std/media/v1/AudioFrame[sample_type=f32]
std/media/v1/AudioFrame[sample_type=f32,channels=2]

Matching rules:

  • Same base + same params -> compatible
  • Same base + one side unparameterized -> compatible (wildcard)
  • Same base + different param values -> mismatch
# These are compatible (wildcard):
output_types:
  audio: std/media/v1/AudioFrame[sample_type=f32]
input_types:
  audio: std/media/v1/AudioFrame

# These are a mismatch:
output_types:
  audio: std/media/v1/AudioFrame[sample_type=f32]
input_types:
  audio: std/media/v1/AudioFrame[sample_type=i16]

Standard Type Library

std/core/v1

TypeArrow TypeDescription
Float32Float3232-bit float
Float64Float6464-bit float
Int32Int3232-bit signed integer
Int64Int6464-bit signed integer
UInt8UInt88-bit unsigned integer
UInt32UInt3232-bit unsigned integer
UInt64UInt6464-bit unsigned integer
StringUtf8UTF-8 string
BytesLargeBinaryRaw bytes (universal sink – any type is compatible)
BoolBooleanBoolean

std/math/v1

TypeArrow TypeFieldsDescription
Vector3Structx, y, z (Float64)3D vector
QuaternionStructx, y, z, w (Float64)Quaternion
PoseStructposition, orientation6-DOF pose
TransformStructtranslation, rotationCoordinate transform

std/control/v1

TypeArrow TypeDescription
TwistStructLinear and angular velocity
JointStateStructJoint positions, velocities, efforts
OdometryStructPose + Twist in a reference frame

std/media/v1

TypeArrow TypeParametersDescription
ImageStructencodingRaw image (width, height, encoding, data)
CompressedImageLargeBinaryformatJPEG/PNG compressed image
PointCloudStructpoint_type3D point cloud
AudioFrameStructsample_type (default: f32)Audio samples

std/vision/v1

TypeArrow TypeDescription
BoundingBoxStruct2D bounding box with confidence and label
DetectionStructObject detection result (list of BoundingBox)
SegmentationStructPixel-level segmentation mask

Validation Rules

adora validate and adora build check:

  1. Key existence: output_types keys must appear in outputs, input_types keys must appear in inputs
  2. URN resolution: All type URNs must exist in the standard or user-defined type library. Typos get “did you mean?” suggestions.
  3. Edge compatibility: Connected edges must have compatible types (exact match, implicit widening, or user-defined rules)
  4. Timer auto-typing: Timer inputs (adora/timer/*) are automatically typed as std/core/v1/UInt64
  5. Type inference: When only the upstream side annotates a type, it is inferred on the downstream input and reported
  6. Parameterized types: Parameter mismatches are detected (see above)
  7. Metadata patterns: output_metadata keys and pattern shorthands are validated (see below)
  8. Schema compatibility: Struct types are checked at the field level – missing fields or wrong field types are flagged

All checks produce warnings (non-fatal by default). Use --strict-types to treat warnings as errors for CI pipelines.

Type warnings:
  - node "camera": output_types key "framez" not found in outputs list
  - node "detector": unknown type "std/vision/v1/BoundingBx" on output "bbox"
    (did you mean "std/vision/v1/BoundingBox"?)
  - node "detector": type mismatch on input "image": upstream camera/image
    declares "std/core/v1/Bytes", but expected "std/media/v1/Image"

Inferred types:
  inferred std/core/v1/Float64 on processor/reading (from sensor/reading)

Type Compatibility Rules

Beyond exact matching, the type checker supports implicit widening conversions:

FromTo
UInt8UInt32
UInt32UInt64
Int32Int64
Float32Float64
Any typeBytes (universal sink)

Widening is transitive up to depth 3 (e.g. UInt8 -> UInt32 -> UInt64 works, but chains of 4+ do not).

User-Defined Compatibility Rules

Add custom rules in the dataflow YAML:

type_rules:
  - from: myproject/SensorV1
    to: myproject/SensorV2

nodes:
  # ...

Metadata Patterns

Nodes that implement communication patterns (services, actions) can declare required metadata keys on their outputs.

Explicit metadata

- id: server
  path: server.py
  outputs:
    - response
  output_metadata:
    response: [request_id]

Pattern shorthand

Use the pattern field to auto-imply required metadata keys:

- id: server
  path: server.py
  pattern: service-server
  outputs:
    - response
PatternRequired metadata keys
service-serverrequest_id
service-clientrequest_id
action-servergoal_id, goal_status
action-clientgoal_id

User-Defined Types

Projects can define custom types in a types/ directory next to the dataflow. The directory structure determines the URN prefix:

project/
  dataflow.yml
  types/
    myproject/
      sensors/
        v1.yml    # URN prefix: myproject/sensors/v1

Type YAML files use the same format as the standard library:

types:
  MySensor:
    arrow: Struct
    description: Custom sensor reading
    fields:
      - name: temperature
        type: Float32
      - name: humidity
        type: Float32

This creates the URN myproject/sensors/v1/MySensor.

The std/ prefix is reserved and cannot be used for user types.

User types are loaded automatically by adora validate and adora build when a types/ directory exists.

Runtime Type Checking

In addition to static validation, Adora supports optional runtime type checking on send_output(). When enabled, the actual Arrow data type is compared against the declared output_types at send time.

Enable via environment variable:

# Warn on mismatches (log and continue)
ADORA_RUNTIME_TYPE_CHECK=warn adora run dataflow.yml

# Error on mismatches (node returns error)
ADORA_RUNTIME_TYPE_CHECK=error adora run dataflow.yml

Valid values: 1, warn, true (warn mode), error (error mode). Unset or any other value disables checking (zero overhead).

Scope:

  • Validates output_types on the sender side (send_output() calls). input_types are checked statically by adora validate but not enforced at runtime
  • Covers all languages that send Arrow arrays (Rust, Python, C++ Arrow path)
  • Raw byte sends (send_output_bytes, C nodes) are untyped and skip checking
  • Complex types (Struct-based: Image, Vector3, etc.) are skipped – only primitive types, String, Bytes, and Bool are validated at runtime

Graph Visualization

When outputs have type annotations, adora graph shows the type on edge labels:

adora graph dataflow.yml --open

Edges display as output_name [TypeName] (e.g. image [Image]).

Operators

Operators support the same output_types, input_types, output_metadata, and pattern fields:

- id: runtime-node
  operators:
    - id: preprocessor
      python: preprocess.py
      inputs:
        raw: sensor/data
      input_types:
        raw: std/core/v1/Bytes
      outputs:
        - processed
      output_types:
        processed: std/media/v1/Image

Modules (Reusable Sub-Dataflows)

Modules let you define reusable sub-graphs of nodes in separate YAML files and compose them into larger dataflows. Modules are expanded at compile time – the runtime never sees them.

Quick Start

Module file (modules/transform_module.yml):

module:
  name: transform_pipeline
  inputs: [raw_data]
  outputs: [filtered]

nodes:
  - id: doubler
    path: doubler.py
    inputs:
      data: _mod/raw_data
    outputs:
      - doubled

  - id: filter
    path: filter_even.py
    inputs:
      data: doubler/doubled
    outputs:
      - filtered

Dataflow file (dataflow.yml):

nodes:
  - id: sender
    path: sender.py
    outputs:
      - value

  - id: pipeline
    module: modules/transform_module.yml
    inputs:
      raw_data: sender/value

  - id: receiver
    path: receiver.py
    inputs:
      filtered: pipeline/filtered

After expansion, pipeline becomes two nodes: pipeline.doubler and pipeline.filter, with all wiring resolved automatically.

Module Definition File

A module file has two sections:

module: header

FieldTypeRequiredDescription
namestringyesModule name (metadata only)
inputslistnoRequired input port names
inputs_optionallistnoOptional input ports (silently skipped if not wired)
outputslistnoOutput port names exposed to the parent dataflow

nodes: list

Standard node definitions, with one special syntax: _mod/port_name references a module input port. When expanded, _mod/port_name is replaced with whatever the parent wired to that port.

module:
  name: my_module
  inputs: [camera_feed]
  outputs: [detections]

nodes:
  - id: detector
    path: detect.py
    inputs:
      image: _mod/camera_feed    # resolved to parent's wiring
    outputs:
      - detections

Module-level build

Modules can have a top-level build: command that runs before any inner node builds:

module:
  name: ml_pipeline
  inputs: [image]
  outputs: [result]

build: pip install -r requirements.txt

nodes:
  - id: model
    path: model.py
    inputs:
      image: _mod/image
    outputs:
      - result

Using Modules

Reference a module in a dataflow node using the module: field instead of path::

- id: nav_stack
  module: modules/navigation.module.yml
  inputs:
    goal_pose: localization/goal

The module node’s inputs: map wires parent outputs to module input ports. External nodes reference module outputs as <module_id>/<output_name> (e.g., nav_stack/cmd_vel).

Parameters

Pass configuration values to modules via params::

- id: fast_pipeline
  module: modules/transform_module.yml
  inputs:
    raw_data: sender/value
  params:
    speed: "2.0"
    mode: turbo

Inside the module, reference params in args: using $PARAM_<UPPERCASE_KEY>:

nodes:
  - id: processor
    path: processor.py
    args: --speed $PARAM_SPEED --mode $PARAM_MODE
    inputs:
      data: _mod/raw_data
    outputs:
      - result

Parameters are also injected as environment variables (PARAM_SPEED, PARAM_MODE) into every node inside the module.

Expansion Rules

  1. Load the module YAML file and validate its header
  2. Prefix all internal node IDs with {module_id}. (e.g., nav_stack.planner)
  3. Replace _mod/port_name references with the actual sources from the parent’s input map
  4. Rewrite internal cross-references (e.g., planner/path becomes nav_stack.planner/path)
  5. Map module-declared outputs to internal node outputs, so nav_stack/cmd_vel resolves to nav_stack.controller/cmd_vel
  6. Replace the module node with the expanded flat nodes
  7. Substitute params: values in args: fields and inject as env vars

Use adora expand to see the result:

adora expand dataflow.yml

Nested Modules

Modules can reference other modules. The expansion is recursive with a depth limit of 8 levels:

# outer_module.yml
module:
  name: outer
  inputs: [data]
  outputs: [result]

nodes:
  - id: inner
    module: inner_module.yml
    inputs:
      raw: _mod/data

  - id: postprocess
    path: postprocess.py
    inputs:
      data: inner/processed
    outputs:
      - result

After expansion, node IDs are fully qualified: outer.inner.some_node.

Optional Inputs

Declare inputs as optional when a module should work with or without certain connections:

module:
  name: flexible_processor
  inputs: [data]
  inputs_optional: [config]
  outputs: [result]

nodes:
  - id: processor
    path: processor.py
    inputs:
      data: _mod/data
      config: _mod/config    # silently dropped if not wired
    outputs:
      - result

When the parent doesn’t wire config, the input is simply omitted from the expanded node.

Visualization

adora graph renders module boundaries as Mermaid subgraphs, making it easy to see which nodes came from which module:

adora graph dataflow.yml --open

Validation

Validate a standalone module file without a full dataflow:

adora expand --module modules/transform_module.yml

This checks:

  • Valid YAML structure
  • Module header is present with name, inputs, outputs
  • All _mod/ references correspond to declared inputs or optional inputs
  • No duplicate node IDs
  • Internal wiring is consistent

Security

  • Path confinement: Module file paths must resolve within the dataflow’s base directory. Absolute paths and directory traversal (../) outside the base are rejected.
  • File size limit: Module files are capped at 1 MB.
  • Depth limit: Recursive nesting is capped at 8 levels.
  • Param key validation: Parameter keys must be alphanumeric with underscores only.

Example

See examples/module-dataflow/ for a complete working example with a sender, transform module (doubler + filter), and receiver.

adora run examples/module-dataflow/dataflow.yml

Communication Patterns

Adora is a dataflow framework based on pub/sub message passing. On top of basic topics, the framework supports service (request/reply), action (goal/feedback/result), and streaming (session/segment/chunk) patterns using well-known metadata keys. No changes to the daemon, coordinator, or YAML syntax are required – the patterns are implemented as conventions at the node API level.

1. Topic (pub/sub)

The default pattern. A node publishes data on an output, and any node that subscribes to that output receives it.

nodes:
  - id: publisher
    outputs:
      - data
  - id: subscriber
    inputs:
      data: publisher/data

Use when: streaming sensor data, periodic status, fire-and-forget events.

2. Service (request/reply)

A client sends a request and expects exactly one response, correlated by a request_id metadata key.

Well-known metadata keys

KeyConstantDescription
request_idadora_node_api::REQUEST_IDUUID v7 correlating request and response

YAML

nodes:
  - id: client
    inputs:
      tick: adora/timer/millis/500
      response: server/response
    outputs:
      - request

  - id: server
    inputs:
      request: client/request
    outputs:
      - response

Node API helpers

#![allow(unused)]
fn main() {
// Client: send request with auto-generated request_id
let rid = node.send_service_request("request".into(), params, data)?;

// Server: pass through metadata.parameters (includes request_id)
node.send_service_response("response".into(), metadata.parameters, result)?;
}

The server MUST pass through the request_id from the incoming request’s metadata parameters into the response. The client matches responses to requests using this key.

Example: examples/service-example/

3. Action (goal/feedback/result)

A client sends a goal and receives periodic feedback plus a final result. Actions support cancellation.

Well-known metadata keys

KeyConstantDescription
goal_idadora_node_api::GOAL_IDUUID v7 identifying the goal
goal_statusadora_node_api::GOAL_STATUSFinal status of the goal

Goal status values:

ValueConstantMeaning
succeededGOAL_STATUS_SUCCEEDEDGoal completed successfully
abortedGOAL_STATUS_ABORTEDGoal aborted by server
canceledGOAL_STATUS_CANCELEDGoal canceled by client

YAML

nodes:
  - id: client
    inputs:
      tick: adora/timer/millis/2000
      feedback: server/feedback
      result: server/result
    outputs:
      - goal
      - cancel

  - id: server
    inputs:
      goal: client/goal
      cancel: client/cancel
    outputs:
      - feedback
      - result

Cancel pattern

The client sends a message on the cancel output with goal_id in the metadata. The server checks for cancel requests between processing steps and sends a result with goal_status = "canceled".

Example: examples/action-example/

4. Streaming (session/segment/chunk)

For real-time pipelines (voice, video, sensor streams) where a user can interrupt mid-stream and queued data must be discarded.

Well-known metadata keys

KeyTypeConstantDescription
session_idStringSESSION_IDIdentifies the conversation/session
segment_idIntegerSEGMENT_IDLogical unit within a session (e.g. one utterance)
seqIntegerSEQChunk sequence number within a segment
finBoolFINtrue on the last chunk of a segment
flushBoolFLUSHtrue to discard older queued messages on this input

YAML

nodes:
  - id: asr
    inputs:
      mic: mic-source/audio
    outputs:
      - text

  - id: llm
    inputs:
      text: asr/text
    outputs:
      - tokens

  - id: tts
    inputs:
      tokens: llm/tokens
    outputs:
      - audio

Node API

#![allow(unused)]
fn main() {
use adora_node_api::{StreamSegment, AdoraNode};

let mut seg = StreamSegment::new();

// Send chunks with auto-incrementing seq (e.g. inside an ASR node)
node.send_stream_chunk("text".into(), &mut seg, false, chunk_data)?;
// Mark final chunk of a segment
node.send_stream_chunk("text".into(), &mut seg, true, last_chunk)?;

// On user interruption: flush downstream queues and start a new segment.
// The prior segment ends without a fin=true signal -- old data is discarded.
let flush_params = seg.flush();
node.send_output("text".into(), flush_params, empty_data)?;
}

Queue flush behavior

When a message arrives with flush: true in its metadata, the receiver’s input queue is cleared of all older messages before the flush message is delivered. This enables instant interruption in voice pipelines – when the user speaks over TTS output, the ASR node sends a new segment with flush: true, and the TTS node immediately discards any queued audio chunks from the previous response.

Note: flush discards all queued messages on the input regardless of session_id. Do not multiplex independent sessions on a single input when using flush.

Python

# Streaming metadata is a plain dict
params = {
    "session_id": session_id,
    "segment_id": 1,
    "seq": 0,
    "fin": False,
    "flush": True,  # flush older queued messages
}
node.send_output("text", data, metadata={"parameters": params})

5. Choosing a pattern

Need a response?Long-running?Cancelable?Real-time stream?Pattern
No--NoTopic
YesNoNoNoService
YesYesOptionalNoAction
NoYesVia flushYesStreaming

6. Important details

  • goal_status matching is case-sensitive. Always use the exact lowercase values: "succeeded", "aborted", "canceled". The ROS2 bridge defaults to Aborted for unrecognised values.

7. Python compatibility

Python nodes use the same metadata conventions. Parameters are plain dicts with string keys:

import uuid

# Service client (uuid7 for time-ordered IDs, matching Rust API)
params = {"request_id": str(uuid.uuid7())}
node.send_output("request", data, metadata={"parameters": params})

# Service server -- pass through parameters
node.send_output("response", result, metadata=event["metadata"])

Note: uuid.uuid7() requires Python 3.13+. On older versions, use the uuid_utils package or uuid.uuid4() (random v4 also works for correlation, but loses time-ordering).

Rust API Reference

This document covers the two main Rust crates for building Adora dataflow components:

  • adora-node-api – for standalone node executables
  • adora-operator-api – for in-process operators managed by the Adora runtime

Node API (adora-node-api)

Add to your Cargo.toml:

[dependencies]
adora-node-api = { workspace = true }

AdoraNode

The primary struct for sending outputs and retrieving node information. Obtained through one of the initialization functions below.

Initialization

#![allow(unused)]
fn main() {
// Recommended: auto-detect environment (daemon, testing, or interactive).
pub fn init_from_env() -> NodeResult<(Self, EventStream)>

// Same as init_from_env but errors instead of falling back to interactive mode.
pub fn init_from_env_force() -> NodeResult<(Self, EventStream)>

// For dynamic nodes: connect to the daemon by node ID.
pub fn init_from_node_id(node_id: NodeId) -> NodeResult<(Self, EventStream)>

// Try init_from_env first; fall back to init_from_node_id.
pub fn init_flexible(node_id: NodeId) -> NodeResult<(Self, EventStream)>

// Standalone interactive mode (prompts for inputs on the terminal).
pub fn init_interactive() -> NodeResult<(Self, EventStream)>

// Integration test mode with synthetic inputs/outputs.
pub fn init_testing(
    input: TestingInput,
    output: TestingOutput,
    options: TestingOptions,
) -> NodeResult<(Self, EventStream)>
}

init_from_env is the recommended entry point. It checks, in order:

  1. Thread-local testing state set by setup_integration_testing
  2. ADORA_NODE_CONFIG environment variable (set by the daemon)
  3. ADORA_TEST_WITH_INPUTS environment variable (file-based integration testing)
  4. Interactive terminal fallback (only if stdin is a TTY)

Sending Outputs

All send methods silently ignore output IDs not declared in the dataflow YAML.

#![allow(unused)]
fn main() {
// Send an Arrow array. Copies data into shared memory when beneficial.
pub fn send_output(
    &mut self,
    output_id: DataId,
    parameters: MetadataParameters,
    data: impl Array,
) -> NodeResult<()>

// Send raw bytes. Copies into shared memory when beneficial.
pub fn send_output_bytes(
    &mut self,
    output_id: DataId,
    parameters: MetadataParameters,
    data_len: usize,
    data: &[u8],
) -> NodeResult<()>

// Send raw bytes via a closure for zero-copy writing.
pub fn send_output_raw<F>(
    &mut self,
    output_id: DataId,
    parameters: MetadataParameters,
    data_len: usize,
    data: F,
) -> NodeResult<()>
where
    F: FnOnce(&mut [u8])

// Send raw bytes with explicit Arrow type information.
pub fn send_typed_output<F>(
    &mut self,
    output_id: DataId,
    type_info: ArrowTypeInfo,
    parameters: MetadataParameters,
    data_len: usize,
    data: F,
) -> NodeResult<()>
where
    F: FnOnce(&mut [u8])

// Send a pre-allocated DataSample with type information.
pub fn send_output_sample(
    &mut self,
    output_id: DataId,
    type_info: ArrowTypeInfo,
    parameters: MetadataParameters,
    sample: Option<DataSample>,
) -> NodeResult<()>

// Report output IDs as closed. No further sends allowed for those IDs.
pub fn close_outputs(&mut self, outputs_ids: Vec<DataId>) -> NodeResult<()>
}

Service, Action, and Streaming Helpers

Higher-level methods for the communication patterns. These use well-known metadata keys to correlate requests, goals, responses, and streaming segments.

#![allow(unused)]
fn main() {
// Generate a unique, time-ordered ID (UUID v7) for correlation.
pub fn new_request_id() -> String
pub fn new_goal_id() -> String   // alias for new_request_id

// Send a service request. Injects a `request_id` into parameters and returns it.
pub fn send_service_request(
    &mut self,
    output_id: DataId,
    parameters: MetadataParameters,
    data: impl Array,
) -> NodeResult<String>

// Send a service response. Semantic alias for send_output.
// Caller must pass through the request_id from the incoming request's metadata.
pub fn send_service_response(
    &mut self,
    output_id: DataId,
    parameters: MetadataParameters,
    data: impl Array,
) -> NodeResult<()>
}

Service example (client sends request, server replies):

#![allow(unused)]
fn main() {
// Client: auto-generates and injects request_id
let rid = node.send_service_request("request".into(), params, data)?;

// Server: pass through metadata.parameters (includes request_id)
node.send_service_response("response".into(), metadata.parameters, result)?;
}

Action example (client sends goal, server streams feedback + result):

#![allow(unused)]
fn main() {
use adora_node_api::{GOAL_ID, GOAL_STATUS, GOAL_STATUS_SUCCEEDED, Parameter};

// Client: generate goal_id, attach to params
let goal_id = AdoraNode::new_goal_id();
params.insert(GOAL_ID.to_string(), Parameter::String(goal_id));
node.send_output("goal".into(), params, data)?;

// Server: extract goal_id, send feedback/result with goal_status
let gid = get_string_param(&metadata.parameters, GOAL_ID);
}

Streaming example (real-time voice/video pipeline with interruption):

#![allow(unused)]
fn main() {
use adora_node_api::StreamSegment;

// Create a streaming segment builder (auto-generates session_id)
let mut seg = StreamSegment::new();

// Send chunks with auto-incrementing seq
node.send_stream_chunk("text".into(), &mut seg, false, chunk_data)?;
// Mark final chunk of a segment
node.send_stream_chunk("text".into(), &mut seg, true, last_chunk)?;

// On user interruption: flush downstream queues and start a new segment
let flush_params = seg.flush();
node.send_output("text".into(), flush_params, empty_data)?;
}

See patterns.md for the full guide and examples/service-example and examples/action-example for working code.

Data Allocation

#![allow(unused)]
fn main() {
// Allocate a DataSample of the given size.
// Uses shared memory for data >= ZERO_COPY_THRESHOLD (4096 bytes).
pub fn allocate_data_sample(&mut self, data_len: usize) -> NodeResult<DataSample>
}

Node Information

#![allow(unused)]
fn main() {
// Node ID from the dataflow YAML.
pub fn id(&self) -> &NodeId

// Unique identifier for this dataflow run.
pub fn dataflow_id(&self) -> &DataflowId

// Input/output configuration for this node.
pub fn node_config(&self) -> &NodeRunConfig

// True if this node was restarted after a previous exit or failure.
pub fn is_restart(&self) -> bool

// Number of times this node has been restarted (0 on first run).
pub fn restart_count(&self) -> u32

// Parsed dataflow YAML descriptor.
pub fn dataflow_descriptor(&self) -> NodeResult<&Descriptor>
}

Logging

Rust nodes have two ways to emit structured logs. Both produce identical structured log entries in the daemon.

Option 1: Node API (recommended for most cases)

All log methods emit structured JSONL to stdout, which the daemon parses automatically. Works with min_log_level filtering, send_logs_as routing, and adora/logs subscribers.

#![allow(unused)]
fn main() {
// General structured log. Level: "error", "warn", "info", "debug", "trace".
pub fn log(&self, level: &str, message: &str, target: Option<&str>)

// Structured log with additional key-value fields.
pub fn log_with_fields(
    &self,
    level: &str,
    message: &str,
    target: Option<&str>,
    fields: Option<&BTreeMap<String, String>>,
)

// Convenience methods (no target parameter).
pub fn log_error(&self, message: &str)
pub fn log_warn(&self, message: &str)
pub fn log_info(&self, message: &str)
pub fn log_debug(&self, message: &str)
pub fn log_trace(&self, message: &str)
}

Option 2: Rust tracing crate

When adora’s tracing subscriber is initialized (via init_tracing() or the default feature), tracing::info!() etc. output structured JSON to stdout that the daemon parses identically:

#![allow(unused)]
fn main() {
tracing::info!("Sensor started");
tracing::warn!(sensor_id = "temp-01", "High temperature");
}

Use tracing when you want ecosystem integration (spans, instrumentation, OpenTelemetry). Use node.log_*() when you want explicit control or structured fields as BTreeMap.

MethodStructured?Fields?OpenTelemetry?Best for
node.log_info(msg)YesNoNoQuick one-liner
node.log_with_fields(...)YesYes (BTreeMap)NoStructured key-value context
tracing::info!(key = val, msg)YesYes (spans)YesEcosystem integration, OTel
println!()No (stdout level)NoNoQuick debugging

EventStream

Asynchronous iterator over incoming events destined for this node. Implements the futures::Stream trait.

The event stream closes itself after a Stop event is received. Nodes should exit once the stream ends.

#![allow(unused)]
fn main() {
// Block until the next event arrives. Returns None when the stream closes.
// Uses an internal EventScheduler that may reorder events for fairness.
pub fn recv(&mut self) -> Option<Event>

// Block with a timeout. Returns an Event::Error on timeout.
pub fn recv_timeout(&mut self, dur: Duration) -> Option<Event>

// Async receive with EventScheduler reordering.
pub async fn recv_async(&mut self) -> Option<Event>

// Async receive with a timeout. Returns Event::Error on timeout.
pub async fn recv_async_timeout(&mut self, dur: Duration) -> Option<Event>

// Non-blocking receive. Returns TryRecvError::Empty if nothing is ready.
pub fn try_recv(&mut self) -> Result<Event, TryRecvError>

// Drain all buffered events without blocking.
// Returns Some(Vec::new()) if nothing is ready; None if the stream is closed.
pub fn drain(&mut self) -> Option<Vec<Event>>

// True if no events are buffered in the scheduler or receiver.
pub fn is_empty(&self) -> bool

// Returns and resets accumulated drop counts per input ID.
// For `drop_oldest` inputs, drops happen at `queue_size`.
// For `backpressure` inputs, drops happen at 10x `queue_size` (hard safety cap).
pub fn drain_drop_counts(&mut self) -> HashMap<DataId, u64>
}

EventStream also implements futures::Stream<Item = Event>, so it can be used with StreamExt::next() and other combinators. Unlike recv/recv_async, the Stream implementation does not use the EventScheduler, preserving chronological event order.


Event

Represents an incoming event. This enum is #[non_exhaustive] – ignore unknown variants to stay forward-compatible.

#![allow(unused)]
fn main() {
#[non_exhaustive]
pub enum Event {
    // An input was received from another node.
    Input {
        id: DataId,           // input ID from the YAML (not the sender's output ID)
        metadata: Metadata,   // timestamp and type information
        data: ArrowData,      // Apache Arrow data
    },

    // The sender mapped to this input exited; no more data will arrive.
    InputClosed { id: DataId },

    // A previously closed input recovered (e.g., upstream node came back after timeout).
    InputRecovered { id: DataId },

    // An upstream node has restarted. Useful for resetting caches or state.
    NodeRestarted { id: NodeId },

    // The event stream is about to close. See StopCause for the reason.
    Stop(StopCause),

    // Instructs the node to reload an operator (used internally by the runtime).
    Reload { operator_id: Option<OperatorId> },

    // An unexpected internal error. Log it for debugging.
    Error(String),
}
}

StopCause

#![allow(unused)]
fn main() {
#[non_exhaustive]
pub enum StopCause {
    // Explicit stop via `adora stop` or Ctrl-C. Exit promptly or be killed.
    Manual,

    // All inputs were closed (upstream nodes exited). Only sent if the node has inputs.
    AllInputsClosed,
}
}

Supporting Types

DataSample

A data region suitable for sending as an output message. Uses shared memory for data >= ZERO_COPY_THRESHOLD to enable zero-copy transfer.

Implements Deref<Target = [u8]> and DerefMut for reading and writing the underlying bytes.

Metadata and MetadataParameters

#![allow(unused)]
fn main() {
// Full metadata attached to every input event.
pub struct Metadata {
    // Contains timestamp, Arrow type info, and user-defined parameters.
}

// User-controlled metadata fields attached when sending outputs.
// Type alias for BTreeMap<String, Parameter>.
// Default is empty. Pass metadata.parameters from an input to forward metadata.
pub type MetadataParameters = BTreeMap<String, Parameter>;

// A single metadata parameter value.
pub enum Parameter {
    Bool(bool), Integer(i64), Float(f64), String(String),
    ListInt(Vec<i64>), ListFloat(Vec<f64>), ListString(Vec<String>),
    Timestamp(DateTime<Utc>),
}

// Extract typed parameters, returning None if missing or wrong type.
pub fn get_string_param<'a>(params: &'a MetadataParameters, key: &str) -> Option<&'a str>
pub fn get_integer_param(params: &MetadataParameters, key: &str) -> Option<i64>
pub fn get_bool_param(params: &MetadataParameters, key: &str) -> Option<bool>
}

Well-known metadata keys (for communication patterns):

ConstantValueUsed by
REQUEST_ID"request_id"Service request/response correlation
GOAL_ID"goal_id"Action goal identification
GOAL_STATUS"goal_status"Action result status
GOAL_STATUS_SUCCEEDED"succeeded"Goal completed successfully
GOAL_STATUS_ABORTED"aborted"Goal aborted by server
GOAL_STATUS_CANCELED"canceled"Goal canceled by client
SESSION_ID"session_id"Streaming session identifier
SEGMENT_ID"segment_id"Streaming segment within a session
SEQ"seq"Streaming chunk sequence number
FIN"fin"Last chunk of a streaming segment
FLUSH"flush"Discard older queued messages on input

All constants are re-exported from adora_node_api.

Identity Types

#![allow(unused)]
fn main() {
// Unique identifier for a running dataflow instance (UUID v4).
pub struct DataflowId(/* ... */);

// Node identifier, as defined in the dataflow YAML.
pub struct NodeId(/* ... */);

// Input/output identifier, as defined in the dataflow YAML.
pub struct DataId(/* ... */);
}

Error Types

#![allow(unused)]
fn main() {
#[derive(Debug, Error)]
pub enum NodeError {
    Init(String),        // config parsing, env vars, daemon handshake
    Connection(String),  // daemon connection lost
    Output(String),      // send or close failure
    Data(String),        // allocation or descriptor parsing
    Internal(eyre::Report),  // catch-all for unexpected errors
}

pub type NodeResult<T> = Result<T, NodeError>;
}

TryRecvError

#![allow(unused)]
fn main() {
pub enum TryRecvError {
    Empty,   // no event available right now
    Closed,  // event stream has been closed
}
}

ZERO_COPY_THRESHOLD

#![allow(unused)]
fn main() {
pub const ZERO_COPY_THRESHOLD: usize = 4096;
}

Messages smaller than this threshold are sent via TCP. Messages at or above this size use shared memory for zero-copy transfer.

ArrowData

#![allow(unused)]
fn main() {
// Wrapper around arrow::array::ArrayRef. Implements Deref to the inner ArrayRef.
pub struct ArrowData(pub arrow::array::ArrayRef);
}

Data from Event::Input arrives as ArrowData. Use TryFrom conversions or Arrow APIs to extract typed values.


InputTracker

Helper for tracking input health and caching the last received value per input. Useful for graceful degradation when upstream nodes time out.

#![allow(unused)]
fn main() {
pub struct InputTracker { /* ... */ }

impl InputTracker {
    pub fn new() -> Self

    // Update state from an event. Returns true if the event was relevant.
    pub fn process_event(&mut self, event: &Event) -> bool

    // Current state of an input (Healthy or Closed), if tracked.
    pub fn state(&self, id: &DataId) -> Option<InputState>

    // True if the input is currently closed.
    pub fn is_closed(&self, id: &DataId) -> bool

    // Last received value for an input. Available even when closed.
    pub fn last_value(&self, id: &DataId) -> Option<&ArrowData>

    // All inputs currently in Closed state.
    pub fn closed_inputs(&self) -> Vec<&DataId>

    // True if any tracked input is closed.
    pub fn any_closed(&self) -> bool
}

pub enum InputState {
    Healthy,  // receiving data normally
    Closed,   // upstream exited or timed out
}
}

Integration Testing

The integration_testing module provides tools for testing nodes without a running daemon.

setup_integration_testing

Sets up thread-local state so that the next call to AdoraNode::init_from_env on the same thread initializes in test mode.

#![allow(unused)]
fn main() {
pub fn setup_integration_testing(
    input: TestingInput,
    output: TestingOutput,
    options: TestingOptions,
)
}

TestingInput

#![allow(unused)]
fn main() {
pub enum TestingInput {
    // Load events from a JSON file (must deserialize to IntegrationTestInput).
    FromJsonFile(PathBuf),

    // Provide events directly.
    Input(IntegrationTestInput),
}
}

TestingOutput

#![allow(unused)]
fn main() {
pub enum TestingOutput {
    // Write outputs to a JSONL file (created or overwritten).
    ToFile(PathBuf),

    // Write outputs as JSONL to any writer.
    ToWriter(Box<dyn std::io::Write + Send>),

    // Send each output as a JSON object to a flume channel.
    ToChannel(flume::Sender<serde_json::Map<String, serde_json::Value>>),
}
}

TestingOptions

#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Default)]
pub struct TestingOptions {
    // Skip time offsets in outputs for deterministic comparison.
    pub skip_output_time_offsets: bool,
}
}

Environment Variable Testing

Nodes using init_from_env also support file-based testing via environment variables:

VariableDescription
ADORA_TEST_WITH_INPUTSPath to a JSON input file (IntegrationTestInput format)
ADORA_TEST_WRITE_OUTPUTS_TOPath for the output JSONL file (default: outputs.jsonl next to inputs)
ADORA_TEST_NO_OUTPUT_TIME_OFFSETIf set, omit time offsets for deterministic outputs

Operator API (adora-operator-api)

Operators are in-process components managed by the Adora runtime. They are compiled as shared libraries (.so/.dylib/.dll) and loaded by the runtime.

Add to your Cargo.toml:

[dependencies]
adora-operator-api = { workspace = true }

[lib]
crate-type = ["cdylib"]

AdoraOperator Trait

#![allow(unused)]
fn main() {
pub trait AdoraOperator: Default {
    fn on_event(
        &mut self,
        event: &Event,
        output_sender: &mut AdoraOutputSender,
    ) -> Result<AdoraStatus, String>;
}
}

Implement this trait to define your operator’s behavior. The runtime calls on_event for each incoming event. Return AdoraStatus to control execution flow.

Event (Operator)

The operator Event enum is simpler than the node Event and uses &str for IDs.

#![allow(unused)]
fn main() {
#[non_exhaustive]
pub enum Event<'a> {
    // An input was received.
    Input { id: &'a str, data: ArrowData },

    // Failed to parse the input data as an Arrow array.
    InputParseError { id: &'a str, error: String },

    // An input was closed by the sender.
    InputClosed { id: &'a str },

    // The operator should stop.
    Stop,
}
}

AdoraOutputSender

#![allow(unused)]
fn main() {
pub struct AdoraOutputSender<'a>(/* ... */);

impl AdoraOutputSender<'_> {
    // Send an output. `id` is the output ID from your dataflow YAML.
    pub fn send(&mut self, id: String, data: impl Array) -> Result<(), String>
}
}

AdoraStatus

Returned from on_event to control the operator lifecycle.

#![allow(unused)]
fn main() {
pub enum AdoraStatus {
    Continue,  // keep running, wait for the next event
    Stop,      // stop this operator
    StopAll,   // stop the entire dataflow
}
}

register_operator! Macro

Generates the FFI entry points required by the Adora runtime to load and call your operator.

#![allow(unused)]
fn main() {
use adora_operator_api::register_operator;

register_operator!(MyOperator);
}

This must be called exactly once per crate, at the top level, with the type that implements AdoraOperator.


Quick Start Example: Node

A minimal node that receives tick inputs and sends a random number as output.

use adora_node_api::{AdoraNode, Event, IntoArrow, adora_core::config::DataId};

fn main() -> eyre::Result<()> {
    let (mut node, mut events) = AdoraNode::init_from_env()?;

    let output = DataId::from("random".to_owned());

    while let Some(event) = events.recv() {
        match event {
            Event::Input { id, metadata, data } => {
                if id.as_str() == "tick" {
                    let value: u64 = fastrand::u64(..);
                    node.send_output(
                        output.clone(),
                        metadata.parameters,
                        value.into_arrow(),
                    )?;
                }
            }
            Event::Stop(_) => {}
            _ => {}
        }
    }

    Ok(())
}

Corresponding dataflow YAML:

nodes:
  - id: timer
    path: adora/timer/millis/100
    outputs:
      - tick

  - id: my-node
    path: ./target/debug/my-node
    inputs:
      tick: timer/tick
    outputs:
      - random

  - id: sink
    path: ./target/debug/sink
    inputs:
      data: my-node/random

Quick Start Example: Operator

A minimal operator that counts ticks and forwards formatted messages.

#![allow(unused)]
#![warn(unsafe_op_in_unsafe_fn)]

fn main() {
use adora_operator_api::{
    AdoraOperator, AdoraOutputSender, AdoraStatus, Event, IntoArrow, register_operator,
};

register_operator!(MyOperator);

#[derive(Debug, Default)]
struct MyOperator {
    ticks: usize,
}

impl AdoraOperator for MyOperator {
    fn on_event(
        &mut self,
        event: &Event,
        output_sender: &mut AdoraOutputSender,
    ) -> Result<AdoraStatus, String> {
        match event {
            Event::Input { id, data } => match *id {
                "tick" => {
                    self.ticks += 1;
                    let msg = format!("tick count: {}", self.ticks);
                    output_sender.send("status".into(), msg.into_arrow())?;
                }
                other => eprintln!("ignoring unexpected input {other}"),
            },
            Event::InputClosed { id } => {
                if *id == "tick" {
                    return Ok(AdoraStatus::Stop);
                }
            }
            Event::Stop => {}
            other => {
                eprintln!("received unknown event {other:?}");
            }
        }

        Ok(AdoraStatus::Continue)
    }
}
}

Corresponding dataflow YAML:

nodes:
  - id: timer
    path: adora/timer/millis/500
    outputs:
      - tick

  - id: runtime-node
    operator:
      shared_library: ./target/debug/libmy_operator
      inputs:
        tick: timer/tick
      outputs:
        - status

Python API Reference

This document covers the Python APIs for building adora nodes, operators, and dataflows. Install with:

pip install adora-rs

Table of Contents


Node API

from adora import Node

The Node class is the primary interface for custom nodes. It connects to a running dataflow, receives input events, and sends outputs.

Node class

__init__(node_id=None)

Create a new node and connect to the running dataflow.

# Standard: node ID is read from environment variables set by the daemon
node = Node()

# Dynamic: connect to a running dataflow by explicit node ID
node = Node(node_id="my-dynamic-node")

Parameters:

  • node_id (str, optional) – Explicit node ID for dynamic nodes. When omitted, the node reads its identity from environment variables set by the adora daemon.

Raises: RuntimeError if the node cannot connect to the dataflow.


next(timeout=None)

Retrieve the next event from the event stream. Blocks until an event is available or the timeout expires.

event = node.next()              # block indefinitely
event = node.next(timeout=2.0)   # block up to 2 seconds

Parameters:

  • timeout (float, optional) – Maximum wait time in seconds.

Returns: dict – An event dictionary, or None if all senders have been dropped or the timeout expired.


drain()

Retrieve all buffered events without blocking.

events = node.drain()
for event in events:
    print(event["type"])

Returns: list[dict] – A list of event dictionaries. Returns an empty list if no events are buffered.


try_recv()

Non-blocking receive. Returns the next buffered event if one is available.

event = node.try_recv()
if event is not None:
    print(event["type"])

Returns: dict | None – An event dictionary, or None if no event is buffered.


recv_async(timeout=None)

Asynchronous receive. For use with asyncio.

event = await node.recv_async()
event = await node.recv_async(timeout=5.0)

Parameters:

  • timeout (float, optional) – Maximum wait time in seconds. Returns an error if the timeout is reached.

Returns: dict | None – An event dictionary, or None if all senders have been dropped.

Note: This method is experimental. The pyo3 async (Rust-Python FFI) integration is still in development.


is_empty()

Check whether there are any buffered events in the event stream.

if not node.is_empty():
    event = node.try_recv()

Returns: bool


send_output(output_id, data, metadata=None)

Send data on an output channel.

import pyarrow as pa

# Send raw bytes
node.send_output("status", b"OK")

# Send an Apache Arrow array (zero-copy capable)
node.send_output("values", pa.array([1, 2, 3]))

# Send with metadata
node.send_output("image", pa.array(pixels), {"camera_id": "front"})

Parameters:

  • output_id (str) – The output name as declared in the dataflow YAML.
  • data (bytes | pyarrow.Array) – The payload. Use bytes for simple data or pyarrow.Array for zero-copy shared-memory transport.
  • metadata (dict, optional) – Key-value pairs attached to the message. Supported value types: bool, int, float, str, list[int], list[float], list[str], datetime.datetime.

Raises: RuntimeError if data is neither bytes nor a pyarrow.Array.

Service, action, and streaming patterns

Python nodes use the same metadata key conventions as Rust for communication patterns. Parameters are plain dicts with string keys.

Well-known metadata keys:

KeyDescription
"request_id"Service request/response correlation (UUID v7)
"goal_id"Action goal identification (UUID v7)
"goal_status"Action result status: "succeeded", "aborted", or "canceled"
"session_id"Streaming session identifier
"segment_id"Streaming segment within a session (integer)
"seq"Streaming chunk sequence number (integer)
"fin"Last chunk of a streaming segment (bool)
"flush"Discard older queued messages on input (bool)

Service client example:

import uuid

# Send a request with a unique request_id
request_id = str(uuid.uuid7())  # Python 3.13+; use uuid_utils or uuid.uuid4() on older versions
node.send_output("request", data, {"request_id": request_id})

Service server example:

# Pass through the metadata (includes request_id) from the incoming request
node.send_output("response", result, event["metadata"])

Action client example:

goal_id = str(uuid.uuid7())
node.send_output("goal", data, {"goal_id": goal_id})

Streaming example (flush downstream queues on user interruption):

params = {
    "session_id": session_id,
    "segment_id": 1,
    "seq": 0,
    "fin": False,
    "flush": True,
}
node.send_output("text", data, metadata={"parameters": params})

See patterns.md for the full guide.


Logging

Python nodes can log using either Python’s built-in logging module (recommended) or the explicit node API.

Python logging module (auto-bridged):

When Node() is created, it automatically installs a handler that routes Python’s logging module through the adora daemon. No configuration needed:

import logging
from adora import Node

node = Node()  # Installs the logging bridge

logging.info("Sensor initialized")       # -> structured "info" log entry
logging.warning("High temperature")      # -> structured "warn" log entry
logging.debug("Raw bytes: %s", data)     # -> structured "debug" log entry

These log entries are captured with full metadata (level, message, file path, line number) and work with min_log_level filtering, send_logs_as routing, and adora/logs subscribers.

Note: Do not call logging.basicConfig() before creating Node(). The constructor sets up the bridge; calling basicConfig() first may install a conflicting handler.

Explicit node API:

log(level, message, target=None, fields=None)

Emit a structured log message with optional target and key-value fields.

node.log("info", "Processing frame", target="vision")
node.log("error", "Sensor timeout", fields={"sensor": "lidar", "retry": "3"})

Parameters:

  • level (str) – Log level: "error", "warn", "info", "debug", or "trace".
  • message (str) – The log message.
  • target (str, optional) – Target module or subsystem name.
  • fields (dict[str, str], optional) – Structured key-value context fields.

Works with the daemon’s min_log_level filtering, send_logs_as routing, and adora/logs subscribers.


log_error(message), log_warn(message), log_info(message), log_debug(message), log_trace(message)

Convenience methods for common log levels:

node.log_error("Connection failed")
node.log_warn("Temperature elevated")
node.log_info("Sensor initialized")
node.log_debug("Raw bytes received")
node.log_trace("Entering loop iteration")

Each is equivalent to node.log(level, message).

When to use which:

MethodStructured?Fields?Best for
logging.info()YesNoGeneral-purpose logging
node.log("info", msg, fields={...})YesYesStructured context (sensor_id, etc.)
node.log_info(msg)YesNoQuick one-liner
print()NoNoLegacy code, quick debugging

dataflow_descriptor()

Return the full dataflow descriptor (the parsed dataflow YAML) as a Python dictionary.

descriptor = node.dataflow_descriptor()
print(descriptor["nodes"])

Returns: dict


node_config()

Return the configuration block for this node from the dataflow descriptor.

config = node.node_config()
model_path = config.get("model", "default.pt")

Returns: dict


dataflow_id()

Return the unique identifier of the running dataflow.

print(node.dataflow_id())  # e.g. "a1b2c3d4-..."

Returns: str


is_restart()

Check whether this node was restarted after a previous exit or failure. Useful for deciding whether to restore saved state or start fresh.

if node.is_restart():
    restore_checkpoint()

Returns: bool


restart_count()

Return how many times this node has been restarted. Returns 0 on the first run, 1 after the first restart, and so on.

print(f"Restart #{node.restart_count()}")

Returns: int


merge_external_events(subscription)

Merge a ROS2 subscription stream into the node’s main event loop. After calling this method, ROS2 messages arrive as events with kind set to "external".

from adora import Node, Ros2Context, Ros2Node, Ros2NodeOptions, Ros2Topic

node = Node()
ros2_context = Ros2Context()
ros2_node = ros2_context.new_node("listener", Ros2NodeOptions())
topic = Ros2Topic("/chatter", "std_msgs/String", ros2_node)
subscription = ros2_node.create_subscription(topic)

node.merge_external_events(subscription)

for event in node:
    if event["kind"] == "external":
        print("ROS2:", event["value"])
    elif event["type"] == "INPUT":
        print("Adora:", event["id"])

Parameters:

  • subscription (adora.Ros2Subscription) – A ROS2 subscription created via the adora ROS2 bridge.

Iteration support

The Node class implements __iter__ and __next__, so you can iterate directly:

for event in node:
    match event["type"]:
        case "INPUT":
            process(event["value"])
        case "STOP":
            break

The iterator calls next() with no timeout on each iteration. It yields None when the event stream is closed, which terminates the loop.


Event dictionary

Events are returned as plain Python dictionaries. The structure depends on the event type.

INPUT

An input message arrived from another node.

{
    "type": "INPUT",
    "id": "camera_image",          # input ID as declared in the dataflow YAML
    "kind": "adora",               # "adora" for dataflow events, "external" for ROS2
    "value": <pyarrow.Array>,      # the payload as an Apache Arrow array
    "metadata": {
        "timestamp": datetime,     # UTC-aware datetime.datetime
        "open_telemetry_context": "...",  # tracing context (if enabled)
        ...                        # any user-supplied metadata
    },
}

Access the data:

values = event["value"].to_pylist()     # convert to Python list
array = event["value"].to_numpy()       # convert to NumPy array

INPUT_CLOSED

An input channel was closed (the upstream node finished).

{
    "type": "INPUT_CLOSED",
    "id": "camera_image",
    "kind": "adora",
}

STOP

The dataflow is shutting down.

{
    "type": "STOP",
    "id": "MANUAL" | "ALL_INPUTS_CLOSED",   # stop cause
    "kind": "adora",
}

ERROR

An error occurred in the runtime.

{
    "type": "ERROR",
    "error": "description of the error",
    "kind": "adora",
}

External (ROS2)

When using merge_external_events, ROS2 messages arrive as:

{
    "kind": "external",
    "value": <pyarrow.Array>,   # the ROS2 message as an Arrow array
}

AdoraStatus enum

Used as the return value from operator on_event methods to control the event loop.

from adora import AdoraStatus
ValueMeaning
AdoraStatus.CONTINUEContinue processing events (value 0)
AdoraStatus.STOPStop this operator (value 1)
AdoraStatus.STOP_ALLStop the entire dataflow (value 2)

Operator API

Operators run inside the adora runtime process (no separate OS process). They are defined as a Python class named Operator with an on_event method.

Operator class (user-defined)

Create a Python file with an Operator class:

from adora import AdoraStatus

class Operator:
    def __init__(self):
        # Initialize state here
        self.count = 0

    def on_event(self, adora_event, send_output) -> AdoraStatus:
        if adora_event["type"] == "INPUT":
            self.count += 1
            # Process the input and optionally send output
            send_output("result", b"processed", adora_event["metadata"])
        return AdoraStatus.CONTINUE

Methods:

  • __init__(self) – Called once when the operator is loaded. Initialize any state or models here.
  • on_event(self, adora_event, send_output) -> AdoraStatus – Called for every incoming event. Must return an AdoraStatus value.

Parameters of on_event:

  • adora_event (dict) – An event dictionary.
  • send_output (callable) – Callback to send output data (see below).

The runtime also sets self.dataflow_descriptor on the operator instance with the parsed dataflow YAML as a dictionary.

send_output callback

The send_output callback is passed to on_event for sending data from an operator.

send_output(output_id, data, metadata=None)

Parameters:

  • output_id (str) – The output name as declared in the dataflow YAML.
  • data (bytes | pyarrow.Array) – The payload.
  • metadata (dict, optional) – Metadata to attach. Pass adora_event["metadata"] to propagate tracing context.

Example:

import pyarrow as pa
from adora import AdoraStatus

class Operator:
    def on_event(self, adora_event, send_output) -> AdoraStatus:
        if adora_event["type"] == "INPUT":
            result = pa.array([42], type=pa.int64())
            send_output("output", result, adora_event["metadata"])
        return AdoraStatus.CONTINUE

DataflowBuilder

from adora.builder import DataflowBuilder, Node, Operator, Output

Build dataflow YAML programmatically in Python.

DataflowBuilder class

__init__(name="adora-dataflow")

Create a new dataflow builder.

flow = DataflowBuilder("my-robot")

Parameters:

  • name (str, optional) – Name of the dataflow. Defaults to "adora-dataflow".

add_node(id, **kwargs) -> Node

Add a node to the dataflow. Returns a Node object for further configuration.

sender = flow.add_node("sender")

Parameters:

  • id (str) – Unique node identifier.
  • **kwargs – Additional node configuration passed through to the YAML.

Returns: Node (builder)

to_yaml(path=None) -> str | None

Generate the YAML representation of the dataflow. If path is given, writes to file and returns None. Otherwise returns the YAML string.

# Write to file
flow.to_yaml("dataflow.yml")

# Get as string
yaml_str = flow.to_yaml()

Parameters:

  • path (str, optional) – File path to write the YAML.

Returns: str | None

Context manager

DataflowBuilder supports the with statement:

with DataflowBuilder("my-flow") as flow:
    flow.add_node("sender").path("sender.py")
    flow.to_yaml("dataflow.yml")

Node class (builder)

Returned by DataflowBuilder.add_node(). All setter methods return self for chaining.

path(path) -> Node

Set the path to the node’s executable or script.

node.path("my_node.py")

args(args) -> Node

Set command-line arguments for the node.

node.args("--verbose --port 8080")

env(env) -> Node

Set environment variables for the node.

node.env({"MODEL_PATH": "/models/yolo.pt"})

build(command) -> Node

Set the build command for the node (run before starting).

node.build("pip install -r requirements.txt")

git(url, branch=None, tag=None, rev=None) -> Node

Set a Git repository as the source for the node.

node.git("https://github.com/org/repo.git", branch="main")

add_operator(operator) -> Node

Attach an Operator to this node.

op = Operator("detector", python="object_detection.py")
node.add_operator(op)

add_output(output_id) -> Output

Declare an output on this node and return an Output reference for use as an input source.

output = sender.add_output("data")

add_input(input_id, source, queue_size=None, queue_policy=None) -> Node

Subscribe this node to an output from another node.

# Using an Output object
output = sender.add_output("data")
receiver.add_input("data", output)

# Using a string reference
receiver.add_input("tick", "adora/timer/millis/100")

# With a custom queue size
receiver.add_input("images", camera_output, queue_size=2)

# Lossless input (blocks sender when full)
receiver.add_input("commands", cmd_output, queue_size=100, queue_policy="backpressure")

Parameters:

  • input_id (str) – Name of the input on this node.
  • source (str | Output) – Either a string ("node_id/output_id") or an Output object.
  • queue_size (int, optional) – Maximum number of buffered messages for this input.
  • queue_policy (str, optional) – "drop_oldest" (default) or "backpressure" (buffers up to 10x queue_size before dropping).

to_dict() -> dict

Return the dictionary representation of the node for YAML serialization.


Output class (builder)

Returned by Node.add_output(). Represents a reference to a node’s output, used as a source in add_input().

output = sender.add_output("data")
receiver.add_input("sensor_data", output)
str(output)  # "sender/data"

Operator class (builder)

Defines an operator for embedding in a node’s YAML configuration.

__init__(id, name=None, description=None, build=None, python=None, shared_library=None, send_stdout_as=None)

op = Operator(
    id="detector",
    python="object_detection.py",
    send_stdout_as="detection_text",
)

Parameters:

  • id (str) – Unique operator identifier.
  • name (str, optional) – Display name.
  • description (str, optional) – Human-readable description.
  • build (str, optional) – Build command to run before loading.
  • python (str, optional) – Path to the Python operator file.
  • shared_library (str, optional) – Path to a shared library operator.
  • send_stdout_as (str, optional) – Route the operator’s stdout as an output with this ID.

to_dict() -> dict

Return the dictionary representation for YAML serialization.


CUDA Module

from adora.cuda import torch_to_ipc_buffer, ipc_buffer_to_ipc_handle, open_ipc_handle

Utilities for zero-copy GPU tensor sharing between nodes via CUDA IPC. Requires PyTorch with CUDA and Numba with CUDA support.

torch_to_ipc_buffer(tensor) -> tuple[pyarrow.Array, dict]

Convert a PyTorch CUDA tensor into an Arrow array containing the CUDA IPC handle, plus a metadata dictionary. Send both through the dataflow to share GPU memory without copying.

import torch
import pyarrow as pa
from adora import Node
from adora.cuda import torch_to_ipc_buffer

node = Node()
tensor = torch.randn(1024, 768, device="cuda")
ipc_buffer, metadata = torch_to_ipc_buffer(tensor)
node.send_output("gpu_data", ipc_buffer, metadata)

Parameters:

  • tensor (torch.Tensor) – A CUDA tensor.

Returns: tuple[pyarrow.Array, dict] – The IPC handle as an int8 Arrow array, and metadata with shape, strides, dtype, size, offset, and source info.


ipc_buffer_to_ipc_handle(handle_buffer, metadata) -> IpcHandle

Reconstruct a CUDA IPC handle from a received Arrow buffer and metadata.

from adora.cuda import ipc_buffer_to_ipc_handle

event = node.next()
ipc_handle = ipc_buffer_to_ipc_handle(event["value"], event["metadata"])

Parameters:

  • handle_buffer (pyarrow.Array) – The Arrow array from event["value"].
  • metadata (dict) – The metadata from event["metadata"].

Returns: numba.cuda.cudadrv.driver.IpcHandle


open_ipc_handle(ipc_handle, metadata) -> ContextManager[torch.Tensor]

Open a CUDA IPC handle and yield a PyTorch tensor. Use as a context manager to ensure proper cleanup.

from adora.cuda import ipc_buffer_to_ipc_handle, open_ipc_handle

event = node.next()
ipc_handle = ipc_buffer_to_ipc_handle(event["value"], event["metadata"])

with open_ipc_handle(ipc_handle, event["metadata"]) as tensor:
    result = tensor * 2  # use the GPU tensor directly

Parameters:

  • ipc_handle (IpcHandle) – Handle from ipc_buffer_to_ipc_handle.
  • metadata (dict) – The metadata dictionary with shape, strides, and dtype info.

Returns: Context manager yielding a torch.Tensor on CUDA.


Quick Start Example

A complete node that receives images, processes them, and sends results:

#!/usr/bin/env python3
"""Example node: receives messages, transforms them, and sends output."""

import logging

import pyarrow as pa
from adora import Node


def main():
    node = Node()

    for event in node:
        if event["type"] == "INPUT":
            input_id = event["id"]

            if input_id == "message":
                values = event["value"].to_pylist()
                number = values[0]

                # Create a struct array with multiple fields
                result = pa.StructArray.from_arrays(
                    [
                        pa.array([number * 2]),
                        pa.array([f"Message #{number}"]),
                    ],
                    names=["doubled", "description"],
                )
                node.send_output("transformed", result)
                logging.info("Transformed message %d", number)

        elif event["type"] == "STOP":
            logging.info("Node stopping")
            break


if __name__ == "__main__":
    main()

Run with:

adora run dataflow.yml

DataflowBuilder Example

Build a dataflow programmatically instead of writing YAML by hand:

#!/usr/bin/env python3
"""Build a simple sender -> receiver dataflow."""

from adora.builder import DataflowBuilder, Operator

flow = DataflowBuilder("example-flow")

# Add a timer-driven sender node
sender = flow.add_node("sender")
sender.path("sender.py")
tick_output = sender.add_output("message")

# Add a receiver that subscribes to the sender
receiver = flow.add_node("receiver")
receiver.path("receiver.py")
receiver.add_input("message", tick_output)

# Add a node with a timer input
timed_node = flow.add_node("periodic")
timed_node.path("periodic.py")
timed_node.add_input("tick", "adora/timer/millis/100")

# Add a node with an operator
runtime_node = flow.add_node("runtime-node")
op = Operator("detector", python="object_detection.py")
runtime_node.add_operator(op)
runtime_node.add_input("image", "camera/image")

# Write or print the YAML
flow.to_yaml("dataflow.yml")
print(flow.to_yaml())

C API Reference

This document covers the two C APIs provided by the Adora framework: the Node API for standalone C processes and the Operator API for shared-library operators loaded by the Adora runtime.

Table of Contents


Node API (adora-node-api-c)

Header: apis/c/node/node_api.h Crate: adora-node-api-c (builds as staticlib)

The Node API is used by standalone C executables that participate in an Adora dataflow as external processes. The daemon spawns the process and sets environment variables that the node reads during initialization.

Initialization

init_adora_context_from_env

void *init_adora_context_from_env();

Initializes an Adora node context from environment variables set by the daemon. Returns an opaque pointer to the context on success, or NULL on failure.

The returned pointer must be passed to all subsequent Node API calls that expect a context argument. When the node is finished, free it with free_adora_context.

free_adora_context

void free_adora_context(void *adora_context);

Frees a context previously created by init_adora_context_from_env. Each context must be freed exactly once. After freeing, the pointer must not be used again.

Event Loop

adora_next_event

void *adora_next_event(void *adora_context);

Blocks until the next event is available for this node. Returns an opaque pointer to the event, or NULL when all event streams have closed (indicating the node should exit).

The returned pointer must not be dereferenced directly. Use the read_adora_* functions to extract the event type and payload. Free the event with free_adora_event when done.

free_adora_event

void free_adora_event(void *adora_event);

Frees an event previously returned by adora_next_event. Each event must be freed exactly once. After freeing, the event pointer and all derived pointers (from read_adora_input_id, read_adora_input_data) become invalid.

Event Inspection

read_adora_event_type

enum AdoraEventType read_adora_event_type(void *adora_event);

Returns the type of the given event. See AdoraEventType for possible values.

read_adora_input_id

void read_adora_input_id(void *adora_event, char **out_ptr, size_t *out_len);

Reads the input ID from an AdoraEventType_Input event. Writes the string start pointer to *out_ptr and its byte length to *out_len. The string is valid UTF-8 but not null-terminated; use out_len to determine its bounds.

If the event is not an input event, sets *out_ptr = NULL and *out_len = 0.

The returned pointer borrows from the event. It becomes invalid after free_adora_event is called.

read_adora_input_data

void read_adora_input_data(void *adora_event, char **out_ptr, size_t *out_len);

Reads the raw data bytes from an AdoraEventType_Input event. Writes the data start pointer to *out_ptr and its byte length to *out_len.

Sets *out_ptr = NULL and *out_len = 0 if the event is not an input event or the input carries no data.

Currently only UInt8 Arrow arrays are supported. Other Arrow data types will cause a runtime panic. Future versions will use the Arrow C Data Interface for full type support.

The returned pointer borrows from the event. It becomes invalid after free_adora_event is called.

read_adora_input_timestamp

unsigned long long read_adora_input_timestamp(void *adora_event);

Returns the hybrid logical clock timestamp from an input event’s metadata as a uint64 value. Returns 0 if the event is not an input event.

Output

adora_send_output

int adora_send_output(
    void *adora_context,
    const char *id_ptr,
    size_t id_len,
    const char *data_ptr,
    size_t data_len
);

Sends output data to all downstream subscribers. The output ID (id_ptr/id_len) must be a valid UTF-8 string matching one of the node’s declared outputs in the dataflow YAML. The data (data_ptr/data_len) is sent as raw bytes (UInt8 Arrow array).

Returns 0 on success, -1 on error. Errors are logged via tracing.

Returns -1 immediately if any pointer argument is NULL.

Logging

adora_log

int adora_log(
    void *adora_context,
    const char *level_ptr,
    size_t level_len,
    const char *msg_ptr,
    size_t msg_len
);

Sends a structured log message through the Adora logging pipeline. Both level and msg must be valid UTF-8 strings.

Valid log levels: "error", "warn", "info", "debug", "trace".

Returns 0 on success, -1 on error. Returns -1 immediately if any pointer argument is NULL.

Enums

AdoraEventType

enum AdoraEventType {
    AdoraEventType_Stop,        // Graceful shutdown requested
    AdoraEventType_Input,       // New input data available
    AdoraEventType_InputClosed, // An input stream was closed
    AdoraEventType_Error,       // An error occurred
    AdoraEventType_Unknown,     // Unrecognized event type
};

Operator API (adora-operator-api-c)

Headers: apis/c/operator/operator_api.h, apis/c/operator/operator_types.h Crate: adora-operator-api-c

The Operator API is used by shared libraries (.so/.dylib/.dll) loaded into the Adora runtime process. Unlike nodes, operators do not have their own main function. Instead, they export three functions that the runtime calls at the appropriate lifecycle points.

The operator_types.h header is auto-generated by safer-ffi and defines all C-compatible struct and enum types.

Lifecycle Functions

adora_init_operator

AdoraInitResult_t adora_init_operator(void);

Called once when the runtime loads the operator. Allocate and initialize any operator state, then return it via the operator_context field. The runtime passes this pointer back on every subsequent call.

Return an AdoraInitResult_t with .result.error = NULL on success.

adora_drop_operator

AdoraResult_t adora_drop_operator(void *operator_context);

Called once when the operator is being unloaded. Free all resources associated with operator_context.

Return an AdoraResult_t with .error = NULL on success.

Event Handling

adora_on_event

OnEventResult_t adora_on_event(
    RawEvent_t *event,
    const SendOutput_t *send_output,
    void *operator_context
);

Called by the runtime each time an event arrives for this operator. Inspect the event fields to determine the event type:

FieldMeaning
event->input != NULLNew input available
event->stop == trueGraceful shutdown requested
event->error.ptr != NULLAn error occurred (UTF-8 string in error.ptr/error.len)
event->input_closed.ptr != NULLAn input stream closed (input ID in input_closed.ptr/input_closed.len)

Use send_output to emit data to downstream nodes (see adora_send_operator_output). Return an OnEventResult_t with the appropriate AdoraStatus_t to control the operator lifecycle.

Input Reading

adora_read_input_id

char *adora_read_input_id(const Input_t *input);

Returns a newly allocated null-terminated string containing the input ID. The caller must free it with adora_free_input_id.

adora_read_data

Vec_uint8_t adora_read_data(Input_t *input);

Reads the input data as a byte array. Consumes the underlying Arrow array from the input (the data can only be read once per event). Returns a Vec_uint8_t with .ptr = NULL if the input has no data or the data has already been consumed.

The caller must free the returned data with adora_free_data.

Output Sending

adora_send_operator_output

AdoraResult_t adora_send_operator_output(
    const SendOutput_t *send_output,
    const char *id,
    const uint8_t *data_ptr,
    size_t data_len
);

Sends output data to downstream subscribers. The id must be a null-terminated string matching one of the operator’s declared outputs. The data (data_ptr/data_len) is converted to a UInt8 Arrow array internally.

Returns an AdoraResult_t with .error = NULL on success.

Memory Management

The Operator API allocates memory that the caller must free using the corresponding functions:

Allocation sourceFree function
adora_read_input_idadora_free_input_id
adora_read_dataadora_free_data
void adora_free_input_id(char *input_id);
void adora_free_data(Vec_uint8_t data);

Failing to call these functions will leak memory. Do not use free() on these allocations – they are allocated by the Rust runtime and must be freed through the API.

Structs

Vec_uint8_t

typedef struct Vec_uint8 {
    uint8_t *ptr;
    size_t len;
    size_t cap;
} Vec_uint8_t;

A Rust-allocated byte vector. Access len bytes starting at ptr. Do not modify cap. Free with adora_free_data.

AdoraResult_t

typedef struct AdoraResult {
    Vec_uint8_t *error;  // NULL on success, points to error string on failure
} AdoraResult_t;

Generic result type. A NULL error pointer indicates success. When non-NULL, the error pointer contains a UTF-8 error message.

AdoraInitResult_t

typedef struct AdoraInitResult {
    AdoraResult_t result;
    void *operator_context;  // opaque pointer to operator state
} AdoraInitResult_t;

Returned by adora_init_operator. On success, result.error is NULL and operator_context holds the operator state pointer.

OnEventResult_t

typedef struct OnEventResult {
    AdoraResult_t result;
    AdoraStatus_t status;
} OnEventResult_t;

Returned by adora_on_event. Contains both an error/success result and a status code controlling the operator lifecycle.

RawEvent_t

typedef struct RawEvent {
    Input_t *input;           // non-NULL when this is an input event
    Vec_uint8_t input_closed; // non-empty when an input stream closed
    bool stop;                // true when shutdown is requested
    Vec_uint8_t error;        // non-empty on error
} RawEvent_t;

Represents an event delivered to the operator. Multiple fields may be set simultaneously; check them in order of priority.

Input_t

typedef struct Input Input_t;  // opaque

Opaque type representing an input event’s data. Use adora_read_input_id and adora_read_data to extract its contents.

Output_t

typedef struct Output Output_t;  // opaque

Opaque type used internally by adora_send_operator_output. Not created directly by user code.

SendOutput_t

typedef struct SendOutput {
    ArcDynFn1_AdoraResult_Output_t send_output;
} SendOutput_t;

Callback handle passed to adora_on_event. Pass it to adora_send_operator_output to emit data. Do not store it beyond the scope of the current adora_on_event call.

Metadata_t

typedef struct Metadata {
    Vec_uint8_t open_telemetry_context;
} Metadata_t;

Event metadata containing an OpenTelemetry trace context string.

Operator Enums

AdoraStatus_t

enum AdoraStatus {
    ADORA_STATUS_CONTINUE = 0,  // Keep running
    ADORA_STATUS_STOP     = 1,  // Stop this operator
    ADORA_STATUS_STOP_ALL = 2,  // Stop the entire dataflow
};
typedef uint8_t AdoraStatus_t;

Returned in OnEventResult_t to control operator lifecycle after processing an event.


Node Example

A complete C node that receives timer ticks and sends output messages:

#include <stdio.h>
#include <string.h>
#include "node_api.h"

int main() {
    void *ctx = init_adora_context_from_env();
    if (ctx == NULL) {
        fprintf(stderr, "failed to init adora context\n");
        return 1;
    }

    for (int i = 0; i < 100; i++) {
        void *event = adora_next_event(ctx);
        if (event == NULL)
            break;  // all streams closed

        enum AdoraEventType ty = read_adora_event_type(event);

        if (ty == AdoraEventType_Input) {
            char *id;
            size_t id_len;
            read_adora_input_id(event, &id, &id_len);

            // Send a response
            char out_id[] = "message";
            char out_data[64];
            int out_len = snprintf(out_data, sizeof(out_data),
                                   "iteration %d", i);

            adora_send_output(ctx, out_id, strlen(out_id),
                              out_data, out_len);
        } else if (ty == AdoraEventType_Stop) {
            free_adora_event(event);
            break;
        }

        free_adora_event(event);
    }

    free_adora_context(ctx);
    return 0;
}

Dataflow YAML for the node:

nodes:
  - id: c_node
    path: build/c_node
    inputs:
      timer: adora/timer/millis/100
    outputs:
      - message

Operator Example

A complete C operator that reads input, maintains state, and sends output:

#include "operator_api.h"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

AdoraInitResult_t adora_init_operator(void) {
    // Allocate operator state (a simple counter)
    int *counter = (int *)calloc(1, sizeof(int));

    AdoraInitResult_t result = {.operator_context = counter};
    return result;
}

AdoraResult_t adora_drop_operator(void *operator_context) {
    free(operator_context);
    AdoraResult_t result = {.error = NULL};
    return result;
}

OnEventResult_t adora_on_event(
    RawEvent_t *event,
    const SendOutput_t *send_output,
    void *operator_context)
{
    OnEventResult_t result = {.status = ADORA_STATUS_CONTINUE};
    int *counter = (int *)operator_context;

    if (event->input != NULL) {
        char *id = adora_read_input_id(event->input);
        Vec_uint8_t data = adora_read_data(event->input);

        if (data.ptr != NULL) {
            *counter += 1;
            printf("received input '%s', counter: %d\n", id, *counter);

            // Send counter value as string
            char buf[64];
            int len = snprintf(buf, sizeof(buf), "count=%d", *counter);
            result.result = adora_send_operator_output(
                send_output, "counter", (uint8_t *)buf, len);

            adora_free_data(data);
        }

        adora_free_input_id(id);
    }

    if (event->stop) {
        result.status = ADORA_STATUS_STOP;
    }

    return result;
}

Dataflow YAML for the operator:

nodes:
  - id: runtime-node
    operators:
      - id: c_operator
        shared-library: build/operator
        inputs:
          data: source_node/output
        outputs:
          - counter

Building and Linking

Node (static library)

C nodes link against adora-node-api-c, which builds as a static library.

Step 1: Build the static library

cargo build -p adora-node-api-c --release

This produces target/release/libadora_node_api_c.a (or .lib on Windows).

Step 2: Compile and link

clang node.c -ladora_node_api_c -L ../../target/release -o build/c_node <FLAGS>

Platform-specific linker flags:

PlatformFlags
Linux-lm -lrt -ldl -pthread
macOS-framework CoreServices -framework Security -lSystem -lresolv -lpthread -lc -lm
Windows-ladvapi32 -luserenv -lkernel32 -lws2_32 -lbcrypt -lncrypt -lschannel -lntdll -liphlpapi -lcfgmgr32 -lcredui -lcrypt32 -lcryptnet -lfwpuclnt -lgdi32 -lmsimg32 -lmswsock -lole32 -lopengl32 -lsecur32 -lshell32 -lsynchronization -luser32 -lwinspool -Wl,-nodefaultlib:libcmt -D_DLL -lmsvcrt

On Windows, add the .exe extension to the output file.

Operator (shared library)

C operators are compiled into shared libraries that the Adora runtime loads at startup.

Step 1: Compile to object file

clang -c operator.c -o build/operator.o -fdeclspec -fPIC

Omit -fPIC on Windows.

Step 2: Link as shared library

# Linux
clang -shared build/operator.o -o build/liboperator.so

# macOS
clang -shared build/operator.o -o build/liboperator.dylib

# Windows
clang -shared build/operator.o -o build/operator.dll

Step 3: Reference in dataflow YAML

operators:
  - id: c_operator
    shared-library: build/operator   # without lib prefix or extension
    inputs:
      data: source/output
    outputs:
      - result

The shared-library path omits the platform-specific prefix (lib) and extension (.so/.dylib/.dll). The runtime resolves the correct file for the current platform.

Include Paths

The Node API header is at apis/c/node/node_api.h. The Operator API headers are at apis/c/operator/operator_api.h and apis/c/operator/operator_types.h. Adjust your include paths accordingly:

# Node
clang -I path/to/adora/apis/c/node node.c ...

# Operator
clang -I path/to/adora/apis/c/operator operator.c ...

C++ Compatibility

Both headers include extern "C" guards (in the operator headers) or use C-compatible declarations (in the node header), so they can be included directly from C++ source files.

C++ API Reference

Adora provides C++ bindings for both standalone nodes and in-process operators via CXX (Rust-C++ interop). The CXX bridge generates type-safe C++ headers from Rust definitions – no raw FFI or manual extern "C" declarations are needed.

Two crates provide the C++ surface:

CrateLibraryUse case
adora-node-api-cxxlibadora_node_api_cxx.aStandalone node executable
adora-operator-api-cxxlibadora_operator_api_cxx.aShared-library operator loaded by the runtime

Generated headers: adora-node-api.h and adora-operator-api.h.


Node API (adora-node-api-cxx)

Initialization

#include "adora-node-api.h"

// Initialize a node from environment variables set by the Adora daemon.
// Returns an AdoraNode struct containing the event stream and output sender.
// Throws on failure.
AdoraNode init_adora_node();

AdoraNode

Returned by init_adora_node(). Owns the event stream and the output sender for the lifetime of the node.

struct AdoraNode {
    rust::Box<Events>        events;       // event stream (blocking receiver)
    rust::Box<OutputSender>  send_output;  // output sender
};

Events

Opaque Rust type exposed to C++. Provides blocking iteration over the node’s incoming events.

// Member function -- call on the boxed object directly.
rust::Box<AdoraEvent> Events::next();

// Free function form -- equivalent to events->next().
rust::Box<AdoraEvent> next_event(rust::Box<Events>& events);

Both forms block until the next event arrives and return an owned AdoraEvent.

AdoraEvent

Opaque Rust type. Inspect its kind with event_type(), then downcast with event_as_input() or event_as_arrow_input().

// Determine the event kind.
AdoraEventType event_type(const rust::Box<AdoraEvent>& event);

// Downcast to a raw-byte input. Throws if the event is not Input.
AdoraInput event_as_input(rust::Box<AdoraEvent> event);

// Downcast to an Arrow FFI input (writes Arrow C Data Interface structs).
// out_array and out_schema must point to valid ArrowArray / ArrowSchema structs.
// Returns AdoraResult with empty error on success.
AdoraResult event_as_arrow_input(
    rust::Box<AdoraEvent> event,
    uint8_t* out_array,
    uint8_t* out_schema);

// Same as above, but also returns the input ID and metadata.
ArrowInputInfo event_as_arrow_input_with_info(
    rust::Box<AdoraEvent> event,
    uint8_t* out_array,
    uint8_t* out_schema);

AdoraEventType

enum class AdoraEventType : uint8_t {
    Stop,             // graceful shutdown requested
    Input,            // new data arrived on an input
    InputClosed,      // a single input was closed
    Error,            // an error occurred
    Unknown,          // unrecognized event variant
    AllInputsClosed,  // all inputs closed (stream ended)
};

AdoraInput

Returned by event_as_input(). Contains raw bytes.

struct AdoraInput {
    rust::String     id;    // input identifier (e.g. "tick", "image")
    rust::Vec<uint8_t> data;  // raw payload bytes
};

ArrowInputInfo

Returned by event_as_arrow_input_with_info(). Contains the input ID, metadata, and an error string.

struct ArrowInputInfo {
    rust::String       id;        // input identifier
    rust::Box<Metadata> metadata; // attached metadata
    rust::String       error;     // empty on success
};

AdoraResult

Returned by output-sending functions. Check the error field – empty means success.

struct AdoraResult {
    rust::String error;  // empty string on success
};

OutputSender

Opaque Rust type. All methods take rust::Box<OutputSender>& as the first argument (the sender from AdoraNode::send_output).

send_output

Send raw bytes on a named output.

AdoraResult send_output(
    rust::Box<OutputSender>& sender,
    rust::String id,
    rust::Slice<const uint8_t> data);

send_output_with_metadata

Send raw bytes with attached metadata.

AdoraResult send_output_with_metadata(
    rust::Box<OutputSender>& sender,
    rust::String id,
    rust::Slice<const uint8_t> data,
    rust::Box<Metadata> metadata);

send_arrow_output

Send an Arrow array via the C Data Interface. The pointers must reference valid ArrowArray and ArrowSchema structs. Ownership of the Arrow data transfers to Rust on success.

AdoraResult send_arrow_output(
    rust::Box<OutputSender>& sender,
    rust::String id,
    uint8_t* array_ptr,
    uint8_t* schema_ptr);

// Overload with metadata (same C++ name via cxx_name attribute).
AdoraResult send_arrow_output(
    rust::Box<OutputSender>& sender,
    rust::String id,
    uint8_t* array_ptr,
    uint8_t* schema_ptr,
    rust::Box<Metadata> metadata);

log_message

Send a log message through the Adora logging system.

AdoraResult log_message(
    const rust::Box<OutputSender>& sender,
    rust::String level,    // e.g. "info", "warn", "error"
    rust::String message);

Metadata

Opaque Rust type for attaching typed key-value pairs to outputs.

Construction

rust::Box<Metadata> new_metadata();

Reading

uint64_t     Metadata::timestamp() const;

bool         Metadata::get_bool(const rust::Str key) const;        // throws on missing/wrong type
int64_t      Metadata::get_int(const rust::Str key) const;
double       Metadata::get_float(const rust::Str key) const;
rust::String Metadata::get_str(const rust::Str key) const;

rust::Vec<int64_t>      Metadata::get_list_int(const rust::Str key) const;
rust::Vec<double>       Metadata::get_list_float(const rust::Str key) const;
rust::Vec<rust::String> Metadata::get_list_string(const rust::Str key) const;

int64_t      Metadata::get_timestamp(const rust::Str key) const;   // nanoseconds since epoch
rust::String Metadata::get_json(const rust::Str key) const;        // single value as JSON string

Writing

All setters throw on failure.

void Metadata::set_bool(const rust::Str key, bool value);
void Metadata::set_int(const rust::Str key, int64_t value);
void Metadata::set_float(const rust::Str key, double value);
void Metadata::set_string(const rust::Str key, rust::String value);

void Metadata::set_list_int(const rust::Str key, rust::Vec<int64_t> value);
void Metadata::set_list_float(const rust::Str key, rust::Vec<double> value);
void Metadata::set_list_string(const rust::Str key, rust::Vec<rust::String> value);

void Metadata::set_timestamp(const rust::Str key, int64_t nanos);  // nanoseconds since epoch

Introspection

MetadataValueType Metadata::type(const rust::Str key) const;  // throws if key missing
rust::String      Metadata::to_json() const;                   // full metadata as JSON
rust::Vec<rust::String> Metadata::list_keys() const;

MetadataValueType

enum class MetadataValueType : uint8_t {
    Bool,
    Integer,
    Float,
    String,
    ListInt,
    ListFloat,
    ListString,
    Timestamp,
};

Service, Action, and Streaming Patterns

C++ nodes can implement communication patterns using the metadata API. The well-known metadata keys are:

KeyDescription
"request_id"Service request/response correlation (UUID v7)
"goal_id"Action goal identification (UUID v7)
"goal_status"Action result status: "succeeded", "aborted", or "canceled"
"session_id"Streaming session identifier
"segment_id"Streaming segment within a session (integer)
"seq"Streaming chunk sequence number (integer)
"fin"Last chunk of a streaming segment (bool)
"flush"Discard older queued messages on input (bool)
// Service server: pass through request_id from input metadata
auto input_metadata = event_as_arrow_input_with_info(event);
send_output_with_metadata(sender, "response", result, std::move(input_metadata.metadata));

// Action server: set goal_id and goal_status on result
auto meta = new_metadata();
meta->set_string("goal_id", goal_id);
meta->set_string("goal_status", "succeeded");
send_output_with_metadata(sender, "result", result_data, std::move(meta));

CombinedEvents (ROS2 integration)

When using the optional ros2-bridge feature, node events and ROS2 subscription events can be merged into a single stream.

// Convert Adora events into a combined stream.
CombinedEvents adora_events_into_combined(rust::Box<Events> events);

// Create an empty combined stream (for ROS2-only nodes).
CombinedEvents empty_combined_events();

CombinedEvents struct

struct CombinedEvents {
    rust::Box<MergedEvents> events;

    CombinedEvent next();  // blocking -- returns the next merged event
};

CombinedEvent struct

struct CombinedEvent {
    rust::Box<MergedAdoraEvent> event;

    bool is_adora() const;  // true if this is a standard Adora event
};

// Downcast a combined event back to an AdoraEvent. Throws if not an Adora event.
rust::Box<AdoraEvent> downcast_adora(CombinedEvent event);

ROS2 subscriptions add their own events to the merged stream. Use subscription->matches(event) and subscription->downcast(event) to handle ROS2-specific events (see the ROS2 Bridge docs).


Operator API (adora-operator-api-cxx)

Operators are shared libraries loaded by the Adora runtime. The C++ side implements two functions that the CXX bridge calls into.

Required C++ interface

You must provide a header operator.h and an implementation file. The header declares an Operator class and two free functions:

// operator.h
#pragma once
#include <memory>
#include "adora-operator-api.h"

class Operator {
public:
    Operator();
    // Add any state your operator needs.
};

std::unique_ptr<Operator> new_operator();

AdoraOnInputResult on_input(
    Operator& op,
    rust::Str id,
    rust::Slice<const uint8_t> data,
    OutputSender& output_sender);
  • new_operator() – called once at startup; returns the operator instance.
  • on_input() – called for every input event; process data and optionally send outputs.

OutputSender (operator)

Available inside on_input(). Sends data on a named output.

AdoraSendOutputResult send_output(
    OutputSender& sender,
    rust::Str id,
    rust::Slice<const uint8_t> data);

Result types

struct AdoraOnInputResult {
    rust::String error;  // empty on success
    bool         stop;   // true to request graceful shutdown
};

struct AdoraSendOutputResult {
    rust::String error;  // empty on success
};

Quick Start: Node Example

A minimal node that receives timer ticks and sends a counter.

#include "adora-node-api.h"
#include <iostream>
#include <vector>

int main() {
    auto adora_node = init_adora_node();
    unsigned char counter = 0;

    for (;;) {
        auto event = next_event(adora_node.events);
        auto ty = event_type(event);

        if (ty == AdoraEventType::AllInputsClosed) {
            break;
        }
        if (ty == AdoraEventType::Stop) {
            break;
        }
        if (ty == AdoraEventType::Input) {
            auto input = event_as_input(std::move(event));
            counter += 1;

            std::cout << "Input: " << std::string(input.id)
                      << " counter=" << (int)counter << std::endl;

            std::vector<unsigned char> out{counter};
            rust::Slice<const uint8_t> slice{out.data(), out.size()};
            auto result = send_output(adora_node.send_output, "counter", slice);
            if (!result.error.empty()) {
                std::cerr << "Send error: " << std::string(result.error) << std::endl;
                return 1;
            }
        }
    }
    return 0;
}

Dataflow YAML:

nodes:
  - id: cxx-node
    path: build/my_node
    inputs:
      tick: adora/timer/millis/300
    outputs:
      - counter

Quick Start: Arrow Node Example

A node that receives and sends Arrow arrays via the C Data Interface, with metadata.

#include "adora-node-api.h"
#include <arrow/api.h>
#include <arrow/c/bridge.h>
#include <iostream>

int main() {
    auto adora_node = init_adora_node();

    for (int i = 0; i < 10; i++) {
        auto event = adora_node.events->next();
        auto ty = event_type(event);

        if (ty == AdoraEventType::AllInputsClosed || ty == AdoraEventType::Stop) {
            break;
        }
        if (ty == AdoraEventType::Input) {
            // Receive Arrow input with metadata
            struct ArrowArray c_array;
            struct ArrowSchema c_schema;
            auto info = event_as_arrow_input_with_info(
                std::move(event),
                reinterpret_cast<uint8_t*>(&c_array),
                reinterpret_cast<uint8_t*>(&c_schema));

            if (!info.error.empty()) {
                std::cerr << std::string(info.error) << std::endl;
                continue;
            }

            std::cout << "Input: " << std::string(info.id)
                      << " ts=" << info.metadata->timestamp() << std::endl;

            auto imported = arrow::ImportArray(&c_array, &c_schema);
            auto array = imported.ValueOrDie();
            std::cout << "Arrow: " << array->ToString() << std::endl;

            // Build an output Arrow array
            arrow::Int32Builder builder;
            builder.Append(i * 10);
            std::shared_ptr<arrow::Array> out_array;
            builder.Finish(&out_array);

            // Export and send with metadata
            struct ArrowArray out_c_array;
            struct ArrowSchema out_c_schema;
            arrow::ExportArray(*out_array, &out_c_array, &out_c_schema);

            auto meta = new_metadata();
            meta->set_string("source", "cpp-arrow-node");
            meta->set_int("iteration", i);

            auto result = send_arrow_output(
                adora_node.send_output, "counter",
                reinterpret_cast<uint8_t*>(&out_c_array),
                reinterpret_cast<uint8_t*>(&out_c_schema),
                std::move(meta));

            if (!result.error.empty()) {
                std::cerr << "Send error: " << std::string(result.error) << std::endl;
            }
        }
    }
    return 0;
}

Quick Start: Operator Example

A minimal operator shared library.

// operator.cc
#include "operator.h"
#include <iostream>
#include <vector>

Operator::Operator() {}

std::unique_ptr<Operator> new_operator() {
    return std::make_unique<Operator>();
}

AdoraOnInputResult on_input(
    Operator& op,
    rust::Str id,
    rust::Slice<const uint8_t> data,
    OutputSender& output_sender)
{
    op.counter += 1;

    std::vector<unsigned char> out{op.counter};
    rust::Slice<const uint8_t> slice{out.data(), out.size()};
    auto send_result = send_output(output_sender, rust::Str("status"), slice);

    return AdoraOnInputResult{send_result.error, false};
}

Dataflow YAML:

nodes:
  - id: runtime-node
    operators:
      - id: my-operator
        shared-library: build/my_operator
        inputs:
          data: some-node/output
        outputs:
          - status

Build Integration (CMake)

The recommended build approach uses CMake with the DoraTargets.cmake helper (see examples/cmake-dataflow/).

Project structure

my-project/
  CMakeLists.txt
  DoraTargets.cmake       # copied from examples/cmake-dataflow/
  node/main.cc
  operator/operator.h
  operator/operator.cc
  dataflow.yml

CMakeLists.txt

cmake_minimum_required(VERSION 3.21)
project(my-dataflow LANGUAGES C CXX)

set(CMAKE_CXX_STANDARD 20)
set(CMAKE_CXX_FLAGS "-fPIC")

include(DoraTargets.cmake)
link_directories(${adora_link_dirs})

# Standalone node (executable)
add_executable(my_node node/main.cc ${node_bridge})
add_dependencies(my_node Adora_cxx)
target_include_directories(my_node PRIVATE ${adora_cxx_include_dir})
target_link_libraries(my_node adora_node_api_cxx)

# Operator (shared library)
add_library(my_operator SHARED
    operator/operator.cc ${operator_bridge})
add_dependencies(my_operator Adora_cxx)
target_include_directories(my_operator PRIVATE
    ${adora_cxx_include_dir} ${adora_c_include_dir}
    ${CMAKE_CURRENT_SOURCE_DIR}/operator)
target_link_libraries(my_operator adora_operator_api_cxx)

install(TARGETS my_node DESTINATION ${CMAKE_CURRENT_SOURCE_DIR}/bin)
install(TARGETS my_operator DESTINATION ${CMAKE_CURRENT_SOURCE_DIR}/lib)

What DoraTargets.cmake provides

VariableDescription
adora_cxx_include_dirPath to generated CXX headers (adora-node-api.h, adora-operator-api.h)
adora_c_include_dirPath to C API headers (for mixed C/C++ projects)
adora_link_dirsLibrary search path for libadora_node_api_cxx.a / libadora_operator_api_cxx.a
node_bridgeGenerated CXX bridge source file for nodes (node_bridge.cc)
operator_bridgeGenerated CXX bridge source file for operators (operator_bridge.cc)
Adora_cxxCMake target dependency that builds the CXX crates

Build steps

# Option A: Build against local Adora source
mkdir build && cd build
cmake .. -DDORA_ROOT_DIR=/path/to/adora
cmake --build .

# Option B: Build against Adora from GitHub (cloned automatically)
mkdir build && cd build
cmake ..
cmake --build .

Requirements

  • C++20 compiler
  • Rust toolchain (for building the Adora static libraries via Cargo)
  • CMake 3.21+
  • For Arrow integration: Apache Arrow C++ library

CXX Bridge Notes

  • All Rust opaque types (Events, OutputSender, AdoraEvent, Metadata, MergedEvents, MergedAdoraEvent) are accessed through rust::Box<T>.
  • rust::String, rust::Vec<T>, and rust::Slice<const T> are CXX bridge types that interoperate with their C++ standard library counterparts. See the CXX type reference.
  • Functions that return Result<T> in Rust throw C++ exceptions on the error path.
  • Arrow FFI functions (event_as_arrow_input, send_arrow_output) are unsafe on the Rust side. The caller must pass valid pointers to ArrowArray / ArrowSchema structs cast to uint8_t*.
  • The node library is a static archive (staticlib). Link it into your executable with -ladora_node_api_cxx.
  • The operator library is also a static archive. Link it into your shared library with -ladora_operator_api_cxx.

Adora CLI Reference

Adora (AI-Dora, Dataflow-Oriented Robotic Architecture) is a 100% Rust framework for building real-time robotics and AI applications. This document covers the adora CLI from both an end-user and developer perspective.

Table of Contents


Quick Start

# Create a new project
adora new my-robot --kind dataflow --lang rust

# Run locally (no coordinator/daemon needed)
adora run dataflow.yml

# Or use coordinator/daemon for production
adora up
adora start dataflow.yml --attach
# Ctrl-C to stop
adora down

Installation

cargo install adora-cli

From source

cargo install --path binaries/cli --locked

Verify

adora --version
adora status

Core Concepts

Dataflow

A dataflow is a directed graph of nodes connected by typed data channels. Nodes produce outputs that other nodes consume as inputs. The framework handles data routing, serialization (Apache Arrow), and lifecycle management.

Execution Modes

ModeCommandInfrastructureUse case
Localadora runNoneDevelopment, testing, single-machine
Distributedadora up + adora startCoordinator + Daemon(s)Production, multi-machine

Component Roles

CLI  -->  Coordinator  -->  Daemon(s)  -->  Nodes / Operators
              (control plane)  (per machine)    (user code)
  • CLI: User interface. Sends commands, displays logs.
  • Coordinator: Orchestrates dataflow lifecycle across machines.
  • Daemon: Spawns node processes, manages IPC, collects metrics.
  • Node: A standalone process that produces and consumes Arrow data.
  • Operator: In-process code running inside a shared runtime (lower latency than nodes).

Data Format

All data flows through the system as Apache Arrow columnar arrays. This enables zero-copy shared memory transfer between co-located nodes and zero-serialization overhead.


Dataflow Descriptor

Dataflows are defined in YAML files. Here is the complete schema:

Minimal Example

nodes:
  - id: sender
    path: sender.py
    outputs:
      - message

  - id: receiver
    path: receiver.py
    inputs:
      message: sender/message

Full Schema

# Dataflow-level settings
health_check_interval: 5.0    # health check sweep interval in seconds (default: 5.0)

nodes:
  - id: my-node                 # unique identifier (required)
    name: "My Node"             # human-readable name (optional)
    description: "..."          # description (optional)

    # --- Source (pick one) ---
    path: ./target/debug/my-node          # local executable
    # path: https://example.com/node.zip  # download from URL
    # git: https://github.com/org/repo.git  # build from git
    #   branch: main            # git branch (mutually exclusive with tag/rev)
    #   tag: v1.0               # git tag
    #   rev: abc123             # git commit hash

    # --- Build ---
    build: cargo build -p my-node   # shell command to build (optional)

    # --- Inputs ---
    inputs:
      # Short form: source_node/output_id
      tick: adora/timer/millis/100
      data: other-node/output

      # Long form with options
      sensor_data:
        source: sensor/frames
        queue_size: 10            # input buffer size (default: 10)
        queue_policy: drop_oldest # or "backpressure" (buffers up to 10x queue_size)
        input_timeout: 5.0        # circuit breaker timeout in seconds

    # --- Outputs ---
    outputs:
      - processed
      - status

    # --- Environment ---
    env:
      MY_VAR: "value"
      FROM_ENV:
        __adora_env: HOST_VAR     # read from host environment
    args: "--verbose"             # command-line arguments

    # --- Fault tolerance ---
    restart_policy: on-failure    # never (default) | on-failure | always
    max_restarts: 5               # 0 = unlimited
    restart_delay: 1.0            # initial backoff in seconds
    max_restart_delay: 30.0       # backoff cap in seconds
    restart_window: 300.0         # reset counter after N seconds
    health_check_timeout: 30.0    # kill if no activity for N seconds

    # --- Logging ---
    min_log_level: info           # source-level filter (daemon-side)
    send_stdout_as: raw_output    # route raw stdout as data output
    send_logs_as: log_entries     # route structured logs as data output
    max_log_size: "50MB"          # rotate log files at this size
    max_rotated_files: 5          # number of rotated files to keep (1-100)

    # --- Deployment ---
    _unstable_deploy:
      machine: A                  # target machine/daemon ID

# Debug settings
_unstable_debug:
  publish_all_messages_to_zenoh: true   # required for topic echo/hz/info

Built-in Timer Nodes

Timers are virtual nodes that emit ticks at fixed intervals:

inputs:
  tick: adora/timer/millis/100   # every 100ms
  slow: adora/timer/millis/1000  # every 1s
  fast: adora/timer/hz/30        # 30 Hz (~33ms)

Operator Nodes

Operators run in-process inside a shared runtime (no separate process):

nodes:
  # Single operator (shorthand)
  - id: detector
    operator:
      python: detect.py
      build: pip install -r requirements.txt
      inputs:
        image: camera/frames
      outputs:
        - bbox

  # Multiple operators sharing a runtime
  - id: runtime-node
    operators:
      - id: preprocessor
        shared-library: ../../target/debug/libpreprocess
        inputs:
          raw: sensor/data
        outputs:
          - processed
      - id: analyzer
        shared-library: ../../target/debug/libanalyze
        inputs:
          data: runtime-node/preprocessor/processed
        outputs:
          - result

Distributed Deployment

Assign nodes to specific machines using _unstable_deploy:

nodes:
  - id: camera-driver
    _unstable_deploy:
      machine: robot-arm
    path: ./target/debug/camera
    outputs:
      - frames

  - id: ml-inference
    _unstable_deploy:
      machine: gpu-server
    path: ./target/debug/inference
    inputs:
      frames: camera-driver/frames
    outputs:
      - predictions

When nodes are on different machines, communication automatically switches from shared memory to Zenoh pub/sub.


Command Reference

Lifecycle Commands

adora run

Run a dataflow locally without coordinator or daemon. Best for development and testing.

adora run <PATH> [OPTIONS]
Argument/FlagDefaultDescription
<PATH>requiredPath to dataflow descriptor YAML
--stop-after <DURATION>Auto-stop after duration (e.g., 30s, 5m)
--uvfalseUse uv for Python node management
--debugfalseEnable debug topics (equivalent to publish_all_messages_to_zenoh: true)
--allow-shell-nodesfalseEnable shell-based node execution
--log-level <LEVEL>stdoutMin display level: error|warn|info|debug|trace|stdout
--log-format <FORMAT>prettyOutput format: pretty|json|compact
--log-filter <FILTER>Per-node level overrides: "node1=debug,node2=warn"

Examples:

# Basic run
adora run dataflow.yml

# Stop after 10 seconds, only show warnings
adora run dataflow.yml --stop-after 10s --log-level warn

# Python dataflow with uv
adora run dataflow.yml --uv

# Debug one node, silence others
adora run dataflow.yml --log-level warn --log-filter "sensor=debug"

# JSON output for CI pipelines
adora run dataflow.yml --log-format json --stop-after 30s 2>test.json

adora up

Start coordinator and daemon in local mode.

adora up

Spawns adora coordinator and adora daemon as background processes. Waits for both to be ready before returning. Idempotent: if already running, does nothing.

adora down (alias: adora destroy)

Tear down coordinator and daemon. Stops all running dataflows first.

adora down [OPTIONS]
FlagDefaultDescription
--coordinator-addr <IP>127.0.0.1Coordinator address
--coordinator-port <PORT>6013Coordinator port

adora build

Run build commands defined in the dataflow descriptor.

adora build <PATH> [OPTIONS]
FlagDefaultDescription
<PATH>requiredDataflow descriptor path
--uvfalseUse uv for Python builds
--localfalseForce local build (skip coordinator)
--strict-typesfalseTreat type warnings as errors (non-zero exit code)

Type checking: After expanding modules, build runs the same type checks as validate. Warnings are printed by default; use --strict-types (or set strict_types: true in the YAML) to fail the build on type mismatches. User-defined types in a types/ directory next to the dataflow are loaded automatically.

Build strategy: If nodes have _unstable_deploy sections and a coordinator is reachable, builds are distributed to target machines. Otherwise, builds run locally.

Git sources: Nodes with a git: field are cloned/updated before building. The build command runs from the git repository root.

adora start

Start a dataflow on a running coordinator.

adora start <PATH> [OPTIONS]
FlagDefaultDescription
<PATH>requiredDataflow descriptor path
--name <NAME>, -nAssign a name to the dataflow
--attachautoAttach to log stream and wait for completion
--detachautoReturn immediately after spawn
--debugfalseEnable debug topics (equivalent to publish_all_messages_to_zenoh: true)
--hot-reloadfalseWatch Python files and reload on change
--uvfalseUse uv for Python nodes
--coordinator-addr <IP>127.0.0.1Coordinator address
--coordinator-port <PORT>6013Coordinator port

If neither --attach nor --detach is specified: attaches if running in a TTY, detaches otherwise.

Attach mode: Streams logs, handles Ctrl-C gracefully (first = stop, second = force kill).

Hot reload: Watches Python operator source files. On change, sends a reload request to the coordinator which propagates to the daemon.

adora stop

Stop a running dataflow.

adora stop [UUID_OR_NAME] [OPTIONS]
FlagDefaultDescription
[UUID_OR_NAME]interactiveDataflow UUID or name
--name <NAME>, -nAlternative name specification
--grace-duration <DURATION>Graceful shutdown timeout
--force, -ffalseImmediate termination
--coordinator-addr <IP>127.0.0.1Coordinator address
--coordinator-port <PORT>6013Coordinator port

If no identifier is given and running in a TTY, presents an interactive picker.

Stop sequence: Send Event::Stop -> wait grace duration -> SIGTERM -> hard kill.

adora restart

Restart a running dataflow (stop + re-start with stored descriptor). No YAML path needed – the coordinator retains the original descriptor.

adora restart [UUID] [OPTIONS]
FlagDefaultDescription
[UUID]Dataflow UUID
--name <NAME>, -nRestart by name instead of UUID
--grace-duration <DURATION>Graceful shutdown timeout for the stop phase
--force, -ffalseForce kill before restart
--coordinator-addr <IP>127.0.0.1Coordinator address
--coordinator-port <PORT>6013Coordinator port

Examples:

# Restart by name
adora restart --name my-app

# Restart by UUID with forced stop
adora restart a1b2c3d4-... --force

adora record

Record dataflow messages to an .adorec file for offline replay. See Debugging Guide for full workflows.

adora record <DATAFLOW_YAML> [OPTIONS]
FlagDefaultDescription
<DATAFLOW_YAML>requiredPath to dataflow descriptor
-o, --output <PATH>recording_{timestamp}.adorecOutput file path
--topics <TOPICS>allComma-separated node/output topics to record
--proxyfalseStream via WebSocket instead of recording on target
--output-yaml <PATH>Write modified YAML without running (dry run)

Default mode injects a record node into the dataflow. --proxy mode requires a running dataflow and publish_all_messages_to_zenoh: true.

adora replay

Replay a recorded .adorec file by replacing source nodes with replay nodes. See Debugging Guide for full workflows.

adora replay <FILE> [OPTIONS]
FlagDefaultDescription
<FILE>requiredPath to .adorec recording
--speed <FLOAT>1.0Playback speed (0 = max speed)
--loopfalseLoop the recording
--replace <NODE_IDS>all recordedComma-separated nodes to replace
--output-yaml <PATH>Write modified YAML without running (dry run)

Monitoring Commands

adora list (alias: adora ps)

List running dataflows with metrics.

adora list [OPTIONS]
FlagDefaultDescription
--format <FMT>, -ftableOutput format: table|json
--status <STATUS>Filter: running|finished|failed
--name <PATTERN>Filter by name (case-insensitive substring)
--sort-by <FIELD>Sort by: cpu|memory
--quiet, -qfalsePrint only UUIDs
--coordinator-addr <IP>127.0.0.1Coordinator address
--coordinator-port <PORT>6013Coordinator port

Output columns: UUID, Name, Status, Nodes, CPU, Memory

adora logs

Show and follow logs of a dataflow and node.

adora logs [UUID_OR_NAME] [NODE] [OPTIONS]
FlagDefaultDescription
[UUID_OR_NAME]Dataflow UUID or name
[NODE]Node name (required unless --all-nodes)
--all-nodesfalseMerge logs from all nodes by timestamp
--tail <N>allShow last N lines
--follow, -ffalseStream new log entries
--localfalseRead from local out/ directory
--since <DURATION>Show logs newer than duration ago
--until <DURATION>Show logs older than duration ago
--level <LEVEL>stdoutMin log level
--log-format <FORMAT>prettyOutput format
--log-filter <FILTER>Per-node level overrides
--grep <PATTERN>Case-insensitive text search
--coordinator-addr <IP>127.0.0.1Coordinator address
--coordinator-port <PORT>6013Coordinator port

Filter pipeline: Read/Parse -> Time filters -> Grep -> Tail -> Display

Examples:

# Follow all nodes live
adora logs my-dataflow --all-nodes --follow

# Last 50 errors from a specific node
adora logs my-dataflow sensor --level error --tail 50

# Search logs from last 5 minutes
adora logs my-dataflow --all-nodes --since 5m --grep "timeout"

# Read local files (no coordinator needed)
adora logs --local --all-nodes --tail 100

# Post-mortem analysis: errors in time window
adora logs --local sensor --since 1h --until 30m --level error

Duration formats: 30 (seconds), 30s, 5m, 1h, 2d

adora inspect top (alias: adora top)

Real-time TUI monitor for node resource usage (like top).

adora inspect top [OPTIONS]
adora top [OPTIONS]
FlagDefaultDescription
--refresh-interval <SECONDS>2Update interval (min: 1)
--oncefalsePrint a single JSON snapshot and exit (for scripting/CI)
--coordinator-addr <IP>127.0.0.1Coordinator address
--coordinator-port <PORT>6013Coordinator port

Requires an interactive terminal (unless --once is used).

KeyAction
q / EscQuit
Up / kSelect previous node
Down / jSelect next node
nSort by node name
cSort by CPU
mSort by memory
rForce refresh

Columns: NODE, STATUS, DATAFLOW, PID, CPU%, MEMORY (MB), RESTARTS, QUEUE, NET TX, NET RX, I/O READ (MB/s), I/O WRITE (MB/s)

  • STATUS: Running, Restarting, Degraded (broken inputs), or Failed
  • RESTARTS: Current restart count per node
  • QUEUE: Pending messages in the node’s input queue
  • NET TX/RX: Cumulative cross-daemon network bytes sent/received via Zenoh

CPU values are per-core (can exceed 100% with multiple cores). Metrics come from daemons, so this works for distributed deployments.

Scripting example:

# JSON snapshot for CI/monitoring pipelines
adora top --once | jq '.[].cpu_usage'

adora topic list

List all topics (outputs) in a running dataflow.

adora topic list [OPTIONS]
FlagDefaultDescription
-d <DATAFLOW>, --dataflowinteractiveDataflow UUID or name
--format <FMT>tableOutput format: table|json

adora topic echo

Subscribe to topics and display messages in real-time.

adora topic echo [OPTIONS] [DATA...]
FlagDefaultDescription
-d <DATAFLOW>, --dataflowrequiredDataflow UUID or name
[DATA...]all outputsTopics to echo (e.g., node1/output)
--format <FMT>tableOutput format: table|json

Requires _unstable_debug.publish_all_messages_to_zenoh: true in the descriptor.

adora topic hz

Measure topic publish frequency with a TUI dashboard.

adora topic hz [OPTIONS] [DATA...]
FlagDefaultDescription
-d <DATAFLOW>, --dataflowrequiredDataflow UUID or name
[DATA...]all outputsTopics to measure
--window <SECONDS>10Sliding window (min: 1)

Requires an interactive terminal. Displays: Avg (ms), Avg (Hz), Min (ms), Max (ms), Std (ms), plus a rate sparkline and histogram for the selected topic.

adora topic info

Show detailed metadata of a single topic.

adora topic info [OPTIONS] DATA
FlagDefaultDescription
-d <DATAFLOW>, --dataflowrequiredDataflow UUID or name
DATArequiredSingle topic (e.g., camera/image)
--duration <SECONDS>5Collection duration (min: 1)

Subscribes to the topic for the specified duration and reports: type (Arrow schema), publisher, subscribers, message count, bandwidth.

adora node

Manage and inspect dataflow nodes.

adora node list
adora node list [OPTIONS]

Lists nodes in a running dataflow with their status, CPU, memory, and restart count.

Columns: NODE, STATUS, PID, CPU%, MEMORY (MB), RESTARTS, DATAFLOW

adora node info

Show detailed information about a specific node including status, inputs, outputs, and metrics.

adora node info <NODE> [OPTIONS]
FlagDefaultDescription
<NODE>requiredNode ID to inspect
-d <DATAFLOW>, --dataflowinteractiveDataflow UUID or name
-f <FORMAT>, --formattableOutput format: table|json
adora node restart

Restart a single node within a running dataflow. The daemon stops the node process and respawns it.

adora node restart <NODE> [OPTIONS]
FlagDefaultDescription
<NODE>requiredNode ID to restart
-d <DATAFLOW>, --dataflowinteractiveDataflow UUID or name
--grace <DURATION>Grace period before force-killing the node
adora node stop

Stop a single node within a running dataflow without stopping the entire dataflow.

adora node stop <NODE> [OPTIONS]
FlagDefaultDescription
<NODE>requiredNode ID to stop
-d <DATAFLOW>, --dataflowinteractiveDataflow UUID or name
--grace <DURATION>Grace period before force-killing the node

adora topic pub

Publish JSON data to a topic in a running dataflow. Requires publish_all_messages_to_zenoh: true.

adora topic pub <TOPIC> [DATA] [OPTIONS]
FlagDefaultDescription
<TOPIC>requiredTopic to publish to (format: node_id/output_id)
[DATA]JSON data to publish (required unless --file)
--file <PATH>Read data from a JSON file instead of command line
--count <N>1Number of messages to publish
-d <DATAFLOW>, --dataflowrequiredDataflow UUID or name

Examples:

# Publish a single value
adora topic pub -d my-app sensor/threshold '[42]'

# Publish from file, 10 times
adora topic pub -d my-app sensor/config --file config.json --count 10

adora param

Manage runtime parameters for nodes. Parameters are persisted in the coordinator store and optionally forwarded to running nodes.

adora param list

List all runtime parameters for a node.

adora param list <NODE> [OPTIONS]
FlagDefaultDescription
<NODE>requiredNode ID
-d <DATAFLOW>, --dataflowinteractiveDataflow UUID or name
--format <FMT>tableOutput format: table|json
adora param get

Get a single runtime parameter value.

adora param get <NODE> <KEY> [OPTIONS]
FlagDefaultDescription
<NODE>requiredNode ID
<KEY>requiredParameter key
-d <DATAFLOW>, --dataflowinteractiveDataflow UUID or name
adora param set

Set a runtime parameter. The value is JSON. The parameter is stored in the coordinator and forwarded to the node if it is running.

adora param set <NODE> <KEY> <VALUE> [OPTIONS]
FlagDefaultDescription
<NODE>requiredNode ID
<KEY>requiredParameter key (max 256 bytes)
<VALUE>requiredParameter value as JSON (max 64KB serialized)
-d <DATAFLOW>, --dataflowinteractiveDataflow UUID or name

Examples:

# Set a numeric parameter
adora param set -d my-app sensor threshold 42

# Set a string parameter
adora param set -d my-app camera resolution '"1080p"'

# Set a complex parameter
adora param set -d my-app detector config '{"confidence": 0.8, "nms": 0.5}'
adora param delete

Delete a runtime parameter.

adora param delete <NODE> <KEY> [OPTIONS]
FlagDefaultDescription
<NODE>requiredNode ID
<KEY>requiredParameter key
-d <DATAFLOW>, --dataflowinteractiveDataflow UUID or name

adora doctor

Diagnose environment, coordinator/daemon connectivity, and optionally validate a dataflow YAML.

adora doctor [OPTIONS]
FlagDefaultDescription
--dataflow <PATH>Path to a dataflow YAML to validate

Checks performed:

  1. Coordinator reachability
  2. Daemon connectivity
  3. Active dataflow status
  4. Dataflow YAML validation (if --dataflow provided)

Examples:

# Basic health check
adora doctor

# Check environment + validate a dataflow
adora doctor --dataflow dataflow.yml

adora trace list

List recent traces captured by the coordinator. The coordinator captures spans from adora_coordinator and adora_core crates in-memory (up to 4096 spans). No external tracing infrastructure required.

adora trace list [OPTIONS]
FlagDefaultDescription
--coordinator-addr <IP>127.0.0.1Coordinator address
--coordinator-port <PORT>6013Coordinator port

Output columns: TRACE ID (first 12 chars), ROOT SPAN, SPANS, STARTED, DURATION

Example:

adora trace list
TRACE ID      ROOT SPAN          SPANS  STARTED              DURATION
a1b2c3d4e5f6  spawn_dataflow     12     2026-03-01 10:30:05  1.234s
f8e7d6c5b4a3  build_dataflow     5      2026-03-01 10:29:58  0.500s

adora trace view

View spans for a specific trace as an indented tree. Supports prefix matching on trace IDs.

adora trace view <TRACE_ID> [OPTIONS]
Argument/FlagDefaultDescription
<TRACE_ID>requiredFull trace ID or unique prefix
--coordinator-addr <IP>127.0.0.1Coordinator address
--coordinator-port <PORT>6013Coordinator port

Example:

adora trace view a1b2c3d4
spawn_dataflow [INFO 1.234s] {build_id="abc", session_id="def"}
  build_dataflow [INFO 0.500s]
    download_node [DEBUG 0.200s] {url="..."}
  start_inner [INFO 0.734s]
    spawn_node [INFO 0.100s] {node_id="camera"}
    spawn_node [INFO 0.080s] {node_id="detector"}

Trace IDs are prefix-matched: if the prefix uniquely identifies a trace, it resolves automatically. If ambiguous, you’ll be prompted to use a longer prefix.


Setup Commands

adora status (alias: adora check)

Check system health and connectivity.

adora status [OPTIONS]

Reports coordinator connectivity, daemon status, and active dataflow count.

adora new

Generate a new project or node from templates.

adora new <NAME> [OPTIONS]
FlagDefaultDescription
<NAME>requiredProject or node name
--kind <KIND>dataflowdataflow|node
--lang <LANG>rustrust|python|c|cxx

adora expand

Expand module references in a dataflow and print the resulting flat YAML. Useful for debugging module composition.

adora expand <PATH> [OPTIONS]
FlagDefaultDescription
<PATH>requiredDataflow descriptor (or module file with --module)
--modulefalseValidate a standalone module file instead of a full dataflow

Examples:

# Expand a dataflow with modules
adora expand dataflow.yml

# Validate a module file
adora expand --module modules/navigation.module.yml

See the Modules Guide for full documentation on module composition.

adora graph

Visualize a dataflow as a graph.

adora graph <PATH> [OPTIONS]
FlagDefaultDescription
<PATH>requiredDataflow descriptor path
--mermaidfalseOutput Mermaid diagram text
--openfalseOpen HTML in browser

Without --mermaid, generates an interactive HTML file using mermaid.js. When outputs have type annotations, edge labels include the type name (e.g. image [Image]).

# Generate HTML
adora graph dataflow.yml --open

# Generate Mermaid for GitHub markdown
adora graph dataflow.yml --mermaid

adora validate

Validate a dataflow YAML file and check type annotations.

adora validate <PATH> [OPTIONS]
FlagDefaultDescription
<PATH>requiredDataflow descriptor path
--strict-typesfalseTreat warnings as errors (non-zero exit code for CI)

Checks:

  1. Key existence: output_types/input_types keys exist in the corresponding outputs/inputs lists
  2. URN resolution: All type URNs resolve in the standard or user-defined type library
  3. Edge compatibility: Connected edges have compatible types (exact match, widening, or user-defined rules)
  4. Parameterized types: Parameter mismatches (e.g. AudioFrame[sample_type=f32] vs AudioFrame[sample_type=i16])
  5. Timer auto-typing: Timer inputs are automatically typed as std/core/v1/UInt64
  6. Type inference: When only upstream annotates a type, it is inferred on the downstream input
  7. Metadata patterns: output_metadata keys and pattern shorthands are validated
  8. Schema compatibility: Struct types are checked at the field level (missing/wrong fields)

User-defined types in a types/ directory next to the dataflow are loaded automatically.

# Validate with warnings
adora validate dataflow.yml

# Strict mode for CI (exit 1 on warnings)
adora validate --strict-types dataflow.yml

See the Type Annotations Guide for the full type library and usage details.


Utility Commands

adora completion

Generate shell completion scripts.

adora completion [SHELL]

Shell is auto-detected if omitted. Supported: bash, zsh, fish, elvish, powershell.

# Bash
eval "$(adora completion bash)"
echo 'eval "$(adora completion bash)"' >> ~/.bashrc

# Zsh
eval "$(adora completion zsh)"
echo 'eval "$(adora completion zsh)"' >> ~/.zshrc

# Fish
adora completion fish > ~/.config/fish/completions/adora.fish

adora system

System management commands.

adora system status [OPTIONS]

Currently provides status as a subcommand (equivalent to adora status).


Self-Management Commands

adora self update

Check for and install CLI updates.

adora self update [--check-only]

Downloads from GitHub releases (dora-rs/adora).

adora self uninstall

Remove the CLI from the system.

adora self uninstall [--force]

Without --force, prompts for confirmation (requires a TTY). Tries uv pip uninstall first, then pip uninstall, then binary self-delete.


Environment Variables

All environment variables serve as fallbacks. CLI flags always take precedence.

VariableDefaultCommandsDescription
ADORA_COORDINATOR_ADDR127.0.0.1All coordinator commandsCoordinator IP address
ADORA_COORDINATOR_PORT6013All coordinator commandsCoordinator WebSocket port
ADORA_LOG_LEVELstdoutrun, logsDefault minimum log level
ADORA_LOG_FORMATprettyrun, logsDefault output format
ADORA_LOG_FILTERrun, logsDefault per-node level overrides
ADORA_ALLOW_SHELL_NODESrunEnable shell node execution
ADORA_RUNTIME_TYPE_CHECKrun, startRuntime type checking: warn (log mismatches) or error (fail on mismatch). See Type Annotations
# Set defaults for a development session
export ADORA_COORDINATOR_ADDR=192.168.1.10
export ADORA_LOG_LEVEL=info
export ADORA_LOG_FORMAT=compact

Architecture Guide

This section is for developers who want to understand the framework internals, extend it, or debug issues.

Communication Stack

                    ┌─────────────────────────────────────┐
                    │           CLI (adora)                │
                    │   WebSocket (JSON request/reply)     │
                    └─────────────┬───────────────────────┘
                                  │
                    ┌─────────────▼───────────────────────┐
                    │        Coordinator                   │
                    │   WebSocket control + daemon mgmt    │
                    │   State: InMemoryStore | RedbStore   │
                    └──┬──────────────────────────────┬───┘
                       │                              │
          ┌────────────▼──────────┐     ┌─────────────▼──────────┐
          │     Daemon A          │     │     Daemon B           │
          │  (machine: robot)     │     │  (machine: gpu-server) │
          │                       │     │                        │
          │  ┌─────┐  ┌─────┐    │     │  ┌──────┐  ┌───────┐  │
          │  │Node1│  │Node2│    │     │  │Node3 │  │Node4  │  │
          │  └──┬──┘  └──┬──┘    │     │  └──┬───┘  └───┬───┘  │
          │     │shmem    │shmem  │     │     │shmem      │shmem │
          │     └────┬────┘       │     │     └─────┬─────┘      │
          └──────────┼────────────┘     └───────────┼────────────┘
                     │                              │
                     └──────── Zenoh pub/sub ────────┘
                              (cross-machine)

Protocol Layers

LayerTransportFormatUse
CLI <-> CoordinatorWebSocketJSON (ControlRequest/Reply)Commands, log streaming
Coordinator <-> DaemonWebSocketJSON (DaemonCoordinatorEvent)Node lifecycle, metrics
Daemon <-> Node (small)TCP / Unix socketCustom binaryControl messages, small data
Daemon <-> Node (large)Shared memoryZero-copy ArrowData messages > 4KB
Daemon <-> DaemonZenoh pub/subArrow + metadataCross-machine data routing

Coordinator Internals

The coordinator is an event-driven async server:

Event Sources:
  - CLI WebSocket connections (ControlRequest)
  - Daemon WebSocket connections (DaemonEvent)
  - Heartbeat timer (3s interval)
  - External events (for embedding)

Event Loop:
  merge_all(cli_events, daemon_events, heartbeat, external)
    -> handle_event()
    -> update state
    -> persist to store (if redb)
    -> send replies

Key types:

#![allow(unused)]
fn main() {
// State
RunningDataflow { uuid, name, descriptor, daemons, node_metrics, ... }
RunningBuild    { build_id, errors, log_subscribers, pending_results, ... }
DaemonConnection { sender, pending_replies, last_heartbeat }

// Store trait
trait CoordinatorStore: Send + Sync {
    fn put_dataflow(&self, record: &DataflowRecord) -> Result<()>;
    fn get_dataflow(&self, uuid: &Uuid) -> Result<Option<DataflowRecord>>;
    fn list_dataflows(&self) -> Result<Vec<DataflowRecord>>;
    // ... daemon and build methods
}
}

Store backends:

  • memory (default): In-memory, lost on restart.
  • redb: Persistent to disk (~/.adora/coordinator.redb). Survives crashes. Requires redb-backend feature.
adora coordinator --store redb
adora coordinator --store redb:/custom/path.redb

Daemon Internals

The daemon manages node processes on a single machine:

Per Node:
  1. Build (if build command specified)
  2. Spawn process with ADORA_NODE_CONFIG env var
  3. Node registers via TCP/shmem handshake
  4. Route inputs/outputs between nodes
  5. Collect metrics (CPU, memory, I/O)
  6. Handle restart policy on exit
  7. Forward logs to coordinator

Communication:
  - Shared memory for messages > 4KB (zero-copy)
  - TCP for control messages and small data
  - flume channels for internal event routing

Metrics collection:

#![allow(unused)]
fn main() {
struct NodeMetrics {
    pid: u32,
    cpu_usage: f32,      // per-core percentage
    memory_mb: f64,
    disk_read_mb_s: Option<f64>,
    disk_write_mb_s: Option<f64>,
    status: NodeStatus,  // Running | Restarting | Degraded | Failed
    restart_count: u32,
    pending_messages: u64,
}
}

Message Types

All inter-component messages are defined in libraries/message/:

#![allow(unused)]
fn main() {
// Node identification
struct NodeId(String);      // [a-zA-Z0-9_.-]
struct DataId(String);      // same validation
type DataflowId = uuid::Uuid;

// Data metadata
struct Metadata {
    timestamp: uhlc::Timestamp,    // hybrid logical clock
    type_info: ArrowTypeInfo,      // Arrow schema
    parameters: MetadataParameters, // custom key-value pairs
}

// Node events (daemon -> node)
enum NodeEvent {
    Stop,
    Reload { operator_id },
    Input { id, metadata, data },
    InputClosed { id },
    InputRecovered { id },
    NodeRestarted { id },
    AllInputsClosed,
}
}

Timestamping

Adora uses a Unified Hybrid Logical Clock (UHLC) for distributed causality. Every message carries a uhlc::Timestamp that preserves causal ordering across machines without synchronized clocks.

Zero-Copy Shared Memory

For large messages (> 4KB), the daemon uses shared memory regions:

  1. Sender node requests a shared memory slot from daemon
  2. Daemon allocates a region and returns the ID
  3. Sender writes Arrow data directly into shared memory
  4. Daemon notifies receiver node of the region ID
  5. Receiver reads directly from shared memory (zero-copy)
  6. Receiver sends a drop token when done

This achieves 10-17x lower latency than ROS2 for large payloads.


Writing Nodes

Rust Node

use adora_node_api::{AdoraNode, Event, IntoArrow};
use adora_core::config::DataId;

fn main() -> eyre::Result<()> {
    let (mut node, mut events) = AdoraNode::init_from_env()?;

    let output = DataId::from("result".to_owned());

    while let Some(event) = events.recv() {
        match event {
            Event::Input { id, metadata, data } => {
                // Process input data (Arrow array)
                let result: u64 = 42;
                node.send_output(
                    output.clone(),
                    metadata.parameters,
                    result.into_arrow(),
                )?;
            }
            Event::Stop(_) => break,
            Event::InputClosed { id } => {
                eprintln!("input {id} closed");
            }
            Event::InputRecovered { id } => {
                eprintln!("input {id} recovered");
            }
            _ => {}
        }
    }
    Ok(())
}

Cargo.toml:

[dependencies]
adora-node-api = { workspace = true }
eyre = "0.6"

Python Node

import pyarrow as pa
from adora import Node

node = Node()

for event in node:
    if event["type"] == "INPUT":
        # event["value"] is a PyArrow array
        values = event["value"].to_pylist()
        result = pa.array([sum(values)])
        node.send_output("result", result)
    elif event["type"] == "STOP":
        break

C Node

#include "node_api.h"

int main() {
    void *ctx = init_adora_context_from_env();
    // ... event loop using adora_next_event / adora_send_output
    free_adora_context(ctx);
    return 0;
}

Node Logging

Nodes can emit structured logs:

Rust:

#![allow(unused)]
fn main() {
// Via tracing (recommended)
tracing::info!("processing frame {}", frame_id);

// Via node API
node.log_info("processing complete");
node.log_with_fields("info", "reading", None, Some(&fields));
}

Python:

import logging
logging.info("processing frame %d", frame_id)

# Or via node API
node.log("info", "processing complete")

Writing Operators

Operators run in-process inside a shared runtime, avoiding process spawn overhead.

Rust Operator

#![allow(unused)]
fn main() {
use adora_operator_api::{register_operator, AdoraOperator, AdoraOutputSender, AdoraStatus, Event};

#[register_operator]
#[derive(Default)]
pub struct MyOperator {
    counter: u32,
}

impl AdoraOperator for MyOperator {
    fn on_event(
        &mut self,
        event: &Event,
        output_sender: &mut AdoraOutputSender,
    ) -> Result<AdoraStatus, String> {
        match event {
            Event::Input { id, data } => {
                self.counter += 1;
                output_sender.send(
                    "count".to_string(),
                    arrow::array::UInt32Array::from(vec![self.counter]),
                )?;
                Ok(AdoraStatus::Continue)
            }
            Event::Stop => Ok(AdoraStatus::Stop),
            _ => Ok(AdoraStatus::Continue),
        }
    }
}
}

Cargo.toml:

[lib]
crate-type = ["cdylib"]

[dependencies]
adora-operator-api = { workspace = true }
arrow = "53"

Python Operator

nodes:
  - id: my-node
    operator:
      python: my_operator.py
      inputs:
        data: source/output
      outputs:
        - result
# my_operator.py
class Operator:
    def __init__(self):
        self.counter = 0

    def on_event(self, event, send_output):
        if event["type"] == "INPUT":
            self.counter += 1
            send_output("result", pa.array([self.counter]))

Distributed Deployments

Setup

# Machine A (coordinator + daemon)
adora up

# Machine B (daemon only, pointing to coordinator on Machine A)
adora daemon --interface 0.0.0.0 --coordinator-addr 192.168.1.10 --machine-id B

# Machine C (same)
adora daemon --interface 0.0.0.0 --coordinator-addr 192.168.1.10 --machine-id C

Dataflow with Machine Assignment

nodes:
  - id: camera
    _unstable_deploy:
      machine: robot
    path: ./camera-driver
    outputs:
      - frames

  - id: inference
    _unstable_deploy:
      machine: gpu-server
    path: ./ml-model
    inputs:
      frames: camera/frames
    outputs:
      - predictions

  - id: actuator
    _unstable_deploy:
      machine: robot
    path: ./actuator-driver
    inputs:
      commands: inference/predictions

Build and Start

# From any machine with coordinator access
adora build dataflow.yml       # distributed build on target machines
adora start dataflow.yml --name my-robot --attach

Monitor

# Resource usage across all machines
adora top

# Logs from any node regardless of machine
adora logs my-robot inference --follow

# List all dataflows
adora list

Coordinator Persistence

For production, use the redb store backend so the coordinator survives restarts:

adora coordinator --store redb

State is persisted to ~/.adora/coordinator.redb. On restart, stale dataflows are marked as failed and the coordinator resumes normal operation.

For managed cluster deployments (cluster.yml, SSH-based lifecycle, label scheduling, systemd services, rolling upgrades), see the Distributed Deployment Guide.


Troubleshooting

For a comprehensive debugging guide covering record/replay workflows, topic inspection, resource monitoring, and end-to-end debugging scenarios, see Debugging and Observability Guide.

Common Issues

“Could not connect to adora-coordinator”

  • Run adora up first, or check ADORA_COORDINATOR_ADDR/ADORA_COORDINATOR_PORT
  • Verify with adora status

“publish_all_messages_to_zenoh not enabled”

  • Use --debug flag: adora start dataflow.yml --debug or adora run dataflow.yml --debug
  • Or add to your dataflow YAML:
    _unstable_debug:
      publish_all_messages_to_zenoh: true
    
  • Required for topic echo, topic hz, topic info

adora top requires an interactive terminal”

  • These TUI commands need a real terminal (not piped output)
  • Same applies to topic hz

Node not receiving inputs

  • Check that output names match: source_node/output_id
  • Verify the source node lists the output in its outputs: array
  • Check adora topic list for available topics

Logs not appearing

  • Check --log-level setting (default stdout shows everything)
  • Check min_log_level in YAML (filters at source)
  • For distributed: verify coordinator/daemon connectivity

Build fails with git source

  • Verify git: URL is accessible
  • Check that branch, tag, or rev exists
  • Build command runs from the git repo root, not the dataflow directory

Debug Workflow

# 1. Full environment diagnosis
adora doctor --dataflow dataflow.yml

# 2. Start with verbose logging and debug topics
adora run dataflow.yml --log-level trace --debug

# 3. Inspect a specific node
adora node info -d my-dataflow problem-node

# 4. Monitor specific node logs
adora logs my-dataflow problem-node --follow --level debug

# 5. Check resource usage
adora top

# 6. Inspect topic data
adora topic echo -d my-dataflow problem-node/output

# 7. Publish test data to a topic
adora topic pub -d my-dataflow problem-node/input '[1, 2, 3]'

# 8. Measure frequencies
adora topic hz -d my-dataflow --window 5

# 9. View/modify runtime parameters
adora param list -d my-dataflow problem-node
adora param set -d my-dataflow problem-node threshold 42

# 10. Restart a misbehaving node without stopping the dataflow
adora node restart -d my-dataflow problem-node

# 11. View coordinator traces (no external infra needed)
adora trace list
adora trace view <trace-id-prefix>

# 12. Visualize dataflow graph
adora graph dataflow.yml --open

Log File Locations

out/
  <dataflow-uuid>/
    log_<node-id>.jsonl          # current log
    log_<node-id>.1.jsonl        # rotated (previous)
    log_<node-id>.2.jsonl        # rotated (older)

Read directly with:

adora logs --local --all-nodes
adora logs --local <node-name> --tail 50

Logging

Adora provides a structured logging system for real-time robotics and AI dataflows. Logs are captured per-node as structured JSONL files, forwarded to the coordinator for live streaming, and optionally routed through the dataflow graph as data messages.

Which Logging Approach Should I Use?

Start here if you’re unsure which approach fits your use case.

I want to…ApproachConfig
Log from PythonUse Python’s logging module (auto-bridged)Nothing – just import logging
Log from RustUse node.log_info() / node.log_error() etc.Nothing – works out of the box
Log from C/C++Use adora_log() / log_message()Nothing – works out of the box
Filter noisy nodesSet min_log_level in YAMLPer-node YAML field
Watch all logs in one placeSubscribe to adora/logs virtual inputinputs: logs: adora/logs
Process one node’s logs as dataUse send_logs_as on that nodePer-node YAML + wire the output
Rotate log filesSet max_log_size in YAMLPer-node YAML field
Build a custom log sinkUse adora-log-utils crateRust dependency
Filter CLI displayUse --log-level / --log-filter flagsCLI flags or env vars

Language-Specific Quick Start

Python – the simplest path is Python’s built-in logging module:

import logging
from adora import Node

node = Node()  # Automatically bridges Python logging -> adora

logging.info("Sensor started")       # Captured as structured "info" log
logging.warning("High temp: 42C")    # Captured as structured "warn" log
print("raw debug output")            # Captured as "stdout" level

When Node() is created, it installs a handler that routes all Python logging calls through Rust’s tracing system. The daemon parses these as structured log entries with level, message, file, and line number. No extra configuration needed.

You can also use the explicit API for structured fields:

node.log_info("Reading acquired")
node.log("info", "Reading acquired", fields={"sensor_id": "temp-01"})

Rust – use the node API convenience methods:

#![allow(unused)]
fn main() {
let (node, mut events) = AdoraNode::init_from_env()?;

// Convenience methods (recommended for most cases)
node.log_info("Sensor started");
node.log_warn("High temperature");

// With structured fields
let mut fields = BTreeMap::new();
fields.insert("sensor_id".into(), "temp-01".into());
node.log_with_fields("info", "Reading acquired", None, Some(&fields));
}

Alternatively, Rust nodes can use the tracing crate. When adora’s tracing subscriber is initialized (via init_tracing()), tracing::info!() etc. output structured JSON to stdout, which the daemon parses automatically:

#![allow(unused)]
fn main() {
// Also works -- parsed as structured logs by the daemon
tracing::info!("Sensor started");
tracing::warn!(sensor_id = "temp-01", "High temperature");
}

Use node.log_*() when you want explicit control over the log format. Use tracing::*!() when you want ecosystem integration (spans, instrumentation, OpenTelemetry). Both produce identical structured log entries in the daemon.

C – use the adora_log() function:

adora_log(ctx, "info", 4, "Sensor started", 14);

C++ – use the log_message() function:

log_message(node.send_output, "info", "Sensor started");

Features at a Glance

FeatureScopeConfig
Log level filteringCLI display--log-level, ADORA_LOG_LEVEL
Output formatsCLI display--log-format, ADORA_LOG_FORMAT
Per-node level overridesCLI display--log-filter, ADORA_LOG_FILTER
Source-level filteringPer-node YAMLmin_log_level
Stdout-as-data routingPer-node YAMLsend_stdout_as
Structured log routingPer-node YAMLsend_logs_as
Log file rotationPer-node YAMLmax_log_size
Rotation file limitPer-node YAMLmax_rotated_files
Node log APIRust/Python/C/C++ nodenode.log(), adora_log(), etc.
Log utilities libraryRust crateadora-log-utils
Log aggregationDataflow inputadora/logs virtual input
Time-range filteringadora logs--since, --until
Live log streamingadora logs--follow
Text searchadora logs--grep
Local log readingadora logs--local, --all-nodes

Log File Format

Each node produces a JSONL file (one JSON object per line) at:

<working_dir>/out/<dataflow_uuid>/log_<node_id>.jsonl

Each line has this structure:

{
  "timestamp": "2024-01-15T10:30:00.123Z",
  "level": "info",
  "node_id": "sensor",
  "message": "Starting sensor...",
  "target": "sensor::module",
  "fields": { "key": "value" }
}
FieldTypeDescription
timestampstringRFC3339 timestamp with millisecond precision
levelstring"error", "warn", "info", "debug", "trace", or "stdout"
node_idstringNode ID
messagestringThe log message text
targetstring?Rust module target (e.g. "sensor::module"), null if absent
fieldsobject?Structured key-value fields from the logging framework. Trust model: fields originate from node stdout and are passed through without sanitization. In mixed-trust environments, log consumers should validate field contents before acting on them

How Node Output Becomes Log Entries

The daemon captures each line of stdout/stderr from a node process and attempts to parse it as a structured log message (JSON with level, message, timestamp, and optional fields). If parsing succeeds, the structured fields are preserved. If parsing fails, the raw line becomes a "stdout"-level entry.

This means nodes using Rust’s tracing or log crate with JSON output get full structured logging automatically. Nodes that simply println! produce "stdout"-level entries.


Viewing Logs: adora run

When running a dataflow with adora run, logs from all nodes are displayed in real-time on the terminal.

Flags

adora run dataflow.yml [OPTIONS]
FlagDefaultEnv VarDescription
--log-level LEVELstdoutADORA_LOG_LEVELMinimum level to display
--log-format FORMATprettyADORA_LOG_FORMATOutput format: pretty, json, compact
--log-filter FILTERnoneADORA_LOG_FILTERPer-node level overrides

Log Levels

From most to least verbose:

LevelDescription
stdoutEverything including raw stdout from nodes (default)
traceFine-grained diagnostic messages
debugDeveloper-level diagnostic messages
infoGeneral informational messages
warnWarning conditions
errorError conditions only

Setting --log-level info hides stdout, trace, and debug messages. The stdout level is a special catch-all that passes everything.

Level Filtering Logic

The level filter uses LogLevelOrStdout::passes():

Message level    Filter level    Displayed?
─────────────    ────────────    ──────────
stdout           stdout          yes
stdout           info            no       (stdout only passes stdout filter)
info             stdout          yes      (any log level passes stdout filter)
debug            info            no       (debug is more verbose than info)
error            info            yes      (error is less verbose than info)

Per-Node Overrides

The --log-filter flag lets you set different levels for different nodes:

adora run dataflow.yml --log-level info --log-filter "sensor=debug,planner=warn"

This shows info and above for all nodes, except sensor (shows debug and above) and planner (shows warn and above).

Format: "node1=level,node2=level" (comma-separated name=level pairs).

Output Formats

Pretty (default) – colored, human-readable:

10:30:00 INFO   sensor: Starting sensor...

10:30:01 INFO   [adora]: spawning node processor

10:30:01 stdout sensor: raw output line
  • Timestamp in local timezone (HH:MM:SS)
  • Level colored: ERROR (red), WARN (yellow), INFO (green), DEBUG (blue), TRACE (dimmed), stdout (italic dimmed blue)
  • Node name in bold with a unique color based on the name
  • System messages prefixed with [adora]
  • Lifecycle messages (spawning, node finished, stopping) get visual separation with blank lines

Json – full LogMessage struct as JSON, one per line:

{"build_id":null,"dataflow_id":"abc-123","node_id":"sensor","level":"INFO","message":"Starting...","timestamp":"2024-01-15T10:30:00Z",...}

Useful for piping to jq or ingesting into log aggregation systems.

Compact – minimal, no color:

10:30:00 INFO sensor: Starting sensor...

Useful for CI/CD environments and log files.


Viewing Logs: adora logs

Read historical logs or stream live logs from a running dataflow.

Basic Usage

# Read logs for a specific node (via coordinator)
adora logs <dataflow_uuid> <node_name>

# Read local log files directly
adora logs --local <node_name>
adora logs --local --all-nodes

# Stream live logs
adora logs <dataflow_uuid> <node_name> --follow
adora logs --local <node_name> --follow

Flags

FlagShortDefaultDescription
--localfalseRead from local out/ directory instead of coordinator
--all-nodesfalseMerge logs from all nodes, sorted by timestamp
--tail N-nallShow only the last N lines
--follow-ffalseStream new log entries as they arrive
--since DURATIONnoneOnly show logs newer than this duration ago
--until DURATIONnoneOnly show logs older than this duration ago
--level LEVELstdoutMinimum log level (env: ADORA_LOG_LEVEL)
--grep PATTERNnoneCase-insensitive text search
--coordinator-addr IP127.0.0.1Coordinator address
--coordinator-port PORTdefaultCoordinator control port

Time Filters

--since and --until accept duration strings relative to now:

# Logs from the last 5 minutes
adora logs --local sensor --since 5m

# Logs from 1 hour ago to 30 minutes ago
adora logs --local sensor --since 1h --until 30m

# Last 10 errors from the past hour
adora logs --local sensor --since 1h --level error --tail 10

Supported duration formats: 30 (seconds), 30s, 5m, 1h, 2d.

--grep performs case-insensitive substring matching against:

  • The log message text
  • The node ID
  • The module target
# Find all timeout-related messages
adora logs --local --all-nodes --grep "timeout"

# Find errors from a specific module
adora logs --local sensor --grep "camera::driver" --level error

Filter Pipeline

All filters are applied in this order:

Read/Parse -> Time Filters -> Grep -> Tail -> Display

When --since, --until, or --grep are used in coordinator mode, the CLI fetches all logs from the server (ignoring --tail server-side) and applies all filters client-side. This ensures correct results when combining filters.

Local vs Coordinator Mode

Local mode (--local) reads JSONL files directly from the out/ directory in the current working directory. No coordinator or daemon needs to be running. If --all-nodes is used or no node name is given, all log files are merged and sorted by timestamp.

Coordinator mode (default) connects to a running coordinator via WebSocket. The coordinator reads log files from the daemon’s working directory and streams them back. This works for both local and distributed deployments.

Follow Mode

Local follow (--local --follow): Polls log files every 200ms for new content. New lines are parsed, filtered by --grep, and printed. Time/tail filters only apply to the initial historical output.

Coordinator follow (--follow): Opens a WebSocket subscription to the coordinator. The coordinator forwards log messages from the daemon in real-time. Level filtering is applied server-side for efficiency. --grep and --since are applied client-side on the stream.


Environment Variables

All environment variables serve as fallbacks – CLI flags always take precedence.

VariableUsed ByValuesDescription
ADORA_LOG_LEVELadora run, adora logserror, warn, info, debug, trace, stdoutDefault minimum log level
ADORA_LOG_FORMATadora runpretty, json, compactDefault output format
ADORA_LOG_FILTERadora run"node1=level,node2=level"Default per-node overrides
ADORA_QUIETdaemonany valueSuppress log forwarding to display (file writing continues)

Example:

# Set defaults for a development session
export ADORA_LOG_LEVEL=info
export ADORA_LOG_FORMAT=pretty
export ADORA_LOG_FILTER="sensor=debug"

# These are equivalent:
adora run dataflow.yml
adora run dataflow.yml --log-level info --log-format pretty --log-filter "sensor=debug"

# CLI flag overrides env var:
adora run dataflow.yml --log-level debug   # overrides ADORA_LOG_LEVEL=info

YAML Configuration

min_log_level

Filter logs at the source (daemon-side) before they reach log files, the coordinator, or send_logs_as routing.

nodes:
  - id: noisy-sensor
    path: ./target/debug/sensor
    min_log_level: info    # suppress debug/trace/stdout from this node

Valid values: error, warn, info, debug, trace, stdout.

When set, the daemon drops log messages below this level immediately after parsing. This reduces disk I/O, network traffic, and log file size. The filtering uses the same passes() logic as the CLI display filter.

send_stdout_as

Route raw stdout/stderr lines as dataflow output messages.

nodes:
  - id: legacy-node
    path: ./legacy-script.py
    send_stdout_as: raw_output
    outputs:
      - raw_output
      - data

  - id: log-consumer
    inputs:
      logs: legacy-node/raw_output

Each stdout/stderr line is sent as an Arrow-encoded string. This is useful for integrating legacy nodes that output data on stdout (e.g., Python scripts using print()).

Both send_stdout_as and normal log file writing happen – stdout routing does not suppress log files.

send_logs_as

Route parsed structured log entries as dataflow output messages.

nodes:
  - id: sensor
    path: ./target/debug/sensor
    send_logs_as: log_entries
    outputs:
      - data
      - log_entries

  - id: log-aggregator
    inputs:
      sensor_logs: sensor/log_entries

Unlike send_stdout_as, this only sends lines that were successfully parsed as structured logs (not raw stdout). Each entry is serialized as a full JSON LogMessage string. The min_log_level filter applies before routing – suppressed messages are not sent.

Use this to build log aggregation, alerting, or monitoring nodes within the dataflow itself.

adora/logs – Automatic Log Aggregation

Subscribe to logs from all nodes with a single input line – no manual wiring needed:

nodes:
  - id: sensor
    path: sensor.py
    inputs:
      tick: adora/timer/millis/200
    outputs:
      - reading

  - id: processor
    path: processor.py
    inputs:
      reading: sensor/reading
    outputs:
      - result

  - id: log-viewer
    path: log_viewer.py
    inputs:
      logs: adora/logs              # all nodes, all levels
      errors: adora/logs/error      # only error+ from all nodes
      sensor: adora/logs/info/sensor  # info+ from one node

The adora/logs virtual input works like adora/timer – the daemon handles subscription internally. Each log message arrives as a JSON-encoded LogMessage string in an Arrow array. To prevent infinite loops, a node never receives its own log messages.

Syntax:

InputDescription
adora/logsAll logs from all nodes
adora/logs/<level>Logs at <level> or above from all nodes
adora/logs/<level>/<node-id>Logs at <level> or above from a specific node

Levels: stdout, error, warn, info, debug, trace.

When to use adora/logs vs send_logs_as:

adora/logssend_logs_as
ScopeAll nodes at onceOne node at a time
YAML changesOnly the consumerEach source node
Adding a nodeZero wiring changesMust update consumer
Use caseDashboard, monitoringPer-node log processing

See examples/log-aggregator/ for a complete working example.

max_log_size

Enable size-based log file rotation.

nodes:
  - id: sensor
    path: ./target/debug/sensor
    max_log_size: "50MB"
ValueBytes
"1KB" or "1K"1,024
"50MB" or "50M"52,428,800
"1GB" or "1G"1,073,741,824
"1000"1,000 (plain number = bytes)

When the active log file exceeds the configured size, the daemon:

  1. Flushes and closes the current file
  2. Renames existing rotated files: .4.jsonl -> .5.jsonl, .3.jsonl -> .4.jsonl, etc.
  3. Renames the current file: log_sensor.jsonl -> log_sensor.1.jsonl
  4. Creates a fresh log_sensor.jsonl
  5. Deletes any file beyond the rotation limit (default 5, configurable via max_rotated_files)

Naming convention:

log_sensor.jsonl       # current (active)
log_sensor.1.jsonl     # previous
log_sensor.2.jsonl     # older
log_sensor.3.jsonl
log_sensor.4.jsonl
log_sensor.5.jsonl     # oldest (deleted on next rotation)

Maximum disk usage per node: max_log_size * (1 + max_rotated_files) (1 active + N rotated).

Without max_log_size, log files grow unbounded. For long-running dataflows, always set this.

The adora logs --local command automatically reads all rotated files for a node and merges them in chronological order (oldest rotated file first, current file last).

max_rotated_files

Control how many rotated log files to keep (default: 5, range: 1-100).

nodes:
  - id: sensor
    path: ./target/debug/sensor
    max_log_size: "50MB"
    max_rotated_files: 10    # keep 10 rotated files instead of 5

With max_rotated_files: 10 and max_log_size: "50MB", maximum disk usage is 50MB * 11 = 550MB per node. Lower values save disk space; higher values preserve more history.

Runtime Node Restrictions

For runtime nodes (operators), only one of each logging field is allowed per runtime:

# OK -- single operator
nodes:
  - id: runtime-node
    operator:
      python: process.py
      send_logs_as: logs
      min_log_level: info
      max_log_size: "100MB"

# ERROR -- multiple operators with conflicting configs
nodes:
  - id: runtime-node
    operators:
      - id: op1
        python: a.py
        send_logs_as: logs1
      - id: op2
        python: b.py
        send_logs_as: logs2    # Error: multiple send_logs_as

When a single operator in a runtime sets these fields, the output name is prefixed with the operator ID (e.g., op1/logs).


Node Log API

Nodes can emit structured log messages programmatically using the node API. These are equivalent to writing JSON-formatted log lines to stdout – the daemon parses them identically.

Rust

#![allow(unused)]
fn main() {
use adora_node_api::AdoraNode;
use std::collections::BTreeMap;

let (node, mut events) = AdoraNode::init_from_env()?;

// General log with level string and optional target
node.log("info", "sensor initialized", Some("sensor::init"));

// Convenience methods (no target parameter)
node.log_error("connection failed");
node.log_warn("temperature elevated");
node.log_info("reading acquired");
node.log_debug("raw bytes received");
node.log_trace("entering loop iteration");

// Structured fields (key-value context preserved through send_logs_as)
let mut fields = BTreeMap::new();
fields.insert("sensor_id".to_string(), "temp-01".to_string());
fields.insert("reading".to_string(), "42.5".to_string());
node.log_with_fields("info", "reading acquired", None, Some(&fields));
}

The level parameter accepts "error", "warn" (or "warning"), "info", "debug", "trace". Unknown levels default to "info". Fields are capped at 60 KB total to match the downstream 64 KB parse limit.

Python

Python nodes have three ways to log, all producing structured log entries:

from adora import Node
import logging

node = Node()

# Option 1: Python's logging module (recommended -- auto-bridged by Node())
logging.info("sensor initialized")
logging.warning("temperature elevated")
logging.debug("raw bytes: %s", data)

# Option 2: Explicit adora API with level string
node.log("info", "sensor initialized", target="sensor.init")
node.log("info", "reading acquired", fields={"sensor_id": "temp-01", "reading": "42.5"})

# Option 3: Convenience methods
node.log_error("connection failed")
node.log_warn("temperature elevated")
node.log_info("reading acquired")
node.log_debug("raw bytes received")
node.log_trace("entering loop iteration")

# This also works but produces "stdout"-level entries (no structure):
print("raw output")

How the Python logging bridge works: When Node() is created, it installs a custom logging.Handler that routes all Python logging calls through Rust’s tracing system. The daemon parses these as structured log entries with level, message, file path, and line number. This happens automatically – no configuration needed.

MethodStructured?Fields support?When to use
logging.info()YesNo (use extra= for custom formatters)General-purpose logging
node.log("info", msg, fields={...})YesYesWhen you need structured key-value context
node.log_info(msg)YesNoQuick one-liner, same as node.log("info", msg)
print()No (stdout level)NoLegacy code, quick debugging

Common pitfall: Do not call logging.basicConfig() before creating Node(). The node constructor sets up the logging bridge; calling basicConfig() first may install a conflicting handler. If you need custom formatters, configure them after Node() creation.

C

#include "node_api.h"

void *ctx = init_adora_context_from_env();
const char *level = "info";
const char *msg = "sensor initialized";
adora_log(ctx, level, strlen(level), msg, strlen(msg));

C++

// Via the cxx bridge
auto node = init_adora_node();
log_message(node.send_output, "info", "sensor initialized");

Log Utilities Library (adora-log-utils)

The adora-log-utils crate provides parsing, merging, filtering, and formatting utilities for working with LogMessage entries in custom sink nodes. Use it when building nodes that consume log data via send_logs_as.

API

#![allow(unused)]
fn main() {
use adora_log_utils;

// Parse a LogMessage from JSON (as received from send_logs_as)
let log = adora_log_utils::parse_log(json_str)?;

// Parse directly from Arrow input data (convenience for event handlers)
let log = adora_log_utils::parse_log_from_arrow(&data)?;

// Merge multiple log streams into a single timeline
let merged = adora_log_utils::merge_by_timestamp(vec![stream_a, stream_b]);

// Filter by minimum level
let errors = adora_log_utils::filter_by_level(&logs, &min_level);

// Format as JSON (one line, no trailing newline)
let json = adora_log_utils::format_json(&log);

// Format as compact single-line: "<timestamp> <node> <LEVEL>: <message>"
let compact = adora_log_utils::format_compact(&log);

// Format as pretty: "[<timestamp>][<LEVEL>][<node>] <message>"
let pretty = adora_log_utils::format_pretty(&log);
}

Dependency

Add to your sink node’s Cargo.toml:

[dependencies]
adora-log-utils = { workspace = true }

Log Sink Examples

Three example sink nodes demonstrate how to consume logs routed via send_logs_as and forward them to external destinations.

File Sink (examples/log-sink-file/)

Merges log streams from multiple nodes into a single JSONL file. Useful for unified log collection.

nodes:
  - id: sensor
    path: sensor.py
    send_logs_as: log_entries
    inputs:
      tick: adora/timer/millis/200
    outputs:
      - reading
      - log_entries

  - id: processor
    path: processor.py
    send_logs_as: log_entries
    inputs:
      reading: sensor/reading
    outputs:
      - result
      - log_entries

  - id: file_sink
    path: log-sink-file
    inputs:
      sensor_logs: sensor/log_entries
      processor_logs: processor/log_entries
    env:
      LOG_FILE: "./combined.jsonl"

The file sink reads LOG_FILE from the environment (default ./combined.jsonl), parses each incoming Arrow message with adora_log_utils::parse_log_from_arrow(), formats it as JSON, and appends it to the file.

TCP Sink (examples/log-sink-tcp/)

Forwards log entries over a TCP socket to a remote log collector. Useful for embedded systems that lack local filesystems and need to stream logs off-device.

nodes:
  - id: source
    path: source.py
    send_logs_as: log_entries
    inputs:
      tick: adora/timer/millis/500
    outputs:
      - data
      - log_entries

  - id: tcp_sink
    path: log-sink-tcp
    inputs:
      logs: source/log_entries
    env:
      SINK_ADDR: "127.0.0.1:9876"

The TCP sink reads SINK_ADDR from the environment (default 127.0.0.1:9876), connects to the server on startup, and sends each log entry as a JSON line. It reconnects automatically on write failure.

Alert Router (examples/log-sink-alert/)

Splits incoming log entries by severity. All logs are forwarded to the all_logs output; only error and warn logs are forwarded to the alerts output. This enables downstream nodes to handle alerts differently (e.g., trigger notifications, write to a dedicated file).

nodes:
  - id: source
    path: my_node.py
    send_stdout_as: log_entries
    inputs:
      tick: adora/timer/millis/200
    outputs:
      - log_entries

  - id: alert_router
    path: log-sink-alert
    inputs:
      logs: source/log_entries
    outputs:
      - all_logs
      - alerts

The source node uses send_stdout_as to route its stdout lines as Arrow string data. The router parses each log entry with adora_log_utils::parse_log_from_arrow(), checks the level, and uses node.send_output() to forward data to the appropriate outputs. Nodes using the node API can alternatively use send_logs_as to route structured logs from node.log().

Building a Custom Sink

To build your own sink node, follow this pattern:

use adora_node_api::{AdoraNode, Event};

fn main() -> eyre::Result<()> {
    let (_node, mut events) = AdoraNode::init_from_env()?;

    while let Some(event) = events.recv() {
        match event {
            Event::Input { data, .. } => {
                let log = adora_log_utils::parse_log_from_arrow(&data)?;
                // Process the log entry: write to file, send over network, etc.
                let json = adora_log_utils::format_json(&log);
                println!("{json}");
            }
            Event::Stop(_) => break,
            _ => {}
        }
    }
    Ok(())
}

How the Daemon Processes Logs

Understanding the internal pipeline helps with debugging and tuning. For each node, the daemon runs a dedicated async task that processes log lines in order:

Node Process (stdout/stderr)
    |
    v
[1] Capture: lines buffered in mpsc channel (capacity 100)
    |
    v
[2] send_stdout_as: raw line -> Arrow data -> dataflow output
    |
    v
[3] Parse: try JSON structured log, fall back to Stdout-level
    |
    v
[4] min_log_level filter: drop messages below threshold
    |
    v
[5] send_logs_as: LogMessage -> JSON -> Arrow data -> dataflow output
    |
    v
[6] Write JSONL: compact format to log file, track bytes written
    |
    v
[7] Rotation check: if bytes_written >= max_log_size, rotate files
    |
    v
[8] Forward: send LogMessage to display channel (unless ADORA_QUIET)
    |
    v
[9] Sync: fsync log file to disk

Key details:

  • Step 2 happens before parsing, so send_stdout_as captures every line including non-structured output
  • Step 4 happens before Steps 5-8, so min_log_level suppresses messages from all downstream processing
  • Step 5 only fires for successfully parsed structured logs (Step 3 success path)
  • Step 8 sends to either a flume channel (adora run direct mode) or the coordinator (distributed mode)
  • Step 9 calls sync_all() after every write, ensuring durability at the cost of some I/O overhead

Structured Log Parsing

When a node emits JSON-formatted log output (e.g., from tracing-subscriber with JSON formatting), the daemon extracts:

  • level: log severity
  • message: the log text
  • target: module path
  • timestamp: when the log was emitted
  • fields: arbitrary key-value pairs
  • build_id, dataflow_id, node_id, daemon_id: extracted from fields as fallback

The daemon also sets dataflow_id, node_id, and daemon_id on all messages to ensure they are always present in the log file.


Coordinator Log Streaming Protocol

When a daemon runs under a coordinator (distributed mode), log forwarding works via WebSocket:

  1. Daemon -> Coordinator: Each LogMessage is wrapped in DaemonEvent::Log(message) and sent over the daemon’s WebSocket connection
  2. Coordinator storage: The coordinator stores/forwards logs
  3. CLI subscription: The CLI sends ControlRequest::LogSubscribe { dataflow_id, level } over its WebSocket connection
  4. Server-side filtering: The coordinator only forwards messages where msg_level <= subscription_level. This reduces network traffic for filtered subscriptions
  5. CLI receive: Messages arrive as serialized LogMessage structs

The --level flag maps to log::LevelFilter:

  • stdout -> LevelFilter::Trace (most permissive, receives everything)
  • info -> LevelFilter::Info (receives Error, Warn, Info)
  • etc.

Complete YAML Reference

nodes:
  - id: sensor
    path: ./target/debug/sensor
    outputs:
      - data
      - raw_output       # for send_stdout_as
      - log_entries       # for send_logs_as

    # Source-level log filtering (daemon-side)
    min_log_level: info          # suppress debug/trace/stdout

    # Route stdout to dataflow
    send_stdout_as: raw_output   # every stdout line becomes a data message

    # Route structured logs to dataflow
    send_logs_as: log_entries    # parsed log entries become data messages

    # Log file rotation
    max_log_size: "50MB"         # rotate when file exceeds 50MB
    max_rotated_files: 5         # keep 5 rotated files (default, range 1-100)

    inputs:
      tick: adora/timer/millis/100

Complete Example

The examples/python-logging/ directory contains a runnable three-node pipeline that exercises every logging feature:

sensor (noisy, high-volume) --> processor (structured logs) --> monitor (log aggregator)

Dataflow configuration highlights:

nodes:
  - id: sensor
    path: sensor.py
    min_log_level: info       # suppress debug noise at source
    max_log_size: "1KB"       # small for demo (triggers rotation quickly)
    inputs:
      tick: adora/timer/millis/50
    outputs:
      - reading

  - id: processor
    path: processor.py
    send_logs_as: log_entries  # route structured logs as data
    inputs:
      reading: sensor/reading
    outputs:
      - result
      - log_entries

  - id: monitor
    path: monitor.py
    inputs:
      logs: processor/log_entries
      reading: sensor/reading

What each node demonstrates:

  • sensor – Mixes print() (raw stdout), logging.info(), logging.debug(), and logging.warning(). With min_log_level: info, debug messages are dropped by the daemon before reaching log files. With max_log_size: "1KB", log rotation kicks in after a few seconds.
  • processor – Uses send_logs_as: log_entries to route its structured log entries as dataflow data. Raw print() output is not routed (only parsed structured entries are).
  • monitor – Subscribes to processor/log_entries and counts warnings/errors, demonstrating in-dataflow log aggregation.

Direct mode (adora run – single process, good for quick testing):

# Basic run
adora run examples/python-logging/dataflow.yml --stop-after 5s

# Only warnings and above
adora run examples/python-logging/dataflow.yml --log-level warn --stop-after 5s

# Per-node overrides
adora run examples/python-logging/dataflow.yml --log-filter "monitor=debug,sensor=warn" --stop-after 5s

# JSON output for machine parsing
adora run examples/python-logging/dataflow.yml --log-format json --stop-after 3s

# Environment variable control
ADORA_LOG_LEVEL=warn adora run examples/python-logging/dataflow.yml --stop-after 5s

Distributed mode (adora up + adora start – coordinator/daemon architecture, required for multi-machine deployments):

# Start infrastructure
adora up

# Start attached (live log stream)
adora start examples/python-logging/dataflow.yml --attach

# Or start detached and query logs separately
adora start examples/python-logging/dataflow.yml
adora logs <dataflow-id> sensor --follow                    # stream one node
adora logs <dataflow-id> sensor --follow --level warn       # only warnings
adora logs <dataflow-id> --all-nodes --tail 20              # last 20 lines
adora logs <dataflow-id> processor --grep "error" --since 5m  # targeted search

In distributed mode, logs flow Node -> Daemon -> Coordinator -> CLI over WebSocket. The coordinator buffers log messages until a subscriber connects, so you won’t miss logs even if you attach late. YAML-level settings (min_log_level, send_logs_as, max_log_size) work identically since they are applied at the daemon.

adora runadora start
Display filtering--log-level, --log-format, --log-filter--level on adora logs
Per-node overrides--log-filter "sensor=debug"Separate adora logs per node
Remote nodesNoYes
Live streamingAlways attached--attach or adora logs --follow

Post-run log analysis (works the same for both modes):

# Read all local logs
adora logs --local --all-nodes --tail 20

# Search for warnings in sensor logs
adora logs --local sensor --grep "high temp"

# Check that rotation created multiple files
ls -la out/*/log_sensor*.jsonl

Use Case Scenarios

1. Debugging a Noisy Sensor Pipeline

A camera sensor node floods the logs with debug messages, making it hard to see errors from other nodes.

nodes:
  - id: camera
    path: ./target/debug/camera
    min_log_level: warn          # suppress info/debug/trace at the source
    max_log_size: "10MB"         # limit disk usage

  - id: detector
    path: ./target/debug/detector

  - id: planner
    path: ./target/debug/planner
# During development: see everything from detector, only warnings from camera
adora run dataflow.yml --log-level debug --log-filter "camera=warn,detector=debug"

# In production: only errors
export ADORA_LOG_LEVEL=error
adora run dataflow.yml

What happens:

  • Camera node’s debug/info messages are dropped by the daemon before reaching the log file (min_log_level: warn)
  • The CLI further filters display based on --log-filter
  • Log files rotate at 10MB, keeping at most 60MB on disk for the camera node

2. Log Aggregation Within the Dataflow

Build an in-dataflow log monitoring node that watches for errors across multiple nodes and sends alerts.

nodes:
  - id: camera
    path: ./target/debug/camera
    send_logs_as: logs
    outputs:
      - frames
      - logs

  - id: detector
    path: ./target/debug/detector
    send_logs_as: logs
    outputs:
      - detections
      - logs

  - id: log-monitor
    path: ./target/debug/log-monitor
    inputs:
      camera_logs: camera/logs
      detector_logs: detector/logs
    outputs:
      - alerts

Node-side handling in the log monitor (using adora-log-utils):

#![allow(unused)]
fn main() {
use adora_node_api::{AdoraNode, Event};
use adora_message::common::{LogLevel, LogLevelOrStdout};

let (mut node, mut events) = AdoraNode::init_from_env()?;
while let Some(event) = events.recv() {
    match event {
        Event::Input { data, .. } => {
            let log = adora_log_utils::parse_log_from_arrow(&data)?;

            let is_error = matches!(log.level,
                LogLevelOrStdout::LogLevel(LogLevel::Error));

            if is_error || log.message.contains("timeout") {
                // Send alert downstream
                node.send_output("alerts", /* ... */)?;
            }
        }
        Event::Stop(_) => break,
        _ => {}
    }
}
}

See also the Log Sink Examples section for complete runnable examples.

3. Post-Mortem Debugging of a Crash

After a dataflow crashes, investigate what happened in the last few minutes.

# Find available dataflows
ls out/

# Read the last 50 lines from all nodes around the crash
adora logs --local --all-nodes --tail 50

# Focus on errors in the last 5 minutes
adora logs --local --all-nodes --since 5m --level error

# Search for a specific error pattern
adora logs --local --all-nodes --grep "out of memory"

# Drill into a specific node
adora logs --local detector --since 2m

# Export as JSON for external analysis
adora run dataflow.yml --log-format json 2>logs.json

4. Long-Running Production Dataflow

A dataflow runs for days or weeks. Without log rotation, disk space fills up.

nodes:
  - id: ingest
    path: ./target/debug/ingest
    min_log_level: info        # no debug noise in production
    max_log_size: "100MB"      # ~600MB max per node (100MB * 6)
    restart_policy: always
    inputs:
      tick: adora/timer/millis/1000
    outputs:
      - data

  - id: processor
    path: ./target/debug/processor
    min_log_level: warn        # only warnings and errors
    max_log_size: "50MB"
    restart_policy: on-failure
    inputs:
      data: ingest/data
    outputs:
      - results

  - id: writer
    path: ./target/debug/writer
    min_log_level: error       # minimal logging
    max_log_size: "20MB"
    inputs:
      results: processor/results

Disk budget:

  • ingest: up to 600MB (100MB x 6 files)
  • processor: up to 300MB (50MB x 6 files)
  • writer: up to 120MB (20MB x 6 files)
  • Total: ~1GB maximum disk usage for all logs

5. Live Monitoring of a Distributed Deployment

Multiple daemons running on different machines, monitored from a central workstation.

# Start infrastructure (coordinator + local daemon)
adora up

# On remote machines, start a daemon pointing to the coordinator:
#   adora daemon --coordinator-addr 192.168.1.10

# Start the dataflow (detached)
adora start dataflow.yml

# Open targeted log streams in separate terminals:

# Terminal 1: all sensor warnings
adora logs <dataflow-id> sensor --follow --level warn

# Terminal 2: processor errors with text search
adora logs <dataflow-id> processor --follow --level error --grep "timeout"

# Terminal 3: all nodes merged
adora logs <dataflow-id> --all-nodes --follow

# Terminal 4: historical + live (errors from the last hour, then stream)
adora logs <dataflow-id> processor --since 1h --level error --follow

# Monitor a remote coordinator from another machine:
adora logs <dataflow-id> sensor --follow --coordinator-addr 192.168.1.10

How it works internally:

  1. CLI connects to the coordinator (default localhost:6013, or --coordinator-addr)
  2. For historical logs: request-reply with filters applied client-side (--since, --grep, --tail)
  3. For --follow: opens a WebSocket subscription to the coordinator
  4. Coordinator filters by --level server-side before forwarding (reduces network traffic)
  5. CLI applies --grep and --since client-side on the live stream
  6. Coordinator buffers log messages until a subscriber connects, so late-joining subscribers see recent history

6. CI/CD Pipeline with Structured Logging

In CI, use JSON format for machine-parseable output and compact format for readable logs.

# Machine-parseable logs for CI tooling
adora run dataflow.yml --log-format json --stop-after 30s 2>test-logs.json

# Compact logs for CI console output
adora run dataflow.yml --log-format compact --log-level info --stop-after 30s

# Post-run analysis: count errors per node
adora logs --local --all-nodes --level error | wc -l

With JSON format, each line is a complete LogMessage that can be processed by jq, log aggregators, or custom scripts:

# Extract error messages with jq
cat test-logs.json | jq -r 'select(.level == "ERROR") | "\(.node_id): \(.message)"'

Performance Considerations

Logging adds I/O overhead proportional to log volume. Here’s how to tune it:

min_log_level is the most impactful setting. It filters at the daemon before any I/O: no log file write, no coordinator forwarding, no send_logs_as routing. A node emitting 1000 debug lines/sec at min_log_level: info generates zero overhead for those lines.

send_logs_as adds a dataflow message per log line. Each parsed log entry is serialized to JSON, converted to Arrow, and sent through the dataflow. For high-volume nodes, this can consume significant bandwidth. Use min_log_level to limit what gets routed.

adora/logs subscribers share a single serialization. The daemon converts each log line to Arrow once and clones the result for each subscriber. The cost scales linearly with subscriber count, not log volume x subscriber count. For most dataflows (1-3 log subscribers), this is negligible.

Log line size is capped at 1 MB. Lines longer than 1 MB from node stdout/stderr are truncated to prevent heap exhaustion. This protects against buggy nodes that dump large binary data to stdout.

Log file rotation is recommended for long-running dataflows. Without max_log_size, log files grow unbounded. A node emitting 100 lines/sec at ~200 bytes/line fills 1 GB in ~14 hours.

Recommended production settings:

nodes:
  - id: my-node
    path: ./my-node
    min_log_level: info        # drop debug/trace at source
    max_log_size: "50MB"       # rotate at 50MB
    max_rotated_files: 5       # keep 5 rotated files (300MB max)

Best Practices

Set min_log_level in production. Source-level filtering at the daemon prevents debug noise from reaching log files and the network. This is the most effective way to reduce log volume since it filters before any I/O.

Always set max_log_size for long-running dataflows. Without rotation, a single noisy node can fill the disk. Start with "50MB" (300MB total per node with rotation) and adjust based on your storage budget. Use max_rotated_files to tune how much history to keep (default 5, range 1-100).

Use environment variables for team defaults. Set ADORA_LOG_LEVEL and ADORA_LOG_FORMAT in your shell profile or CI configuration. Individual developers can override with CLI flags.

Use --log-filter during development. Instead of changing YAML config, use per-node display overrides to focus on the node you’re debugging: --log-filter "my-node=debug".

Use send_logs_as for operational monitoring. Build monitoring nodes that watch for error patterns, compute error rates, or forward alerts. This keeps monitoring logic within the dataflow graph. Use adora-log-utils to parse and format log entries in custom sink nodes (see examples/log-sink-file/ and examples/log-sink-tcp/).

Prefer send_logs_as over send_stdout_as for structured data. send_stdout_as captures every stdout line (including raw prints), while send_logs_as only captures parsed structured log entries with full metadata.

Use --local for post-mortem debugging. After a crash, adora logs --local --all-nodes works without a running coordinator and merges all node logs chronologically.

Combine --since with --grep for targeted debugging. Instead of scrolling through thousands of lines, narrow the window: adora logs --local sensor --since 5m --grep "error".

Use JSON format for log pipelines. When feeding logs to external systems (ELK, Grafana Loki, Datadog), use --log-format json for structured ingestion.

Debugging and Observability Guide

This guide covers how to debug, record, replay, and monitor adora dataflows. It is written for new users who want to understand what went wrong in a dataflow, measure performance, or reproduce issues offline.


Table of Contents


Prerequisites

Before using topic inspection commands (topic echo, topic hz, topic info), enable debug message publishing using either approach:

Option 1: CLI flag (recommended)

adora start dataflow.yml --debug
adora run dataflow.yml --debug

Option 2: YAML descriptor

_unstable_debug:
  publish_all_messages_to_zenoh: true

This tells the daemon to publish all inter-node messages to Zenoh, where the coordinator can proxy them to CLI clients via WebSocket. Without this flag, topic inspection commands will return an error.

The record, replay, logs, list, top, graph, node info/restart/stop, param, and doctor commands do not require this flag. The topic pub command does require it.


Quick Debugging Checklist

When something goes wrong, follow this sequence:

# 1. Run full environment diagnosis
adora doctor --dataflow dataflow.yml

# 2. What dataflows are active?
adora list

# 3. Inspect the problem node
adora node info -d my-dataflow problem-node

# 4. Check node resource usage
adora top

# 5. Stream logs from the problem node
adora logs my-dataflow problem-node --follow --level debug

# 6. Is the node producing output?
adora topic echo -d my-dataflow problem-node/output

# 7. Inject test data
adora topic pub -d my-dataflow problem-node/input '[1, 2, 3]'

# 8. Is it publishing at the expected rate?
adora topic hz -d my-dataflow --window 5

# 9. Check/modify runtime parameters
adora param list -d my-dataflow problem-node
adora param set -d my-dataflow problem-node debug_level 2

# 10. Restart a misbehaving node (without stopping the dataflow)
adora node restart -d my-dataflow problem-node

# 11. View coordinator traces (no external infra needed)
adora trace list
adora trace view <trace-id-prefix>

# 12. Visualize the dataflow graph
adora graph dataflow.yml --open

# 13. Record for offline analysis
adora record dataflow.yml -o debug-capture.adorec

Record and Replay

Record captures live dataflow messages to a file. Replay substitutes source nodes with recorded data, letting you reproduce behavior without hardware.

Recording a Dataflow

# Record all topics (default output: recording_{timestamp}.adorec)
adora record dataflow.yml

# Specify output file
adora record dataflow.yml -o my-capture.adorec

This injects a hidden __adora_record__ node into the dataflow that subscribes to all node outputs and writes them to an .adorec file. The record node binary (adora-record-node) is auto-built on first use.

The recording runs until you press Ctrl-C or the dataflow stops.

Recording Specific Topics

# Only record camera and lidar
adora record dataflow.yml --topics sensor/image,lidar/points

Topic names use the format node_id/output_id. Available topics can be discovered with adora topic list -d <dataflow>.

Proxy Recording (Remote / Diskless)

When the target machine has no local disk or you want to record on your local machine:

# Start the dataflow first (detached)
adora start dataflow.yml --detach

# Record via WebSocket proxy -- data streams through coordinator to CLI
adora record dataflow.yml --proxy -o capture.adorec

# Record specific topics via proxy
adora record dataflow.yml --proxy --topics sensor/image,lidar/points

How proxy mode works:

  1. The dataflow must already be running (adora start --detach)
  2. The CLI connects to the coordinator via WebSocket
  3. The coordinator subscribes to Zenoh on the CLI’s behalf
  4. Message data streams through WebSocket binary frames to the CLI
  5. The CLI writes the .adorec file locally

This requires publish_all_messages_to_zenoh: true in the descriptor.

When to use --proxy:

  • Embedded targets with no local disk
  • Remote machines where you want the recording on your workstation
  • When you only have WebSocket connectivity (no direct Zenoh access)

When to use default mode (no --proxy):

  • Same machine or shared filesystem
  • High-throughput scenarios (no WebSocket overhead)
  • No need for publish_all_messages_to_zenoh

Replaying a Recording

# Replay at original speed
adora replay recording.adorec

# Replay at 2x speed
adora replay recording.adorec --speed 2.0

# Replay as fast as possible (speed 0)
adora replay recording.adorec --speed 0

Replay works by:

  1. Reading the .adorec file header to get the original dataflow descriptor
  2. Identifying which nodes produced the recorded data
  3. Replacing those source nodes with adora-replay-node instances
  4. Running the modified dataflow – downstream nodes receive replayed data identically to live data

The replay node binary (adora-replay-node) is auto-built on first use.

Replay Options

FlagDefaultDescription
--speed <FLOAT>1.0Playback speed multiplier. 2.0 = 2x, 0.5 = half speed, 0 = as fast as possible
--loopoffLoop the recording continuously
--replace <NODES>all recordedComma-separated list of nodes to replace
--output-yaml <PATH>-Write modified descriptor YAML without running

Selective Replay

Replace only specific source nodes while keeping others live:

# Only replace the sensor node, keep camera live
adora replay recording.adorec --replace sensor

# Replace sensor and lidar, keep everything else live
adora replay recording.adorec --replace sensor,lidar

This is useful when you want to debug a specific processing pipeline with known input data while keeping other parts of the system live.

Dry Run (Output YAML)

Both record and replay support --output-yaml to see the modified descriptor without running:

# See what the record-injected descriptor looks like
adora record dataflow.yml --output-yaml record-modified.yml

# See what the replay-modified descriptor looks like
adora replay recording.adorec --output-yaml replay-modified.yml

Recording File Format

The .adorec format is a simple binary file:

┌──────────────────────────────────┐
│ Header (bincode)                 │
│   version: u32                   │
│   start_nanos: u64               │
│   dataflow_id: Uuid              │
│   descriptor_yaml: Vec<u8>       │
├──────────────────────────────────┤
│ Entry 1 (bincode)                │
│   node_id: String                │
│   output_id: String              │
│   timestamp_offset_nanos: u64    │
│   event_bytes: Vec<u8>           │
├──────────────────────────────────┤
│ Entry 2 ...                      │
├──────────────────────────────────┤
│ ...                              │
├──────────────────────────────────┤
│ Footer (bincode)                 │
│   total_messages: u64            │
│   total_bytes: u64               │
└──────────────────────────────────┘

The event_bytes field contains the raw Timestamped<InterDaemonEvent> bincode payload – the same format used on the wire between daemons. The descriptor_yaml in the header stores the original dataflow descriptor so replay can reconstruct the dataflow.


Node Management

Node Info

Get detailed information about a specific node including its status, inputs, outputs, metrics, and restart count:

adora node info -d my-dataflow camera

# JSON output
adora node info -d my-dataflow camera --format json

Node Restart

Restart a single node without stopping the entire dataflow. Useful for recovering a misbehaving node or picking up configuration changes:

# Restart with default grace period
adora node restart -d my-dataflow camera

# Restart with custom grace period
adora node restart -d my-dataflow camera --grace 10s

The daemon sends a stop event, waits for the grace period, then respawns the node process.

Node Stop

Stop a single node without stopping the entire dataflow:

adora node stop -d my-dataflow camera

# With custom grace period
adora node stop -d my-dataflow camera --grace 5s

Topic Inspection

Topic inspection commands subscribe to live dataflow messages via the coordinator’s WebSocket proxy. They require --debug flag or publish_all_messages_to_zenoh: true.

Listing Topics

# List all topics in a running dataflow
adora topic list -d my-dataflow

# JSON output
adora topic list -d my-dataflow --format json

Shows each output, which node publishes it, and which nodes subscribe to it. This command reads from the descriptor and does not require publish_all_messages_to_zenoh.

Echoing Topic Data

Stream live topic data to the terminal:

# Echo a single topic
adora topic echo -d my-dataflow camera_node/image

# Echo multiple topics
adora topic echo -d my-dataflow robot1/pose robot2/vel

# JSON output (useful for piping to jq or other tools)
adora topic echo -d my-dataflow robot1/pose --format json

# Echo all topics
adora topic echo -d my-dataflow

Each line shows the topic name, Arrow data content, and metadata parameters. Use --format json for machine-readable output:

{"timestamp":1709000000000,"name":"robot1/pose","data":[1.0,2.0,3.0],"metadata":null}

Measuring Frequency

Interactive TUI showing per-topic publish frequency:

# All topics with 10-second sliding window
adora topic hz -d my-dataflow --window 10

# Specific topics with 5-second window
adora topic hz -d my-dataflow robot1/pose robot2/vel --window 5

The TUI displays:

  • Average frequency (Hz)
  • Average, min, max interval
  • Standard deviation
  • Sparkline showing recent activity

Press q or Ctrl-C to exit. Requires an interactive terminal.

Publishing Test Data

Inject data into a running dataflow for testing. Requires publish_all_messages_to_zenoh: true.

# Publish a single Arrow array
adora topic pub -d my-dataflow sensor/threshold '[42]'

# Publish from a JSON file
adora topic pub -d my-dataflow sensor/config --file test-config.json

# Publish multiple messages
adora topic pub -d my-dataflow sensor/trigger '[1]' --count 10

This is useful for:

  • Testing node behavior with known input data
  • Triggering specific code paths in downstream nodes
  • Simulating sensor inputs without hardware

Topic Metadata and Stats

One-shot statistics collection:

# Collect stats for 5 seconds (default)
adora topic info -d my-dataflow camera_node/image

# Collect for 10 seconds
adora topic info -d my-dataflow camera_node/image --duration 10

Reports:

  • Arrow data type
  • Publisher node
  • Subscriber nodes (from descriptor)
  • Message count and bandwidth
  • Publishing frequency

Runtime Parameters

Runtime parameters let you read and modify node configuration while a dataflow is running, without restarting. Parameters are stored in the coordinator and optionally forwarded to running nodes.

# List all parameters for a node
adora param list -d my-dataflow detector

# Get a single parameter
adora param get -d my-dataflow detector confidence

# Set a parameter (value is JSON)
adora param set -d my-dataflow detector confidence 0.8
adora param set -d my-dataflow detector config '{"nms": 0.5, "classes": ["car", "person"]}'

# Delete a parameter
adora param delete -d my-dataflow detector confidence

Parameters are persisted in the coordinator store (in-memory or redb). When a node is running, param set also forwards the new value to the node’s daemon. Nodes can read parameters through the node event stream.

Limits: Keys max 256 bytes, values max 64KB serialized.


Environment Diagnosis

adora doctor performs a comprehensive health check of your environment:

# Basic diagnosis
adora doctor

# Diagnosis + dataflow validation
adora doctor --dataflow dataflow.yml

Checks performed:

  1. Coordinator reachability
  2. Connected daemon status
  3. Active dataflow health
  4. Dataflow YAML validation (if --dataflow provided)

Use this as a first step when debugging any issue, or in CI to validate the environment before running tests.


Trace Inspection

The coordinator captures tracing spans in-memory from adora_coordinator and adora_core crates (up to 4096 spans in a ring buffer). You can view these traces without any external tracing infrastructure (no Jaeger, Tempo, etc. required).

Listing Traces

adora trace list

Shows all captured traces with their root span name, span count, start time, and total duration:

TRACE ID      ROOT SPAN          SPANS  STARTED              DURATION
a1b2c3d4e5f6  spawn_dataflow     12     2026-03-01 10:30:05  1.234s
f8e7d6c5b4a3  build_dataflow     5      2026-03-01 10:29:58  0.500s

Viewing a Trace

# Full trace ID
adora trace view a1b2c3d4-e5f6-7890-abcd-1234567890ab

# Or use a unique prefix
adora trace view a1b2c3d4

Displays spans as an indented tree showing parent-child relationships, log levels, durations, and span fields:

spawn_dataflow [INFO 1.234s] {build_id="abc", session_id="def"}
  build_dataflow [INFO 0.500s]
    download_node [DEBUG 0.200s] {url="..."}
  start_inner [INFO 0.734s]
    spawn_node [INFO 0.100s] {node_id="camera"}
    spawn_node [INFO 0.080s] {node_id="detector"}

When to Use Trace Inspection

  • Quick debugging – see what the coordinator did during a start, stop, or build without setting up Jaeger/Tempo
  • Performance analysis – identify slow spans in dataflow lifecycle operations
  • Deployment troubleshooting – understand the sequence and timing of coordinator operations

For full distributed tracing across daemons and nodes, set ADORA_OTLP_ENDPOINT and use an OTLP-compatible backend.


Resource Monitoring

adora top (also adora inspect top) provides a real-time TUI showing per-node resource usage:

# Default 2-second refresh
adora top

# Custom refresh interval
adora top --refresh-interval 5

# JSON snapshot for scripting/CI
adora top --once | jq .

Displays for each node:

  • CPU usage (% of a single core)
  • Memory (RSS)
  • Node status (Running, Restarting, Degraded, Failed)
  • Restart count
  • Queue depth (pending messages)
  • Network TX/RX (cross-daemon bytes via Zenoh)
  • Disk I/O read/write

Metrics are collected by daemons and reported to the coordinator, so this works for distributed dataflows across multiple machines. Press q or Ctrl-C to exit.

Use --once to print a single JSON snapshot and exit, useful for CI pipelines and monitoring integrations.

Note: CPU percentages are per-core, so values can exceed 100% for multi-threaded nodes. Nodes on different machines may have different CPUs, so percentages are not directly comparable across machines.


Log Analysis

Live Log Streaming

# Stream logs from a specific node
adora logs my-dataflow sensor-node --follow

# Stream logs from all nodes
adora logs my-dataflow --all-nodes --follow

# Filter by log level
adora logs my-dataflow sensor-node --follow --level debug

# Stream with grep filter
adora logs my-dataflow --all-nodes --follow --grep "error"

Without --follow, reads from local log files. With --follow, streams live from the coordinator via WebSocket.

Local Log Files

Logs are stored in the out/ directory:

out/
  <dataflow-uuid>/
    log_<node-id>.jsonl          # current log
    log_<node-id>.1.jsonl        # rotated (previous)
    log_<node-id>.2.jsonl        # rotated (older)

Read directly:

# All nodes, local files
adora logs --local --all-nodes

# Specific node, last 50 lines
adora logs --local sensor-node --tail 50

Filtering and Searching

FlagExampleDescription
--level <LEVEL>--level debugMinimum level: error, warn, info, debug, trace, stdout
--log-filter <FILTER>--log-filter "sensor=debug,processor=warn"Per-node level filter
--grep <PATTERN>--grep "timeout"Case-insensitive substring match
--since <DURATION>--since 5mOnly logs newer than this
--until <DURATION>--until 1hOnly logs older than this
--tail <N>--tail 100Show last N lines
--log-format <FMT>--log-format jsonOutput format: pretty (default) or json

Environment variables:

  • ADORA_LOG_LEVEL – default log level
  • ADORA_LOG_FORMAT – default log format
  • ADORA_LOG_FILTER – default per-node filter

Dataflow Visualization

Generate a visual graph of your dataflow:

# Generate HTML and open in browser
adora graph dataflow.yml --open

# Generate Mermaid diagram text
adora graph dataflow.yml --mermaid

The Mermaid output can be pasted into mermaid.live or used in GitHub markdown:

```mermaid
graph TD
    sensor --> processor
    processor --> controller
```

The HTML mode generates a self-contained file with an interactive mermaid.js diagram.


Monitoring Running Dataflows

# Full environment diagnosis
adora doctor

# List all dataflows (active and completed)
adora list

# List nodes in a specific dataflow
adora node list -d my-dataflow

# Get detailed info on a specific node
adora node info -d my-dataflow camera

# Check coordinator/daemon status
adora status

# View/modify runtime parameters
adora param list -d my-dataflow detector
adora param set -d my-dataflow detector threshold 0.5

adora list shows each dataflow’s UUID, name, status, and node count. Use -d <name> with other commands to target a specific dataflow.


End-to-End Debugging Workflows

Workflow 1: Node Not Producing Output

# 1. Verify the node is running
adora list
adora top

# 2. Check its logs
adora logs my-dataflow problem-node --follow --level trace

# 3. Check if upstream nodes are publishing
adora topic echo -d my-dataflow upstream-node/output

# 4. Verify topic wiring
adora topic list -d my-dataflow
adora graph dataflow.yml --open

Workflow 2: Unexpected Data or Wrong Values

# 1. Echo the topic to see raw data
adora topic echo -d my-dataflow node/output --format json

# 2. Record for offline analysis
adora record dataflow.yml -o debug.adorec

# 3. Replay with known input to isolate the issue
adora replay debug.adorec --replace sensor --speed 0

Workflow 3: Performance Issues

# 1. Check CPU/memory per node
adora top

# 2. Measure publish frequencies
adora topic hz -d my-dataflow --window 10

# 3. Get bandwidth stats for suspected bottleneck
adora topic info -d my-dataflow heavy-node/output --duration 10

# 4. Record and replay at max speed to find throughput limits
adora record dataflow.yml -o perf.adorec
adora replay perf.adorec --speed 0

Workflow 4: Reproducing a Field Issue

# On the robot / target machine:
adora start dataflow.yml --detach
adora record dataflow.yml --proxy -o field-capture.adorec

# Transfer the .adorec file to your workstation, then:
adora replay field-capture.adorec
adora replay field-capture.adorec --speed 0.5  # slow motion
adora replay field-capture.adorec --loop        # continuous replay

Workflow 5: Remote Debugging (No Direct Access)

When you only have WebSocket connectivity to the coordinator:

# All these commands work over WebSocket -- no Zenoh needed
adora list
adora top
adora logs my-dataflow --all-nodes --follow
adora topic echo -d my-dataflow node/output
adora topic hz -d my-dataflow
adora record dataflow.yml --proxy -o remote-capture.adorec

See Also

Fault Tolerance

Adora provides built-in fault tolerance for robotic and AI dataflows. Nodes can automatically restart on failure, detect stale upstream connections, gracefully degrade when inputs are unavailable, and the coordinator can persist state to disk so it survives crashes and restarts.

Features at a Glance

FeatureScopeConfig
Restart policiesPer-noderestart_policy, max_restarts, restart_delay, …
Health monitoringPer-nodehealth_check_timeout, health_check_interval (dataflow-level)
Input timeoutsPer-inputinput_timeout
Circuit breakerAutomaticTriggered by input_timeout, auto-recovers
NodeRestarted eventDownstream nodesAutomatic when upstream restarts
InputTracker APIRust nodesadora_node_api::InputTracker
ObservabilityDaemon-wideAtomic counters logged periodically
Distributed healthMulti-daemonCoordinator heartbeat monitoring
Coordinator state persistenceCoordinator--store redb (requires redb-backend feature)

Restart Policies

Control what happens when a node exits or crashes.

Configuration

nodes:
  - id: my-node
    path: ./target/debug/my-node
    restart_policy: on-failure  # never | on-failure | always
    max_restarts: 5             # 0 = unlimited (default: 0)
    restart_delay: 1.0          # initial delay in seconds
    max_restart_delay: 30.0     # cap for exponential backoff
    restart_window: 300.0       # reset counter after this many seconds

Policy Types

never (default) – Node is not restarted. Failure propagates normally.

on-failure – Restart only when the node exits with a non-zero exit code. Clean exits (code 0) are not restarted.

always – Restart on any exit, except:

  • The dataflow was stopped by the user (adora stop or Ctrl-C)
  • All inputs were closed and the node exited with a non-zero code

How Restarts Work Internally

When a node process exits, the daemon evaluates the restart decision in this order:

  1. Policy check: Does the restart policy allow it?
    • Never -> no restart
    • OnFailure -> restart only if exit code != 0
    • Always -> restart
  2. Disable check: Has disable_restart been set? (set when all inputs close or during manual stop via stop_all)
  3. Window check: If restart_window is set and the window has elapsed since the first restart, reset the counter to 0
  4. Limit check: If max_restarts > 0 and the window counter exceeds it, give up permanently
  5. Backoff: If restart_delay is set, sleep for the computed delay (re-checking disable_restart after waking)
  6. Respawn: The node process is spawned fresh with the same configuration

The daemon tracks restart state per node instance in the spawn/prepared.rs lifecycle loop. Each node runs in its own tokio task, so restarts don’t block other nodes.

Backoff

When restart_delay is set, the daemon waits before restarting. The delay doubles on each attempt (exponential backoff) and is capped by max_restart_delay.

The backoff exponent is capped at 16 internally to prevent overflow (2^16 = 65536x multiplier).

Example with restart_delay: 1.0 and max_restart_delay: 10.0:

Attempt 1: wait 1s    (1.0 * 2^0)
Attempt 2: wait 2s    (1.0 * 2^1)
Attempt 3: wait 4s    (1.0 * 2^2)
Attempt 4: wait 8s    (1.0 * 2^3)
Attempt 5: wait 10s   (capped at max_restart_delay)
Attempt 6: wait 10s   (capped)

During the backoff sleep, the daemon continuously monitors the disable_restart flag. If all inputs close while the node is waiting to restart, the restart is cancelled with the log message: “restart cancelled: inputs closed during backoff wait”.

Restart Window

When restart_window is set, the restart counter resets after the window elapses (measured from the first restart in the current window). This enables “N restarts per M seconds” semantics.

Example: max_restarts: 5, restart_window: 300.0 means “at most 5 restarts per 5 minutes”. If the window elapses without hitting the limit, the counter resets and the node gets another 5 attempts.

Restart Disable During Shutdown

When the daemon stops a dataflow (via stop_all), it calls disable_restart() on every node before sending Stop events. This prevents the restart mechanism from fighting the shutdown process. The disable_restart flag is an Arc<AtomicBool> shared between the daemon event loop and the node’s spawn lifecycle task.

NodeRestarted Event

When a node restarts, the daemon sends a NodeRestarted event to all downstream nodes that consume its outputs. This allows downstream nodes to:

  • Reset internal state or caches
  • Log the upstream recovery
  • Re-initialize connections or sessions

The event carries the NodeId of the restarting node. Downstream nodes receive it automatically via the event stream:

#![allow(unused)]
fn main() {
match event {
    Event::NodeRestarted { id } => {
        println!("upstream node {id} restarted, resetting state");
        // Clear any cached state from the old node instance
    }
    _ => {}
}
}

The daemon finds downstream nodes via dataflow.mappings, which maps each node’s outputs to all subscribing (receiver_node, input_id) pairs. Each unique receiver gets one NodeRestarted event per restart.


Health Monitoring

Passive monitoring detects hung nodes that stop communicating with the daemon.

health_check_interval: 2.0  # seconds (default: 5.0, dataflow-level)
nodes:
  - id: my-node
    path: ./target/debug/my-node
    health_check_timeout: 30.0  # seconds (per-node)
    restart_policy: on-failure

Configurable Health Check Interval

The health_check_interval is a dataflow-level setting that controls how often the daemon checks node health. Default is 5.0 seconds. Lower values detect hung nodes faster but add more overhead. Set this at the top level of your dataflow YAML, not per-node.

How It Works Internally

The daemon runs a health check sweep at the configured health_check_interval (via a tokio interval stream emitting Event::NodeHealthCheckInterval).

Each RunningNode has a last_activity: Arc<AtomicU64> field storing the timestamp (milliseconds since epoch) of the last communication. This is updated atomically by the node’s communication handler (node_communication/mod.rs) every time the node sends any request to the daemon (event subscriptions, output sends, etc.).

The health check function (check_node_health) iterates all running nodes:

  1. Skip nodes without health_check_timeout set
  2. Skip nodes with last_activity == 0 (not yet connected)
  3. Compute elapsed_ms = now - last_activity
  4. If elapsed_ms > timeout_ms, log a warning and kill the node process

After killing, the normal exit handling runs, which evaluates the restart policy. This means health_check_timeout combined with restart_policy: on-failure automatically recovers hung nodes.

What Counts as “Activity”

Any message from the node to the daemon counts:

  • Event subscription requests
  • Output data sends (via shared memory or TCP)
  • Timer tick acknowledgments

Normal input data received from other nodes does not reset the timer – the node must actively communicate with the daemon.


Input Timeouts and Circuit Breaker

Per-input timeouts detect when an upstream node stops producing data.

Configuration

nodes:
  - id: downstream-node
    path: ./target/debug/downstream
    inputs:
      sensor_data:
        source: camera-node/frames
        input_timeout: 5.0  # seconds

The input_timeout is set per input, not per node. Different inputs can have different timeouts.

How It Works Internally

The daemon maintains an InputDeadline for each input with a timeout:

struct InputDeadline {
    timeout: Duration,        // configured timeout
    last_received: Instant,   // last time data arrived
}

These are stored in RunningDataflow.input_deadlines keyed by (NodeId, DataId).

Timeout detection runs during the same 5-second health check interval. The check_input_timeouts function:

  1. Scans all input_deadlines entries
  2. If last_received.elapsed() > timeout, the input is “broken”
  3. The (node_id, input_id) pair is moved from input_deadlines to broken_inputs
  4. The daemon calls break_input() which sends InputClosed { id } to the downstream node
  5. If all of a node’s inputs are now closed (and none are broken/recoverable), AllInputsClosed is sent and the node’s restart is disabled

Deadline reset: Every time data arrives on an input, its last_received is reset to Instant::now().

Circuit Breaker: Auto-Recovery

The circuit breaker tracks broken inputs in RunningDataflow.broken_inputs. When new data arrives on a broken input:

  1. The data is delivered to the node normally
  2. The broken_inputs entry is removed
  3. The input is re-added to open_inputs
  4. A new InputDeadline is created (restarting the timeout)
  5. An InputRecovered { id } event is sent to the node
  6. The circuit_breaker_recoveries counter is incremented

This means recovery is fully automatic. If the upstream node restarts (via restart policy) and begins producing data again, downstream nodes seamlessly resume receiving it.

Node-Side Handling

In Rust nodes, handle these events in your event loop:

#![allow(unused)]
fn main() {
use adora_node_api::{AdoraNode, Event};

let (mut node, mut events) = AdoraNode::init_from_env()?;
while let Some(event) = events.recv() {
    match event {
        Event::Input { id, data, .. } => {
            // Normal processing
        }
        Event::InputClosed { id } => {
            // Upstream stopped producing on this input.
            // You can: use cached data, skip processing, alert operator, etc.
        }
        Event::InputRecovered { id } => {
            // Upstream is back online for this input.
            // Resume normal processing.
        }
        Event::Stop(_) => break,
        _ => {}
    }
}
}

InputTracker API (Rust)

The InputTracker helper tracks input health and caches the last received value per input, making graceful degradation easy.

#![allow(unused)]
fn main() {
use adora_node_api::{AdoraNode, Event, InputTracker, InputState};

let (mut node, mut events) = AdoraNode::init_from_env()?;
let mut tracker = InputTracker::new();

while let Some(event) = events.recv() {
    tracker.process_event(&event);

    match event {
        Event::Input { id, data, .. } => {
            // Fresh data available
        }
        Event::InputClosed { id } => {
            // Input timed out -- fall back to cached data
            if let Some(stale_data) = tracker.last_value(&id) {
                // Use stale_data as fallback
            }
        }
        Event::Stop(_) => break,
        _ => {}
    }

    // Check overall health
    if tracker.any_closed() {
        let closed: Vec<_> = tracker.closed_inputs();
        // Log or adjust behavior
    }
}
}

Internal Design

InputTracker maintains two HashMaps:

  • states: HashMap<DataId, InputState> – current state per input (Healthy or Closed)
  • cache: HashMap<DataId, ArrowData> – last received value per input

On Event::Input, both maps are updated (state = Healthy, cache = data clone). On Event::InputClosed, only state changes (cache is preserved). On Event::InputRecovered, state is set back to Healthy. The cache is never cleared, so last_value() always returns the most recent data even after the input closes.

Note: ArrowData wraps Arc<dyn arrow::array::Array>, so the cache clone is reference-counted (cheap).

API Reference

MethodReturnsDescription
new()InputTrackerCreate empty tracker
process_event(&Event)boolUpdate state. Returns true if event was relevant
state(&DataId)Option<InputState>Current state (Healthy or Closed)
is_closed(&DataId)boolCheck if input is closed
last_value(&DataId)Option<&ArrowData>Last received value (available even when closed)
closed_inputs()Vec<&DataId>All currently closed inputs
any_closed()boolTrue if any tracked input is closed

Observability

The daemon tracks fault tolerance events with atomic counters (FaultToleranceStats) and logs a summary every 5 seconds during the health check interval.

Counters

CounterTypeIncremented when
restartsAtomicU64A node restart is initiated (in spawn lifecycle)
health_check_killsAtomicU64A node is killed by the health check (unresponsive)
input_timeoutsAtomicU64An input timeout fires (circuit breaker trips)
circuit_breaker_recoveriesAtomicU64Data arrives on a broken input (auto-recovery)

All counters use Ordering::Relaxed since they are informational and don’t need strict ordering guarantees.

Log Output

When any counter is non-zero, the daemon emits a structured log line:

INFO fault tolerance stats restarts=3 health_kills=0 input_timeouts=1 cb_recoveries=1

These counters are cumulative for the lifetime of the daemon process. They are not reset between dataflows.


Distributed Health

In multi-daemon deployments, the coordinator monitors daemon heartbeats.

Protocol

  • Heartbeat interval: 3 seconds (coordinator sends heartbeat to each daemon)
  • Disconnect threshold: 30 seconds without a response
  • Detection: On each heartbeat sweep, the coordinator removes daemons that haven’t responded within the threshold
  • Notification: The coordinator broadcasts PeerDaemonDisconnected { daemon_id } to all remaining daemons

DaemonInfo

The ConnectedMachines CLI query returns Vec<DaemonInfo>:

#![allow(unused)]
fn main() {
pub struct DaemonInfo {
    pub daemon_id: DaemonId,
    pub last_heartbeat_ago_ms: u64,  // milliseconds since last heartbeat
}
}

This allows monitoring tools to detect daemons that are alive but slow to respond.

Daemon-Side Handling

When a daemon receives PeerDaemonDisconnected, it logs a structured warning:

WARN peer daemon disconnected daemon_id=machine-B

Currently this is informational. Future work may include automatic migration of nodes from the disconnected daemon.


Coordinator State Persistence

By default the coordinator holds all state in memory. If the coordinator process crashes or is restarted, all knowledge of running dataflows is lost – daemons continue running but become orphaned, and users must manually re-run dataflows.

The redb store backend solves this by persisting coordinator state to a single file on disk using redb, a pure-Rust embedded key-value store with copy-on-write B-trees that are crash-safe by design.

Design: Stateless Coordinator with Stateful Backend

The coordinator itself remains stateless in the K8s sense – it can be stopped and restarted at any time. All durable state lives in the store backend behind the CoordinatorStore trait:

Coordinator (stateless process)
    |
    v
CoordinatorStore trait
    |
    +-- InMemoryStore (default, no persistence)
    +-- RedbStore     (persists to ~/.adora/coordinator.redb)

This separation means:

  • The coordinator event loop never reads from the filesystem during normal operation (only at startup recovery)
  • All state mutations are written to the store at well-defined persistence points
  • The store can be swapped without changing coordinator logic

Enabling Persistence

# Use default path (~/.adora/coordinator.redb)
adora coordinator --store redb

# Use custom path
adora coordinator --store redb:/path/to/coordinator.redb

# Default: in-memory only (no persistence)
adora coordinator --store memory

The redb backend requires the redb-backend Cargo feature, which is enabled in the default CLI build.

What Is Persisted

The store tracks three record types:

RecordKeyPersisted Fields
DataflowRecordUUID (16 bytes)uuid, name, descriptor (JSON), status, daemon IDs, generation counter, created/updated timestamps
BuildRecordUUID (16 bytes)build ID, status, errors, created/updated timestamps
DaemonInfoDaemonId (bincode)daemon ID, machine ID

Records are serialized with bincode for compact, fast encoding.

Dataflow Status Lifecycle

The coordinator persists dataflow status at every state transition:

Start command     -->  Pending
All daemons ready -->  Running
Stop command      -->  Stopping
All nodes finish  -->  Succeeded  or  Failed { error }
Spawn failure     -->  Failed { error: "spawn failed: ..." }

Each persist call increments the record’s generation counter, providing a monotonic version for conflict detection.

Persistence Points

The coordinator writes to the store at these moments in the event loop:

  1. Dataflow started (ControlRequest::Start) – record created with status Pending
  2. Dataflow spawned (DataflowSpawnResult success from all daemons) – updated to Running
  3. Spawn failed (DataflowSpawnResult error) – updated to Failed with the actual error message
  4. Stop requested (ControlRequest::Stop or StopByName) – updated to Stopping
  5. All nodes finished (DataflowFinishedOnDaemon) – updated to Succeeded or Failed with per-node error details
  6. Graceful shutdown (Ctrl-C or Destroy command) – all running dataflows marked Stopping before stop messages are sent

If a store write fails, the coordinator logs a warning and continues operating with in-memory state. This prevents a store failure from blocking the dataflow lifecycle.

Startup Recovery

When the coordinator starts with a redb store that contains data from a previous run, it performs recovery:

  1. Read all persisted dataflow records via store.list_dataflows()
  2. For any record with a non-terminal status (Pending, Running, Stopping):
    • Mark it as Failed { error: "coordinator restarted" }
    • Increment the generation counter
    • Write the updated record back to the store
  3. Terminal records (Succeeded, Failed) are left unchanged

This ensures that stale dataflows from a crashed coordinator are not confused with actively running ones. The daemons that were running those dataflows will detect the coordinator disconnect independently.

Error Detail Preservation

When a dataflow fails, the Failed status includes the actual per-node error messages rather than a generic string:

Failed { error: "node-1: exited with code 137; node-2: failed to spawn node: binary not found" }

Errors are collected from DataflowDaemonResult.node_results across all daemons, formatted as node_id: error_message, and joined with ; .

Schema Versioning

The redb database includes a meta table with a schema_version key. On open:

  • If no version exists (fresh database), the current version is written
  • If the stored version matches the binary’s version, the database opens normally
  • If there is a mismatch, the database is rejected with an error

This prevents silent data corruption when the serialization format of stored records changes between Adora versions. The current schema version is 1.

File Security

On Unix systems:

  • The database file is set to 0600 (owner read/write only) after creation
  • The default directory (~/.adora/) is set to 0700 (owner only)
  • Custom paths provided via redb:/path are validated to reject .. components

Internal Architecture

#![allow(unused)]
fn main() {
// Store trait (libraries/coordinator-store/src/lib.rs)
pub trait CoordinatorStore: Send + Sync {
    fn put_dataflow(&self, record: &DataflowRecord) -> Result<()>;
    fn get_dataflow(&self, uuid: &Uuid) -> Result<Option<DataflowRecord>>;
    fn list_dataflows(&self) -> Result<Vec<DataflowRecord>>;
    fn delete_dataflow(&self, uuid: &Uuid) -> Result<()>;
    // ... daemon and build methods
}
}

The RedbStore implementation uses three redb tables (daemons, dataflows, builds) with UUID-based binary keys and bincode-serialized values. All operations are synchronous (redb is a synchronous library); the coordinator calls them directly from the async event loop since they are fast in-process operations.

A bincode deserialization limit of 64 MiB guards against corrupted data that could encode huge allocation sizes in length prefixes.


Complete YAML Reference

# Dataflow-level settings
health_check_interval: 2.0    # health check sweep interval (default: 5.0s)

nodes:
  - id: sensor-node
    path: ./target/debug/sensor
    inputs:
      tick: adora/timer/millis/100
    outputs:
      - frames

  - id: processor
    path: ./target/debug/processor

    # Restart policy
    restart_policy: on-failure    # never | on-failure | always
    max_restarts: 5               # 0 = unlimited
    restart_delay: 1.0            # initial backoff delay (seconds)
    max_restart_delay: 30.0       # max backoff cap (seconds)
    restart_window: 300.0         # reset counter after N seconds

    # Health monitoring
    health_check_timeout: 30.0    # kill if no activity for N seconds

    inputs:
      frames:
        source: sensor-node/frames
        input_timeout: 5.0        # circuit breaker timeout (seconds)
        queue_size: 10            # input buffer size (default: 10)
    outputs:
      - result

Use Case Scenarios

1. Camera Pipeline with Intermittent Hardware Failures

A camera driver node occasionally crashes due to USB disconnects. The processing pipeline should survive these outages and resume when the camera reconnects.

nodes:
  - id: camera-driver
    path: ./target/debug/camera-driver
    restart_policy: on-failure
    max_restarts: 0               # unlimited -- hardware failures are expected
    restart_delay: 2.0            # wait for USB to re-enumerate
    max_restart_delay: 30.0
    inputs:
      tick: adora/timer/millis/33  # ~30 FPS
    outputs:
      - frames

  - id: object-detector
    path: ./target/debug/detector
    inputs:
      frames:
        source: camera-driver/frames
        input_timeout: 5.0        # tolerate 5s camera outage
    outputs:
      - detections

  - id: planner
    path: ./target/debug/planner
    inputs:
      detections:
        source: object-detector/detections
        input_timeout: 10.0       # longer tolerance -- can plan with stale data
      lidar:
        source: lidar-driver/points
        input_timeout: 3.0

What happens when the camera crashes:

  1. camera-driver exits with non-zero code
  2. Daemon evaluates on-failure policy -> restart after 2s backoff
  3. During the outage, object-detector receives InputClosed { id: "frames" } after 5s
  4. planner receives InputClosed { id: "detections" } after 10s
  5. Camera restarts, begins producing frames
  6. object-detector receives new frame data + InputRecovered { id: "frames" } (circuit breaker recovers)
  7. planner receives detections + InputRecovered { id: "detections" }

Node-side handling in the planner:

#![allow(unused)]
fn main() {
use adora_node_api::{AdoraNode, Event, InputTracker};

let (mut node, mut events) = AdoraNode::init_from_env()?;
let mut tracker = InputTracker::new();

while let Some(event) = events.recv() {
    tracker.process_event(&event);

    match event {
        Event::Input { id, data, .. } => match id.as_ref() {
            "detections" => plan_with_detections(&data),
            "lidar" => update_lidar_map(&data),
            _ => {}
        },
        Event::InputClosed { id } => match id.as_ref() {
            "detections" => {
                // Camera pipeline down -- plan with lidar only
                plan_lidar_only();
            }
            "lidar" => {
                // LiDAR down -- use last known detection data
                if let Some(stale) = tracker.last_value(&"detections".into()) {
                    plan_with_stale_detections(stale);
                }
            }
            _ => {}
        },
        Event::Stop(_) => break,
        _ => {}
    }
}
}

2. ML Inference Node with OOM Crashes

An ML inference node occasionally runs out of memory on large inputs. It should restart quickly but give up after repeated failures (indicating a systemic issue).

nodes:
  - id: ml-inference
    path: ./target/debug/ml-inference
    restart_policy: on-failure
    max_restarts: 3
    restart_delay: 0.5
    restart_window: 60.0          # 3 restarts per minute
    health_check_timeout: 60.0    # ML inference can be slow
    inputs:
      images:
        source: preprocessor/images
    outputs:
      - predictions

Behavior:

  • Node crashes from OOM -> restarts after 0.5s
  • Crashes again on another large input -> restarts after 1.0s
  • Crashes a third time -> restarts after 2.0s
  • Crashes a fourth time within 60s -> max_restarts exceeded, node fails permanently
  • If the node runs stably for 60s after the first crash, the restart window resets and it gets 3 more chances

3. Multi-Sensor Fusion with Graceful Degradation

A robot fuses data from multiple sensors. Individual sensors may fail, but the system should continue operating with reduced capability.

nodes:
  - id: sensor-fusion
    path: ./target/debug/sensor-fusion
    inputs:
      camera:
        source: camera-node/frames
        input_timeout: 3.0
      lidar:
        source: lidar-node/points
        input_timeout: 3.0
      imu:
        source: imu-node/readings
        input_timeout: 1.0        # IMU is critical, short timeout
      gps:
        source: gps-node/fix
        input_timeout: 10.0       # GPS can be intermittent
    outputs:
      - fused-state

Node-side with InputTracker:

#![allow(unused)]
fn main() {
use adora_node_api::{AdoraNode, Event, InputTracker};

let (mut node, mut events) = AdoraNode::init_from_env()?;
let mut tracker = InputTracker::new();

while let Some(event) = events.recv() {
    tracker.process_event(&event);

    match event {
        Event::Input { id, data, .. } => {
            // Process fresh data from any sensor
            update_sensor(&id, &data);
            compute_and_send_fusion(&mut node, &tracker);
        }
        Event::InputClosed { id } => {
            // Sensor went offline -- adjust fusion weights
            eprintln!("sensor {id} offline, degrading");
            compute_and_send_fusion(&mut node, &tracker);
        }
        Event::InputRecovered { id } => {
            // Sensor back online
            eprintln!("sensor {id} recovered");
        }
        Event::Stop(_) => break,
        _ => {}
    }
}

fn compute_and_send_fusion(node: &mut AdoraNode, tracker: &InputTracker) {
    // Use fresh data where available, stale cache for degraded sensors
    let camera = tracker.last_value(&"camera".into());
    let lidar = tracker.last_value(&"lidar".into());
    let imu = tracker.last_value(&"imu".into());

    if tracker.is_closed(&"imu".into()) {
        // IMU is critical -- switch to emergency mode
        emergency_stop(node);
        return;
    }

    // Fuse available sensors, weighting active ones higher
    let closed = tracker.closed_inputs();
    let active_count = 4 - closed.len();
    // ... fusion logic using active_count for confidence weighting
}
}

4. Long-Running Data Processing Pipeline

A batch processing pipeline runs continuously. The processing node occasionally hangs due to a third-party library bug. Health monitoring detects and recovers from these hangs.

nodes:
  - id: data-ingest
    path: ./target/debug/ingest
    restart_policy: always        # always restart -- this is a long-running service
    max_restarts: 0               # unlimited
    restart_delay: 1.0
    inputs:
      tick: adora/timer/millis/1000
    outputs:
      - records

  - id: processor
    path: ./target/debug/processor
    restart_policy: on-failure
    max_restarts: 10
    restart_delay: 0.5
    restart_window: 600.0         # 10 restarts per 10 minutes
    health_check_timeout: 30.0    # kill if hung for 30s
    inputs:
      records: data-ingest/records
    outputs:
      - results

  - id: writer
    path: ./target/debug/writer
    restart_policy: on-failure
    max_restarts: 5
    restart_delay: 2.0            # give DB time to recover
    max_restart_delay: 60.0
    inputs:
      results:
        source: processor/results
        input_timeout: 60.0       # processor may be slow

What happens when the processor hangs:

  1. Processor stops communicating with daemon
  2. After 30s, health check detects the hang and kills the process
  3. health_check_kills counter increments
  4. Daemon evaluates on-failure -> restart after 0.5s
  5. New processor instance starts, resumes consuming from data-ingest
  6. writer may have received InputClosed during the 60s timeout – or may not if the restart was fast enough
  7. If writer did receive InputClosed, it gets InputRecovered when new results arrive

5. Distributed Deployment with Daemon Failure Detection

A multi-machine deployment where the coordinator monitors daemon health.

Machine A (coordinator + daemon):  camera-driver, preprocessor
Machine B (daemon):                ml-inference, postprocessor
Machine C (daemon):                planner, actuator-driver

What happens when Machine B loses network:

  1. Coordinator’s heartbeat to Machine B fails
  2. After 30s without response, coordinator removes Machine B from active daemons
  3. Coordinator broadcasts PeerDaemonDisconnected { daemon_id: "machine-B" } to Machine A and Machine C
  4. Daemons on A and C log: WARN peer daemon disconnected daemon_id=machine-B
  5. Nodes on A and C with inputs from Machine B’s nodes receive InputClosed events (via their input timeouts)
  6. CLI queries to ConnectedMachines show only A and C with their last_heartbeat_ago_ms

6. Coordinator Crash Recovery with redb Persistence

A long-running multi-daemon deployment where the coordinator must survive restarts without losing track of dataflow history.

# Start coordinator with persistent store
adora coordinator --store redb

# In another terminal, start a dataflow
adora start examples/rust-dataflow/dataflow.yml --name my-pipeline --detach

# Coordinator crashes or is killed (e.g., OOM, hardware failure)
# ... time passes ...

# Restart coordinator with the same store
adora coordinator --store redb

What happens on restart:

  1. Coordinator opens ~/.adora/coordinator.redb and reads persisted dataflow records
  2. Finds my-pipeline with status Running
  3. Marks it as Failed { error: "coordinator restarted" }, increments generation
  4. Logs: INFO recovering stale dataflow <uuid> ("my-pipeline") -> marking as Failed
  5. adora list now shows my-pipeline with its final status and timestamps
  6. Daemons detect the coordinator disconnect independently and stop their nodes
  7. User can start a fresh dataflow – the coordinator is fully operational

The key benefit: the coordinator retains a complete history of dataflow lifecycle events across restarts. Without --store redb, all state would be lost and the operator would have no record of what was running before the crash.

7. Periodic Batch Job with Always-Restart

A node that processes batches and exits when done. It should restart to process the next batch.

nodes:
  - id: batch-processor
    path: ./target/debug/batch-proc
    restart_policy: always        # restart even on clean exit
    max_restarts: 0               # unlimited
    restart_delay: 10.0           # wait 10s between batches
    max_restart_delay: 10.0       # no exponential growth
    inputs:
      trigger: adora/timer/millis/1  # immediate first trigger
    outputs:
      - batch-result

The node processes one batch, exits with code 0, waits 10s, then restarts to process the next. The always policy ensures restarts even on success. Setting restart_delay == max_restart_delay gives a constant delay.


Best Practices

Start with on-failure. Use always only for nodes that are expected to exit and restart (e.g., periodic batch jobs).

Set max_restarts. Unlimited restarts can mask bugs. Start with 3-5 and increase if needed. Use max_restarts: 0 only for nodes where crashes are expected and unavoidable (hardware drivers, external API clients).

Use restart_window. Prevents permanent restart loops. A window of 60-300 seconds is typical. Without a window, a node that crashes at startup will exhaust its restart budget immediately.

Tune restart_delay. Start with 0.5-1.0 seconds. Too short causes thrashing; too long delays recovery. Match the delay to your node’s typical startup time and the root cause of failures:

  • USB/hardware reconnection: 2-5s
  • Network service reconnection: 1-3s
  • OOM/transient bugs: 0.5-1.0s

Set health_check_timeout generously. Should be at least 2-3x your node’s longest expected processing time. ML inference nodes may need 60s+. If too short, healthy nodes get killed during normal processing.

Set input_timeout per input. Not all inputs need the same timeout. Use shorter timeouts for high-frequency inputs (IMU, camera) and longer timeouts for slow/bursty sources (GPS, batch results). A good starting point is 3-5x the expected publish interval.

Use InputTracker for critical paths. When a node must keep running even with degraded inputs, use InputTracker to fall back to cached data. This is essential for sensor fusion, planning, and control nodes.

Use --store redb for production deployments. The redb backend ensures the coordinator retains dataflow history across crashes and restarts. The in-memory default is fine for development but loses all state on exit. The redb file is small (proportional to the number of dataflow records) and adds negligible overhead.

Combine features for defense in depth:

  • restart_policy + restart_delay -> recover from node crashes
  • health_check_timeout -> recover from hung nodes
  • input_timeout -> detect stale upstream data
  • InputTracker -> graceful degradation in node code
  • --store redb -> survive coordinator crashes

Distributed Deployment Guide

Adora supports deploying dataflows across multiple machines for multi-robot fleets, edge AI pipelines, and distributed robotics systems. This guide covers cluster management, node scheduling, binary distribution, auto-recovery, and operational best practices.

Table of Contents


Overview

Adora’s distributed architecture has three tiers:

CLI  -->  Coordinator  -->  Daemon(s)  -->  Nodes / Operators
              (one)          (per machine)     (user code)
  • CLI sends control commands (build, start, stop) to the coordinator.
  • Coordinator orchestrates daemons, resolves node placement, and manages dataflow lifecycle.
  • Daemons run on each machine, spawning and supervising node processes.
  • Nodes communicate via shared memory (same machine) or Zenoh pub-sub (cross-machine).

There are two paths to distributed deployment:

Ad-hoc – manually start adora daemon on each machine, then use the coordinator for control. Good for development and testing. See Distributed Deployments in the CLI reference.

Managed (cluster.yml) – define your cluster topology in a YAML file, then use adora cluster commands for SSH-based lifecycle management. This guide focuses on the managed path.


Quick Start

  1. Create a cluster.yml:
coordinator:
  addr: 10.0.0.1
machines:
  - id: robot
    host: 10.0.0.2
    user: ubuntu
  - id: gpu-server
    host: 10.0.0.3
    user: ubuntu
  1. Bring up the cluster:
adora cluster up cluster.yml
  1. Start a dataflow:
adora start dataflow.yml --name my-app --attach
  1. Check cluster health:
adora cluster status
  1. Tear down:
adora cluster down

Features at a Glance

FeatureCommand / ConfigDescription
Cluster lifecycleadora cluster up/status/downSSH-based daemon management from a single machine
Label scheduling_unstable_deploy.labelsRoute nodes to daemons by key-value labels
Binary distribution_unstable_deploy.distributelocal, scp, or http strategies
systemd servicesadora cluster install/uninstallPersistent daemon services that survive reboots
Auto-recoveryAutomaticRe-spawn nodes when a daemon reconnects
Rolling upgradeadora cluster upgradeSCP binary + restart per-machine sequentially
Dataflow restartadora cluster restartRestart a running dataflow by name or UUID

Cluster Configuration Reference

A cluster.yml file defines the coordinator address and the set of machines in the cluster.

Full Schema

coordinator:
  addr: 10.0.0.1            # IP address the coordinator binds to (required)
  port: 6013                 # WebSocket port (default: 6013)

machines:
  - id: edge-01              # Unique machine identifier (required)
    host: 10.0.0.2           # SSH-reachable hostname or IP (required)
    user: ubuntu              # SSH user (optional, defaults to current user)
    labels:                   # Key-value labels for scheduling (optional)
      gpu: "true"
      arch: arm64

  - id: edge-02
    host: 10.0.0.3
    labels:
      arch: arm64

Fields

coordinator

FieldTypeDefaultDescription
addrIP address(required)Address the coordinator binds to
portu166013WebSocket port

machines[]

FieldTypeDefaultDescription
idstring(required)Unique machine identifier, used in _unstable_deploy.machine
hoststring(required)SSH-reachable hostname or IP address
userstringcurrent userSSH username
labelsmapemptyKey-value pairs for label-based scheduling

Validation Rules

  • At least one machine must be defined.
  • Machine IDs must be non-empty and unique.
  • Machine hosts must be non-empty.
  • Unknown fields are rejected (deny_unknown_fields).

Example: 3-Machine GPU Cluster

coordinator:
  addr: 192.168.1.1

machines:
  - id: coordinator-host
    host: 192.168.1.1
    labels:
      role: control

  - id: gpu-a100
    host: 192.168.1.10
    user: ml
    labels:
      gpu: a100
      arch: x86_64

  - id: jetson-01
    host: 192.168.1.20
    user: nvidia
    labels:
      gpu: jetson
      arch: arm64

Cluster Commands Reference

All adora cluster commands operate on a cluster.yml file and use SSH to manage remote machines.

SSH options used: BatchMode=yes, ConnectTimeout=10, StrictHostKeyChecking=accept-new.

adora cluster up

Bring up a multi-machine cluster from a cluster.yml file. Starts the coordinator locally, then SSH-es into each machine to start a daemon.

adora cluster up <PATH>

Arguments:

ArgumentDescription
PATHPath to the cluster configuration file

Behavior:

  1. Loads and validates the cluster config.
  2. Starts the coordinator locally on addr:port.
  3. For each machine, SSH-es in and runs nohup adora daemon --machine-id <id> --coordinator-addr <addr> --coordinator-port <port> [--labels k1=v1,k2=v2] --quiet.
  4. Polls until all expected daemons register with the coordinator (30s timeout).

Example:

$ adora cluster up cluster.yml
Starting coordinator on 10.0.0.1:6013...
Starting daemon on robot (ubuntu@10.0.0.2)... OK
Starting daemon on gpu-server (ubuntu@10.0.0.3)... OK
All 2 daemons connected.

adora cluster status

Show the current status of the cluster. Displays connected daemons and active dataflow count.

adora cluster status [--coordinator-addr ADDR] [--coordinator-port PORT]

Flags:

FlagDefaultDescription
--coordinator-addrlocalhostCoordinator hostname or IP
--coordinator-port6013Coordinator WebSocket port

Example:

$ adora cluster status
DAEMON ID      LAST HEARTBEAT
robot          2s ago
gpu-server     1s ago

Active dataflows: 1

adora cluster down

Tear down the cluster (coordinator and all daemons).

adora cluster down [--coordinator-addr ADDR] [--coordinator-port PORT]

Terminates all daemons and the coordinator process.

adora cluster install

Install adora-daemon as a systemd service on each machine. SSH-es into each machine, writes a systemd unit file, and enables the service.

adora cluster install <PATH>

Arguments:

ArgumentDescription
PATHPath to the cluster configuration file

Behavior:

For each machine, creates and enables a systemd service named adora-daemon-<id>. The unit file:

[Unit]
Description=Adora Daemon (<id>)
After=network-online.target
Wants=network-online.target

[Service]
ExecStart=adora daemon --machine-id <id> --coordinator-addr <addr> --coordinator-port <port> --labels k1=v1,k2=v2 --quiet
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target

Example:

$ adora cluster install cluster.yml
Installing adora-daemon-robot on ubuntu@10.0.0.2... OK
Installing adora-daemon-gpu-server on ubuntu@10.0.0.3... OK
2/2 succeeded.

adora cluster uninstall

Uninstall adora-daemon systemd services from each machine. Stops, disables, and removes the systemd unit.

adora cluster uninstall <PATH>

Behavior:

For each machine, runs:

sudo systemctl stop adora-daemon-<id>
sudo systemctl disable adora-daemon-<id>
sudo rm -f /etc/systemd/system/adora-daemon-<id>.service
sudo systemctl daemon-reload

adora cluster upgrade

Rolling upgrade: SCP the local adora binary to each machine and restart daemons. Processes machines sequentially to maintain availability.

adora cluster upgrade <PATH>

Behavior:

For each machine sequentially:

  1. SCP the local adora binary to /usr/local/bin/adora on the target machine.
  2. Restart the systemd service via sudo systemctl restart adora-daemon-<id>.
  3. Poll the coordinator until the daemon reconnects (30s timeout, 500ms intervals).

Nodes on other machines continue running while each machine is being upgraded.

Example:

$ adora cluster upgrade cluster.yml
Upgrading robot (ubuntu@10.0.0.2)...
  SCP binary... OK
  Restart service... OK
  Waiting for reconnect... OK (3.2s)
Upgrading gpu-server (ubuntu@10.0.0.3)...
  SCP binary... OK
  Restart service... OK
  Waiting for reconnect... OK (2.8s)
2/2 succeeded.

adora cluster restart

Restart a running dataflow by name or UUID. Stops the dataflow and immediately re-starts it using the stored descriptor (no YAML path needed).

adora cluster restart <DATAFLOW>

Arguments:

ArgumentDescription
DATAFLOWName or UUID of the dataflow to restart

Example:

$ adora cluster restart my-app
Restarting dataflow `my-app`
dataflow restarted: a1b2c3d4-... -> e5f6a7b8-...

Node Scheduling

When the coordinator receives a dataflow, it decides which daemon runs each node based on the _unstable_deploy section in the dataflow YAML. Resolution priority: machine > labels > unnamed.

Machine-based scheduling

Assign a node to a specific machine by its id from cluster.yml:

nodes:
  - id: camera
    _unstable_deploy:
      machine: robot
    path: ./camera-driver
    outputs:
      - frames

The coordinator looks up the daemon whose machine-id matches. If no matching daemon is connected, the deployment fails with: no matching daemon for machine id "robot".

Label-based scheduling

Assign a node by requiring specific labels on the target daemon:

nodes:
  - id: inference
    _unstable_deploy:
      labels:
        gpu: "true"
    path: ./ml-model
    inputs:
      frames: camera/frames
    outputs:
      - predictions

The coordinator finds the first connected daemon whose labels are a superset of the required labels. All required key-value pairs must match exactly. If no daemon satisfies the requirements, deployment fails with: no daemon matches labels {"gpu": "true"}.

Unassigned nodes

Nodes without an _unstable_deploy section (or with an empty one) are assigned to the first unnamed daemon – one that connected without a --machine-id flag.

How resolve_daemon() works internally

The coordinator resolves node placement in coordinator/run/mod.rs:

resolve_daemon(connections, deploy) -> DaemonId
  1. If deploy.machine is Some(id):
       -> look up daemon by machine-id
  2. Else if deploy.labels is non-empty:
       -> find first daemon where all required labels match
  3. Else:
       -> pick first unnamed daemon

The label matching function iterates over all connected daemons and checks that every required key-value pair exists in the daemon’s label set (conn.labels.get(k) == Some(v)). This is a superset check: a daemon with {gpu: "true", arch: "arm64", role: "edge"} satisfies the requirement {gpu: "true"}.


Binary Distribution

Control how node binaries are delivered to remote daemons via the distribute field.

Local (default)

Each daemon builds from source on its own machine. This is the current default behavior.

nodes:
  - id: my-node
    _unstable_deploy:
      machine: edge-01
      distribute: local
    path: ./my-node

SCP mode

The CLI pushes the locally-built binary to the target machine via SSH/SCP before spawning.

nodes:
  - id: my-node
    _unstable_deploy:
      machine: edge-01
      distribute: scp
    path: ./my-node

HTTP mode

The coordinator runs an artifact store. Daemons pull binaries from the coordinator via HTTP before spawning.

nodes:
  - id: my-node
    _unstable_deploy:
      machine: edge-01
      distribute: http
    path: ./my-node

Artifacts are served from GET /api/artifacts/{build_id}/{node_id} on the coordinator’s WebSocket port. The endpoint requires authentication (Bearer token) and sanitizes node IDs to prevent path traversal.

When to use each strategy

StrategyBest forTradeoffs
localHomogeneous clusters, CI buildsRequires build toolchain on every machine
scpHeterogeneous clusters, cross-compiled binariesRequires SSH access from CLI to all machines
httpAir-gapped daemons, firewalled networksRequires coordinator reachability from all daemons

systemd Service Management

For production deployments, install daemons as systemd services so they survive reboots and auto-restart on failure.

Install

adora cluster install cluster.yml

Creates a systemd unit file on each machine (see adora cluster install for the full unit template). Key properties:

  • Restart=on-failure with RestartSec=5: daemon auto-restarts if it crashes.
  • After=network-online.target: waits for network before starting.
  • WantedBy=multi-user.target: starts on boot.

Uninstall

adora cluster uninstall cluster.yml

Stops, disables, and removes the unit file from each machine, then reloads the systemd daemon.

Verifying service status

After install, check services directly:

ssh ubuntu@10.0.0.2 sudo systemctl status adora-daemon-robot

Auto-Recovery

When a daemon disconnects and reconnects (e.g., after a network blip, machine reboot, or service restart), the coordinator automatically re-spawns any missing dataflows on that daemon.

How it works

  1. Daemon reconnects and sends a StatusReport listing its currently running dataflows.
  2. Coordinator compares the report against its expected state (dataflows that should have nodes on this daemon).
  3. For each running dataflow with nodes assigned to this daemon that the daemon did not report, the coordinator sends a SpawnDataflowNodes command to re-spawn the missing nodes.

30-second backoff

To prevent crash loops (e.g., a node that immediately crashes on spawn), recovery uses a per-daemon, per-dataflow backoff:

  • After a recovery attempt, the coordinator records the timestamp.
  • Subsequent recovery for the same daemon/dataflow pair is skipped until 30 seconds have elapsed.
  • The backoff clears when the daemon reports the dataflow as running again.

This means a node that crashes immediately will only be re-spawned once every 30 seconds, not in a tight loop.

Limitations

  • Auto-recovery only applies to dataflows started via adora start (coordinator-managed). Local adora run dataflows are not tracked by the coordinator.
  • Recovery re-spawns all nodes assigned to the reconnecting daemon, not individual nodes. For per-node restart on crash, use restart policies.

Rolling Upgrade

Upgrade the adora binary on all cluster machines with zero downtime using sequential per-machine upgrades.

Process

adora cluster upgrade cluster.yml

For each machine, sequentially:

  1. SCP the local adora binary to /usr/local/bin/adora on the target.
  2. Restart the systemd service (systemctl restart adora-daemon-<id>).
  3. Poll the coordinator until the daemon reconnects (30s timeout).

Because machines are upgraded one at a time, nodes on other machines continue running. After the daemon reconnects, auto-recovery re-spawns any dataflow nodes that were running on that machine.

Prerequisites

  • Daemons must be installed as systemd services (adora cluster install).
  • The local adora binary must be compatible with the cluster’s coordinator version.
  • SSH access with sudo permissions on all target machines.

Use Cases

1. Edge AI Pipeline (Robot + GPU Server)

A camera node runs on the robot, sends frames to a GPU server for inference, and results flow back to an actuator on the robot.

cluster.yml:

coordinator:
  addr: 192.168.1.1

machines:
  - id: robot
    host: 192.168.1.10
    user: ubuntu
    labels:
      role: edge
  - id: gpu-server
    host: 192.168.1.20
    user: ml
    labels:
      gpu: "true"

dataflow.yml:

nodes:
  - id: camera
    _unstable_deploy:
      machine: robot
    path: ./camera-driver
    outputs:
      - frames

  - id: inference
    _unstable_deploy:
      labels:
        gpu: "true"
    path: ./ml-model
    inputs:
      frames: camera/frames
    outputs:
      - predictions

  - id: actuator
    _unstable_deploy:
      machine: robot
    path: ./actuator-driver
    inputs:
      commands: inference/predictions

2. Multi-Robot Fleet

A central coordinator manages N robots with heterogeneous hardware. Label scheduling routes nodes to the right machines without hardcoding machine IDs.

cluster.yml:

coordinator:
  addr: 10.0.0.1

machines:
  - id: bot-01
    host: 10.0.0.11
    user: robot
    labels:
      fleet: warehouse
      lidar: "true"

  - id: bot-02
    host: 10.0.0.12
    user: robot
    labels:
      fleet: warehouse
      camera: rgbd

  - id: bot-03
    host: 10.0.0.13
    user: robot
    labels:
      fleet: warehouse
      lidar: "true"
      camera: rgbd

dataflow.yml:

nodes:
  - id: lidar-driver
    _unstable_deploy:
      labels:
        lidar: "true"
    path: ./lidar-driver
    outputs:
      - scans

  - id: camera-driver
    _unstable_deploy:
      labels:
        camera: rgbd
    path: ./camera-driver
    outputs:
      - frames

With this configuration, lidar-driver runs on bot-01 or bot-03, and camera-driver runs on bot-02 or bot-03.

3. CI/CD Pipeline for Robotics

Automate cluster management in CI:

# Setup
adora cluster install cluster.yml

# Deploy new version
adora cluster upgrade cluster.yml

# Run integration tests
adora start test-dataflow.yml --name integration-test --attach

# Monitor
adora cluster status
adora top

# Cleanup
adora stop integration-test

4. Development to Production

StageApproachCommand
Local devSingle-process, no coordinatoradora run dataflow.yml
StagingAd-hoc daemons, manual setupadora up + adora daemon on each machine
ProductionManaged cluster, systemd servicesadora cluster install cluster.yml

Operations Runbook

Initial Setup Checklist

  1. SSH keys: Distribute SSH keys so the CLI machine can reach all cluster machines without a password (BatchMode=yes).
  2. Adora binary: Install the adora binary on all machines (same version).
  3. Network: Ensure coordinator port (default 6013) is reachable from all machines. Ensure Zenoh ports are open between daemons for cross-machine node communication.
  4. cluster.yml: Create the cluster configuration with correct IPs, users, and labels.

Day-to-Day Operations

# Start a dataflow
adora start dataflow.yml --name my-app --attach

# List running dataflows
adora list

# Monitor resource usage
adora top

# View node logs
adora logs my-app <node-id> --follow

# Stop a dataflow
adora stop my-app

# Check cluster health
adora cluster status

Upgrading

  1. Build or download the new adora binary locally.
  2. Run adora cluster upgrade cluster.yml.
  3. Verify with adora cluster status that all daemons reconnected.
  4. Running dataflows are automatically re-spawned via auto-recovery.

Troubleshooting

Daemon not connecting

  • Verify the coordinator is running and reachable: curl http://<addr>:6013/api/health (or check coordinator logs).
  • Check daemon logs: journalctl -u adora-daemon-<id> -f (systemd) or the daemon’s stderr output (ad-hoc).
  • Confirm the --coordinator-addr and --coordinator-port match the coordinator’s actual bind address.

SSH failures during cluster commands

  • Ensure ssh -o BatchMode=yes <user>@<host> echo ok works from the CLI machine.
  • Check that StrictHostKeyChecking=accept-new is acceptable for your environment (first connection auto-accepts the host key).
  • Verify the user field in cluster.yml matches a valid SSH user on the target.

Label mismatch errors

  • Error: no daemon matches labels {"gpu": "true"}.
  • Check that the daemon was started with the correct --labels flag.
  • Run adora cluster status to see connected daemons. Labels are set at daemon startup from cluster.yml and cannot be changed at runtime.

Auto-recovery not triggering

  • Auto-recovery only applies to coordinator-managed dataflows (adora start), not adora run.
  • Check coordinator logs for auto-recovery: re-spawning messages.
  • If the node crashes immediately, recovery is throttled to once every 30 seconds per daemon per dataflow.

Deployment YAML Reference

The _unstable_deploy section on each node controls placement and distribution. All fields are optional.

nodes:
  - id: my-node
    _unstable_deploy:
      machine: edge-01                # Target machine ID from cluster.yml
      labels:                          # Label requirements (superset match)
        gpu: "true"
        arch: arm64
      distribute: local                # local | scp | http
      working_dir: /opt/my-app         # Working directory on the target machine
    path: ./my-node

Fields

FieldTypeDefaultDescription
machinestringnoneTarget machine ID. Takes priority over labels.
labelsmapemptyRequired daemon labels. All key-value pairs must match.
distributestringlocalBinary distribution strategy: local, scp, or http.
working_dirpathnoneWorking directory on the target machine.

Resolution priority

  1. machine – if set, the node is assigned to the daemon with that machine ID.
  2. labels – if set (and machine is not), the node is assigned to the first daemon whose labels are a superset of the required labels.
  3. Fallback – if neither is set, the node is assigned to the first unnamed (no machine-id) daemon.

Best Practices

  • Use labels over machine IDs for flexibility. Labels decouple your dataflow from specific machines, making it easier to add, remove, or replace hardware.
  • Use systemd install for production. Daemon services survive reboots and auto-restart on failure with Restart=on-failure.
  • Use coordinator persistence (adora coordinator --store redb) with clusters so the coordinator survives restarts. See Coordinator State Persistence.
  • Set restart policies on nodes for per-node resilience. Combine with auto-recovery for defense in depth. See Restart Policies.
  • Monitor with multiple tools: adora cluster status for daemon health, adora top for resource usage, adora logs for node output.
  • Test locally first. Develop with adora run dataflow.yml, then deploy to a cluster. The same dataflow YAML works in both modes – _unstable_deploy fields are ignored in local mode.
  • Use rolling upgrades instead of stopping the entire cluster. adora cluster upgrade processes one machine at a time to maintain availability.
  • Keep cluster.yml in version control alongside your dataflow definitions.

Performance

Adora achieves 10-17x lower latency than ROS2 Python through zero-copy shared memory IPC, Apache Arrow columnar format, and 100% Rust internals. This document covers methodology, reproduction, and tuning.

Architecture Advantages

LayerAdoraROS2 (rclpy)
RuntimeRust async (tokio)Python + C++ middleware
IPC (>4KB)Zero-copy shared memoryDDS serialization + copy
IPC (<4KB)TCP with bincodeDDS serialization + copy
Data formatApache Arrow (zero-serde)CDR serialization
ThreadingLock-free channels (flume)GIL-bound callbacks

Benchmark Suite

Internal benchmarks (examples/benchmark/)

Measures Adora’s own latency and throughput across 10 payload sizes (0B to 4MB).

cd examples/benchmark
./compare.sh          # Rust vs Python sender comparison

Metrics reported: avg, p50, p95, p99, p99.9, min, max latency; msg/s throughput.

ROS2 comparison (examples/ros2-comparison/)

Apples-to-apples comparison using identical Python workloads on both frameworks.

cd examples/ros2-comparison
./run_comparison.sh   # Requires ROS2 Humble+

Both sides use time.perf_counter_ns() timestamps embedded in payload first 8 bytes. Same message count, sizes, and sleep intervals ensure comparable results.

Criterion micro-benchmarks

Isolated benchmarks for internal hot paths:

# Daemon message routing (fan-out x payload size matrix)
cargo bench -p adora-daemon

# Message serialization/deserialization
cargo bench -p adora-message

CI tracks these via benchmark-action/github-action-benchmark with 120% alert threshold.

Reproducing Results

Requirements

  • Linux or macOS (shared memory IPC)
  • Rust 1.85+ with release profile
  • Python 3.10+ with numpy, pyarrow
  • ROS2 Humble+ (for comparison only)

Steps

  1. Build Adora:

    cargo install --path binaries/cli --locked
    
  2. Run internal benchmark:

    cd examples/benchmark
    BENCH_CSV=results/rust.csv adora run dataflow.yml
    
  3. Run ROS2 comparison:

    cd examples/ros2-comparison
    ./run_comparison.sh
    

Environment Notes

  • Close background applications to reduce variance
  • Use taskset or cpuset to pin processes for consistent results
  • Run at least 3 iterations and report median
  • Shared memory benefits appear at payloads >4KB

Performance Tuning

Queue sizes

Default queue size is 10. For high-throughput outputs, increase it:

inputs:
  data:
    source: producer/output
    queue_size: 1000

Payload size

Adora automatically uses shared memory for messages >4KB, avoiding copies. Structure data to exceed this threshold when low latency matters.

Arrow format

Use Arrow arrays directly instead of converting to/from Python lists:

# Fast: pass Arrow array directly
node.send_output("out", pa.array(data, type=pa.uint8()))

# Slow: convert through Python list
node.send_output("out", pa.array(list(data), type=pa.uint8()))

Operator vs Node

Operators run in-process with the runtime (zero IPC overhead) but share the GIL in Python. Use Rust operators for compute-heavy work, Python operators for glue logic.

Distributed deployment

For cross-machine communication, Adora uses Zenoh pub-sub. Latency depends on network quality. Use local deployment (single-machine) when sub-millisecond latency is required.

CSV Output Format

All benchmarks support BENCH_CSV environment variable for machine-readable output:

latency,<bytes>,<label>,<n>,<avg_ns>,<p50_ns>,<p95_ns>,<p99_ns>,<p999_ns>,<min_ns>,<max_ns>
throughput,<bytes>,<label>,<n>,<msg_per_sec>,<elapsed_ns>,0,0,0,0,0

ROS2 Bridge

Adora provides a declarative YAML-based ROS2 bridge that lets any Adora node communicate with ROS2 topics, services, and actions without importing ROS2 libraries. You define the bridge in your dataflow YAML using the ros2: key, and the framework automatically spawns a bridge binary that converts between Apache Arrow (Adora’s native format) and ROS2 CDR/DDS. Your user nodes stay ROS2-free – they send and receive pure Arrow StructArray data.

Features at a Glance

FeatureConfigDescription
Topic subscribetopic + direction: subscribeReceive from ROS2, forward as Arrow
Topic publishtopic + direction: publishReceive Arrow, publish to ROS2
Multi-topictopicsMultiple topics on a single ROS2 node
Service clientservice + role: clientSend requests, receive responses
Service serverservice + role: serverReceive requests, send responses
Action clientaction + role: clientSend goals, receive feedback + result
Action serveraction + role: serverReceive goals, send feedback + result
QoS policiesqosReliability, durability, history, liveliness
Auto-spawnAutomaticBridge binary spawned by daemon as a Custom node

Architecture

When the Adora descriptor resolver encounters a ros2: key on a node, it converts it into a Custom node pointing to the adora-ros2-bridge-node binary. The bridge config is serialized as JSON into the ADORA_ROS2_BRIDGE_CONFIG environment variable.

User Node <--(Arrow/SharedMem)--> Bridge Binary <--(CDR/DDS)--> ROS2

The bridge binary:

  1. Reads AMENT_PREFIX_PATH to locate installed ROS2 message packages
  2. Parses message/service/action definitions at startup
  3. Creates a ros2_client node and the appropriate publishers, subscribers, clients, or servers
  4. Converts incoming ROS2 CDR messages to Arrow StructArray (subscribe/response/feedback)
  5. Converts incoming Arrow StructArray to ROS2 CDR messages (publish/request/goal)

Your user nodes never link against ROS2 – all ROS2 communication is isolated in the bridge binary.


Prerequisites

  • ROS2 environment sourced: AMENT_PREFIX_PATH must be set and point to a workspace containing the required message packages
  • Message packages installed: e.g., turtlesim, geometry_msgs, example_interfaces
  • For service client: A ROS2 service server must be running (or use a companion server dataflow)
  • For action client: A ROS2 action server must be running before starting the dataflow (no wait_for_action_server mechanism)
  • For action server: A ROS2 action client sends goals to the bridge (e.g., ros2 action send_goal)

Topic Bridge

Single Topic (Subscribe)

Subscribe to a ROS2 topic and forward messages as Arrow data to downstream Adora nodes.

nodes:
  - id: pose_bridge
    ros2:
      topic: /turtle1/pose
      message_type: turtlesim/Pose
      direction: subscribe       # default, can be omitted
    outputs:
      - pose

The bridge creates a ROS2 subscription on /turtle1/pose, deserializes each incoming turtlesim/Pose message into an Arrow StructArray, and sends it on the pose output.

Single Topic (Publish)

Receive Arrow data from Adora nodes and publish to a ROS2 topic.

nodes:
  - id: cmd_bridge
    ros2:
      topic: /turtle1/cmd_vel
      message_type: geometry_msgs/Twist
      direction: publish
    inputs:
      cmd_vel: planner/cmd_vel

The bridge receives Arrow data on the cmd_vel input, serializes it to geometry_msgs/Twist CDR, and publishes to /turtle1/cmd_vel.

Multi-Topic

Bridge multiple topics on a single ROS2 node context, mixing subscribe and publish directions.

nodes:
  - id: turtle_bridge
    ros2:
      topics:
        - topic: /turtle1/pose
          message_type: turtlesim/Pose
          direction: subscribe
          output: pose
        - topic: /turtle1/cmd_vel
          message_type: geometry_msgs/Twist
          direction: publish
          input: velocity
      qos:
        reliable: true
        keep_last: 10
    inputs:
      velocity: planner/cmd_vel
    outputs:
      - pose

Multi-topic mode supports up to 64 topics per bridge node.

Input/Output ID Mapping

By default, topic names are converted to Adora IDs by stripping the leading / and replacing remaining / with _:

ROS2 TopicDefault Adora ID
/turtle1/poseturtle1_pose
/camera/image_rawcamera_image_raw

In multi-topic mode, you can override this with explicit output (for subscribe) or input (for publish) fields. In single-topic mode, the node’s declared outputs or inputs are used directly.


Service Bridge

Service Client

Send requests from Adora to an external ROS2 service and receive responses.

nodes:
  - id: add_client
    ros2:
      service: /add_two_ints
      service_type: example_interfaces/AddTwoInts
      role: client
    inputs:
      request: requester/data
    outputs:
      - response

The bridge waits for the service to become available (up to 10 retries, 2 seconds each), then for each Arrow input it receives:

  1. Serializes the Arrow data as an AddTwoInts_Request CDR message
  2. Sends the request to the ROS2 service
  3. Waits for a response (30-second timeout)
  4. Deserializes the response into Arrow and sends it on the response output

Service Server

Expose an Adora handler node as a ROS2 service that external ROS2 clients can call.

nodes:
  - id: add_server
    ros2:
      service: /adora_add_two_ints
      service_type: example_interfaces/AddTwoInts
      role: server
    inputs:
      response: handler/result
    outputs:
      - request

  - id: handler
    path: path/to/handler-node
    inputs:
      request: add_server/request
    outputs:
      - result

The bridge receives ROS2 service requests, assigns each a unique request_id (UUID v7), forwards the request data as Arrow on the request output with request_id in metadata, and waits for the handler node to send a response back on the response input with the same request_id. The response is then returned to the correct ROS2 client.

See examples/ros2-bridge/yaml-bridge-service/ for a working example.

Request ID Correlation

Each incoming ROS2 request is assigned a request_id metadata parameter. The handler node must include the same request_id in metadata when sending the response. The simplest approach is to pass through metadata.parameters:

#![allow(unused)]
fn main() {
Event::Input { id, metadata, data } => {
    // metadata.parameters contains request_id
    let result = compute(data);
    node.send_service_response("response".into(), metadata.parameters, result)?;
}
}

Responses can arrive in any order – the bridge correlates them by request_id, not by arrival order. Stale pending requests are evicted after 30 seconds. The maximum pending request queue is 64 – additional requests are dropped when full.

Service Wait and Timeouts

BehaviorValue
Service client: wait for availability10 retries, 2s each (20s total)
Service client: response timeout30 seconds
Service server: pending request limit64

Action Bridge

Action Client

Send goals from Adora to an external ROS2 action server, receiving feedback and results.

nodes:
  - id: fib_client
    ros2:
      action: /fibonacci
      action_type: example_interfaces/Fibonacci
      role: client
    inputs:
      goal: goal_sender/goal
    outputs:
      - feedback
      - result

For each Arrow goal input:

  1. Serializes the Arrow data as a Fibonacci_Goal CDR message
  2. Sends the goal to the action server (30-second timeout)
  3. If accepted, spawns background threads for feedback and result
  4. Feedback messages arrive on the feedback output as they stream in
  5. The final result arrives on the result output (5-minute timeout)

Feedback and Result Streams

The action bridge sends feedback and results on separate outputs:

  • feedback: Streamed as each feedback message arrives from the action server. Contains the action’s feedback message as Arrow (e.g., {partial_sequence: int32[]} for Fibonacci)
  • result: Sent once when the action completes. Contains the action’s result message as Arrow (e.g., {sequence: int32[]} for Fibonacci)

Concurrent Goals

The bridge supports up to 8 concurrent in-flight goals (MAX_CONCURRENT_GOALS). Additional goals are dropped with a warning. Each goal spawns dedicated feedback and result reader threads.

Timeouts

BehaviorValue
Goal send timeout30 seconds
Result retrieval timeout5 minutes
FeedbackNo timeout (streams until action completes)

Action Server

Expose an Adora handler node as a ROS2 action server that external ROS2 clients can call.

nodes:
  - id: fib_server
    ros2:
      action: /fibonacci
      action_type: example_interfaces/Fibonacci
      role: server
    inputs:
      feedback: handler/feedback
      result: handler/result
    outputs:
      - goal

  - id: handler
    path: path/to/handler-node
    inputs:
      goal: fib_server/goal
    outputs:
      - feedback
      - result

The bridge receives goals from ROS2 clients, auto-accepts them, and forwards the goal data on the goal output. The handler computes feedback and results and sends them back on the feedback and result inputs.

See examples/ros2-bridge/yaml-bridge-action-server/ for a working Fibonacci example.

Goal ID Metadata

Each goal is identified by a UUID string passed as a goal_id metadata parameter. The bridge sets goal_id on every goal output. The handler must include the same goal_id in metadata when sending feedback and result so the bridge can correlate them to the correct goal.

The simplest approach is to pass through metadata.parameters from the goal event:

#![allow(unused)]
fn main() {
Event::Input { id, metadata, data } => match id.as_str() {
    "goal" => {
        let params = metadata.parameters; // contains goal_id
        // ... compute ...
        node.send_output("feedback".into(), params.clone(), feedback)?;
        node.send_output("result".into(), params, result)?;
    }
    // ...
}
}

Action Server Lifecycle

  1. ROS2 client sends a goal request
  2. Bridge auto-accepts the goal and starts executing
  3. Bridge sends goal data on goal output with goal_id in metadata
  4. Handler sends feedback (zero or more times) with same goal_id
  5. Handler sends result (once) with same goal_id; bridge returns it to the ROS2 client
  6. Result send times out after 5 minutes if the client never requests it

Goals that contain no data or cannot be forwarded to the handler are automatically aborted – the bridge sends Aborted status back to the ROS2 client so it does not hang indefinitely.

Goal Status

By default, results are returned with Succeeded status. The handler can override this by setting a goal_status metadata parameter on the result output:

goal_status valueROS2 StatusUse case
"succeeded" (or omitted)SucceededGoal completed successfully
"aborted"AbortedGoal failed during execution
"canceled"CanceledGoal was canceled by the handler

Unrecognized goal_status values default to Aborted with a warning logged. Omitting goal_status entirely defaults to Succeeded.

Rust example:

#![allow(unused)]
fn main() {
use adora_node_api::{GOAL_STATUS, GOAL_STATUS_ABORTED, Parameter};

let mut params = metadata.parameters; // contains goal_id
params.insert(GOAL_STATUS.to_string(), Parameter::String(GOAL_STATUS_ABORTED.to_string()));
node.send_output("result".into(), params, error_result)?;
}

Action Server Limits

BehaviorValue
Max concurrent goals8 (additional goals receive Aborted status)
Auto-acceptAll goals are auto-accepted
Result send timeout5 minutes

Python Action Server Handler

Python nodes receive goal data as PyArrow arrays with goal_id in the metadata dictionary. Pass it through on feedback/result outputs:

for event in node:
    if event["type"] == "INPUT" and event["id"] == "goal":
        goal_id = event["metadata"]["goal_id"]
        order = event["value"]["order"][0].as_py()

        # Send feedback
        node.send_output("feedback", feedback_array, {"goal_id": goal_id})

        # Send result (with optional status)
        node.send_output("result", result_array, {
            "goal_id": goal_id,
            "goal_status": "succeeded",  # or "aborted", "canceled"
        })

C++ Action Server Handler

C++ nodes access goal_id via type-safe metadata accessors:

auto goal_id = metadata->get_str("goal_id");

// Send feedback with goal_id
auto fb_metadata = new_metadata();
fb_metadata->set_string("goal_id", goal_id);
send_arrow_output_with_metadata("feedback", feedback_data, fb_metadata);

// Send result with goal_id
auto res_metadata = new_metadata();
res_metadata->set_string("goal_id", goal_id);
send_arrow_output_with_metadata("result", result_data, res_metadata);

Quality of Service (QoS)

Configuration

Set QoS at the bridge level (applies to all topics/channels) or per-topic in multi-topic mode.

nodes:
  - id: my_bridge
    ros2:
      topic: /sensor/data
      message_type: sensor_msgs/LaserScan
      qos:
        reliable: true
        durability: transient_local
        keep_last: 10
        liveliness: automatic
        lease_duration: 5.0
        max_blocking_time: 0.5

Defaults

FieldDefault
reliablefalse (best effort)
durabilityvolatile
livelinessautomatic
lease_durationinfinity
max_blocking_time100ms (only applies when reliable: true)
keep_last1
keep_allfalse

Per-Topic QoS Override

In multi-topic mode, each topic can override the bridge-level QoS:

ros2:
  topics:
    - topic: /fast_sensor
      message_type: sensor_msgs/Imu
      direction: subscribe
      qos:
        reliable: false          # override: best effort for this topic
        keep_last: 1
    - topic: /cmd
      message_type: geometry_msgs/Twist
      direction: publish
      # inherits bridge-level QoS (reliable: true)
  qos:
    reliable: true               # default for all topics
    keep_last: 10

Validation Rules

FieldValid Values
reliabletrue, false
durability"volatile", "transient_local"
liveliness"automatic", "manual_by_participant", "manual_by_topic"
keep_last1 to 10000
keep_alltrue, false (mutually exclusive intent with keep_last)
lease_durationFinite non-negative float (seconds)
max_blocking_timeFinite non-negative float (seconds)

Data Format: Arrow Structs

All data exchanged between your nodes and the bridge uses Arrow StructArray with a single row. Each field in the ROS2 message becomes a column in the struct.

How to Build Arrow Messages

Rust example: building an AddTwoInts_Request ({a: i64, b: i64}):

#![allow(unused)]
fn main() {
use std::sync::Arc;
use arrow::array::{Array, Int64Array, StructArray};
use arrow::datatypes::{DataType, Field};

fn make_add_request(a: i64, b: i64) -> StructArray {
    let fields = vec![
        Arc::new(Field::new("a", DataType::Int64, false)),
        Arc::new(Field::new("b", DataType::Int64, false)),
    ];
    let arrays: Vec<Arc<dyn Array>> = vec![
        Arc::new(Int64Array::from(vec![a])),
        Arc::new(Int64Array::from(vec![b])),
    ];
    StructArray::try_new(fields.into(), arrays, None)
        .expect("failed to create struct array")
}
}

Reading a response ({sum: i64}):

#![allow(unused)]
fn main() {
use arrow::array::{Int64Array, StructArray};

fn read_response(data: &dyn arrow::array::Array) -> i64 {
    let struct_array = data
        .as_any()
        .downcast_ref::<StructArray>()
        .expect("expected struct array");
    struct_array
        .column_by_name("sum")
        .expect("missing 'sum' field")
        .as_any()
        .downcast_ref::<Int64Array>()
        .expect("expected Int64Array")
        .value(0)
}
}

Mapping ROS2 Types to Arrow Types

ROS2 TypeArrow TypeRust Arrow Array
boolBooleanBooleanArray
int8Int8Int8Array
int16Int16Int16Array
int32Int32Int32Array
int64Int64Int64Array
uint8 / byte / charUInt8UInt8Array
uint16UInt16UInt16Array
uint32UInt32UInt32Array
uint64UInt64UInt64Array
float32Float32Float32Array
float64Float64Float64Array
stringUtf8StringArray
wstringUtf8 (encoded as UTF-16 on CDR side)StringArray
Nested messageStructStructArray

Sequences and Arrays

ROS2 TypeArrow TypeRust Arrow Array
Variable-length sequence (int32[])ListListArray
Bounded sequence (int32[<=10])List (length validated)ListArray
Fixed-size array (int32[3])FixedSizeListFixedSizeListArray

Example: reading a ListArray from Fibonacci feedback ({partial_sequence: int32[]}):

#![allow(unused)]
fn main() {
use arrow::array::{Int32Array, ListArray, StructArray};

let struct_array = data.as_any().downcast_ref::<StructArray>().unwrap();
let list = struct_array
    .column_by_name("partial_sequence")
    .unwrap()
    .as_any()
    .downcast_ref::<ListArray>()
    .unwrap();
let values = list
    .value(0)
    .as_any()
    .downcast_ref::<Int32Array>()
    .unwrap()
    .values()
    .to_vec();
}

Complete YAML Reference

nodes:
  - id: my_bridge
    ros2:
      # --- Mode (exactly one required) ---

      # Single topic mode
      topic: /topic_name               # ROS2 topic name
      message_type: package/TypeName    # ROS2 message type
      direction: subscribe             # subscribe (default) | publish

      # Multi-topic mode (mutually exclusive with topic)
      topics:
        - topic: /topic_a
          message_type: package/TypeA
          direction: subscribe
          output: custom_output_id     # override default ID mapping
          qos:                         # per-topic QoS override
            reliable: true
        - topic: /topic_b
          message_type: package/TypeB
          direction: publish
          input: custom_input_id       # override default ID mapping

      # Service mode (mutually exclusive with topic/topics/action)
      service: /service_name           # ROS2 service name
      service_type: package/TypeName   # ROS2 service type
      role: client                     # client | server

      # Action mode (mutually exclusive with topic/topics/service)
      action: /action_name             # ROS2 action name
      action_type: package/TypeName    # ROS2 action type
      role: client                     # client | server

      # --- QoS (optional, applies to all channels) ---
      qos:
        reliable: false                # true | false (default: false = best effort)
        durability: volatile           # volatile (default) | transient_local
        liveliness: automatic          # automatic | manual_by_participant | manual_by_topic
        lease_duration: 5.0            # seconds (default: infinity)
        max_blocking_time: 0.1         # seconds (default: 0.1, reliable only)
        keep_last: 1                   # 1-10000 (default: 1)
        keep_all: false                # true | false (default: false)

      # --- Optional ROS2 node config ---
      namespace: /                     # ROS2 namespace (default: "/")
      node_name: my_ros_node           # ROS2 node name (default: adora node id)

    # --- Standard Adora node fields ---
    inputs:
      input_id: source_node/output_id
    outputs:
      - output_id

Use Case Scenarios

1. Subscribe to Sensor Data (turtlesim pose)

nodes:
  - id: pose_bridge
    ros2:
      topic: /turtle1/pose
      message_type: turtlesim/Pose
    outputs:
      - pose

  - id: my_processor
    path: ./target/debug/my-processor
    inputs:
      pose: pose_bridge/pose
#![allow(unused)]
fn main() {
// In my_processor: receive turtlesim/Pose as Arrow
Event::Input { id, data, .. } if id.as_str() == "pose" => {
    let s = data.as_any().downcast_ref::<StructArray>().unwrap();
    let x = s.column_by_name("x").unwrap()
        .as_any().downcast_ref::<Float32Array>().unwrap().value(0);
    let y = s.column_by_name("y").unwrap()
        .as_any().downcast_ref::<Float32Array>().unwrap().value(0);
    println!("Turtle at ({x}, {y})");
}
}

2. Publish Velocity Commands

nodes:
  - id: planner
    path: ./target/debug/planner
    inputs:
      tick: adora/timer/millis/100
    outputs:
      - cmd_vel

  - id: cmd_bridge
    ros2:
      topic: /turtle1/cmd_vel
      message_type: geometry_msgs/Twist
      direction: publish
    inputs:
      cmd_vel: planner/cmd_vel
#![allow(unused)]
fn main() {
// In planner: send geometry_msgs/Twist as Arrow
// Twist has nested Vector3 fields: linear {x,y,z} and angular {x,y,z}
fn make_twist(linear_x: f64, angular_z: f64) -> StructArray {
    let vec3_fields = vec![
        Arc::new(Field::new("x", DataType::Float64, false)),
        Arc::new(Field::new("y", DataType::Float64, false)),
        Arc::new(Field::new("z", DataType::Float64, false)),
    ];
    let linear = StructArray::try_new(
        vec3_fields.clone().into(),
        vec![
            Arc::new(Float64Array::from(vec![linear_x])) as _,
            Arc::new(Float64Array::from(vec![0.0])) as _,
            Arc::new(Float64Array::from(vec![0.0])) as _,
        ],
        None,
    ).unwrap();
    let angular = StructArray::try_new(
        vec3_fields.into(),
        vec![
            Arc::new(Float64Array::from(vec![0.0])) as _,
            Arc::new(Float64Array::from(vec![0.0])) as _,
            Arc::new(Float64Array::from(vec![angular_z])) as _,
        ],
        None,
    ).unwrap();

    let fields = vec![
        Arc::new(Field::new("linear", linear.data_type().clone(), false)),
        Arc::new(Field::new("angular", angular.data_type().clone(), false)),
    ];
    StructArray::try_new(
        fields.into(),
        vec![Arc::new(linear) as _, Arc::new(angular) as _],
        None,
    ).unwrap()
}
}

3. Multi-Topic Bidirectional Bridge

Subscribe to pose and publish velocity on a single ROS2 node.

nodes:
  - id: turtle_bridge
    ros2:
      topics:
        - topic: /turtle1/pose
          message_type: turtlesim/Pose
          direction: subscribe
          output: pose
        - topic: /turtle1/cmd_vel
          message_type: geometry_msgs/Twist
          direction: publish
          input: velocity
      qos:
        reliable: true
        keep_last: 10
    inputs:
      velocity: planner/cmd_vel
    outputs:
      - pose

  - id: planner
    path: ./target/debug/planner
    inputs:
      pose: turtle_bridge/pose
      tick: adora/timer/millis/100
    outputs:
      - cmd_vel

4. Service Client: Call an External ROS2 Service

nodes:
  - id: requester
    path: ./target/debug/requester
    inputs:
      tick: adora/timer/millis/1000
      response: add_client/response
    outputs:
      - request

  - id: add_client
    ros2:
      service: /add_two_ints
      service_type: example_interfaces/AddTwoInts
      role: client
    inputs:
      request: requester/request
    outputs:
      - response

Prerequisites: run a ROS2 service first:

ros2 run examples_rclcpp_minimal_service service_main

5. Service Server: Expose an Adora Handler as ROS2 Service

nodes:
  - id: add_server
    ros2:
      service: /add_two_ints
      service_type: example_interfaces/AddTwoInts
      role: server
    inputs:
      response: handler/response
    outputs:
      - request

  - id: handler
    path: ./target/debug/handler
    inputs:
      request: add_server/request
    outputs:
      - response

The handler receives {a: i64, b: i64} as Arrow, computes the result, and sends {sum: i64} back. External ROS2 clients can call this service:

ros2 service call /add_two_ints example_interfaces/srv/AddTwoInts "{a: 3, b: 5}"

6. Action Client: Long-Running Fibonacci Goal

nodes:
  - id: goal_sender
    path: ./target/debug/goal-sender
    inputs:
      tick: adora/timer/millis/5000
      feedback: fib_client/feedback
      result: fib_client/result
    outputs:
      - goal

  - id: fib_client
    ros2:
      action: /fibonacci
      action_type: example_interfaces/Fibonacci
      role: client
    inputs:
      goal: goal_sender/goal
    outputs:
      - feedback
      - result

Prerequisites: start the action server before the dataflow:

ros2 run examples_rclcpp_action_server fibonacci_action_server

The goal node sends {order: int32}, receives streamed {partial_sequence: int32[]} feedback, and a final {sequence: int32[]} result.

7. Action Server: Expose an Adora Handler as ROS2 Action

nodes:
  - id: fib_server
    ros2:
      action: /fibonacci
      action_type: example_interfaces/Fibonacci
      role: server
    inputs:
      feedback: handler/feedback
      result: handler/result
    outputs:
      - goal

  - id: handler
    path: ./target/debug/handler
    inputs:
      goal: fib_server/goal
    outputs:
      - feedback
      - result

The handler receives {order: int32} goals with a goal_id in metadata, sends {partial_sequence: int32[]} feedback, and a final {sequence: int32[]} result – all with the same goal_id in metadata. External ROS2 clients can send goals:

ros2 action send_goal /fibonacci example_interfaces/action/Fibonacci "{order: 10}"

Limitations and Known Constraints

  • Action server auto-accept: All incoming goals are automatically accepted. The handler cannot reject goals before execution starts.
  • No action cancel support: Neither client nor server handles ROS2 cancel requests.
  • No wait_for_action_server: The ros2_client library does not provide this API. Start the action server before the dataflow. The first goal will time out (30s) if the server is unavailable.
  • Single-flight service client: The service client processes requests sequentially – each request blocks until the response arrives (or times out at 30s).
  • QoS uniform for service/action channels: The qos config applies to all service/action sub-channels (goal, result, cancel, feedback, status). Per-channel QoS is not configurable.
  • AMENT_PREFIX_PATH required: The bridge fails at startup if no ROS2 message definitions are found.
  • Max 64 topics: Multi-topic mode supports at most 64 topics per bridge node.
  • Max 8 concurrent action goals: Additional goals receive Aborted status when the limit is reached.
  • Max 64 pending service requests (server): Requests are dropped when the queue is full.

Best Practices

Source your ROS2 environment before running. Ensure AMENT_PREFIX_PATH is set and includes all required message packages. The bridge logs an error if no definitions are found.

Start action servers before the dataflow. There is no wait mechanism for action servers. If the server is not ready, the first goal send will time out after 30 seconds.

Use multi-topic mode for related topics. Bridging /turtle1/pose (subscribe) and /turtle1/cmd_vel (publish) on the same bridge node reduces resource usage compared to two separate bridge nodes.

Match Arrow field names exactly. The bridge validates that Arrow struct field names match the ROS2 message definition. Missing fields use default values (zero for numbers, empty string). Extra fields cause an error.

Use explicit output/input in multi-topic mode. Default ID mapping (stripping /, replacing / with _) can be confusing for deep topic names. Explicit IDs make the dataflow YAML self-documenting.

Set QoS to match the ROS2 publisher/subscriber. QoS mismatches (e.g., reliable subscriber with best-effort publisher) cause silent communication failures. Check with ros2 topic info -v /topic_name to see the existing QoS settings.

Pass through request_id in service responses. The bridge correlates responses to requests using the request_id metadata parameter. If the handler does not include request_id in the response metadata, the bridge cannot match the response to the original ROS2 request.

WebSocket Control Plane

Adora’s control plane uses WebSocket connections for all communication between the CLI, coordinator, and daemons. A single Axum server exposes three routes on one port, replacing the previous multi-port TCP design. JSON text frames carry a UUID-correlated request-reply protocol with fire-and-forget events for log streaming.

Features at a Glance

FeatureDetail
Routes/api/control (CLI), /api/daemon (daemons), /health
Wire formatJSON text frames + binary frames for topic data
ProtocolUUID-correlated request-reply + fire-and-forget events
Message size limit1 MiB (MAX_CONTROL_MESSAGE_BYTES)
Concurrency limit256 connections (MAX_WS_CONNECTIONS)
Server frameworkAxum + Tower middleware
Client librarytokio-tungstenite (integration tests, daemon), custom WsSession (CLI)
SecurityRe-register guard, daemon ID verification, machine ID length limit

Architecture

                        Single Axum server (one port)
                       ┌────────────────────────────┐
                       │  /api/control   (CLI)       │
  CLI ──── WS ────────>│  /api/daemon    (Daemons)   │
                       │  /health        (HTTP GET)  │
  Daemon ── WS ───────>│                             │
                       └──────────┬─────────────────┘
                                  │ mpsc::Sender<Event>
                                  v
                            Coordinator
                          (event loop)

The coordinator binds a single TcpListener and serves an Axum router. Each WebSocket upgrade spawns a handler task that communicates with the coordinator’s main event loop through an mpsc::Sender<Event> channel.

Key source files

FileRole
binaries/coordinator/src/ws_server.rsRouter, serve(), constants, ShutdownTrigger
binaries/coordinator/src/ws_control.rs/api/control handler
binaries/coordinator/src/ws_daemon.rs/api/daemon handler, security, event translation
binaries/cli/src/ws_client.rsWsSession synchronous client wrapper
libraries/message/src/ws_protocol.rsWsRequest, WsResponse, WsEvent, WsMessage types

Wire Protocol

All messages are JSON text frames. Three message shapes exist:

WsRequest (client -> server)

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "method": "control",
  "params": { "List": null }
}
FieldTypeDescription
idUUIDUnique request identifier for reply correlation
methodstring"control" for CLI requests, "daemon_event" / "daemon_command" for daemon
paramsobjectSerialized ControlRequest or Timestamped<CoordinatorRequest>

WsResponse (server -> client)

Success:

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "result": { "DataflowList": [] }
}

Error:

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "error": "no running dataflow with id ..."
}
FieldTypeDescription
idUUIDMatches the originating request id
resultobject?Present on success (serialized ControlRequestReply)
errorstring?Present on failure

WsEvent (either direction)

{
  "event": "log",
  "payload": { "message": "sensor started", "level": "info" }
}

Used for log streaming after a LogSubscribe/BuildLogSubscribe is acknowledged.

Dispatch

Each handler parses incoming frames with its own strategy to preserve u128 fidelity (see u128 serialization):

  • CLI (ws_client.rs): Uses a flat IncomingFrame struct with serde_json::value::RawValue for the result/payload fields, avoiding serde_json::Value entirely. Discriminates by presence of event (log push) or id (response).
  • Coordinator control handler (ws_control.rs): Parses as WsRequest (always a request from CLI).
  • Coordinator daemon handler (ws_daemon.rs): Checks for "method" key to distinguish requests vs responses. Uses DaemonWsRequestRaw helper for requests.
  • Daemon (coordinator.rs): Uses CoordinatorCommandRaw / RegisterReplyRaw helper structs to parse directly from raw JSON text.

A WsMessage untagged enum is defined in ws_protocol.rs for generic dispatch but is not used by the production handlers:

#![allow(unused)]
fn main() {
#[serde(untagged)]
pub enum WsMessage {
    Request(WsRequest),
    Response(WsResponse),
    Event(WsEvent),
}
}

CLI Control Plane (/api/control)

The CLI connects to /api/control to send ControlRequest commands and receive ControlRequestReply responses.

Connection lifecycle

  1. Connect – HTTP upgrade to WebSocket
  2. Request-reply – CLI sends WsRequest, coordinator processes the ControlRequest, sends WsResponse
  3. Log subscribe (optional) – CLI sends LogSubscribe/BuildLogSubscribe, coordinator acks with WsResponse, then pushes WsEvent{event:"log"} frames
  4. Close – CLI sends Close frame or drops connection

Supported ControlRequest variants

VariantDescription
ListList all running dataflows
BuildTrigger a dataflow build
WaitForBuildBlock until build completes
StartStart a dataflow
WaitForSpawnBlock until nodes are spawned
Stop / StopByNameStop a running dataflow
ReloadHot-reload a node/operator
CheckCheck dataflow status
DestroyTear down all daemons
LogsRetrieve historical logs
InfoGet dataflow details
DaemonConnectedCheck if any daemon is connected
ConnectedMachinesList connected daemons
LogSubscribeSubscribe to live dataflow logs
BuildLogSubscribeSubscribe to live build logs
CliAndDefaultDaemonOnSameMachineCheck co-location
GetNodeInfoGet node metadata
TopicSubscribeSubscribe to live topic data via binary WS frames (details)
TopicUnsubscribeCancel a topic subscription

Log subscription flow

CLI                         Coordinator
 │                              │
 │─── WsRequest{LogSubscribe} ─>│
 │                              │  (check dataflow exists)
 │<── WsResponse{subscribed} ───│
 │                              │
 │<── WsEvent{event:"log"} ────│  (repeated)
 │<── WsEvent{event:"log"} ────│
 │                              │
 │─── Close ───────────────────>│  (log_subscribers dropped)

If the dataflow is not found, the coordinator returns WsResponse with an error and no events are sent.

WsSession (CLI client)

WsSession is a synchronous wrapper that bridges blocking CLI code to the async WebSocket connection. It creates an internal tokio::runtime::Runtime (current-thread) and spawns an async session_loop task.

CLI thread (sync)                       session_loop (async)
     │                                        │
     │── SessionCommand::Request ────────────>│── WsRequest ──> server
     │                                        │<── WsResponse ──
     │<── oneshot reply ─────────────────────│
     │                                        │
     │── SessionCommand::SubscribeLogs ──────>│── WsRequest ──> server
     │                                        │<── WsResponse (ack)
     │<── oneshot ack ───────────────────────│
     │<── std_mpsc log events ───────────────│<── WsEvent ──

The session loop maintains:

  • pending_requests: HashMap<Uuid, oneshot::Sender> – for request-reply correlation
  • pending_subscribes: HashMap<Uuid, (ack_tx, log_tx)> – for subscribe ack routing
  • log_subscribers: Vec<std_mpsc::Sender> – for broadcasting log events
  • pending_topic_subscribes: HashMap<Uuid, (ack_tx, data_tx)> – for topic subscribe ack routing
  • topic_subscribers: HashMap<Uuid, std_mpsc::Sender> – for binary frame dispatch by subscription UUID

Binary WS frames (topic data) are dispatched separately from text frames. See WebSocket Topic Data Channel for details.

On disconnect, all pending requests receive an error via their oneshot channels.


Daemon Plane (/api/daemon)

Daemons connect to /api/daemon for registration, event reporting, and receiving coordinator commands.

Registration flow

Daemon                       Coordinator
  │                              │
  │── WsRequest{Register} ─────>│
  │                              │  (validate, assign daemon_id)
  │                              │  (track connection + cmd channel)
  │                              │
  │── WsRequest{Event{...}} ───>│  (subsequent events)
  1. Daemon sends a Register request containing DaemonRegisterRequest (version + machine ID)
  2. Coordinator validates version compatibility and machine ID length
  3. Coordinator assigns a DaemonId and stores the DaemonConnection (includes cmd_tx channel for sending commands back to the daemon)
  4. The connection is tracked via tracked_daemon_id for cleanup on disconnect

Event translation

Daemon events are translated into coordinator-internal Event variants:

DaemonEventCoordinator Event
AllNodesReadyEvent::Dataflow { ReadyOnDaemon }
AllNodesFinishedEvent::Dataflow { DataflowFinishedOnDaemon }
HeartbeatEvent::DaemonHeartbeat
Log(message)Event::Log(message)
ExitEvent::DaemonExit
NodeMetricsEvent::NodeMetrics
BuildResultEvent::DataflowBuildResult
SpawnResultEvent::DataflowSpawnResult

Bidirectional communication

The coordinator can send commands back to daemons via the cmd_tx channel stored in DaemonConnection. The daemon handler maintains a pending_replies: HashMap<Uuid, oneshot::Sender> to correlate daemon responses to coordinator-initiated requests.

Message routing on the daemon handler:

  • Frame has "method" key -> daemon request (registration or event)
  • Frame lacks "method" key -> daemon response to a coordinator command

u128 serialization workaround

uhlc::ID contains a NonZeroU128 which exceeds serde_json::Value::Number range (i64/u64/f64 only). Using serde_json::to_value() errors with “number out of range”, and serde_json::from_slice::<Value>() silently loses precision by storing as f64.

All production code bypasses serde_json::Value for data containing uhlc::Timestamp:

ComponentSerializationDeserialization
Daemon (coordinator.rs)to_string + format!Helper structs (RegisterReplyRaw, CoordinatorCommandRaw) + from_str
Coordinator control (ws_control.rs)to_string + format! for repliesN/A (CLI requests don’t contain u128)
Coordinator daemon (ws_daemon.rs)N/ADaemonWsRequestRaw + from_str
Coordinator state (state.rs)str::from_utf8 + format! (raw bytes embedding)N/A
CLI (ws_client.rs)N/A (requests don’t contain u128)IncomingFrame with serde_json::value::RawValue

Integration tests similarly construct WsRequest JSON strings manually via format!() + serde_json::to_string() (not to_value()) to match the real wire format.


Security

Re-register guard

Each daemon WebSocket connection allows exactly one Register request. If a connection attempts a second registration, the coordinator logs a warning and closes the connection:

daemon attempted re-register on same connection, rejecting

Daemon ID verification

After registration, every Event message must include a daemon_id matching the one assigned during registration. Mismatched IDs cause connection termination:

daemon sent event with mismatched id: expected `X`, got `Y` -- closing connection

Machine ID length validation

The machine_id field in DaemonRegisterRequest is limited to 256 bytes. Oversized values cause connection termination.

Connection and message limits

LimitValueEnforced by
Max message size1 MiBWebSocketUpgrade::max_message_size
Max concurrent connections256Tower ConcurrencyLimitLayer

Connection Lifecycle & Keepalive

Establishment

Both /api/control and /api/daemon use standard HTTP/1.1 WebSocket upgrade. The Axum WebSocketUpgrade extractor handles the handshake.

Ping/pong

Both handlers respond to Ping frames with Pong frames containing the same payload:

#![allow(unused)]
fn main() {
Ok(Message::Ping(data)) => {
    let _ = ws_tx.send(Message::Pong(data)).await;
    continue;
}
}

Graceful close

When a Close frame is received:

  • Control handler: breaks the handler loop, dropping log subscriber channels
  • Daemon handler: breaks the loop, then emits Event::DaemonExit { daemon_id } for immediate cleanup

Cleanup on disconnect

Control connections:

  • log_tx channel is dropped, stopping log forwarding to that client
  • No coordinator state to clean up (control connections are stateless)

Daemon connections:

  • DaemonExit event is emitted if a daemon_id was tracked
  • cmd_tx and pending_replies are dropped
  • Coordinator removes the daemon from its connection map

WsSession (CLI client):

  • All entries in pending_requests receive Err("WS connection closed")
  • All entries in pending_subscribes receive Err("WS connection closed")

Message Flow Examples

CLI lists dataflows

CLI                          WsSession                    Coordinator
 │                              │                              │
 │── request(&List) ───────────>│                              │
 │                              │── WsRequest ────────────────>│
 │                              │   id: "abc-123"              │
 │                              │   method: "control"          │
 │                              │   params: "List"             │
 │                              │                              │
 │                              │                    ControlEvent::IncomingRequest
 │                              │                    reply via oneshot
 │                              │                              │
 │                              │<── WsResponse ──────────────│
 │                              │   id: "abc-123"              │
 │                              │   result: {DataflowList:[]}  │
 │                              │                              │
 │<── ControlRequestReply ─────│                              │

Daemon registration

Daemon                                    Coordinator
  │                                           │
  │── WsRequest ─────────────────────────────>│
  │   method: "daemon_event"                  │
  │   params: {inner: Register{...},          │
  │            timestamp: ...}                │
  │                                           │  validate version
  │                                           │  validate machine_id
  │                                           │  assign daemon_id
  │                                           │  store DaemonConnection
  │                                           │
  │── WsRequest{Event{Heartbeat}} ──────────>│
  │                                           │  Event::DaemonHeartbeat
  │                                           │
  │                        (on WS close) ────>│  Event::DaemonExit

Log subscription lifecycle

CLI                    WsSession              Coordinator
 │                        │                        │
 │── subscribe_logs() ───>│                        │
 │                        │── WsRequest ──────────>│
 │                        │   params: LogSubscribe │
 │                        │                        │  find dataflow
 │                        │<── WsResponse ────────│  {subscribed: true}
 │<── ack (Ok) ──────────│                        │
 │                        │                        │
 │                        │<── WsEvent{log} ──────│  (node produces log)
 │<── log_rx.recv() ─────│                        │
 │                        │<── WsEvent{log} ──────│
 │<── log_rx.recv() ─────│                        │
 │                        │                        │
 │   (drop session) ─────>│── Close ─────────────>│  (log_subscribers dropped)

Test Coverage

Test tiers

TierLocationTestsWhat’s covered
Unit (protocol)libraries/message/src/ws_protocol.rs10Roundtrip serialization, untagged dispatch, error cases
Unit (client)binaries/cli/src/ws_client.rs6Response routing, subscribe ack, topic subscribe ack, orphan handling, disconnect
Integration (control)binaries/coordinator/tests/ws_control_tests.rs11Health check, List, invalid JSON/params, Destroy, DaemonConnected, ping/pong, concurrent requests, connection close, log subscribe
Integration (daemon)binaries/coordinator/tests/ws_daemon_tests.rs4Register, register-then-status, disconnect cleanup, ping/pong
E2E (WsSession)tests/ws-cli-e2e.rs4WsSession + coordinator: list, status, stop, multi-request
Total35

Key test patterns

Poll-with-timeout: Integration tests poll coordinator state (e.g., DaemonConnected) with a 2-second deadline and 20ms sleep intervals, avoiding flaky timing assumptions.

No nested runtimes: E2E tests run the coordinator on a background std::thread with its own tokio runtime, while WsSession (which creates its own current-thread runtime) runs on the test’s main thread. This avoids the “cannot start a runtime from within a runtime” panic.

u128 workaround in tests: Daemon test helpers construct WsRequest JSON strings manually via format!() + serde_json::to_string() (not serde_json::to_value()) to preserve uhlc::ID u128 values on the wire.

Test coordinator setup: Both integration and E2E tests use adora_coordinator::start_testing() which binds to port 0 (OS-assigned) and accepts an empty external event stream.


Configuration Reference

Constants

ConstantValueFilePurpose
MAX_CONTROL_MESSAGE_BYTES1 MiB (1,048,576)ws_server.rsMax WebSocket frame size
MAX_WS_CONNECTIONS256ws_server.rsTower concurrency limit

Server setup

#![allow(unused)]
fn main() {
// Production: called by coordinator's main startup
let (port, shutdown, future) = ws_server::serve(bind_addr, event_tx, clock).await?;
tokio::spawn(future);
// ...
shutdown.shutdown(); // graceful stop
}

Test setup

#![allow(unused)]
fn main() {
// Binds to port 0, returns (port, future)
let (port, future) = adora_coordinator::start_testing(
    "127.0.0.1:0".parse().unwrap(),
    futures::stream::empty(),
).await?;
}

Shutdown

ShutdownTrigger wraps a oneshot::Sender<()>. Calling .shutdown() sends the signal, which the Axum server receives via with_graceful_shutdown. In-flight requests complete; new connections are rejected.

WebSocket Topic Data Channel

The topic data channel extends the WebSocket control plane to proxy live dataflow messages from the coordinator to CLI clients. Instead of requiring direct Zenoh network access, CLI commands like topic echo, topic hz, and topic info receive message data over the existing WebSocket connection as binary frames.

Motivation

ScenarioBefore (Zenoh direct)After (WS proxy)
CLI on same machine as daemonWorksWorks
CLI remote, Zenoh reachableWorksWorks
CLI remote, no Zenoh accessFailsWorks
Browser-based web UIImpossiblePossible
Embedded target, no local diskCannot record locally--proxy streams to CLI

The key insight: CLI and future web UIs connect to the coordinator via WebSocket. By having the coordinator subscribe to Zenoh on their behalf and forward messages as binary frames, topic inspection works anywhere the WebSocket connection reaches.


Architecture

CLI  ──── WS (binary frames) ────>  Coordinator  ──── Zenoh sub ────>  Daemon
                                    (Zenoh proxy)                      (debug publish)

The coordinator acts as a Zenoh proxy:

  1. CLI sends a TopicSubscribe request over the existing text-frame WS protocol
  2. Coordinator validates the dataflow and opens Zenoh subscribers
  3. Coordinator forwards each Zenoh sample as a binary WS frame back to the CLI
  4. CLI dispatches binary frames by subscription UUID to the appropriate consumer

Key source files

FileRole
libraries/message/src/cli_to_coordinator.rsTopicSubscribe, TopicUnsubscribe request variants
libraries/message/src/coordinator_to_cli.rsTopicSubscribed reply variant
binaries/coordinator/src/ws_control.rsZenoh proxy: subscribe, forward binary frames
binaries/coordinator/src/control.rsControlEvent::TopicSubscribe for validation
binaries/cli/src/ws_client.rsWsSession::subscribe_topics(), binary frame dispatch
binaries/cli/src/command/topic/echo.rsTopic echo via WS
binaries/cli/src/command/topic/hz.rsTopic frequency measurement via WS
binaries/cli/src/command/topic/info.rsTopic metadata/stats via WS
binaries/cli/src/command/record.rs--proxy flag for WS-based recording

Wire Protocol

Subscription handshake (JSON text frames)

The subscription uses the existing UUID-correlated request-reply protocol:

Request (CLI -> Coordinator):

{
  "id": "abc-123",
  "method": "control",
  "params": {
    "TopicSubscribe": {
      "dataflow_id": "550e8400-...",
      "topics": [["camera_node", "image"], ["lidar_node", "points"]]
    }
  }
}

Response (Coordinator -> CLI):

{
  "id": "abc-123",
  "result": {
    "TopicSubscribed": {
      "subscription_id": "7f1b3a00-..."
    }
  }
}

Unsubscribe (CLI -> Coordinator):

{
  "id": "def-456",
  "method": "control",
  "params": {
    "TopicUnsubscribe": {
      "subscription_id": "7f1b3a00-..."
    }
  }
}

Binary data frames

After the handshake, the coordinator pushes binary WS frames. Each frame has a fixed-size header:

 0                   16                              N
 ├───────────────────┼──────────────────────────────┤
 │  subscription_id  │  Timestamped<InterDaemonEvent>│
 │  (16 bytes UUID)  │  (bincode serialized)         │
 └───────────────────┴──────────────────────────────┘
FieldSizeDescription
subscription_id16 bytesUUID matching the TopicSubscribed ack, for multiplexing
payloadvariableRaw Timestamped<InterDaemonEvent> bincode bytes from Zenoh

The 16-byte UUID prefix allows multiplexing multiple subscriptions on a single WS connection without additional framing overhead.


Data Flow

CLI                         WsSession                     Coordinator
 │                              │                              │
 │── subscribe_topics() ───────>│                              │
 │                              │── WsRequest{TopicSubscribe} >│
 │                              │                              │ validate dataflow
 │                              │                              │ open Zenoh session (lazy)
 │                              │                              │ spawn subscriber tasks
 │                              │<── WsResponse{TopicSubscribed}│
 │<── (sub_id, data_rx) ───────│                              │
 │                              │                              │
 │                              │       ┌── Zenoh sample ──────│ Daemon publishes
 │                              │<──────│ Binary frame         │
 │<── data_rx.recv() ──────────│       │ (sub_id + payload)   │
 │                              │       │                      │
 │                              │<──────│ Binary frame         │
 │<── data_rx.recv() ──────────│       │                      │
 │                              │       └                      │
 │                              │                              │
 │   (drop session) ───────────>│── Close ────────────────────>│ abort subscriber tasks

Coordinator internals

  1. Validation: ControlEvent::TopicSubscribe is sent to the coordinator event loop, which checks that the dataflow exists and has publish_all_messages_to_zenoh: true enabled
  2. Lazy Zenoh: The coordinator’s Zenoh session is opened on the first TopicSubscribe request and reused for subsequent subscriptions on the same WS connection
  3. Per-topic tasks: Each (node_id, data_id) pair spawns a tokio task that subscribes to the corresponding Zenoh topic and forwards samples to the binary frame channel
  4. Backpressure: The binary frame channel has capacity 64. try_send is used – if the channel is full (slow consumer), samples are silently dropped rather than blocking the Zenoh subscriber
  5. Cleanup: When the WS connection closes, all subscriber tasks are aborted

WsSession (CLI side)

The WsSession::subscribe_topics() method:

  1. Serializes a TopicSubscribe request
  2. Sends SessionCommand::SubscribeTopics through the internal command channel
  3. The async session_loop wraps it as a WsRequest and sends it
  4. On receiving the TopicSubscribed ack, registers the data_tx sender in topic_subscribers keyed by subscription_id
  5. Binary frames are dispatched by extracting the first 16 bytes as UUID and sending the remainder to the matching data_tx

State maintained in session_loop:

  • pending_topic_subscribes: HashMap<Uuid, (ack_tx, data_tx)> – awaiting ack
  • topic_subscribers: HashMap<Uuid, Sender> – active subscriptions receiving binary data

Prerequisites

The dataflow descriptor must enable debug message publishing:

_unstable_debug:
  publish_all_messages_to_zenoh: true

Without this, the coordinator rejects the TopicSubscribe with:

dataflow {id} not found or publish_all_messages_to_zenoh not enabled

CLI Commands

adora topic echo

Stream topic data to the terminal in real-time.

# Echo a single topic
adora topic echo -d my-dataflow camera_node/image

# Echo multiple topics
adora topic echo -d my-dataflow robot1/pose robot2/vel

# JSON output for piping
adora topic echo -d my-dataflow robot1/pose --format json

Internally: calls session.subscribe_topics(), receives Timestamped<InterDaemonEvent> from the data_rx channel, deserializes Arrow data, and renders as table or JSON.

adora topic hz

Interactive TUI displaying per-topic publish frequency statistics.

# All topics
adora topic hz -d my-dataflow --window 10

# Specific topics
adora topic hz -d my-dataflow robot1/pose robot2/vel --window 5

Uses ratatui for the TUI. A background std::thread receives events from data_rx and dispatches to per-topic HzStats trackers via a BTreeMap<(node_id, data_id), index> lookup.

adora topic info

One-shot topic metadata and statistics.

adora topic info -d my-dataflow camera_node/image --duration 5

Collects messages for --duration seconds, then displays type information, publisher, subscribers (from descriptor), message count, and bandwidth.

adora record --proxy

Stream dataflow data through WebSocket for local recording.

# Start dataflow first
adora start dataflow.yml --detach

# Record via proxy (data streams through coordinator to CLI)
adora record dataflow.yml --proxy -o capture.adorec

# Record specific topics
adora record dataflow.yml --proxy --topics sensor/image,lidar/points

Use case: the target machine (running the daemon) has no local disk or limited storage. The --proxy flag routes data through the coordinator WebSocket to the CLI machine, where the .adorec file is written locally.

Without --proxy (default), a record node is injected into the dataflow and records directly on the daemon’s machine.


Zenoh Topic Format

The coordinator subscribes to Zenoh topics using the format from adora_core::topics::zenoh_output_publish_topic():

adora/{dataflow_id}/{node_id}/{data_id}

Each topic carries Timestamped<InterDaemonEvent> as its payload, serialized with bincode. The coordinator forwards these bytes as-is (prepended with subscription UUID) – no re-serialization.


Backpressure and Performance

ParameterValueRationale
Binary frame channel capacity64Balance between latency and memory
Drop policyDrop on fullPrefer freshness over completeness
Binary formatRaw bincode (no base64)Avoid 33% overhead for large payloads

For high-throughput topics (camera images, point clouds), the binary frame channel may fill up if the WS connection is slow. Dropped samples are silent – the CLI will show reduced frequency in topic hz but won’t stall.


Error Handling

ErrorSourceResponse
Dataflow not foundCoordinator validationWsResponse with error message
publish_all_messages_to_zenoh not enabledCoordinator validationWsResponse with error message
Zenoh session open failureCoordinatorWsResponse with error message
Zenoh subscriber failurePer-topic taskWarning log, task exits
Binary frame too short (<16 bytes)CLI session_loopWarning log, frame dropped
Unknown subscription UUIDCLI session_loopFrame dropped silently
WS connection closedEither sideAll tasks aborted, pending acks get error

Test Coverage

TierLocationWhat’s covered
Unit (client)binaries/cli/src/ws_client.rshandle_response_topic_subscribe_ack – verifies ack routing and subscriber registration
Unit (all existing)binaries/cli/src/ws_client.rsUpdated to pass topic subscribe state through handle_response

The TopicSubscribe / binary frame path is primarily validated via integration testing with a running coordinator and Zenoh session. See Testing Guide for smoke test instructions.

Adora Testing Guide

This guide covers how to run, write, and troubleshoot tests across the Adora workspace.

Quick Start (5-minute validation)

Run these three commands to validate that the workspace is healthy:

# 1. Format check (~5s)
cargo fmt --all -- --check

# 2. Lint (~60s first run, cached after)
cargo clippy --all \
  --exclude adora-node-api-python \
  --exclude adora-operator-api-python \
  --exclude adora-ros2-bridge-python \
  -- -D warnings

# 3. Unit + integration tests (~90s first run)
cargo test --all \
  --exclude adora-node-api-python \
  --exclude adora-operator-api-python \
  --exclude adora-ros2-bridge-python

All three must pass before opening a PR. Python packages are excluded because they require maturin.

Test Tiers

TierWhat it coversCommandSpeed
FormatCode stylecargo fmt --all -- --check~5s
LintWarnings, correctnesscargo clippy --all ...~60s
UnitIndividual functionscargo test --all ...~90s
CLICommand parsing, validationcargo test -p adora-cli~5s
IntegrationNode I/O via env varscargo test --test example-tests~30s
SmokeFull CLI lifecyclecargo test --test example-smoke -- --test-threads=1~3min
E2EMulti-dataflow scenarioscargo test --test ws-cli-e2e -- --ignored --test-threads=1~2min
Fault toleranceRestart policies, timeoutscargo test --test fault-tolerance-e2e~45s
TyposSpellingInstall typos-cli, then typos~2s

Tier Details

Unit Tests

Unit tests live alongside the code they test using #[cfg(test)] modules. Key crates with tests:

CrateTest countWhat’s tested
adora-arrow-convert~26Round-trip Arrow type conversions
adora-cli~96Command parsing, value parsers, log grep/filtering, JSON parsing, WebSocket client, cluster config
adora-coordinator~24WS control/daemon plane, health check, concurrent requests, artifact store, rate limiter, error sanitization
adora-coordinator-store~10In-memory and redb CRUD, schema versioning, persistence
adora-core~8Dataflow descriptor validation
adora-daemon~2Shlex argument parsing
adora-node-api~10Input tracking, service/action helpers (ID generation, send_service_request/response)
adora-log-utils~11Log parsing utilities
adora-message~36Common types, WS protocol, node/data IDs, metadata, auth tokens
ros2-bridge~30ROS2 message/service/action parsing

Run a single crate’s tests:

cargo test -p adora-cli
cargo test -p adora-core
cargo test -p adora-arrow-convert

CLI Tests

CLI tests verify command parsing, argument validation, and value parsers without running any commands. They live in #[cfg(test)] modules inside the CLI crate.

What’s tested:

  • Clap schema validation (Args::command().debug_assert())
  • Parsing of every subcommand (run, up, down, start, stop, list, logs, build, graph, new, status, inspect top, topic list/hz/echo, node list)
  • Rejection of unknown subcommands
  • --help and --version exit codes
  • Value parsers: parse_store_spec (coordinator store backend), parse_window (topic hz window)
  • Utility functions: parse_version_from_pip_show

How to run:

cargo test -p adora-cli

How to add new tests:

When adding a new CLI subcommand or value parser, add a corresponding test in the #[cfg(test)] module of the same file. For subcommand parsing, add a parse_ok call in binaries/cli/src/command/mod.rs. For value parsers, add tests in the file that defines the parser function.

Integration Tests (Node I/O)

File: tests/example-tests.rs

These tests run compiled node executables with pre-recorded inputs and compare outputs against expected baselines. No coordinator or daemon is needed.

cargo test --test example-tests

How it works:

  1. Builds and runs a node crate (e.g., rust-dataflow-example-node)
  2. Sets ADORA_TEST_WITH_INPUTS to a JSON file with timed events
  3. Sets ADORA_TEST_NO_OUTPUT_TIME_OFFSET=1 for deterministic output
  4. Compares JSONL output against tests/sample-inputs/expected-outputs-*.jsonl

Sample input/output files live in tests/sample-inputs/.

Smoke Tests

File: tests/example-smoke.rs

Two execution modes are tested for each applicable example:

  • Networked (adora up + adora start --detach + poll + adora stop + adora down): exercises the full coordinator/daemon WS control plane.
  • Local (adora run --stop-after): runs everything in-process, testing the single-process dataflow path.
# Must run single-threaded (shared coordinator port)
cargo test --test example-smoke -- --test-threads=1

# Run only networked or local tests
cargo test --test example-smoke smoke_rust -- --test-threads=1
cargo test --test example-smoke smoke_local -- --test-threads=1

A bash script is also available for quick local validation:

./scripts/smoke-all.sh              # all examples
./scripts/smoke-all.sh --rust-only  # Rust examples only
./scripts/smoke-all.sh --python-only # Python examples only

Networked tests (17):

TestExampleTimeout
smoke_rust_dataflowrust-dataflow/dataflow.yml30s
smoke_rust_dataflow_dynamicrust-dataflow/dataflow_dynamic.yml30s
smoke_rust_dataflow_socketrust-dataflow/dataflow_socket.yml30s
smoke_rust_dataflow_urlrust-dataflow-url/dataflow.yml30s
smoke_benchmarkbenchmark/dataflow.yml30s
smoke_log_sink_filelog-sink-file/dataflow.yml30s
smoke_log_sink_alertlog-sink-alert/dataflow.yml30s
smoke_log_sink_tcplog-sink-tcp/dataflow.yml30s
smoke_python_dataflowpython-dataflow/dataflow.yml30s
smoke_python_asyncpython-async/dataflow.yaml15s
smoke_python_drainpython-drain/dataflow.yaml15s
smoke_python_logpython-log/dataflow.yaml15s
smoke_python_loggingpython-logging/dataflow.yml15s
smoke_python_multiple_arrayspython-multiple-arrays/dataflow.yml15s
smoke_python_concurrent_rwpython-concurrent-rw/dataflow.yml15s
smoke_service_exampleservice-example/dataflow.yml30s
smoke_action_exampleaction-example/dataflow.yml30s

Local tests (9):

TestExamplestop-after
smoke_local_python_dataflowpython-dataflow/dataflow.yml30s
smoke_local_python_asyncpython-async/dataflow.yaml10s
smoke_local_python_drainpython-drain/dataflow.yaml10s
smoke_local_python_logpython-log/dataflow.yaml10s
smoke_local_python_loggingpython-logging/dataflow.yml10s
smoke_local_python_multiple_arrayspython-multiple-arrays/dataflow.yml10s
smoke_local_python_concurrent_rwpython-concurrent-rw/dataflow.yml10s
smoke_local_service_exampleservice-example/dataflow.yml10s
smoke_local_action_exampleaction-example/dataflow.yml10s

Examples requiring special dependencies (webcam, CUDA, ROS2, C/C++ toolchain, multi-machine deploy) are not included in smoke tests.

E2E Tests (WebSocket CLI)

File: tests/ws-cli-e2e.rs

Two groups:

Non-ignored (fast): Start an in-process coordinator and test WsSession directly:

cargo test --test ws-cli-e2e
  • cli_list_empty – empty dataflow listing
  • cli_status_no_daemon – daemon connectivity check
  • cli_stop_nonexistent – error for missing dataflows
  • cli_multiple_requests_same_session – session reuse

Ignored (full stack): Use adora up with real nodes:

cargo test --test ws-cli-e2e -- --ignored --test-threads=1
  • e2e_start_list_stop – start, list, stop lifecycle
  • e2e_sequential_dataflows – two dataflows in sequence

Fault Tolerance Tests

File: tests/fault-tolerance-e2e.rs

These test restart policies and input timeouts using Daemon::run_dataflow directly (no CLI needed).

cargo test --test fault-tolerance-e2e

Tests:

  • restart_recovers_from_failure – node with restart_policy: on-failure survives panics (15s)
  • max_restarts_limit_reached – node exhausts max_restarts: 2 budget (15s)
  • input_timeout_closes_stale_inputinput_timeout: 2.0s fires when upstream stops (10s)

Dataflow YAMLs for these tests live in tests/dataflows/.

Coordinator Integration Tests

Files: binaries/coordinator/tests/ws_control_tests.rs, binaries/coordinator/tests/ws_daemon_tests.rs

These start an in-process coordinator and test the WebSocket control/daemon planes.

cargo test -p adora-coordinator

Topics covered: health check, list/stop/destroy requests, invalid JSON/params, concurrent requests, ping/pong, daemon registration, disconnect cleanup, error sanitization (no internal chain leaks), artifact store cleanup on drop.

CI Pipeline

CI runs on push/PR to main. See .github/workflows/ci.yml.

fmt  ──────────────┐
clippy ────────────┤ (all run in parallel)
test ──────────────┤
typos ─────────────┘
                   │
              e2e (depends on test)
JobRunnerWhat runs
fmtubuntu-latestcargo fmt --all -- --check
clippyubuntu-latestcargo clippy --all ... -- -D warnings
testubuntu-latestcargo test --all ... (excl. Python + adora-examples)
e2eubuntu-latestexample-tests, fault-tolerance, smoke tests, WS E2E
typosubuntu-latestcrate-ci/typos@master

The e2e job only runs after test passes. All other jobs run in parallel.

Writing New Tests

Unit tests

Add a #[cfg(test)] module in the same file as the code under test:

#![allow(unused)]
fn main() {
#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn parses_valid_input() {
        let result = parse("valid");
        assert_eq!(result, expected);
    }
}
}

Integration tests for nodes

Use the integration testing framework in adora-node-api. Three approaches:

1. setup_integration_testing (recommended)

Call before the node’s main function to inject inputs and capture outputs:

#![allow(unused)]
fn main() {
#[test]
fn test_main_function() -> eyre::Result<()> {
    let events = vec![
        TimedIncomingEvent {
            time_offset_secs: 0.01,
            event: IncomingEvent::Input {
                id: "tick".into(),
                metadata: None,
                data: None,
            },
        },
        TimedIncomingEvent {
            time_offset_secs: 0.055,
            event: IncomingEvent::Stop,
        },
    ];
    let inputs = TestingInput::Input(
        IntegrationTestInput::new("node_id".parse().unwrap(), events),
    );
    let (tx, rx) = flume::unbounded();
    let outputs = TestingOutput::ToChannel(tx);
    let options = TestingOptions { skip_output_time_offsets: true };

    integration_testing::setup_integration_testing(inputs, outputs, options);
    crate::main()?;

    let outputs = rx.try_iter().collect::<Vec<_>>();
    assert_eq!(outputs, expected_outputs);
    Ok(())
}
}

2. Environment variable mode

Test the compiled executable directly, closest to production behavior:

ADORA_TEST_WITH_INPUTS=path/to/inputs.json \
ADORA_TEST_NO_OUTPUT_TIME_OFFSET=1 \
ADORA_TEST_WRITE_OUTPUTS_TO=/tmp/out.jsonl \
cargo run -p my-node

3. AdoraNode::init_testing

For testing node logic without going through main:

#![allow(unused)]
fn main() {
let (node, events) = AdoraNode::init_testing(inputs, outputs, Default::default())?;
}

Generating test input files

Record real dataflow events by setting ADORA_WRITE_EVENTS_TO:

ADORA_WRITE_EVENTS_TO=/tmp/recorded-events adora run examples/rust-dataflow/dataflow.yml

This writes inputs-{node_id}.json files that can be used directly with ADORA_TEST_WITH_INPUTS.

Workspace-level integration tests

Add new test files in the tests/ directory. For tests that need the full CLI stack, follow the patterns in tests/example-smoke.rs:

Networked pattern (exercises coordinator + daemon):

  1. Build nodes with Once guards (avoid rebuilding per test)
  2. Clean up stale processes with adora down
  3. Start cluster with adora up
  4. Run dataflow with adora start --detach
  5. Poll adora list --json for completion
  6. Clean up with adora stop --all and adora down

Local pattern (single-process, in-process coordinator):

  1. Build CLI with Once guard
  2. Run adora run <yaml> --stop-after <duration>
  3. Assert exit code is success

Conventions

  • Use assert2::assert! for better error messages (available as dev-dependency)
  • Use tempfile::NamedTempFile for temporary output files
  • E2E tests that need exclusive port access should be #[ignore] and run with --test-threads=1
  • Async tests use #[tokio::test(flavor = "multi_thread")]
  • Fault tolerance test dataflows go in tests/dataflows/
  • Sample input/output baselines go in tests/sample-inputs/

Troubleshooting

cargo test fails to compile Python packages

Always exclude Python packages:

cargo test --all \
  --exclude adora-node-api-python \
  --exclude adora-operator-api-python \
  --exclude adora-ros2-bridge-python

Smoke/E2E tests fail with “address already in use”

A stale coordinator or daemon is still running. Clean up:

adora down
# or kill processes manually:
pkill -f adora-coordinator
pkill -f adora-daemon

Smoke tests hang or timeout

  • Increase the timeout in the test if your machine is slow (look for Duration::from_secs(...))
  • Check that example nodes build successfully:
    cargo build -p rust-dataflow-example-node -p rust-dataflow-example-status-node \
      -p rust-dataflow-example-sink -p rust-dataflow-example-sink-dynamic
    cargo build -p log-sink-file -p log-sink-alert -p log-sink-tcp
    cargo build --release -p benchmark-example-node -p benchmark-example-sink
    
  • For Python smoke tests, ensure pyarrow and numpy are installed

E2E tests fail when run in parallel

Smoke and ignored E2E tests must run single-threaded:

cargo test --test example-smoke -- --test-threads=1
cargo test --test ws-cli-e2e -- --ignored --test-threads=1

Integration test output doesn’t match expected

  1. Check that ADORA_TEST_NO_OUTPUT_TIME_OFFSET=1 is set (time offsets vary per machine)
  2. Regenerate baselines if the node’s behavior intentionally changed:
    ADORA_TEST_WITH_INPUTS=tests/sample-inputs/inputs-rust-node.json \
    ADORA_TEST_NO_OUTPUT_TIME_OFFSET=1 \
    ADORA_TEST_WRITE_OUTPUTS_TO=tests/sample-inputs/expected-outputs-rust-node.jsonl \
    cargo run -p rust-dataflow-example-node
    

Typos check fails

The typos config is in _typos.toml. To add a false-positive exclusion:

[default.extend-identifiers]
MyCustomIdent = "MyCustomIdent"

Tests pass locally but fail in CI

  • CI runs on Ubuntu; check for platform-specific assumptions (paths, process signals)
  • CI uses rust-cache so dependency versions may differ from your local lockfile
  • Ensure cargo fmt --all -- --check passes (CI enforces this)