Adora
Agentic Dataflow-Oriented Robotic Architecture – a 100% Rust framework for building real-time robotics and AI applications.
Why Adora?
Performance
- 10-17x faster than ROS2 Python – 100% Rust internals with zero-copy shared memory IPC for messages >4KB, flat latency from 4KB to 4MB payloads
- Apache Arrow native – columnar memory format end-to-end with zero serialization overhead; shared across all language bindings
Developer Experience
- Single CLI, full lifecycle –
adora runfor local dev,adora up/startfor distributed prod, plus build, logs, monitoring, record/replay all from one tool - Declarative YAML dataflows – define pipelines as directed graphs, connect nodes through typed inputs/outputs, optional type annotations with static validation
- Multi-language nodes – write nodes in Rust, Python, C, or C++ with native APIs (not wrappers); mix languages freely in one dataflow
- Reusable modules – compose sub-graphs as standalone YAML files with typed inputs/outputs, parameters, and nested composition
- Hot reload – live-reload Python operators without restarting the dataflow
Production Readiness
- Fault tolerance – per-node restart policies (never/on-failure/always), exponential backoff, health monitoring, circuit breakers with configurable input timeouts
- Distributed by default – local shared memory between co-located nodes, automatic Zenoh pub-sub for cross-machine communication, SSH-based cluster management with label scheduling
- Configurable queue policies –
drop_oldest(default) orbackpressureper input, with metrics on dropped messages - OpenTelemetry – built-in structured logging with rotation/routing, metrics, distributed tracing
Debugging and Observability
- Record/replay – capture dataflow messages to
.adorecfiles, replay offline at any speed with node substitution - Topic inspection –
topic echoto print live data,topic hzTUI for frequency analysis,topic infofor schema and bandwidth - Resource monitoring –
adora topTUI showing per-node CPU, memory, queue depth, network I/O across all machines - Log aggregation – subscribe to
adora/logsto receive structured log messages from all nodes without extra wiring - Trace inspection –
trace listandtrace viewfor viewing coordinator spans without external infrastructure
Ecosystem
- Communication patterns – built-in service (request/reply), action (goal/feedback/result), and streaming (session/segment/chunk) patterns via well-known metadata keys
- ROS2 bridge – bidirectional interop with ROS2 topics, services, and actions
- In-process operators – lightweight functions that run inside a shared runtime, avoiding per-node process overhead
Next Steps
- Install Adora
- Quick Start tutorial
- Architecture overview
- Dataflow YAML reference
- Type annotations
- Communication patterns
Installation
From crates.io (recommended)
cargo install adora-cli # CLI (adora command)
pip install adora-rs # Python node/operator API
From source
git clone https://github.com/dora-rs/adora.git
cd adora
cargo build --release -p adora-cli
PATH=$PATH:$(pwd)/target/release
# Python API (requires maturin >= 1.8: pip install maturin)
# Must run from the package directory for dependency resolution
cd apis/python/node && maturin develop --uv && cd ../../..
Platform installers
macOS / Linux:
curl --proto '=https' --tlsv1.2 -LsSf \
https://github.com/dora-rs/adora/releases/latest/download/adora-cli-installer.sh | sh
Windows:
powershell -ExecutionPolicy ByPass -c "irm https://github.com/dora-rs/adora/releases/latest/download/adora-cli-installer.ps1 | iex"
Build features
| Feature | Description | Default |
|---|---|---|
tracing | OpenTelemetry tracing support | Yes |
metrics | OpenTelemetry metrics collection | No |
python | Python operator support (PyO3) | No |
redb-backend | Persistent coordinator state (redb) | No |
prometheus | Prometheus /metrics endpoint on coordinator | No |
cargo install adora-cli --features redb-backend
Verify
adora --version
adora status
Getting Started with Python
This guide walks you through writing Python nodes and operators for adora dataflows.
Prerequisites
cargo install adora-cli # CLI (adora command)
pip install adora-rs # Python node/operator API
The adora-rs package includes pyarrow as a dependency.
Building from source (instead of pip install adora-rs):
pip install maturin # requires >= 1.8
cd apis/python/node && maturin develop --uv && cd ../../..
Hello World: Sender and Receiver
Create three files:
sender.py – sends 100 numbered messages:
import pyarrow as pa
from adora import Node
node = Node()
for i in range(100):
node.send_output("message", pa.array([i]))
receiver.py – receives and prints messages:
from adora import Node
node = Node()
for event in node:
if event["type"] == "INPUT":
values = event["value"].to_pylist()
print(f"Received {event['id']}: {values}")
elif event["type"] == "STOP":
break
dataflow.yml – connects sender to receiver:
nodes:
- id: sender
path: sender.py
outputs:
- message
- id: receiver
path: receiver.py
inputs:
message: sender/message
Run it:
adora run dataflow.yml
Events
Every call to node.next() or iteration over for event in node returns an event dictionary:
| Key | Type | Description |
|---|---|---|
type | str | "INPUT", "INPUT_CLOSED", "STOP", or "ERROR" |
id | str | Input name (e.g. "message") – only for INPUT events |
value | pyarrow.Array or None | The data payload |
metadata | dict | Tracing/routing metadata |
Handle events by checking event["type"]:
for event in node:
match event["type"]:
case "INPUT":
process(event["id"], event["value"])
case "INPUT_CLOSED":
print(f"Input {event['id']} closed")
case "STOP":
break
Working with Arrow Data
All data flows through adora as Apache Arrow arrays. Common patterns:
import pyarrow as pa
# Simple values
node.send_output("count", pa.array([42]))
node.send_output("names", pa.array(["alice", "bob"]))
# Read values back
values = event["value"].to_pylist() # [42] or ["alice", "bob"]
# Structured data
struct = pa.StructArray.from_arrays(
[pa.array([1.5]), pa.array(["hello"])],
names=["x", "y"],
)
node.send_output("point", struct)
# Raw bytes (images, serialized data, etc.)
node.send_output("frame", pa.array(raw_bytes))
Operators
Operators are lightweight alternatives to nodes. They run inside the adora runtime process (no separate OS process), making them faster for simple transformations.
Define an Operator class with an on_event method:
# doubler_op.py
import pyarrow as pa
from adora import AdoraStatus
class Operator:
def on_event(self, event, send_output) -> AdoraStatus:
if event["type"] == "INPUT":
value = event["value"].to_pylist()[0]
send_output("doubled", pa.array([value * 2]), event["metadata"])
return AdoraStatus.CONTINUE
Reference it in YAML with operator instead of path:
nodes:
- id: timer
path: adora/timer/millis/500
outputs:
- tick
- id: doubler
operator:
python: doubler_op.py
inputs:
tick: timer/tick
outputs:
- doubled
When to use operators vs nodes:
| Nodes | Operators | |
|---|---|---|
| Process model | Separate OS process | In-process (shared runtime) |
| Startup cost | Higher | Lower |
| Isolation | Full process isolation | Shared memory space |
| Best for | Long-running, heavy compute | Lightweight transforms, filters |
Async Nodes
For nodes that need async I/O (HTTP calls, database queries, etc.), use recv_async():
import asyncio
from adora import Node
async def main():
node = Node()
for _ in range(50):
event = await node.recv_async()
if event["type"] == "STOP":
break
# Do async work here
result = await fetch_data(event["value"])
node.send_output("result", result)
asyncio.run(main())
See examples/python-async for a complete example.
Logging
Use node.log() for structured logging that integrates with adora logs:
node.log("info", "Processing item", {"count": str(i)})
Or use Python’s standard logging module – adora captures stdout/stderr automatically:
import logging
logging.info("Processing item %d", i)
See examples/python-logging for logging module integration.
Timers
Built-in timer nodes generate periodic ticks without writing any code:
nodes:
- id: tick-source
path: adora/timer/millis/100 # tick every 100ms
outputs:
- tick
- id: my-node
path: my_node.py
inputs:
tick: tick-source/tick
Also available: adora/timer/hz/30 for 30 Hz.
Next Steps
- Python API Reference – full API docs for Node, Operator, DataflowBuilder, CUDA
- Communication Patterns – service (request/reply) and action (goal/feedback/result) patterns
- Examples – python-dataflow, python-async, python-drain, python-concurrent-rw, python-multiple-arrays
- Distributed Deployment – running across multiple machines with
adora up
Adora Architecture
Comprehensive architecture reference for Adora (AI-Dora, Agentic Dataflow-Oriented Robotic Architecture) — a 100% Rust framework for real-time robotics and AI applications.
Overview and Design Philosophy
Adora is built on four core principles:
- Dataflow-oriented: Applications are directed graphs of nodes connected by typed data channels. Nodes declare inputs and outputs; the framework handles routing, scheduling, and lifecycle.
- Zero-copy performance: Messages above 4 KiB use shared memory with 128-byte aligned buffers and atomic coordination, achieving 10-17x lower latency than ROS2.
- Multi-language: First-class support for Rust, Python (PyO3), C, and C++ nodes — all sharing the same Apache Arrow data format.
- Four-layer stack: Message protocol, core libraries, daemon/runtime execution, and CLI/coordinator orchestration.
Architecture Stack
┌─────────────────────────────────────────────────┐
│ CLI (adora) Coordinator (orchestrator) │ Layer 4: Orchestration
├─────────────────────────────────────────────────┤
│ Daemon (per-machine) Runtime (operators) │ Layer 3: Execution
├─────────────────────────────────────────────────┤
│ adora-core shared-memory-server Node API │ Layer 2: Core Libraries
├─────────────────────────────────────────────────┤
│ adora-message (protocol + Arrow types) │ Layer 1: Protocol
└─────────────────────────────────────────────────┘
Workspace Structure
Rust edition 2024, MSRV 1.85.0, workspace version 0.1.0. All crates share the workspace version.
Binaries (7)
| Path | Crate | Role |
|---|---|---|
binaries/cli | adora-cli | CLI binary (adora command) — build, run, stop dataflows |
binaries/coordinator | adora-coordinator | Orchestrates distributed multi-daemon deployments; WebSocket server |
binaries/daemon | adora-daemon | Spawns nodes, manages shared-memory/TCP communication per machine |
binaries/runtime | adora-runtime | In-process operator execution (Python/C/C++ via dlopen/PyO3) |
binaries/ros2-bridge-node | adora-ros2-bridge-node | ROS2 integration node |
binaries/record-node | adora-record-node | Records dataflow messages to .adorec format |
binaries/replay-node | adora-replay-node | Replays recorded messages from .adorec files |
Core Libraries (6)
| Path | Crate | Role |
|---|---|---|
libraries/message | adora-message | All inter-component message types, protocol definitions, Arrow metadata |
libraries/core | adora-core | Dataflow descriptor parsing, build utilities, Zenoh config |
libraries/shared-memory-server | shared-memory-server | Zero-copy IPC for messages >= 4 KiB |
libraries/recording | adora-recording | Recording format (.adorec): bincode header + entries + footer |
libraries/arrow-convert | adora-arrow-convert | Arrow type conversions (numeric, datetime) |
libraries/coordinator-store | adora-coordinator-store | State persistence for coordinator (in-memory or redb backend) |
Extension Libraries (5)
| Path | Crate | Role |
|---|---|---|
libraries/extensions/telemetry/tracing | adora-tracing | OpenTelemetry distributed tracing (OTLP exporter) |
libraries/extensions/telemetry/metrics | adora-metrics | System metrics collection (CPU, memory, disk) |
libraries/extensions/download | adora-download | HTTP file download utility for operator/node binaries |
libraries/extensions/ros2-bridge | adora-ros2-bridge | ROS2 integration: topic pub/sub, services, actions |
libraries/log-utils | adora-log-utils | Log parsing, merging, filtering, formatting |
API Crates (9)
| Path | Crate | Language |
|---|---|---|
apis/rust/node | adora-node-api | Rust |
apis/rust/operator | adora-operator-api | Rust |
apis/rust/operator/macros | adora-operator-api-macros | Rust (proc-macro) |
apis/rust/operator/types | adora-operator-api-types | Rust (FFI-safe types) |
apis/python/node | adora-node-api-python | Python (PyO3) – builds the adora module |
apis/python/operator | adora-operator-api-python | Python (PyO3) – compiled into adora-node-api-python |
apis/c/node | adora-node-api-c | C |
apis/c/operator | adora-operator-api-c | C/C++ |
Component Architecture
CLI
The adora command provides three command groups:
Lifecycle (run, up, down, build, start, stop, restart):
adora runexecutes a dataflow locally without coordinator/daemon (single-machine shortcut)adora up/adora downmanage coordinator + daemon infrastructureadora start/adora stopcontrol dataflows on a running coordinator
Monitoring (list, logs, inspect, topic, node, record, replay, trace):
- Real-time inspection with
adora inspect top - Topic subscription and data inspection
- Recording and replay via
.adorecfiles
Setup (status, new, graph, system, completion, self):
- Project scaffolding, dataflow visualization, self-update
Coordinator
The coordinator is an Axum-based WebSocket server that orchestrates distributed deployments.
┌──────────────────┐
│ Coordinator │
WS /api/control │ ┌────────────┐ │ WS /api/daemon
CLI ◄──────────────────► │ │ State │ │ ◄──────────────────► Daemon(s)
│ │ Store │ │
│ └────────────┘ │
│ /api/artifacts │
│ /health │
└──────────────────┘
WebSocket routes:
/api/control— CLI control plane (build, start, stop, list, logs, topic subscribe)/api/daemon— Daemon registration and event stream/api/artifacts/{build_id}/{node_id}— Binary artifact downloads/health— Health check endpoint
State management: In-memory by default, optional persistent storage via redb backend.
Daemon
The daemon runs one per machine and manages the lifecycle of all nodes on that machine.
┌──────────────────────────────────────────────────────┐
│ Daemon │
│ │
│ ┌──────────┐ ┌───────────┐ ┌──────────────────┐ │
│ │ Event │ │ Spawner │ │ Node Comm │ │
│ │ Loop │──│ (nodes) │ │ ┌──────────────┐ │ │
│ │ │ └───────────┘ │ │ TCP listener │ │ │
│ │ Sources: │ ┌───────────┐ │ │ Shmem server │ │ │
│ │ • Coord │ │ Fault │ │ │ Unix socket │ │ │
│ │ • Nodes │──│ Tolerance │ │ └──────────────┘ │ │
│ │ • Zenoh │ └───────────┘ └──────────────────┘ │
│ │ • Timers │ │
│ └──────────┘ │
│ │
│ ┌──────────────────────────────────────────────┐ │
│ │ Running Dataflows │ │
│ │ ├─ Node A (process) ◄──► TCP/Shmem │ │
│ │ ├─ Node B (process) ◄──► TCP/Shmem │ │
│ │ └─ Runtime (operators) ◄──► TCP/Shmem │ │
│ └──────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────┘
Event loop (Daemon::run_inner()): Async Tokio event loop merging:
- Coordinator commands (WebSocket)
- Node events (TCP/shared memory)
- Inter-daemon events (Zenoh)
- Heartbeat (5s interval), metrics collection (2s), health checks (5s default)
Node spawning:
- Create working directory for the node
- Set up communication channel (TCP, shmem, or Unix domain socket)
- Serialize
NodeConfigto environment variable - Spawn process with sanitized environment (blocks
LD_PRELOAD,DYLD_INSERT_LIBRARIES, etc.) - Monitor via
ProcessHandle
Runtime
The runtime executes in-process operators (Python, shared library, WASM) in a dedicated process.
┌──────────────────────────────┐
│ Runtime │
│ │
│ ┌────────────────────────┐ │
│ │ Operator Runner │ │
│ │ (separate thread) │ │
│ │ │ │
│ │ SharedLibrary → dlopen │ │
│ │ Python → PyO3 │ │
│ │ Wasm → (planned) │ │
│ └──────────┬─────────────┘ │
│ │ flume(2) │
│ ┌──────────▼─────────────┐ │
│ │ Event Merge Loop │ │
│ │ ├─ OperatorEvent │ │
│ │ └─ DaemonEvent │ │
│ └────────────────────────┘ │
└──────────────────────────────┘
- Single-threaded Tokio runtime
- Operator runs in a separate thread, communicates via
flume::bounded(2)channel - Input queue size per data ID configurable (default: 10)
Nodes
Nodes are standalone processes that communicate with the daemon.
Lifecycle:
- Node starts, reads
NodeConfigfrom environment - Registers with daemon via
DaemonRequest::Register - Subscribes to events via
DaemonRequest::Subscribe - Processes events in a loop (
NextEvent→ handle →SendMessage) - Reports drop tokens for shared memory cleanup
- Signals completion via
OutputsDone
Communication Protocols
CLI to Coordinator (WebSocket)
| Property | Value |
|---|---|
| Transport | WebSocket over TCP |
| Default port | 6013 |
| Auth | Bearer token in Authorization header |
| Control messages | JSON text frames (request/response/event) |
| Topic data | Binary frames: [16-byte UUID][bincode payload] |
| Rate limit | 20 connections per IP per 60s |
| Max connections | 256 |
JSON-RPC-like message format:
// Request (client → server)
{"id": "uuid", "method": "control", "params": {...}}
// Response (server → client)
{"id": "uuid", "result": {...}}
// or
{"id": "uuid", "error": "message"}
// Event (fire-and-forget, either direction)
{"event": "log", "payload": {...}}
Key control methods: Build, Start, Stop, List, Logs, TopicSubscribe, TopicUnsubscribe, Reload, Restart, Destroy.
Coordinator to Daemon (WebSocket)
| Property | Value |
|---|---|
| Transport | WebSocket (daemon connects to coordinator) |
| Route | /api/daemon |
| Retry | Exponential backoff 1s → 30s, max 50 attempts |
| Registration | DaemonRegisterRequest with version, machine_id, labels |
Daemon events (daemon → coordinator): BuildResult, SpawnResult, AllNodesReady, AllNodesFinished, Heartbeat, StatusReport, Log, NodeMetrics, Exit.
Coordinator commands (coordinator → daemon): Build, Spawn, AllNodesReady, StopDataflow, ReloadDataflow, Logs, Destroy, Heartbeat.
Daemon to Node (Local)
Three transport options, configured via LocalCommunicationConfig:
TCP (default):
- Binds
127.0.0.1:0(ephemeral port),TCP_NODELAYenabled - Frame format:
[8-byte u64 LE length][bincode payload] - Max message: 64 MiB, read timeout: 30s
Shared Memory (zero-copy):
- Four 4 KiB regions per node: control, events, drop tokens, events-close
- Used for messages >= 4096 bytes (
ZERO_COPY_THRESHOLD) - Atomic synchronization with acquire/release ordering
Unix Domain Socket (Unix only):
- Socket at
/tmp/{dataflow_id}/{node_id}.sock - Permissions:
0o700 - Same bincode frame format as TCP
Node → Daemon requests: Register, Subscribe, SendMessage, CloseOutputs, OutputsDone, NextEvent, ReportDropTokens, SubscribeDrop, NodeConfig.
Daemon → Node replies: Result, PreparedMessage, NextEvents, NextDropEvents, NodeConfig, Empty.
Node events: Stop, Reload, Input, InputClosed, InputRecovered, NodeRestarted, AllInputsClosed.
Daemon to Daemon (Zenoh)
| Property | Value |
|---|---|
| Transport | Zenoh pub-sub |
| Router port | 7447 |
| Peer port | 5456 |
| Routing | linkstate |
| Serialization | bincode |
Topic pattern:
adora/{network_id}/{dataflow_id}/output/{node_id}/{output_id}
Default network_id is "default".
InterDaemonEvent:
Output { dataflow_id, node_id, output_id, metadata, data }— data messageOutputClosed { dataflow_id, node_id, output_id }— stream end
Message Types and Wire Formats
Timestamped Wrapper
All inter-component messages are wrapped in a timestamp:
#![allow(unused)]
fn main() {
pub struct Timestamped<T> {
pub inner: T,
pub timestamp: uhlc::Timestamp, // hybrid logical clock
}
}
DataMessage
Transport abstraction for payloads:
#![allow(unused)]
fn main() {
pub enum DataMessage {
Vec(AVec<u8, ConstAlign<128>>), // inline, 128-byte aligned
SharedMemory {
shared_memory_id: String,
len: usize,
drop_token: DropToken, // UUIDv7, tracks lifetime
},
}
}
LogMessage
#![allow(unused)]
fn main() {
pub struct LogMessage {
pub build_id: Option<BuildId>,
pub dataflow_id: Option<DataflowId>,
pub node_id: Option<NodeId>,
pub daemon_id: Option<DaemonId>,
pub level: LogLevelOrStdout, // Stdout | LogLevel(Error/Warn/Info/Debug/Trace)
pub target: Option<String>,
pub module_path: Option<String>,
pub file: Option<String>,
pub line: Option<u32>,
pub message: String,
pub timestamp: DateTime<Utc>,
pub fields: Option<BTreeMap<String, String>>,
}
}
NodeError
#![allow(unused)]
fn main() {
pub struct NodeError {
pub timestamp: uhlc::Timestamp,
pub cause: NodeErrorCause, // GraceDuration | Cascading | FailedToSpawn | Other
pub exit_status: NodeExitStatus, // Success | IoError | ExitCode | Signal | Unknown
}
}
Data Format and Metadata
Apache Arrow
All data payloads use Apache Arrow columnar format with 128-byte alignment. Arrow type information is carried in every message via ArrowTypeInfo:
#![allow(unused)]
fn main() {
pub struct ArrowTypeInfo {
pub data_type: DataType, // Arrow DataType
pub len: usize,
pub null_count: usize,
pub validity: Option<Vec<u8>>, // null bitmap
pub offset: usize,
pub buffer_offsets: Vec<BufferOffset>,
pub child_data: Vec<ArrowTypeInfo>, // recursive for nested types
}
}
Metadata
Every message carries structured metadata:
#![allow(unused)]
fn main() {
pub struct Metadata {
metadata_version: u16,
timestamp: uhlc::Timestamp,
pub type_info: ArrowTypeInfo,
pub parameters: MetadataParameters, // BTreeMap<String, Parameter>
}
}
Parameter Types
#![allow(unused)]
fn main() {
pub enum Parameter {
Bool(bool),
Integer(i64),
String(String),
ListInt(Vec<i64>),
Float(f64),
ListFloat(Vec<f64>),
ListString(Vec<String>),
Timestamp(DateTime<Utc>),
}
}
Well-Known Metadata Keys
| Key | Purpose |
|---|---|
request_id | Service request/reply correlation |
goal_id | Action goal identifier |
goal_status | Action completion: succeeded, aborted, canceled |
session_id | Streaming session identifier |
segment_id | Streaming segment within a session |
seq | Streaming chunk sequence number |
fin | Last chunk of a streaming segment |
flush | Discard older queued messages on input |
Zero-Copy Shared Memory
Architecture
┌────────────────────────────────────────────────────┐
│ Shared Memory Region │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────┐ ┌────┐ ┌────┐│
│ │ Server │ │ Client │ │Discon│ │Len │ │Data││
│ │ Event │ │ Event │ │(bool)│ │(u64)│ │ ││
│ └──────────┘ └──────────┘ └──────┘ └────┘ └────┘│
│ (raw_sync_2) (raw_sync_2) AtomicBool AtomicU64 │
└────────────────────────────────────────────────────┘
ShmemChannel
#![allow(unused)]
fn main() {
pub struct ShmemChannel {
memory: Shmem,
server_event: Box<dyn EventImpl>,
client_event: Box<dyn EventImpl>,
disconnect_offset: usize,
len_offset: usize,
data_offset: usize,
server: bool,
}
}
Synchronization Protocol
Send (write → release store length → signal event → check disconnect):
- Copy data to shared memory buffer
- Store message length with
Releaseordering (publishes data) - Signal event to wake receiver
- Check disconnect flag with
Acquireordering
Receive (wait event → check disconnect → acquire load length → read data):
- Wait for event signal
- Check disconnect flag with
Acquireordering - Load message length with
Acquireordering (ensures all writes visible) - Read and deserialize data from buffer
Thresholds and Limits
| Parameter | Value |
|---|---|
ZERO_COPY_THRESHOLD | 4096 bytes |
| Control region size | 4 KiB per node |
| Events region size | 4 KiB per node |
| Drop region size | 4 KiB per node |
| Max cache count | 20 regions |
| Max cache bytes | 256 MiB |
DropToken Lifecycle
- Sender allocates shared memory, generates
DropToken(UUIDv7) - Sender transmits
DataMessage::SharedMemory { shared_memory_id, len, drop_token } - Receiver processes data, returns
drop_tokenviaReportDropTokens - Sender receives confirmed token, returns memory to cache for reuse
Dataflow Specification
YAML Format
nodes:
# Standard node (executable)
- id: my-node
build: cargo build --release
path: target/release/my-node
inputs:
tick: adora/timer/millis/100
data: other-node/output
outputs:
- result
restart_policy: on-failure
max_restarts: 3
restart_delay: 1.0
env:
DEBUG: true
# Single operator (Python)
- id: processor
operator:
python: process.py
inputs:
image: camera/frame
outputs:
- detection
# Multi-operator runtime
- id: pipeline
operators:
- id: stage1
python: stage1.py
inputs:
data: source/output
outputs:
- intermediate
- id: stage2
shared-library: target/release/libstage2.so
inputs:
data: stage1/intermediate
outputs:
- final
# ROS2 bridge
- id: ros-input
ros2:
topic: /robot/state
message_type: sensor_msgs/JointState
direction: subscribe
qos:
reliable: true
outputs:
- joints
Descriptor Structs
#![allow(unused)]
fn main() {
pub struct Descriptor {
pub nodes: Vec<Node>,
pub communication: CommunicationConfig,
pub deploy: Option<Deploy>,
pub debug: Debug,
pub health_check_interval: Option<f64>, // default 5.0s
}
}
Node types (mutually exclusive fields):
path— standard executable/scriptoperator— single in-process operatoroperators— multiple in-process operatorscustom— legacy configurationros2— declarative ROS2 bridge
Timer Nodes
Built-in timer nodes generate periodic ticks:
adora/timer/millis/<N>— every N millisecondsadora/timer/secs/<N>— every N seconds
Operator Sources
#![allow(unused)]
fn main() {
pub enum OperatorSource {
SharedLibrary(String), // .so/.dll path
Python(PythonSource), // Python module
Wasm(String), // WebAssembly (planned)
}
}
Deploy Configuration
#![allow(unused)]
fn main() {
pub struct Deploy {
pub machine: Option<String>,
pub working_dir: Option<PathBuf>,
pub labels: BTreeMap<String, String>,
pub distribute: DistributeStrategy, // Local | Scp | Http
}
}
Fault Tolerance
Restart Policies
#![allow(unused)]
fn main() {
pub enum RestartPolicy {
Never, // default
OnFailure, // restart on non-zero exit
Always, // restart unless user-stopped or inputs closed
}
}
Configuration fields per node:
max_restarts— 0 = unlimitedrestart_delay— initial backoff in seconds (doubles each attempt)max_restart_delay— caps exponential backoffrestart_window— reset counter after N seconds (enables “N restarts per M seconds”)health_check_timeout— kill node if no activity within this duration
Health Monitoring
- Heartbeat interval: 5 seconds (daemon → coordinator)
- Health check interval: 5 seconds (configurable per dataflow)
- Metrics collection: 2-second interval (CPU, memory, disk, pending messages)
Circuit Breaker
Per-input timeout detection with automatic recovery:
- Input configured with
input_timeout: <seconds> - If no data arrives within timeout →
InputClosedevent sent to node - Node marks input as degraded, can use cached last-known value
- When upstream recovers →
InputRecoveredevent, circuit breaker re-opens - Node status transitions:
Running→Degraded→Running
Cascading Error Tracking
#![allow(unused)]
fn main() {
pub struct CascadingErrorCauses {
pub caused_by: BTreeMap<NodeId, NodeId>,
}
}
Tracks which node failure caused downstream failures, enabling root-cause analysis.
Fault Tolerance Metrics
#![allow(unused)]
fn main() {
pub struct FaultToleranceSnapshot {
pub restarts: u64,
pub health_check_kills: u64,
pub input_timeouts: u64,
pub circuit_breaker_recoveries: u64,
}
}
Reported per daemon via heartbeat events. Visible via adora inspect top.
Distributed Deployment
Multi-Daemon Architecture
┌──────────┐ Zenoh ┌──────────┐
│ Daemon A │◄──────────────────►│ Daemon B │
│ Machine 1│ pub/sub │ Machine 2│
│ │ │ │
│ Node 1 │ │ Node 3 │
│ Node 2 │ │ Node 4 │
└────┬─────┘ └────┬─────┘
│ WS │ WS
└──────────┐ ┌────────────────┘
▼ ▼
┌──────────┐
│Coordinator│
│ :6013 │
└──────────┘
Zenoh Topic Naming
adora/{network_id}/{dataflow_id}/output/{node_id}/{output_id}
network_idisolates separate Adora clusters (default:"default")- Zenoh router port: 7447, peer port: 5456
- Routing mode:
linkstate
Build Distribution
Three strategies via DistributeStrategy:
- Local — each daemon builds from source (default)
- Scp — CLI pushes built binaries via SSH/SCP
- Http — daemons pull from coordinator’s
/api/artifactsendpoint
Machine Labels
Nodes can target specific machines via labels:
_unstable_deploy:
labels:
gpu: "true"
arch: "arm64"
Recording and Replay
.adorec Binary Format
[HEADER]
├─ MAGIC: 8 bytes ("ADORAREC")
├─ version: u16 LE (currently 1)
├─ start_nanos: u64 LE (Unix epoch nanoseconds)
├─ dataflow_id: 16 bytes (UUID)
├─ yaml_len: u32 LE
└─ descriptor_yaml: [u8; yaml_len]
[ENTRIES] (repeated)
├─ record_len: u32 LE
├─ node_id_len: u16 LE
├─ node_id: [u8; node_id_len]
├─ output_id_len: u16 LE
├─ output_id: [u8; output_id_len]
├─ timestamp_offset_nanos: u64 LE
├─ event_bytes_len: u32 LE
└─ event_bytes: [u8; event_bytes_len] (bincode InterDaemonEvent)
[FOOTER] (optional, written on clean finish)
├─ FOOTER_MAGIC: 8 bytes ("ADORAEND")
├─ total_messages: u64 LE
└─ total_bytes: u64 LE
Writer/Reader API
#![allow(unused)]
fn main() {
pub struct RecordingWriter<W: Write> { /* ... */ }
impl<W: Write> RecordingWriter<W> {
pub fn new(inner: W, header: &RecordingHeader) -> Result<Self>;
pub fn write_entry(&mut self, entry: &RecordEntry) -> Result<()>;
pub fn finish(self) -> Result<RecordingFooter>;
}
pub struct RecordingReader<R: Read> { /* ... */ }
impl<R: Read> RecordingReader<R> {
pub fn open(inner: R) -> Result<Self>;
pub fn header(&self) -> &RecordingHeader;
pub fn next_entry(&mut self) -> Result<Option<RecordEntry>>;
}
}
Extensions
Telemetry
Distributed Tracing (adora-tracing):
- OpenTelemetry with OTLP exporter (compatible with Jaeger, Zipkin, Tempo)
- Context propagation across nodes
- Setup:
set_up_tracing(name: &str)
Metrics (adora-metrics):
- System metrics via
sysinfo(CPU, memory, disk) - OpenTelemetry meter with OTLP exporter
- Async process observer:
run_metrics_monitor(meter_id)
ROS2 Bridge
Declarative YAML-based ROS2 integration supporting:
Topics — subscribe (ROS2 → Adora) or publish (Adora → ROS2):
ros2:
topic: /camera/image
message_type: sensor_msgs/Image
direction: subscribe
Services — client or server role:
ros2:
service: /add_two_ints
service_type: example_interfaces/AddTwoInts
role: client
Actions — goal/feedback/result lifecycle:
ros2:
action: /fibonacci
action_type: example_interfaces/Fibonacci
role: client
QoS configuration:
qos:
reliable: true
durability: transient_local
keep_last: 10
Download
File download utility for fetching operator/node binaries from HTTP URLs. Sanitizes filenames, sets executable permissions on Unix.
Key Constants and Defaults
| Constant | Value | Location |
|---|---|---|
ADORA_COORDINATOR_PORT_WS_DEFAULT | 6013 | Coordinator WebSocket port |
ADORA_DAEMON_LOCAL_LISTEN_PORT_DEFAULT | 53291 | Daemon TCP listener port |
ZERO_COPY_THRESHOLD | 4096 bytes | Shared memory activation |
MAX_MESSAGE_BYTES | 64 MiB | Max TCP/bincode message |
MAX_CONTROL_MESSAGE_BYTES | 1 MiB | Max control plane JSON message |
TCP_READ_TIMEOUT | 30 seconds | Socket read timeout |
WS_PING_INTERVAL | 10 seconds | WebSocket keepalive |
MAX_WS_CONNECTIONS | 256 | Concurrent WebSocket limit |
MAX_CONNECTIONS_PER_IP | 20 / 60s | Rate limiting |
MAX_TOPICS_PER_SUBSCRIBE | 64 | Topic batch limit |
MAX_SUBSCRIPTIONS_PER_CONNECTION | 16 | Per-connection limit |
MAX_BINARY_PAYLOAD_BYTES | 64 MiB | Topic data frame limit |
WATCHDOG_INTERVAL | 5 seconds | Heartbeat to coordinator |
METRICS_INTERVAL | 2 seconds | Metrics collection |
HEALTH_CHECK_INTERVAL | 5 seconds | Default node health check |
MAX_BUFFERED_LOG_MESSAGES | 10,000 | Log buffer capacity |
MAX_PENDING_REPLIES | 256 | Pending coordinator replies |
MAX_ERROR_BYTES | 4096 | Max error message size |
| Default input queue size | 10 | Per-input message buffer |
Identifiers and Data Structures
ID Types
| Type | Underlying | Validation |
|---|---|---|
DataflowId | uuid::Uuid | Assigned on dataflow start |
SessionId | uuid::Uuid (v7) | Per CLI session |
BuildId | uuid::Uuid (v7) | Per build operation |
DaemonId | { machine_id: Option<String>, uuid: Uuid (v7) } | Persisted in .daemon-id |
NodeId | String | Validated: [a-zA-Z0-9_.-], non-empty |
DataId | String | Same validation as NodeId |
OperatorId | String | No validation |
DropToken | Uuid (v7) | Per shared-memory message |
Authentication
#![allow(unused)]
fn main() {
pub struct AuthToken(String); // 64 hex chars (32 bytes)
}
- Generated via cryptographically random bytes
- Stored at
<working_dir>/.adora-token - Constant-time comparison to prevent timing attacks
- Applied to all WebSocket routes
Node Status
#![allow(unused)]
fn main() {
pub enum NodeStatus {
Running, // healthy
Restarting, // restart in progress
Degraded, // circuit breaker open (input timeout)
Failed, // terminal failure
}
}
Serialization Summary
| Channel | Format | Notes |
|---|---|---|
| CLI ↔ Coordinator | JSON text frames | Preserves u128 for HLC timestamps |
| Coordinator ↔ Daemon | JSON text frames | Direct string serialization |
| Daemon ↔ Node (TCP) | bincode over length-prefixed frames | 8-byte LE length prefix |
| Daemon ↔ Node (shmem) | bincode via shared memory | Atomic synchronization |
| Daemon ↔ Daemon | bincode over Zenoh | Apache Arrow data format |
| Recording | bincode entries in .adorec | Custom binary container |
Dataflow YAML Specification
Dataflows are defined in YAML files. Each file describes a graph of nodes, their inputs/outputs, and execution parameters.
A JSON Schema is available at the repo root (adora-schema.json) for editor autocompletion and validation.
Quick Start
nodes:
- id: sender
path: sender.py
outputs:
- message
- id: receiver
path: receiver.py
inputs:
message: sender/message
Run with adora run dataflow.yml (local mode) or adora up && adora start dataflow.yml (networked mode).
Editor Setup
Add a schema comment at the top of your YAML file for VS Code autocompletion (requires the YAML extension):
# yaml-language-server: $schema=https://raw.githubusercontent.com/dora-rs/adora/main/adora-schema.json
nodes:
- id: my-node
# ... autocompletion works here
Root-Level Fields
| Field | Type | Default | Description |
|---|---|---|---|
nodes | list | required | List of node configurations |
strict_types | bool | false | Treat type warnings as errors in validate and build |
type_rules | list | [] | User-defined type compatibility rules (see Type Annotations) |
health_check_interval | float | 5.0 | Seconds between daemon health check sweeps. For each node with health_check_timeout set, the daemon checks whether the node has communicated within its timeout; if not, the node is killed and its restart_policy is evaluated |
_unstable_deploy | object | – | Root-level deployment config (see Deployment) |
_unstable_debug | object | – | Debug options (see Debug) |
Node Configuration
Every node requires an id. All other fields are optional (though most nodes need at least path or operator/operators).
Identity
| Field | Type | Description |
|---|---|---|
id | string | Required. Unique identifier. Must not contain /. Whitespace is discouraged |
name | string | Human-readable display name (metadata only, used in tooling and logs) |
description | string | Documentation string (metadata only, not used at runtime) |
Source
A node’s executable comes from a local path, a git repository, a module reference, or is implicit (operator/ROS2 nodes).
| Field | Type | Description |
|---|---|---|
path | string | Path to executable or script. Can also be a URL (legacy) |
module | string | Path to a module definition file (mutually exclusive with path). See Modules Guide |
git | string | Git repo URL. adora build clones it and uses the clone dir as working directory |
branch | string | Branch to checkout (requires git, mutually exclusive with tag/rev) |
tag | string | Tag to checkout (requires git, mutually exclusive with branch/rev) |
rev | string | Commit hash to checkout (requires git, mutually exclusive with branch/tag) |
build | string | Build commands run during adora build. Each line runs separately. pip/pip3 lines use uv when --uv is passed |
args | string | Command-line arguments (space-separated) |
Example with git source:
- id: rust-node
git: https://github.com/dora-rs/adora.git
branch: main
build: cargo build -p example-node --release
path: target/release/example-node
Data I/O
Inputs
Inputs subscribe to another node’s output using the format <node-id>/<output-id>:
inputs:
# Short form
image: camera/frames
tick: adora/timer/millis/100
# Long form with options
sensor_data:
source: sensor/frames
queue_size: 10
queue_policy: drop_oldest
input_timeout: 5.0
# Lossless input (blocks sender when full)
commands:
source: controller/cmd
queue_size: 100
queue_policy: backpressure
| Input option | Type | Default | Description |
|---|---|---|---|
source | string | required | <node-id>/<output-id> or timer path |
queue_size | integer | 10 | Input buffer size |
queue_policy | string | drop_oldest | drop_oldest: drops oldest message when full. backpressure: buffers up to 10x queue_size without dropping (drops with ERROR log at hard cap) |
input_timeout | float | – | Circuit breaker timeout in seconds. If no message arrives within this period, the daemon closes the input and the node receives an InputClosed event for graceful degradation |
Built-in Timers
Timers are virtual nodes that emit ticks at fixed intervals:
inputs:
tick: adora/timer/millis/100 # every 100ms
slow: adora/timer/millis/1000 # every 1s
fast: adora/timer/hz/30 # 30 Hz (~33ms)
Built-in Log Aggregation
Subscribe to structured log messages from all (or filtered) nodes:
inputs:
all_logs: adora/logs # all nodes, all levels
errors: adora/logs/error # error+ from all nodes
sensor: adora/logs/info/sensor # info+ from specific node
Each message arrives as a JSON-encoded LogMessage string. See Logging for details.
Outputs
A list of output identifiers the node produces:
outputs:
- processed_image
- metadata
Type Annotations
Optional type annotations for inputs and outputs. Types are never required – unannotated ports remain fully dynamic.
- id: camera
path: camera.py
outputs:
- image
- depth
output_types:
image: std/media/v1/Image
depth: std/media/v1/Image
- id: detector
path: detect.py
inputs:
image: camera/image
input_types:
image: std/media/v1/Image
outputs:
- bbox
output_types:
bbox: std/vision/v1/BoundingBox
| Field | Type | Default | Description |
|---|---|---|---|
output_types | object | {} | Maps output IDs to type URNs. Keys must match entries in outputs |
input_types | object | {} | Maps input IDs to expected type URNs. Keys must match entries in inputs |
output_metadata | object | {} | Maps output IDs to lists of required metadata keys |
pattern | string | – | Communication pattern shorthand: service-server, service-client, action-server, action-client |
Type URNs use the format std/<category>/v<version>/<TypeName> and support parameters (e.g. std/media/v1/AudioFrame[sample_type=f32]). See the Type Annotations Guide for the full standard type library, parameterized types, compatibility rules, and user-defined types.
Run adora validate <file> to check type annotations statically. For runtime checking, set ADORA_RUNTIME_TYPE_CHECK=warn or error:
adora validate dataflow.yml
ADORA_RUNTIME_TYPE_CHECK=warn adora run dataflow.yml
Types also appear on adora graph edge labels when annotated.
Module Parameters
When using module:, pass configuration values via params::
- id: fast_pipeline
module: modules/transform.module.yml
inputs:
data: sender/value
params:
speed: "2.0"
mode: turbo
Inside the module, params are available as $PARAM_<UPPERCASE_KEY> in args: and as environment variables. See the Modules Guide for full documentation.
Environment
env:
MY_VAR: "value" # string
DEBUG: true # boolean
PORT: 8080 # integer
RATE: 1.5 # float
FROM_HOST:
__adora_env: HOST_VAR # read from host environment at runtime
Environment variables apply to both build commands and node execution. Values support $VAR expansion syntax.
Logging
| Field | Type | Default | Description |
|---|---|---|---|
send_stdout_as | string | – | Route raw stdout/stderr lines as a data output. Each line is sent as a separate Arrow message |
send_logs_as | string | – | Route structured log entries as a data output. Each entry is a JSON string with fields: timestamp, level, node_id, message, target, fields |
min_log_level | string | – | Suppress logs below this level from file output, coordinator forwarding, and send_logs_as. Levels from most to least verbose: stdout (all output including raw stdout), trace, debug, info, warn, error |
max_log_size | string | – | Rotate log file at this size (e.g. "50MB", "1GB") |
max_rotated_files | integer | 5 | Number of rotated log files to keep |
Example:
- id: sensor
path: ./sensor
min_log_level: info
send_stdout_as: raw_output
send_logs_as: log_entries
max_log_size: "100MB"
max_rotated_files: 3
outputs:
- data
- raw_output
- log_entries
When using send_stdout_as or send_logs_as, include the output name in the outputs list so downstream nodes can subscribe to it.
For a complete guide to all logging features, see Logging.
Fault Tolerance
| Field | Type | Default | Description |
|---|---|---|---|
restart_policy | string | never | never, on-failure, or always |
max_restarts | integer | 0 | Max restart attempts. 0 = unlimited |
restart_delay | float | – | Initial backoff in seconds. Doubles each attempt |
max_restart_delay | float | – | Cap for exponential backoff |
restart_window | float | – | Time window for counting restarts. The counter resets after this many seconds since the first restart in the current window. Enables “N restarts per M seconds” semantics with max_restarts |
health_check_timeout | float | – | If the node does not communicate with the daemon (send outputs, subscribe, etc.) for this many seconds, the daemon kills the process and evaluates the restart_policy |
Restart policies:
never(default): no automatic restarton-failure: restart only on non-zero exit codealways: restart on any exit, except when stopped by user or all inputs closed with success
Example with exponential backoff:
- id: sensor
path: ./sensor
restart_policy: on-failure
max_restarts: 5
restart_delay: 1.0 # 1s, 2s, 4s, 8s, 16s
max_restart_delay: 30.0 # capped at 30s
restart_window: 300.0 # 5 restarts per 5 minutes
health_check_timeout: 30.0
Deployment
Assign nodes to specific machines using _unstable_deploy:
- id: camera-driver
_unstable_deploy:
machine: robot-arm
path: ./target/debug/camera
outputs:
- frames
- id: ml-inference
_unstable_deploy:
machine: gpu-server
labels:
gpu: "true"
distribute: scp
path: ./target/debug/inference
inputs:
frames: camera-driver/frames
| Deploy field | Type | Default | Description |
|---|---|---|---|
machine | string | – | Target machine/daemon ID. The coordinator routes the node to the daemon registered with this ID |
working_dir | string | – | Working directory on the target machine |
labels | object | – | Key-value labels for scheduling. The coordinator matches these against labels reported by each daemon at registration |
distribute | string | local | How built binaries reach the target daemon: local – each daemon builds from source independently; scp – CLI pushes the built binary via SSH/SCP before spawn; http – daemon pulls the binary from the coordinator’s HTTP artifact store |
When nodes are on different machines, communication automatically switches from shared memory to Zenoh pub/sub.
Operator Nodes
Operators run in-process inside a shared runtime (no separate process). Use operator for a single operator or operators for multiple.
Single Operator
The id field is optional for single operators (defaults to the node id):
- id: detector
operator:
python: detect.py
build: pip install -r requirements.txt
inputs:
image: camera/frames
outputs:
- bbox
Multiple Operators
Each operator in operators requires a unique id:
- id: runtime-node
operators:
- id: preprocessor
shared-library: ../../target/debug/libpreprocess
inputs:
raw: sensor/data
outputs:
- processed
- id: analyzer
shared-library: ../../target/debug/libanalyze
inputs:
data: runtime-node/preprocessor/processed
outputs:
- result
Operator Source Types
| Field | Description |
|---|---|
python | Python script path, or {source: "script.py", conda_env: "myenv"} |
shared-library | Path to a shared library (.so/.dylib/.dll) |
Operators also support inputs, outputs, build, send_stdout_as, send_logs_as, min_log_level, max_log_size, and max_rotated_files with the same semantics as node-level fields.
ROS2 Bridge
Declare a node as a ROS2 bridge to automatically convert between ROS2 DDS messages and Adora’s Arrow format. No custom code needed.
Single Topic
- id: camera_bridge
ros2:
topic: /camera/image_raw
message_type: sensor_msgs/Image
direction: subscribe
outputs:
- image
Multiple Topics
- id: robot_bridge
ros2:
topics:
- topic: /camera/image_raw
message_type: sensor_msgs/Image
direction: subscribe
output: image
- topic: /cmd_vel
message_type: geometry_msgs/Twist
direction: publish
input: velocity
qos:
reliable: true
inputs:
velocity: planner/cmd_vel
outputs:
- image
Service Bridge
- id: add_service
ros2:
service: /add_two_ints
service_type: example_interfaces/AddTwoInts
role: server
inputs:
request: client_node/request
outputs:
- response
Action Bridge
- id: nav_action
ros2:
action: /navigate
action_type: nav2_msgs/NavigateToPose
role: client
inputs:
goal: planner/goal
outputs:
- feedback
- result
QoS Configuration
QoS can be set at the bridge level (applies to all topics) or per-topic:
| QoS field | Type | Default | Description |
|---|---|---|---|
reliable | bool | false | Reliable vs best-effort transport |
durability | string | volatile | volatile or transient_local |
liveliness | string | automatic | automatic, manual_by_participant, manual_by_topic |
lease_duration | float | infinity | Lease duration in seconds |
max_blocking_time | float | – | Max blocking time for reliable transport |
keep_last | integer | 1 | History depth (KeepLast policy) |
keep_all | bool | false | Use KeepAll history instead of KeepLast |
Other ROS2 Fields
| Field | Type | Default | Description |
|---|---|---|---|
namespace | string | / | ROS2 namespace |
node_name | string | node id | ROS2 node name |
Debug
_unstable_debug:
publish_all_messages_to_zenoh: true
Required for adora topic echo, adora topic hz, and adora topic info commands.
Communication Patterns
Adora supports four communication patterns built on top of the dataflow:
- Topic (default): pub/sub dataflow
- Service: request/reply via
request_idmetadata - Action: goal/feedback/result via
goal_id/goal_statusmetadata, with cancellation support - Streaming: session/segment/chunk via
session_id/segment_id/seq/fin/flushmetadata, with queue flush for interruption
See Communication Patterns for details and examples.
Full Example
health_check_interval: 10.0
_unstable_debug:
publish_all_messages_to_zenoh: true
nodes:
- id: webcam
operator:
python: webcam.py
inputs:
tick: adora/timer/millis/100
outputs:
- image
- id: detector
operator:
python: detect.py
build: pip install ultralytics
inputs:
image: webcam/image
outputs:
- bbox
- id: plotter
operator:
python: plot.py
inputs:
image: webcam/image
bbox: detector/bbox
- id: logger
path: ./logger
inputs:
bbox: detector/bbox
send_stdout_as: logs
min_log_level: info
restart_policy: on-failure
max_restarts: 3
outputs:
- logs
Type Annotations
Optional type annotations on dataflow inputs and outputs. Types are never required – unannotated ports remain fully dynamic. Type checking runs at build time and validate time (no runtime overhead by default).
Quick Start
nodes:
- id: camera
path: camera.py
outputs:
- image
output_types:
image: std/media/v1/Image
- id: detector
path: detect.py
inputs:
image: camera/image
input_types:
image: std/media/v1/Image
outputs:
- bbox
output_types:
bbox: std/vision/v1/BoundingBox
Validate with:
adora validate dataflow.yml
# Fail with non-zero exit code on warnings (for CI)
adora validate --strict-types dataflow.yml
# Type checks also run during build
adora build dataflow.yml --strict-types
You can also set strict_types: true at the top level of the YAML to enable strict mode without the CLI flag:
strict_types: true
nodes:
# ...
Type URN Format
Type URNs follow the pattern std/<category>/v<version>/<TypeName>:
std/core/v1/Float32
std/media/v1/Image
std/vision/v1/BoundingBox
Parameterized Types
Some struct types accept parameters to distinguish variants:
std/media/v1/AudioFrame[sample_type=f32]
std/media/v1/AudioFrame[sample_type=f32,channels=2]
Matching rules:
- Same base + same params -> compatible
- Same base + one side unparameterized -> compatible (wildcard)
- Same base + different param values -> mismatch
# These are compatible (wildcard):
output_types:
audio: std/media/v1/AudioFrame[sample_type=f32]
input_types:
audio: std/media/v1/AudioFrame
# These are a mismatch:
output_types:
audio: std/media/v1/AudioFrame[sample_type=f32]
input_types:
audio: std/media/v1/AudioFrame[sample_type=i16]
Standard Type Library
std/core/v1
| Type | Arrow Type | Description |
|---|---|---|
Float32 | Float32 | 32-bit float |
Float64 | Float64 | 64-bit float |
Int32 | Int32 | 32-bit signed integer |
Int64 | Int64 | 64-bit signed integer |
UInt8 | UInt8 | 8-bit unsigned integer |
UInt32 | UInt32 | 32-bit unsigned integer |
UInt64 | UInt64 | 64-bit unsigned integer |
String | Utf8 | UTF-8 string |
Bytes | LargeBinary | Raw bytes (universal sink – any type is compatible) |
Bool | Boolean | Boolean |
std/math/v1
| Type | Arrow Type | Fields | Description |
|---|---|---|---|
Vector3 | Struct | x, y, z (Float64) | 3D vector |
Quaternion | Struct | x, y, z, w (Float64) | Quaternion |
Pose | Struct | position, orientation | 6-DOF pose |
Transform | Struct | translation, rotation | Coordinate transform |
std/control/v1
| Type | Arrow Type | Description |
|---|---|---|
Twist | Struct | Linear and angular velocity |
JointState | Struct | Joint positions, velocities, efforts |
Odometry | Struct | Pose + Twist in a reference frame |
std/media/v1
| Type | Arrow Type | Parameters | Description |
|---|---|---|---|
Image | Struct | encoding | Raw image (width, height, encoding, data) |
CompressedImage | LargeBinary | format | JPEG/PNG compressed image |
PointCloud | Struct | point_type | 3D point cloud |
AudioFrame | Struct | sample_type (default: f32) | Audio samples |
std/vision/v1
| Type | Arrow Type | Description |
|---|---|---|
BoundingBox | Struct | 2D bounding box with confidence and label |
Detection | Struct | Object detection result (list of BoundingBox) |
Segmentation | Struct | Pixel-level segmentation mask |
Validation Rules
adora validate and adora build check:
- Key existence:
output_typeskeys must appear inoutputs,input_typeskeys must appear ininputs - URN resolution: All type URNs must exist in the standard or user-defined type library. Typos get “did you mean?” suggestions.
- Edge compatibility: Connected edges must have compatible types (exact match, implicit widening, or user-defined rules)
- Timer auto-typing: Timer inputs (
adora/timer/*) are automatically typed asstd/core/v1/UInt64 - Type inference: When only the upstream side annotates a type, it is inferred on the downstream input and reported
- Parameterized types: Parameter mismatches are detected (see above)
- Metadata patterns:
output_metadatakeys andpatternshorthands are validated (see below) - Schema compatibility: Struct types are checked at the field level – missing fields or wrong field types are flagged
All checks produce warnings (non-fatal by default). Use --strict-types to treat warnings as errors for CI pipelines.
Type warnings:
- node "camera": output_types key "framez" not found in outputs list
- node "detector": unknown type "std/vision/v1/BoundingBx" on output "bbox"
(did you mean "std/vision/v1/BoundingBox"?)
- node "detector": type mismatch on input "image": upstream camera/image
declares "std/core/v1/Bytes", but expected "std/media/v1/Image"
Inferred types:
inferred std/core/v1/Float64 on processor/reading (from sensor/reading)
Type Compatibility Rules
Beyond exact matching, the type checker supports implicit widening conversions:
| From | To |
|---|---|
UInt8 | UInt32 |
UInt32 | UInt64 |
Int32 | Int64 |
Float32 | Float64 |
| Any type | Bytes (universal sink) |
Widening is transitive up to depth 3 (e.g. UInt8 -> UInt32 -> UInt64 works, but chains of 4+ do not).
User-Defined Compatibility Rules
Add custom rules in the dataflow YAML:
type_rules:
- from: myproject/SensorV1
to: myproject/SensorV2
nodes:
# ...
Metadata Patterns
Nodes that implement communication patterns (services, actions) can declare required metadata keys on their outputs.
Explicit metadata
- id: server
path: server.py
outputs:
- response
output_metadata:
response: [request_id]
Pattern shorthand
Use the pattern field to auto-imply required metadata keys:
- id: server
path: server.py
pattern: service-server
outputs:
- response
| Pattern | Required metadata keys |
|---|---|
service-server | request_id |
service-client | request_id |
action-server | goal_id, goal_status |
action-client | goal_id |
User-Defined Types
Projects can define custom types in a types/ directory next to the dataflow. The directory structure determines the URN prefix:
project/
dataflow.yml
types/
myproject/
sensors/
v1.yml # URN prefix: myproject/sensors/v1
Type YAML files use the same format as the standard library:
types:
MySensor:
arrow: Struct
description: Custom sensor reading
fields:
- name: temperature
type: Float32
- name: humidity
type: Float32
This creates the URN myproject/sensors/v1/MySensor.
The std/ prefix is reserved and cannot be used for user types.
User types are loaded automatically by adora validate and adora build when a types/ directory exists.
Runtime Type Checking
In addition to static validation, Adora supports optional runtime type checking on send_output(). When enabled, the actual Arrow data type is compared against the declared output_types at send time.
Enable via environment variable:
# Warn on mismatches (log and continue)
ADORA_RUNTIME_TYPE_CHECK=warn adora run dataflow.yml
# Error on mismatches (node returns error)
ADORA_RUNTIME_TYPE_CHECK=error adora run dataflow.yml
Valid values: 1, warn, true (warn mode), error (error mode). Unset or any other value disables checking (zero overhead).
Scope:
- Validates
output_typeson the sender side (send_output()calls).input_typesare checked statically byadora validatebut not enforced at runtime - Covers all languages that send Arrow arrays (Rust, Python, C++ Arrow path)
- Raw byte sends (
send_output_bytes, C nodes) are untyped and skip checking - Complex types (Struct-based: Image, Vector3, etc.) are skipped – only primitive types, String, Bytes, and Bool are validated at runtime
Graph Visualization
When outputs have type annotations, adora graph shows the type on edge labels:
adora graph dataflow.yml --open
Edges display as output_name [TypeName] (e.g. image [Image]).
Operators
Operators support the same output_types, input_types, output_metadata, and pattern fields:
- id: runtime-node
operators:
- id: preprocessor
python: preprocess.py
inputs:
raw: sensor/data
input_types:
raw: std/core/v1/Bytes
outputs:
- processed
output_types:
processed: std/media/v1/Image
Modules (Reusable Sub-Dataflows)
Modules let you define reusable sub-graphs of nodes in separate YAML files and compose them into larger dataflows. Modules are expanded at compile time – the runtime never sees them.
Quick Start
Module file (modules/transform_module.yml):
module:
name: transform_pipeline
inputs: [raw_data]
outputs: [filtered]
nodes:
- id: doubler
path: doubler.py
inputs:
data: _mod/raw_data
outputs:
- doubled
- id: filter
path: filter_even.py
inputs:
data: doubler/doubled
outputs:
- filtered
Dataflow file (dataflow.yml):
nodes:
- id: sender
path: sender.py
outputs:
- value
- id: pipeline
module: modules/transform_module.yml
inputs:
raw_data: sender/value
- id: receiver
path: receiver.py
inputs:
filtered: pipeline/filtered
After expansion, pipeline becomes two nodes: pipeline.doubler and pipeline.filter, with all wiring resolved automatically.
Module Definition File
A module file has two sections:
module: header
| Field | Type | Required | Description |
|---|---|---|---|
name | string | yes | Module name (metadata only) |
inputs | list | no | Required input port names |
inputs_optional | list | no | Optional input ports (silently skipped if not wired) |
outputs | list | no | Output port names exposed to the parent dataflow |
nodes: list
Standard node definitions, with one special syntax: _mod/port_name references a module input port. When expanded, _mod/port_name is replaced with whatever the parent wired to that port.
module:
name: my_module
inputs: [camera_feed]
outputs: [detections]
nodes:
- id: detector
path: detect.py
inputs:
image: _mod/camera_feed # resolved to parent's wiring
outputs:
- detections
Module-level build
Modules can have a top-level build: command that runs before any inner node builds:
module:
name: ml_pipeline
inputs: [image]
outputs: [result]
build: pip install -r requirements.txt
nodes:
- id: model
path: model.py
inputs:
image: _mod/image
outputs:
- result
Using Modules
Reference a module in a dataflow node using the module: field instead of path::
- id: nav_stack
module: modules/navigation.module.yml
inputs:
goal_pose: localization/goal
The module node’s inputs: map wires parent outputs to module input ports. External nodes reference module outputs as <module_id>/<output_name> (e.g., nav_stack/cmd_vel).
Parameters
Pass configuration values to modules via params::
- id: fast_pipeline
module: modules/transform_module.yml
inputs:
raw_data: sender/value
params:
speed: "2.0"
mode: turbo
Inside the module, reference params in args: using $PARAM_<UPPERCASE_KEY>:
nodes:
- id: processor
path: processor.py
args: --speed $PARAM_SPEED --mode $PARAM_MODE
inputs:
data: _mod/raw_data
outputs:
- result
Parameters are also injected as environment variables (PARAM_SPEED, PARAM_MODE) into every node inside the module.
Expansion Rules
- Load the module YAML file and validate its header
- Prefix all internal node IDs with
{module_id}.(e.g.,nav_stack.planner) - Replace
_mod/port_namereferences with the actual sources from the parent’s input map - Rewrite internal cross-references (e.g.,
planner/pathbecomesnav_stack.planner/path) - Map module-declared outputs to internal node outputs, so
nav_stack/cmd_velresolves tonav_stack.controller/cmd_vel - Replace the module node with the expanded flat nodes
- Substitute
params:values inargs:fields and inject as env vars
Use adora expand to see the result:
adora expand dataflow.yml
Nested Modules
Modules can reference other modules. The expansion is recursive with a depth limit of 8 levels:
# outer_module.yml
module:
name: outer
inputs: [data]
outputs: [result]
nodes:
- id: inner
module: inner_module.yml
inputs:
raw: _mod/data
- id: postprocess
path: postprocess.py
inputs:
data: inner/processed
outputs:
- result
After expansion, node IDs are fully qualified: outer.inner.some_node.
Optional Inputs
Declare inputs as optional when a module should work with or without certain connections:
module:
name: flexible_processor
inputs: [data]
inputs_optional: [config]
outputs: [result]
nodes:
- id: processor
path: processor.py
inputs:
data: _mod/data
config: _mod/config # silently dropped if not wired
outputs:
- result
When the parent doesn’t wire config, the input is simply omitted from the expanded node.
Visualization
adora graph renders module boundaries as Mermaid subgraphs, making it easy to see which nodes came from which module:
adora graph dataflow.yml --open
Validation
Validate a standalone module file without a full dataflow:
adora expand --module modules/transform_module.yml
This checks:
- Valid YAML structure
- Module header is present with
name,inputs,outputs - All
_mod/references correspond to declared inputs or optional inputs - No duplicate node IDs
- Internal wiring is consistent
Security
- Path confinement: Module file paths must resolve within the dataflow’s base directory. Absolute paths and directory traversal (
../) outside the base are rejected. - File size limit: Module files are capped at 1 MB.
- Depth limit: Recursive nesting is capped at 8 levels.
- Param key validation: Parameter keys must be alphanumeric with underscores only.
Example
See examples/module-dataflow/ for a complete working example with a sender, transform module (doubler + filter), and receiver.
adora run examples/module-dataflow/dataflow.yml
Communication Patterns
Adora is a dataflow framework based on pub/sub message passing. On top of basic topics, the framework supports service (request/reply), action (goal/feedback/result), and streaming (session/segment/chunk) patterns using well-known metadata keys. No changes to the daemon, coordinator, or YAML syntax are required – the patterns are implemented as conventions at the node API level.
1. Topic (pub/sub)
The default pattern. A node publishes data on an output, and any node that subscribes to that output receives it.
nodes:
- id: publisher
outputs:
- data
- id: subscriber
inputs:
data: publisher/data
Use when: streaming sensor data, periodic status, fire-and-forget events.
2. Service (request/reply)
A client sends a request and expects exactly one response, correlated by a
request_id metadata key.
Well-known metadata keys
| Key | Constant | Description |
|---|---|---|
request_id | adora_node_api::REQUEST_ID | UUID v7 correlating request and response |
YAML
nodes:
- id: client
inputs:
tick: adora/timer/millis/500
response: server/response
outputs:
- request
- id: server
inputs:
request: client/request
outputs:
- response
Node API helpers
#![allow(unused)]
fn main() {
// Client: send request with auto-generated request_id
let rid = node.send_service_request("request".into(), params, data)?;
// Server: pass through metadata.parameters (includes request_id)
node.send_service_response("response".into(), metadata.parameters, result)?;
}
The server MUST pass through the request_id from the incoming request’s
metadata parameters into the response. The client matches responses to
requests using this key.
Example: examples/service-example/
3. Action (goal/feedback/result)
A client sends a goal and receives periodic feedback plus a final result. Actions support cancellation.
Well-known metadata keys
| Key | Constant | Description |
|---|---|---|
goal_id | adora_node_api::GOAL_ID | UUID v7 identifying the goal |
goal_status | adora_node_api::GOAL_STATUS | Final status of the goal |
Goal status values:
| Value | Constant | Meaning |
|---|---|---|
succeeded | GOAL_STATUS_SUCCEEDED | Goal completed successfully |
aborted | GOAL_STATUS_ABORTED | Goal aborted by server |
canceled | GOAL_STATUS_CANCELED | Goal canceled by client |
YAML
nodes:
- id: client
inputs:
tick: adora/timer/millis/2000
feedback: server/feedback
result: server/result
outputs:
- goal
- cancel
- id: server
inputs:
goal: client/goal
cancel: client/cancel
outputs:
- feedback
- result
Cancel pattern
The client sends a message on the cancel output with goal_id in the
metadata. The server checks for cancel requests between processing steps and
sends a result with goal_status = "canceled".
Example: examples/action-example/
4. Streaming (session/segment/chunk)
For real-time pipelines (voice, video, sensor streams) where a user can interrupt mid-stream and queued data must be discarded.
Well-known metadata keys
| Key | Type | Constant | Description |
|---|---|---|---|
session_id | String | SESSION_ID | Identifies the conversation/session |
segment_id | Integer | SEGMENT_ID | Logical unit within a session (e.g. one utterance) |
seq | Integer | SEQ | Chunk sequence number within a segment |
fin | Bool | FIN | true on the last chunk of a segment |
flush | Bool | FLUSH | true to discard older queued messages on this input |
YAML
nodes:
- id: asr
inputs:
mic: mic-source/audio
outputs:
- text
- id: llm
inputs:
text: asr/text
outputs:
- tokens
- id: tts
inputs:
tokens: llm/tokens
outputs:
- audio
Node API
#![allow(unused)]
fn main() {
use adora_node_api::{StreamSegment, AdoraNode};
let mut seg = StreamSegment::new();
// Send chunks with auto-incrementing seq (e.g. inside an ASR node)
node.send_stream_chunk("text".into(), &mut seg, false, chunk_data)?;
// Mark final chunk of a segment
node.send_stream_chunk("text".into(), &mut seg, true, last_chunk)?;
// On user interruption: flush downstream queues and start a new segment.
// The prior segment ends without a fin=true signal -- old data is discarded.
let flush_params = seg.flush();
node.send_output("text".into(), flush_params, empty_data)?;
}
Queue flush behavior
When a message arrives with flush: true in its metadata, the
receiver’s input queue is cleared of all older messages before the
flush message is delivered. This enables instant interruption in
voice pipelines – when the user speaks over TTS output, the ASR node
sends a new segment with flush: true, and the TTS node immediately
discards any queued audio chunks from the previous response.
Note: flush discards all queued messages on the input regardless of
session_id. Do not multiplex independent sessions on a single input
when using flush.
Python
# Streaming metadata is a plain dict
params = {
"session_id": session_id,
"segment_id": 1,
"seq": 0,
"fin": False,
"flush": True, # flush older queued messages
}
node.send_output("text", data, metadata={"parameters": params})
5. Choosing a pattern
| Need a response? | Long-running? | Cancelable? | Real-time stream? | Pattern |
|---|---|---|---|---|
| No | - | - | No | Topic |
| Yes | No | No | No | Service |
| Yes | Yes | Optional | No | Action |
| No | Yes | Via flush | Yes | Streaming |
6. Important details
goal_statusmatching is case-sensitive. Always use the exact lowercase values:"succeeded","aborted","canceled". The ROS2 bridge defaults toAbortedfor unrecognised values.
7. Python compatibility
Python nodes use the same metadata conventions. Parameters are plain dicts with string keys:
import uuid
# Service client (uuid7 for time-ordered IDs, matching Rust API)
params = {"request_id": str(uuid.uuid7())}
node.send_output("request", data, metadata={"parameters": params})
# Service server -- pass through parameters
node.send_output("response", result, metadata=event["metadata"])
Note:
uuid.uuid7()requires Python 3.13+. On older versions, use theuuid_utilspackage oruuid.uuid4()(random v4 also works for correlation, but loses time-ordering).
Rust API Reference
This document covers the two main Rust crates for building Adora dataflow components:
adora-node-api– for standalone node executablesadora-operator-api– for in-process operators managed by the Adora runtime
Node API (adora-node-api)
Add to your Cargo.toml:
[dependencies]
adora-node-api = { workspace = true }
AdoraNode
The primary struct for sending outputs and retrieving node information. Obtained through one of the initialization functions below.
Initialization
#![allow(unused)]
fn main() {
// Recommended: auto-detect environment (daemon, testing, or interactive).
pub fn init_from_env() -> NodeResult<(Self, EventStream)>
// Same as init_from_env but errors instead of falling back to interactive mode.
pub fn init_from_env_force() -> NodeResult<(Self, EventStream)>
// For dynamic nodes: connect to the daemon by node ID.
pub fn init_from_node_id(node_id: NodeId) -> NodeResult<(Self, EventStream)>
// Try init_from_env first; fall back to init_from_node_id.
pub fn init_flexible(node_id: NodeId) -> NodeResult<(Self, EventStream)>
// Standalone interactive mode (prompts for inputs on the terminal).
pub fn init_interactive() -> NodeResult<(Self, EventStream)>
// Integration test mode with synthetic inputs/outputs.
pub fn init_testing(
input: TestingInput,
output: TestingOutput,
options: TestingOptions,
) -> NodeResult<(Self, EventStream)>
}
init_from_env is the recommended entry point. It checks, in order:
- Thread-local testing state set by
setup_integration_testing ADORA_NODE_CONFIGenvironment variable (set by the daemon)ADORA_TEST_WITH_INPUTSenvironment variable (file-based integration testing)- Interactive terminal fallback (only if stdin is a TTY)
Sending Outputs
All send methods silently ignore output IDs not declared in the dataflow YAML.
#![allow(unused)]
fn main() {
// Send an Arrow array. Copies data into shared memory when beneficial.
pub fn send_output(
&mut self,
output_id: DataId,
parameters: MetadataParameters,
data: impl Array,
) -> NodeResult<()>
// Send raw bytes. Copies into shared memory when beneficial.
pub fn send_output_bytes(
&mut self,
output_id: DataId,
parameters: MetadataParameters,
data_len: usize,
data: &[u8],
) -> NodeResult<()>
// Send raw bytes via a closure for zero-copy writing.
pub fn send_output_raw<F>(
&mut self,
output_id: DataId,
parameters: MetadataParameters,
data_len: usize,
data: F,
) -> NodeResult<()>
where
F: FnOnce(&mut [u8])
// Send raw bytes with explicit Arrow type information.
pub fn send_typed_output<F>(
&mut self,
output_id: DataId,
type_info: ArrowTypeInfo,
parameters: MetadataParameters,
data_len: usize,
data: F,
) -> NodeResult<()>
where
F: FnOnce(&mut [u8])
// Send a pre-allocated DataSample with type information.
pub fn send_output_sample(
&mut self,
output_id: DataId,
type_info: ArrowTypeInfo,
parameters: MetadataParameters,
sample: Option<DataSample>,
) -> NodeResult<()>
// Report output IDs as closed. No further sends allowed for those IDs.
pub fn close_outputs(&mut self, outputs_ids: Vec<DataId>) -> NodeResult<()>
}
Service, Action, and Streaming Helpers
Higher-level methods for the communication patterns. These use well-known metadata keys to correlate requests, goals, responses, and streaming segments.
#![allow(unused)]
fn main() {
// Generate a unique, time-ordered ID (UUID v7) for correlation.
pub fn new_request_id() -> String
pub fn new_goal_id() -> String // alias for new_request_id
// Send a service request. Injects a `request_id` into parameters and returns it.
pub fn send_service_request(
&mut self,
output_id: DataId,
parameters: MetadataParameters,
data: impl Array,
) -> NodeResult<String>
// Send a service response. Semantic alias for send_output.
// Caller must pass through the request_id from the incoming request's metadata.
pub fn send_service_response(
&mut self,
output_id: DataId,
parameters: MetadataParameters,
data: impl Array,
) -> NodeResult<()>
}
Service example (client sends request, server replies):
#![allow(unused)]
fn main() {
// Client: auto-generates and injects request_id
let rid = node.send_service_request("request".into(), params, data)?;
// Server: pass through metadata.parameters (includes request_id)
node.send_service_response("response".into(), metadata.parameters, result)?;
}
Action example (client sends goal, server streams feedback + result):
#![allow(unused)]
fn main() {
use adora_node_api::{GOAL_ID, GOAL_STATUS, GOAL_STATUS_SUCCEEDED, Parameter};
// Client: generate goal_id, attach to params
let goal_id = AdoraNode::new_goal_id();
params.insert(GOAL_ID.to_string(), Parameter::String(goal_id));
node.send_output("goal".into(), params, data)?;
// Server: extract goal_id, send feedback/result with goal_status
let gid = get_string_param(&metadata.parameters, GOAL_ID);
}
Streaming example (real-time voice/video pipeline with interruption):
#![allow(unused)]
fn main() {
use adora_node_api::StreamSegment;
// Create a streaming segment builder (auto-generates session_id)
let mut seg = StreamSegment::new();
// Send chunks with auto-incrementing seq
node.send_stream_chunk("text".into(), &mut seg, false, chunk_data)?;
// Mark final chunk of a segment
node.send_stream_chunk("text".into(), &mut seg, true, last_chunk)?;
// On user interruption: flush downstream queues and start a new segment
let flush_params = seg.flush();
node.send_output("text".into(), flush_params, empty_data)?;
}
See patterns.md for the full guide and examples/service-example and examples/action-example for working code.
Data Allocation
#![allow(unused)]
fn main() {
// Allocate a DataSample of the given size.
// Uses shared memory for data >= ZERO_COPY_THRESHOLD (4096 bytes).
pub fn allocate_data_sample(&mut self, data_len: usize) -> NodeResult<DataSample>
}
Node Information
#![allow(unused)]
fn main() {
// Node ID from the dataflow YAML.
pub fn id(&self) -> &NodeId
// Unique identifier for this dataflow run.
pub fn dataflow_id(&self) -> &DataflowId
// Input/output configuration for this node.
pub fn node_config(&self) -> &NodeRunConfig
// True if this node was restarted after a previous exit or failure.
pub fn is_restart(&self) -> bool
// Number of times this node has been restarted (0 on first run).
pub fn restart_count(&self) -> u32
// Parsed dataflow YAML descriptor.
pub fn dataflow_descriptor(&self) -> NodeResult<&Descriptor>
}
Logging
Rust nodes have two ways to emit structured logs. Both produce identical structured log entries in the daemon.
Option 1: Node API (recommended for most cases)
All log methods emit structured JSONL to stdout, which the daemon parses automatically. Works with min_log_level filtering, send_logs_as routing, and adora/logs subscribers.
#![allow(unused)]
fn main() {
// General structured log. Level: "error", "warn", "info", "debug", "trace".
pub fn log(&self, level: &str, message: &str, target: Option<&str>)
// Structured log with additional key-value fields.
pub fn log_with_fields(
&self,
level: &str,
message: &str,
target: Option<&str>,
fields: Option<&BTreeMap<String, String>>,
)
// Convenience methods (no target parameter).
pub fn log_error(&self, message: &str)
pub fn log_warn(&self, message: &str)
pub fn log_info(&self, message: &str)
pub fn log_debug(&self, message: &str)
pub fn log_trace(&self, message: &str)
}
Option 2: Rust tracing crate
When adora’s tracing subscriber is initialized (via init_tracing() or the default feature), tracing::info!() etc. output structured JSON to stdout that the daemon parses identically:
#![allow(unused)]
fn main() {
tracing::info!("Sensor started");
tracing::warn!(sensor_id = "temp-01", "High temperature");
}
Use tracing when you want ecosystem integration (spans, instrumentation, OpenTelemetry). Use node.log_*() when you want explicit control or structured fields as BTreeMap.
| Method | Structured? | Fields? | OpenTelemetry? | Best for |
|---|---|---|---|---|
node.log_info(msg) | Yes | No | No | Quick one-liner |
node.log_with_fields(...) | Yes | Yes (BTreeMap) | No | Structured key-value context |
tracing::info!(key = val, msg) | Yes | Yes (spans) | Yes | Ecosystem integration, OTel |
println!() | No (stdout level) | No | No | Quick debugging |
EventStream
Asynchronous iterator over incoming events destined for this node. Implements the futures::Stream trait.
The event stream closes itself after a Stop event is received. Nodes should exit once the stream ends.
#![allow(unused)]
fn main() {
// Block until the next event arrives. Returns None when the stream closes.
// Uses an internal EventScheduler that may reorder events for fairness.
pub fn recv(&mut self) -> Option<Event>
// Block with a timeout. Returns an Event::Error on timeout.
pub fn recv_timeout(&mut self, dur: Duration) -> Option<Event>
// Async receive with EventScheduler reordering.
pub async fn recv_async(&mut self) -> Option<Event>
// Async receive with a timeout. Returns Event::Error on timeout.
pub async fn recv_async_timeout(&mut self, dur: Duration) -> Option<Event>
// Non-blocking receive. Returns TryRecvError::Empty if nothing is ready.
pub fn try_recv(&mut self) -> Result<Event, TryRecvError>
// Drain all buffered events without blocking.
// Returns Some(Vec::new()) if nothing is ready; None if the stream is closed.
pub fn drain(&mut self) -> Option<Vec<Event>>
// True if no events are buffered in the scheduler or receiver.
pub fn is_empty(&self) -> bool
// Returns and resets accumulated drop counts per input ID.
// For `drop_oldest` inputs, drops happen at `queue_size`.
// For `backpressure` inputs, drops happen at 10x `queue_size` (hard safety cap).
pub fn drain_drop_counts(&mut self) -> HashMap<DataId, u64>
}
EventStream also implements futures::Stream<Item = Event>, so it can be used with StreamExt::next() and other combinators. Unlike recv/recv_async, the Stream implementation does not use the EventScheduler, preserving chronological event order.
Event
Represents an incoming event. This enum is #[non_exhaustive] – ignore unknown variants to stay forward-compatible.
#![allow(unused)]
fn main() {
#[non_exhaustive]
pub enum Event {
// An input was received from another node.
Input {
id: DataId, // input ID from the YAML (not the sender's output ID)
metadata: Metadata, // timestamp and type information
data: ArrowData, // Apache Arrow data
},
// The sender mapped to this input exited; no more data will arrive.
InputClosed { id: DataId },
// A previously closed input recovered (e.g., upstream node came back after timeout).
InputRecovered { id: DataId },
// An upstream node has restarted. Useful for resetting caches or state.
NodeRestarted { id: NodeId },
// The event stream is about to close. See StopCause for the reason.
Stop(StopCause),
// Instructs the node to reload an operator (used internally by the runtime).
Reload { operator_id: Option<OperatorId> },
// An unexpected internal error. Log it for debugging.
Error(String),
}
}
StopCause
#![allow(unused)]
fn main() {
#[non_exhaustive]
pub enum StopCause {
// Explicit stop via `adora stop` or Ctrl-C. Exit promptly or be killed.
Manual,
// All inputs were closed (upstream nodes exited). Only sent if the node has inputs.
AllInputsClosed,
}
}
Supporting Types
DataSample
A data region suitable for sending as an output message. Uses shared memory for data >= ZERO_COPY_THRESHOLD to enable zero-copy transfer.
Implements Deref<Target = [u8]> and DerefMut for reading and writing the underlying bytes.
Metadata and MetadataParameters
#![allow(unused)]
fn main() {
// Full metadata attached to every input event.
pub struct Metadata {
// Contains timestamp, Arrow type info, and user-defined parameters.
}
// User-controlled metadata fields attached when sending outputs.
// Type alias for BTreeMap<String, Parameter>.
// Default is empty. Pass metadata.parameters from an input to forward metadata.
pub type MetadataParameters = BTreeMap<String, Parameter>;
// A single metadata parameter value.
pub enum Parameter {
Bool(bool), Integer(i64), Float(f64), String(String),
ListInt(Vec<i64>), ListFloat(Vec<f64>), ListString(Vec<String>),
Timestamp(DateTime<Utc>),
}
// Extract typed parameters, returning None if missing or wrong type.
pub fn get_string_param<'a>(params: &'a MetadataParameters, key: &str) -> Option<&'a str>
pub fn get_integer_param(params: &MetadataParameters, key: &str) -> Option<i64>
pub fn get_bool_param(params: &MetadataParameters, key: &str) -> Option<bool>
}
Well-known metadata keys (for communication patterns):
| Constant | Value | Used by |
|---|---|---|
REQUEST_ID | "request_id" | Service request/response correlation |
GOAL_ID | "goal_id" | Action goal identification |
GOAL_STATUS | "goal_status" | Action result status |
GOAL_STATUS_SUCCEEDED | "succeeded" | Goal completed successfully |
GOAL_STATUS_ABORTED | "aborted" | Goal aborted by server |
GOAL_STATUS_CANCELED | "canceled" | Goal canceled by client |
SESSION_ID | "session_id" | Streaming session identifier |
SEGMENT_ID | "segment_id" | Streaming segment within a session |
SEQ | "seq" | Streaming chunk sequence number |
FIN | "fin" | Last chunk of a streaming segment |
FLUSH | "flush" | Discard older queued messages on input |
All constants are re-exported from adora_node_api.
Identity Types
#![allow(unused)]
fn main() {
// Unique identifier for a running dataflow instance (UUID v4).
pub struct DataflowId(/* ... */);
// Node identifier, as defined in the dataflow YAML.
pub struct NodeId(/* ... */);
// Input/output identifier, as defined in the dataflow YAML.
pub struct DataId(/* ... */);
}
Error Types
#![allow(unused)]
fn main() {
#[derive(Debug, Error)]
pub enum NodeError {
Init(String), // config parsing, env vars, daemon handshake
Connection(String), // daemon connection lost
Output(String), // send or close failure
Data(String), // allocation or descriptor parsing
Internal(eyre::Report), // catch-all for unexpected errors
}
pub type NodeResult<T> = Result<T, NodeError>;
}
TryRecvError
#![allow(unused)]
fn main() {
pub enum TryRecvError {
Empty, // no event available right now
Closed, // event stream has been closed
}
}
ZERO_COPY_THRESHOLD
#![allow(unused)]
fn main() {
pub const ZERO_COPY_THRESHOLD: usize = 4096;
}
Messages smaller than this threshold are sent via TCP. Messages at or above this size use shared memory for zero-copy transfer.
ArrowData
#![allow(unused)]
fn main() {
// Wrapper around arrow::array::ArrayRef. Implements Deref to the inner ArrayRef.
pub struct ArrowData(pub arrow::array::ArrayRef);
}
Data from Event::Input arrives as ArrowData. Use TryFrom conversions or Arrow APIs to extract typed values.
InputTracker
Helper for tracking input health and caching the last received value per input. Useful for graceful degradation when upstream nodes time out.
#![allow(unused)]
fn main() {
pub struct InputTracker { /* ... */ }
impl InputTracker {
pub fn new() -> Self
// Update state from an event. Returns true if the event was relevant.
pub fn process_event(&mut self, event: &Event) -> bool
// Current state of an input (Healthy or Closed), if tracked.
pub fn state(&self, id: &DataId) -> Option<InputState>
// True if the input is currently closed.
pub fn is_closed(&self, id: &DataId) -> bool
// Last received value for an input. Available even when closed.
pub fn last_value(&self, id: &DataId) -> Option<&ArrowData>
// All inputs currently in Closed state.
pub fn closed_inputs(&self) -> Vec<&DataId>
// True if any tracked input is closed.
pub fn any_closed(&self) -> bool
}
pub enum InputState {
Healthy, // receiving data normally
Closed, // upstream exited or timed out
}
}
Integration Testing
The integration_testing module provides tools for testing nodes without a running daemon.
setup_integration_testing
Sets up thread-local state so that the next call to AdoraNode::init_from_env on the same thread initializes in test mode.
#![allow(unused)]
fn main() {
pub fn setup_integration_testing(
input: TestingInput,
output: TestingOutput,
options: TestingOptions,
)
}
TestingInput
#![allow(unused)]
fn main() {
pub enum TestingInput {
// Load events from a JSON file (must deserialize to IntegrationTestInput).
FromJsonFile(PathBuf),
// Provide events directly.
Input(IntegrationTestInput),
}
}
TestingOutput
#![allow(unused)]
fn main() {
pub enum TestingOutput {
// Write outputs to a JSONL file (created or overwritten).
ToFile(PathBuf),
// Write outputs as JSONL to any writer.
ToWriter(Box<dyn std::io::Write + Send>),
// Send each output as a JSON object to a flume channel.
ToChannel(flume::Sender<serde_json::Map<String, serde_json::Value>>),
}
}
TestingOptions
#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Default)]
pub struct TestingOptions {
// Skip time offsets in outputs for deterministic comparison.
pub skip_output_time_offsets: bool,
}
}
Environment Variable Testing
Nodes using init_from_env also support file-based testing via environment variables:
| Variable | Description |
|---|---|
ADORA_TEST_WITH_INPUTS | Path to a JSON input file (IntegrationTestInput format) |
ADORA_TEST_WRITE_OUTPUTS_TO | Path for the output JSONL file (default: outputs.jsonl next to inputs) |
ADORA_TEST_NO_OUTPUT_TIME_OFFSET | If set, omit time offsets for deterministic outputs |
Operator API (adora-operator-api)
Operators are in-process components managed by the Adora runtime. They are compiled as shared libraries (.so/.dylib/.dll) and loaded by the runtime.
Add to your Cargo.toml:
[dependencies]
adora-operator-api = { workspace = true }
[lib]
crate-type = ["cdylib"]
AdoraOperator Trait
#![allow(unused)]
fn main() {
pub trait AdoraOperator: Default {
fn on_event(
&mut self,
event: &Event,
output_sender: &mut AdoraOutputSender,
) -> Result<AdoraStatus, String>;
}
}
Implement this trait to define your operator’s behavior. The runtime calls on_event for each incoming event. Return AdoraStatus to control execution flow.
Event (Operator)
The operator Event enum is simpler than the node Event and uses &str for IDs.
#![allow(unused)]
fn main() {
#[non_exhaustive]
pub enum Event<'a> {
// An input was received.
Input { id: &'a str, data: ArrowData },
// Failed to parse the input data as an Arrow array.
InputParseError { id: &'a str, error: String },
// An input was closed by the sender.
InputClosed { id: &'a str },
// The operator should stop.
Stop,
}
}
AdoraOutputSender
#![allow(unused)]
fn main() {
pub struct AdoraOutputSender<'a>(/* ... */);
impl AdoraOutputSender<'_> {
// Send an output. `id` is the output ID from your dataflow YAML.
pub fn send(&mut self, id: String, data: impl Array) -> Result<(), String>
}
}
AdoraStatus
Returned from on_event to control the operator lifecycle.
#![allow(unused)]
fn main() {
pub enum AdoraStatus {
Continue, // keep running, wait for the next event
Stop, // stop this operator
StopAll, // stop the entire dataflow
}
}
register_operator! Macro
Generates the FFI entry points required by the Adora runtime to load and call your operator.
#![allow(unused)]
fn main() {
use adora_operator_api::register_operator;
register_operator!(MyOperator);
}
This must be called exactly once per crate, at the top level, with the type that implements AdoraOperator.
Quick Start Example: Node
A minimal node that receives tick inputs and sends a random number as output.
use adora_node_api::{AdoraNode, Event, IntoArrow, adora_core::config::DataId};
fn main() -> eyre::Result<()> {
let (mut node, mut events) = AdoraNode::init_from_env()?;
let output = DataId::from("random".to_owned());
while let Some(event) = events.recv() {
match event {
Event::Input { id, metadata, data } => {
if id.as_str() == "tick" {
let value: u64 = fastrand::u64(..);
node.send_output(
output.clone(),
metadata.parameters,
value.into_arrow(),
)?;
}
}
Event::Stop(_) => {}
_ => {}
}
}
Ok(())
}
Corresponding dataflow YAML:
nodes:
- id: timer
path: adora/timer/millis/100
outputs:
- tick
- id: my-node
path: ./target/debug/my-node
inputs:
tick: timer/tick
outputs:
- random
- id: sink
path: ./target/debug/sink
inputs:
data: my-node/random
Quick Start Example: Operator
A minimal operator that counts ticks and forwards formatted messages.
#![allow(unused)]
#![warn(unsafe_op_in_unsafe_fn)]
fn main() {
use adora_operator_api::{
AdoraOperator, AdoraOutputSender, AdoraStatus, Event, IntoArrow, register_operator,
};
register_operator!(MyOperator);
#[derive(Debug, Default)]
struct MyOperator {
ticks: usize,
}
impl AdoraOperator for MyOperator {
fn on_event(
&mut self,
event: &Event,
output_sender: &mut AdoraOutputSender,
) -> Result<AdoraStatus, String> {
match event {
Event::Input { id, data } => match *id {
"tick" => {
self.ticks += 1;
let msg = format!("tick count: {}", self.ticks);
output_sender.send("status".into(), msg.into_arrow())?;
}
other => eprintln!("ignoring unexpected input {other}"),
},
Event::InputClosed { id } => {
if *id == "tick" {
return Ok(AdoraStatus::Stop);
}
}
Event::Stop => {}
other => {
eprintln!("received unknown event {other:?}");
}
}
Ok(AdoraStatus::Continue)
}
}
}
Corresponding dataflow YAML:
nodes:
- id: timer
path: adora/timer/millis/500
outputs:
- tick
- id: runtime-node
operator:
shared_library: ./target/debug/libmy_operator
inputs:
tick: timer/tick
outputs:
- status
Python API Reference
This document covers the Python APIs for building adora nodes, operators, and dataflows. Install with:
pip install adora-rs
Table of Contents
Node API
from adora import Node
The Node class is the primary interface for custom nodes. It connects to a running dataflow, receives input events, and sends outputs.
Node class
__init__(node_id=None)
Create a new node and connect to the running dataflow.
# Standard: node ID is read from environment variables set by the daemon
node = Node()
# Dynamic: connect to a running dataflow by explicit node ID
node = Node(node_id="my-dynamic-node")
Parameters:
node_id(str, optional) – Explicit node ID for dynamic nodes. When omitted, the node reads its identity from environment variables set by the adora daemon.
Raises: RuntimeError if the node cannot connect to the dataflow.
next(timeout=None)
Retrieve the next event from the event stream. Blocks until an event is available or the timeout expires.
event = node.next() # block indefinitely
event = node.next(timeout=2.0) # block up to 2 seconds
Parameters:
timeout(float, optional) – Maximum wait time in seconds.
Returns: dict – An event dictionary, or None if all senders have been dropped or the timeout expired.
drain()
Retrieve all buffered events without blocking.
events = node.drain()
for event in events:
print(event["type"])
Returns: list[dict] – A list of event dictionaries. Returns an empty list if no events are buffered.
try_recv()
Non-blocking receive. Returns the next buffered event if one is available.
event = node.try_recv()
if event is not None:
print(event["type"])
Returns: dict | None – An event dictionary, or None if no event is buffered.
recv_async(timeout=None)
Asynchronous receive. For use with asyncio.
event = await node.recv_async()
event = await node.recv_async(timeout=5.0)
Parameters:
timeout(float, optional) – Maximum wait time in seconds. Returns an error if the timeout is reached.
Returns: dict | None – An event dictionary, or None if all senders have been dropped.
Note: This method is experimental. The pyo3 async (Rust-Python FFI) integration is still in development.
is_empty()
Check whether there are any buffered events in the event stream.
if not node.is_empty():
event = node.try_recv()
Returns: bool
send_output(output_id, data, metadata=None)
Send data on an output channel.
import pyarrow as pa
# Send raw bytes
node.send_output("status", b"OK")
# Send an Apache Arrow array (zero-copy capable)
node.send_output("values", pa.array([1, 2, 3]))
# Send with metadata
node.send_output("image", pa.array(pixels), {"camera_id": "front"})
Parameters:
output_id(str) – The output name as declared in the dataflow YAML.data(bytes | pyarrow.Array) – The payload. Usebytesfor simple data orpyarrow.Arrayfor zero-copy shared-memory transport.metadata(dict, optional) – Key-value pairs attached to the message. Supported value types:bool,int,float,str,list[int],list[float],list[str],datetime.datetime.
Raises: RuntimeError if data is neither bytes nor a pyarrow.Array.
Service, action, and streaming patterns
Python nodes use the same metadata key conventions as Rust for communication patterns. Parameters are plain dicts with string keys.
Well-known metadata keys:
| Key | Description |
|---|---|
"request_id" | Service request/response correlation (UUID v7) |
"goal_id" | Action goal identification (UUID v7) |
"goal_status" | Action result status: "succeeded", "aborted", or "canceled" |
"session_id" | Streaming session identifier |
"segment_id" | Streaming segment within a session (integer) |
"seq" | Streaming chunk sequence number (integer) |
"fin" | Last chunk of a streaming segment (bool) |
"flush" | Discard older queued messages on input (bool) |
Service client example:
import uuid
# Send a request with a unique request_id
request_id = str(uuid.uuid7()) # Python 3.13+; use uuid_utils or uuid.uuid4() on older versions
node.send_output("request", data, {"request_id": request_id})
Service server example:
# Pass through the metadata (includes request_id) from the incoming request
node.send_output("response", result, event["metadata"])
Action client example:
goal_id = str(uuid.uuid7())
node.send_output("goal", data, {"goal_id": goal_id})
Streaming example (flush downstream queues on user interruption):
params = {
"session_id": session_id,
"segment_id": 1,
"seq": 0,
"fin": False,
"flush": True,
}
node.send_output("text", data, metadata={"parameters": params})
See patterns.md for the full guide.
Logging
Python nodes can log using either Python’s built-in logging module (recommended) or the explicit node API.
Python logging module (auto-bridged):
When Node() is created, it automatically installs a handler that routes Python’s logging module through the adora daemon. No configuration needed:
import logging
from adora import Node
node = Node() # Installs the logging bridge
logging.info("Sensor initialized") # -> structured "info" log entry
logging.warning("High temperature") # -> structured "warn" log entry
logging.debug("Raw bytes: %s", data) # -> structured "debug" log entry
These log entries are captured with full metadata (level, message, file path, line number) and work with min_log_level filtering, send_logs_as routing, and adora/logs subscribers.
Note: Do not call
logging.basicConfig()before creatingNode(). The constructor sets up the bridge; callingbasicConfig()first may install a conflicting handler.
Explicit node API:
log(level, message, target=None, fields=None)
Emit a structured log message with optional target and key-value fields.
node.log("info", "Processing frame", target="vision")
node.log("error", "Sensor timeout", fields={"sensor": "lidar", "retry": "3"})
Parameters:
level(str) – Log level:"error","warn","info","debug", or"trace".message(str) – The log message.target(str, optional) – Target module or subsystem name.fields(dict[str, str], optional) – Structured key-value context fields.
Works with the daemon’s min_log_level filtering, send_logs_as routing, and adora/logs subscribers.
log_error(message), log_warn(message), log_info(message), log_debug(message), log_trace(message)
Convenience methods for common log levels:
node.log_error("Connection failed")
node.log_warn("Temperature elevated")
node.log_info("Sensor initialized")
node.log_debug("Raw bytes received")
node.log_trace("Entering loop iteration")
Each is equivalent to node.log(level, message).
When to use which:
| Method | Structured? | Fields? | Best for |
|---|---|---|---|
logging.info() | Yes | No | General-purpose logging |
node.log("info", msg, fields={...}) | Yes | Yes | Structured context (sensor_id, etc.) |
node.log_info(msg) | Yes | No | Quick one-liner |
print() | No | No | Legacy code, quick debugging |
dataflow_descriptor()
Return the full dataflow descriptor (the parsed dataflow YAML) as a Python dictionary.
descriptor = node.dataflow_descriptor()
print(descriptor["nodes"])
Returns: dict
node_config()
Return the configuration block for this node from the dataflow descriptor.
config = node.node_config()
model_path = config.get("model", "default.pt")
Returns: dict
dataflow_id()
Return the unique identifier of the running dataflow.
print(node.dataflow_id()) # e.g. "a1b2c3d4-..."
Returns: str
is_restart()
Check whether this node was restarted after a previous exit or failure. Useful for deciding whether to restore saved state or start fresh.
if node.is_restart():
restore_checkpoint()
Returns: bool
restart_count()
Return how many times this node has been restarted. Returns 0 on the first run, 1 after the first restart, and so on.
print(f"Restart #{node.restart_count()}")
Returns: int
merge_external_events(subscription)
Merge a ROS2 subscription stream into the node’s main event loop. After calling this method, ROS2 messages arrive as events with kind set to "external".
from adora import Node, Ros2Context, Ros2Node, Ros2NodeOptions, Ros2Topic
node = Node()
ros2_context = Ros2Context()
ros2_node = ros2_context.new_node("listener", Ros2NodeOptions())
topic = Ros2Topic("/chatter", "std_msgs/String", ros2_node)
subscription = ros2_node.create_subscription(topic)
node.merge_external_events(subscription)
for event in node:
if event["kind"] == "external":
print("ROS2:", event["value"])
elif event["type"] == "INPUT":
print("Adora:", event["id"])
Parameters:
subscription(adora.Ros2Subscription) – A ROS2 subscription created via the adora ROS2 bridge.
Iteration support
The Node class implements __iter__ and __next__, so you can iterate directly:
for event in node:
match event["type"]:
case "INPUT":
process(event["value"])
case "STOP":
break
The iterator calls next() with no timeout on each iteration. It yields None when the event stream is closed, which terminates the loop.
Event dictionary
Events are returned as plain Python dictionaries. The structure depends on the event type.
INPUT
An input message arrived from another node.
{
"type": "INPUT",
"id": "camera_image", # input ID as declared in the dataflow YAML
"kind": "adora", # "adora" for dataflow events, "external" for ROS2
"value": <pyarrow.Array>, # the payload as an Apache Arrow array
"metadata": {
"timestamp": datetime, # UTC-aware datetime.datetime
"open_telemetry_context": "...", # tracing context (if enabled)
... # any user-supplied metadata
},
}
Access the data:
values = event["value"].to_pylist() # convert to Python list
array = event["value"].to_numpy() # convert to NumPy array
INPUT_CLOSED
An input channel was closed (the upstream node finished).
{
"type": "INPUT_CLOSED",
"id": "camera_image",
"kind": "adora",
}
STOP
The dataflow is shutting down.
{
"type": "STOP",
"id": "MANUAL" | "ALL_INPUTS_CLOSED", # stop cause
"kind": "adora",
}
ERROR
An error occurred in the runtime.
{
"type": "ERROR",
"error": "description of the error",
"kind": "adora",
}
External (ROS2)
When using merge_external_events, ROS2 messages arrive as:
{
"kind": "external",
"value": <pyarrow.Array>, # the ROS2 message as an Arrow array
}
AdoraStatus enum
Used as the return value from operator on_event methods to control the event loop.
from adora import AdoraStatus
| Value | Meaning |
|---|---|
AdoraStatus.CONTINUE | Continue processing events (value 0) |
AdoraStatus.STOP | Stop this operator (value 1) |
AdoraStatus.STOP_ALL | Stop the entire dataflow (value 2) |
Operator API
Operators run inside the adora runtime process (no separate OS process). They are defined as a Python class named Operator with an on_event method.
Operator class (user-defined)
Create a Python file with an Operator class:
from adora import AdoraStatus
class Operator:
def __init__(self):
# Initialize state here
self.count = 0
def on_event(self, adora_event, send_output) -> AdoraStatus:
if adora_event["type"] == "INPUT":
self.count += 1
# Process the input and optionally send output
send_output("result", b"processed", adora_event["metadata"])
return AdoraStatus.CONTINUE
Methods:
__init__(self)– Called once when the operator is loaded. Initialize any state or models here.on_event(self, adora_event, send_output) -> AdoraStatus– Called for every incoming event. Must return anAdoraStatusvalue.
Parameters of on_event:
adora_event(dict) – An event dictionary.send_output(callable) – Callback to send output data (see below).
The runtime also sets self.dataflow_descriptor on the operator instance with the parsed dataflow YAML as a dictionary.
send_output callback
The send_output callback is passed to on_event for sending data from an operator.
send_output(output_id, data, metadata=None)
Parameters:
output_id(str) – The output name as declared in the dataflow YAML.data(bytes | pyarrow.Array) – The payload.metadata(dict, optional) – Metadata to attach. Passadora_event["metadata"]to propagate tracing context.
Example:
import pyarrow as pa
from adora import AdoraStatus
class Operator:
def on_event(self, adora_event, send_output) -> AdoraStatus:
if adora_event["type"] == "INPUT":
result = pa.array([42], type=pa.int64())
send_output("output", result, adora_event["metadata"])
return AdoraStatus.CONTINUE
DataflowBuilder
from adora.builder import DataflowBuilder, Node, Operator, Output
Build dataflow YAML programmatically in Python.
DataflowBuilder class
__init__(name="adora-dataflow")
Create a new dataflow builder.
flow = DataflowBuilder("my-robot")
Parameters:
name(str, optional) – Name of the dataflow. Defaults to"adora-dataflow".
add_node(id, **kwargs) -> Node
Add a node to the dataflow. Returns a Node object for further configuration.
sender = flow.add_node("sender")
Parameters:
id(str) – Unique node identifier.**kwargs– Additional node configuration passed through to the YAML.
Returns: Node (builder)
to_yaml(path=None) -> str | None
Generate the YAML representation of the dataflow. If path is given, writes to file and returns None. Otherwise returns the YAML string.
# Write to file
flow.to_yaml("dataflow.yml")
# Get as string
yaml_str = flow.to_yaml()
Parameters:
path(str, optional) – File path to write the YAML.
Returns: str | None
Context manager
DataflowBuilder supports the with statement:
with DataflowBuilder("my-flow") as flow:
flow.add_node("sender").path("sender.py")
flow.to_yaml("dataflow.yml")
Node class (builder)
Returned by DataflowBuilder.add_node(). All setter methods return self for chaining.
path(path) -> Node
Set the path to the node’s executable or script.
node.path("my_node.py")
args(args) -> Node
Set command-line arguments for the node.
node.args("--verbose --port 8080")
env(env) -> Node
Set environment variables for the node.
node.env({"MODEL_PATH": "/models/yolo.pt"})
build(command) -> Node
Set the build command for the node (run before starting).
node.build("pip install -r requirements.txt")
git(url, branch=None, tag=None, rev=None) -> Node
Set a Git repository as the source for the node.
node.git("https://github.com/org/repo.git", branch="main")
add_operator(operator) -> Node
Attach an Operator to this node.
op = Operator("detector", python="object_detection.py")
node.add_operator(op)
add_output(output_id) -> Output
Declare an output on this node and return an Output reference for use as an input source.
output = sender.add_output("data")
add_input(input_id, source, queue_size=None, queue_policy=None) -> Node
Subscribe this node to an output from another node.
# Using an Output object
output = sender.add_output("data")
receiver.add_input("data", output)
# Using a string reference
receiver.add_input("tick", "adora/timer/millis/100")
# With a custom queue size
receiver.add_input("images", camera_output, queue_size=2)
# Lossless input (blocks sender when full)
receiver.add_input("commands", cmd_output, queue_size=100, queue_policy="backpressure")
Parameters:
input_id(str) – Name of the input on this node.source(str | Output) – Either a string ("node_id/output_id") or anOutputobject.queue_size(int, optional) – Maximum number of buffered messages for this input.queue_policy(str, optional) –"drop_oldest"(default) or"backpressure"(buffers up to 10xqueue_sizebefore dropping).
to_dict() -> dict
Return the dictionary representation of the node for YAML serialization.
Output class (builder)
Returned by Node.add_output(). Represents a reference to a node’s output, used as a source in add_input().
output = sender.add_output("data")
receiver.add_input("sensor_data", output)
str(output) # "sender/data"
Operator class (builder)
Defines an operator for embedding in a node’s YAML configuration.
__init__(id, name=None, description=None, build=None, python=None, shared_library=None, send_stdout_as=None)
op = Operator(
id="detector",
python="object_detection.py",
send_stdout_as="detection_text",
)
Parameters:
id(str) – Unique operator identifier.name(str, optional) – Display name.description(str, optional) – Human-readable description.build(str, optional) – Build command to run before loading.python(str, optional) – Path to the Python operator file.shared_library(str, optional) – Path to a shared library operator.send_stdout_as(str, optional) – Route the operator’s stdout as an output with this ID.
to_dict() -> dict
Return the dictionary representation for YAML serialization.
CUDA Module
from adora.cuda import torch_to_ipc_buffer, ipc_buffer_to_ipc_handle, open_ipc_handle
Utilities for zero-copy GPU tensor sharing between nodes via CUDA IPC. Requires PyTorch with CUDA and Numba with CUDA support.
torch_to_ipc_buffer(tensor) -> tuple[pyarrow.Array, dict]
Convert a PyTorch CUDA tensor into an Arrow array containing the CUDA IPC handle, plus a metadata dictionary. Send both through the dataflow to share GPU memory without copying.
import torch
import pyarrow as pa
from adora import Node
from adora.cuda import torch_to_ipc_buffer
node = Node()
tensor = torch.randn(1024, 768, device="cuda")
ipc_buffer, metadata = torch_to_ipc_buffer(tensor)
node.send_output("gpu_data", ipc_buffer, metadata)
Parameters:
tensor(torch.Tensor) – A CUDA tensor.
Returns: tuple[pyarrow.Array, dict] – The IPC handle as an int8 Arrow array, and metadata with shape, strides, dtype, size, offset, and source info.
ipc_buffer_to_ipc_handle(handle_buffer, metadata) -> IpcHandle
Reconstruct a CUDA IPC handle from a received Arrow buffer and metadata.
from adora.cuda import ipc_buffer_to_ipc_handle
event = node.next()
ipc_handle = ipc_buffer_to_ipc_handle(event["value"], event["metadata"])
Parameters:
handle_buffer(pyarrow.Array) – The Arrow array fromevent["value"].metadata(dict) – The metadata fromevent["metadata"].
Returns: numba.cuda.cudadrv.driver.IpcHandle
open_ipc_handle(ipc_handle, metadata) -> ContextManager[torch.Tensor]
Open a CUDA IPC handle and yield a PyTorch tensor. Use as a context manager to ensure proper cleanup.
from adora.cuda import ipc_buffer_to_ipc_handle, open_ipc_handle
event = node.next()
ipc_handle = ipc_buffer_to_ipc_handle(event["value"], event["metadata"])
with open_ipc_handle(ipc_handle, event["metadata"]) as tensor:
result = tensor * 2 # use the GPU tensor directly
Parameters:
ipc_handle(IpcHandle) – Handle fromipc_buffer_to_ipc_handle.metadata(dict) – The metadata dictionary with shape, strides, and dtype info.
Returns: Context manager yielding a torch.Tensor on CUDA.
Quick Start Example
A complete node that receives images, processes them, and sends results:
#!/usr/bin/env python3
"""Example node: receives messages, transforms them, and sends output."""
import logging
import pyarrow as pa
from adora import Node
def main():
node = Node()
for event in node:
if event["type"] == "INPUT":
input_id = event["id"]
if input_id == "message":
values = event["value"].to_pylist()
number = values[0]
# Create a struct array with multiple fields
result = pa.StructArray.from_arrays(
[
pa.array([number * 2]),
pa.array([f"Message #{number}"]),
],
names=["doubled", "description"],
)
node.send_output("transformed", result)
logging.info("Transformed message %d", number)
elif event["type"] == "STOP":
logging.info("Node stopping")
break
if __name__ == "__main__":
main()
Run with:
adora run dataflow.yml
DataflowBuilder Example
Build a dataflow programmatically instead of writing YAML by hand:
#!/usr/bin/env python3
"""Build a simple sender -> receiver dataflow."""
from adora.builder import DataflowBuilder, Operator
flow = DataflowBuilder("example-flow")
# Add a timer-driven sender node
sender = flow.add_node("sender")
sender.path("sender.py")
tick_output = sender.add_output("message")
# Add a receiver that subscribes to the sender
receiver = flow.add_node("receiver")
receiver.path("receiver.py")
receiver.add_input("message", tick_output)
# Add a node with a timer input
timed_node = flow.add_node("periodic")
timed_node.path("periodic.py")
timed_node.add_input("tick", "adora/timer/millis/100")
# Add a node with an operator
runtime_node = flow.add_node("runtime-node")
op = Operator("detector", python="object_detection.py")
runtime_node.add_operator(op)
runtime_node.add_input("image", "camera/image")
# Write or print the YAML
flow.to_yaml("dataflow.yml")
print(flow.to_yaml())
C API Reference
This document covers the two C APIs provided by the Adora framework: the Node API for standalone C processes and the Operator API for shared-library operators loaded by the Adora runtime.
Table of Contents
- Node API (adora-node-api-c)
- Operator API (adora-operator-api-c)
- Node Example
- Operator Example
- Building and Linking
Node API (adora-node-api-c)
Header: apis/c/node/node_api.h
Crate: adora-node-api-c (builds as staticlib)
The Node API is used by standalone C executables that participate in an Adora dataflow as external processes. The daemon spawns the process and sets environment variables that the node reads during initialization.
Initialization
init_adora_context_from_env
void *init_adora_context_from_env();
Initializes an Adora node context from environment variables set by the daemon. Returns an opaque pointer to the context on success, or NULL on failure.
The returned pointer must be passed to all subsequent Node API calls that expect a context argument. When the node is finished, free it with free_adora_context.
free_adora_context
void free_adora_context(void *adora_context);
Frees a context previously created by init_adora_context_from_env. Each context must be freed exactly once. After freeing, the pointer must not be used again.
Event Loop
adora_next_event
void *adora_next_event(void *adora_context);
Blocks until the next event is available for this node. Returns an opaque pointer to the event, or NULL when all event streams have closed (indicating the node should exit).
The returned pointer must not be dereferenced directly. Use the read_adora_* functions to extract the event type and payload. Free the event with free_adora_event when done.
free_adora_event
void free_adora_event(void *adora_event);
Frees an event previously returned by adora_next_event. Each event must be freed exactly once. After freeing, the event pointer and all derived pointers (from read_adora_input_id, read_adora_input_data) become invalid.
Event Inspection
read_adora_event_type
enum AdoraEventType read_adora_event_type(void *adora_event);
Returns the type of the given event. See AdoraEventType for possible values.
read_adora_input_id
void read_adora_input_id(void *adora_event, char **out_ptr, size_t *out_len);
Reads the input ID from an AdoraEventType_Input event. Writes the string start pointer to *out_ptr and its byte length to *out_len. The string is valid UTF-8 but not null-terminated; use out_len to determine its bounds.
If the event is not an input event, sets *out_ptr = NULL and *out_len = 0.
The returned pointer borrows from the event. It becomes invalid after free_adora_event is called.
read_adora_input_data
void read_adora_input_data(void *adora_event, char **out_ptr, size_t *out_len);
Reads the raw data bytes from an AdoraEventType_Input event. Writes the data start pointer to *out_ptr and its byte length to *out_len.
Sets *out_ptr = NULL and *out_len = 0 if the event is not an input event or the input carries no data.
Currently only UInt8 Arrow arrays are supported. Other Arrow data types will cause a runtime panic. Future versions will use the Arrow C Data Interface for full type support.
The returned pointer borrows from the event. It becomes invalid after free_adora_event is called.
read_adora_input_timestamp
unsigned long long read_adora_input_timestamp(void *adora_event);
Returns the hybrid logical clock timestamp from an input event’s metadata as a uint64 value. Returns 0 if the event is not an input event.
Output
adora_send_output
int adora_send_output(
void *adora_context,
const char *id_ptr,
size_t id_len,
const char *data_ptr,
size_t data_len
);
Sends output data to all downstream subscribers. The output ID (id_ptr/id_len) must be a valid UTF-8 string matching one of the node’s declared outputs in the dataflow YAML. The data (data_ptr/data_len) is sent as raw bytes (UInt8 Arrow array).
Returns 0 on success, -1 on error. Errors are logged via tracing.
Returns -1 immediately if any pointer argument is NULL.
Logging
adora_log
int adora_log(
void *adora_context,
const char *level_ptr,
size_t level_len,
const char *msg_ptr,
size_t msg_len
);
Sends a structured log message through the Adora logging pipeline. Both level and msg must be valid UTF-8 strings.
Valid log levels: "error", "warn", "info", "debug", "trace".
Returns 0 on success, -1 on error. Returns -1 immediately if any pointer argument is NULL.
Enums
AdoraEventType
enum AdoraEventType {
AdoraEventType_Stop, // Graceful shutdown requested
AdoraEventType_Input, // New input data available
AdoraEventType_InputClosed, // An input stream was closed
AdoraEventType_Error, // An error occurred
AdoraEventType_Unknown, // Unrecognized event type
};
Operator API (adora-operator-api-c)
Headers: apis/c/operator/operator_api.h, apis/c/operator/operator_types.h
Crate: adora-operator-api-c
The Operator API is used by shared libraries (.so/.dylib/.dll) loaded into the Adora runtime process. Unlike nodes, operators do not have their own main function. Instead, they export three functions that the runtime calls at the appropriate lifecycle points.
The operator_types.h header is auto-generated by safer-ffi and defines all C-compatible struct and enum types.
Lifecycle Functions
adora_init_operator
AdoraInitResult_t adora_init_operator(void);
Called once when the runtime loads the operator. Allocate and initialize any operator state, then return it via the operator_context field. The runtime passes this pointer back on every subsequent call.
Return an AdoraInitResult_t with .result.error = NULL on success.
adora_drop_operator
AdoraResult_t adora_drop_operator(void *operator_context);
Called once when the operator is being unloaded. Free all resources associated with operator_context.
Return an AdoraResult_t with .error = NULL on success.
Event Handling
adora_on_event
OnEventResult_t adora_on_event(
RawEvent_t *event,
const SendOutput_t *send_output,
void *operator_context
);
Called by the runtime each time an event arrives for this operator. Inspect the event fields to determine the event type:
| Field | Meaning |
|---|---|
event->input != NULL | New input available |
event->stop == true | Graceful shutdown requested |
event->error.ptr != NULL | An error occurred (UTF-8 string in error.ptr/error.len) |
event->input_closed.ptr != NULL | An input stream closed (input ID in input_closed.ptr/input_closed.len) |
Use send_output to emit data to downstream nodes (see adora_send_operator_output). Return an OnEventResult_t with the appropriate AdoraStatus_t to control the operator lifecycle.
Input Reading
adora_read_input_id
char *adora_read_input_id(const Input_t *input);
Returns a newly allocated null-terminated string containing the input ID. The caller must free it with adora_free_input_id.
adora_read_data
Vec_uint8_t adora_read_data(Input_t *input);
Reads the input data as a byte array. Consumes the underlying Arrow array from the input (the data can only be read once per event). Returns a Vec_uint8_t with .ptr = NULL if the input has no data or the data has already been consumed.
The caller must free the returned data with adora_free_data.
Output Sending
adora_send_operator_output
AdoraResult_t adora_send_operator_output(
const SendOutput_t *send_output,
const char *id,
const uint8_t *data_ptr,
size_t data_len
);
Sends output data to downstream subscribers. The id must be a null-terminated string matching one of the operator’s declared outputs. The data (data_ptr/data_len) is converted to a UInt8 Arrow array internally.
Returns an AdoraResult_t with .error = NULL on success.
Memory Management
The Operator API allocates memory that the caller must free using the corresponding functions:
| Allocation source | Free function |
|---|---|
adora_read_input_id | adora_free_input_id |
adora_read_data | adora_free_data |
void adora_free_input_id(char *input_id);
void adora_free_data(Vec_uint8_t data);
Failing to call these functions will leak memory. Do not use free() on these allocations – they are allocated by the Rust runtime and must be freed through the API.
Structs
Vec_uint8_t
typedef struct Vec_uint8 {
uint8_t *ptr;
size_t len;
size_t cap;
} Vec_uint8_t;
A Rust-allocated byte vector. Access len bytes starting at ptr. Do not modify cap. Free with adora_free_data.
AdoraResult_t
typedef struct AdoraResult {
Vec_uint8_t *error; // NULL on success, points to error string on failure
} AdoraResult_t;
Generic result type. A NULL error pointer indicates success. When non-NULL, the error pointer contains a UTF-8 error message.
AdoraInitResult_t
typedef struct AdoraInitResult {
AdoraResult_t result;
void *operator_context; // opaque pointer to operator state
} AdoraInitResult_t;
Returned by adora_init_operator. On success, result.error is NULL and operator_context holds the operator state pointer.
OnEventResult_t
typedef struct OnEventResult {
AdoraResult_t result;
AdoraStatus_t status;
} OnEventResult_t;
Returned by adora_on_event. Contains both an error/success result and a status code controlling the operator lifecycle.
RawEvent_t
typedef struct RawEvent {
Input_t *input; // non-NULL when this is an input event
Vec_uint8_t input_closed; // non-empty when an input stream closed
bool stop; // true when shutdown is requested
Vec_uint8_t error; // non-empty on error
} RawEvent_t;
Represents an event delivered to the operator. Multiple fields may be set simultaneously; check them in order of priority.
Input_t
typedef struct Input Input_t; // opaque
Opaque type representing an input event’s data. Use adora_read_input_id and adora_read_data to extract its contents.
Output_t
typedef struct Output Output_t; // opaque
Opaque type used internally by adora_send_operator_output. Not created directly by user code.
SendOutput_t
typedef struct SendOutput {
ArcDynFn1_AdoraResult_Output_t send_output;
} SendOutput_t;
Callback handle passed to adora_on_event. Pass it to adora_send_operator_output to emit data. Do not store it beyond the scope of the current adora_on_event call.
Metadata_t
typedef struct Metadata {
Vec_uint8_t open_telemetry_context;
} Metadata_t;
Event metadata containing an OpenTelemetry trace context string.
Operator Enums
AdoraStatus_t
enum AdoraStatus {
ADORA_STATUS_CONTINUE = 0, // Keep running
ADORA_STATUS_STOP = 1, // Stop this operator
ADORA_STATUS_STOP_ALL = 2, // Stop the entire dataflow
};
typedef uint8_t AdoraStatus_t;
Returned in OnEventResult_t to control operator lifecycle after processing an event.
Node Example
A complete C node that receives timer ticks and sends output messages:
#include <stdio.h>
#include <string.h>
#include "node_api.h"
int main() {
void *ctx = init_adora_context_from_env();
if (ctx == NULL) {
fprintf(stderr, "failed to init adora context\n");
return 1;
}
for (int i = 0; i < 100; i++) {
void *event = adora_next_event(ctx);
if (event == NULL)
break; // all streams closed
enum AdoraEventType ty = read_adora_event_type(event);
if (ty == AdoraEventType_Input) {
char *id;
size_t id_len;
read_adora_input_id(event, &id, &id_len);
// Send a response
char out_id[] = "message";
char out_data[64];
int out_len = snprintf(out_data, sizeof(out_data),
"iteration %d", i);
adora_send_output(ctx, out_id, strlen(out_id),
out_data, out_len);
} else if (ty == AdoraEventType_Stop) {
free_adora_event(event);
break;
}
free_adora_event(event);
}
free_adora_context(ctx);
return 0;
}
Dataflow YAML for the node:
nodes:
- id: c_node
path: build/c_node
inputs:
timer: adora/timer/millis/100
outputs:
- message
Operator Example
A complete C operator that reads input, maintains state, and sends output:
#include "operator_api.h"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
AdoraInitResult_t adora_init_operator(void) {
// Allocate operator state (a simple counter)
int *counter = (int *)calloc(1, sizeof(int));
AdoraInitResult_t result = {.operator_context = counter};
return result;
}
AdoraResult_t adora_drop_operator(void *operator_context) {
free(operator_context);
AdoraResult_t result = {.error = NULL};
return result;
}
OnEventResult_t adora_on_event(
RawEvent_t *event,
const SendOutput_t *send_output,
void *operator_context)
{
OnEventResult_t result = {.status = ADORA_STATUS_CONTINUE};
int *counter = (int *)operator_context;
if (event->input != NULL) {
char *id = adora_read_input_id(event->input);
Vec_uint8_t data = adora_read_data(event->input);
if (data.ptr != NULL) {
*counter += 1;
printf("received input '%s', counter: %d\n", id, *counter);
// Send counter value as string
char buf[64];
int len = snprintf(buf, sizeof(buf), "count=%d", *counter);
result.result = adora_send_operator_output(
send_output, "counter", (uint8_t *)buf, len);
adora_free_data(data);
}
adora_free_input_id(id);
}
if (event->stop) {
result.status = ADORA_STATUS_STOP;
}
return result;
}
Dataflow YAML for the operator:
nodes:
- id: runtime-node
operators:
- id: c_operator
shared-library: build/operator
inputs:
data: source_node/output
outputs:
- counter
Building and Linking
Node (static library)
C nodes link against adora-node-api-c, which builds as a static library.
Step 1: Build the static library
cargo build -p adora-node-api-c --release
This produces target/release/libadora_node_api_c.a (or .lib on Windows).
Step 2: Compile and link
clang node.c -ladora_node_api_c -L ../../target/release -o build/c_node <FLAGS>
Platform-specific linker flags:
| Platform | Flags |
|---|---|
| Linux | -lm -lrt -ldl -pthread |
| macOS | -framework CoreServices -framework Security -lSystem -lresolv -lpthread -lc -lm |
| Windows | -ladvapi32 -luserenv -lkernel32 -lws2_32 -lbcrypt -lncrypt -lschannel -lntdll -liphlpapi -lcfgmgr32 -lcredui -lcrypt32 -lcryptnet -lfwpuclnt -lgdi32 -lmsimg32 -lmswsock -lole32 -lopengl32 -lsecur32 -lshell32 -lsynchronization -luser32 -lwinspool -Wl,-nodefaultlib:libcmt -D_DLL -lmsvcrt |
On Windows, add the .exe extension to the output file.
Operator (shared library)
C operators are compiled into shared libraries that the Adora runtime loads at startup.
Step 1: Compile to object file
clang -c operator.c -o build/operator.o -fdeclspec -fPIC
Omit -fPIC on Windows.
Step 2: Link as shared library
# Linux
clang -shared build/operator.o -o build/liboperator.so
# macOS
clang -shared build/operator.o -o build/liboperator.dylib
# Windows
clang -shared build/operator.o -o build/operator.dll
Step 3: Reference in dataflow YAML
operators:
- id: c_operator
shared-library: build/operator # without lib prefix or extension
inputs:
data: source/output
outputs:
- result
The shared-library path omits the platform-specific prefix (lib) and extension (.so/.dylib/.dll). The runtime resolves the correct file for the current platform.
Include Paths
The Node API header is at apis/c/node/node_api.h. The Operator API headers are at apis/c/operator/operator_api.h and apis/c/operator/operator_types.h. Adjust your include paths accordingly:
# Node
clang -I path/to/adora/apis/c/node node.c ...
# Operator
clang -I path/to/adora/apis/c/operator operator.c ...
C++ Compatibility
Both headers include extern "C" guards (in the operator headers) or use C-compatible declarations (in the node header), so they can be included directly from C++ source files.
C++ API Reference
Adora provides C++ bindings for both standalone nodes and in-process operators via CXX (Rust-C++ interop). The CXX bridge generates type-safe C++ headers from Rust definitions – no raw FFI or manual extern "C" declarations are needed.
Two crates provide the C++ surface:
| Crate | Library | Use case |
|---|---|---|
adora-node-api-cxx | libadora_node_api_cxx.a | Standalone node executable |
adora-operator-api-cxx | libadora_operator_api_cxx.a | Shared-library operator loaded by the runtime |
Generated headers: adora-node-api.h and adora-operator-api.h.
Node API (adora-node-api-cxx)
Initialization
#include "adora-node-api.h"
// Initialize a node from environment variables set by the Adora daemon.
// Returns an AdoraNode struct containing the event stream and output sender.
// Throws on failure.
AdoraNode init_adora_node();
AdoraNode
Returned by init_adora_node(). Owns the event stream and the output sender for the lifetime of the node.
struct AdoraNode {
rust::Box<Events> events; // event stream (blocking receiver)
rust::Box<OutputSender> send_output; // output sender
};
Events
Opaque Rust type exposed to C++. Provides blocking iteration over the node’s incoming events.
// Member function -- call on the boxed object directly.
rust::Box<AdoraEvent> Events::next();
// Free function form -- equivalent to events->next().
rust::Box<AdoraEvent> next_event(rust::Box<Events>& events);
Both forms block until the next event arrives and return an owned AdoraEvent.
AdoraEvent
Opaque Rust type. Inspect its kind with event_type(), then downcast with event_as_input() or event_as_arrow_input().
// Determine the event kind.
AdoraEventType event_type(const rust::Box<AdoraEvent>& event);
// Downcast to a raw-byte input. Throws if the event is not Input.
AdoraInput event_as_input(rust::Box<AdoraEvent> event);
// Downcast to an Arrow FFI input (writes Arrow C Data Interface structs).
// out_array and out_schema must point to valid ArrowArray / ArrowSchema structs.
// Returns AdoraResult with empty error on success.
AdoraResult event_as_arrow_input(
rust::Box<AdoraEvent> event,
uint8_t* out_array,
uint8_t* out_schema);
// Same as above, but also returns the input ID and metadata.
ArrowInputInfo event_as_arrow_input_with_info(
rust::Box<AdoraEvent> event,
uint8_t* out_array,
uint8_t* out_schema);
AdoraEventType
enum class AdoraEventType : uint8_t {
Stop, // graceful shutdown requested
Input, // new data arrived on an input
InputClosed, // a single input was closed
Error, // an error occurred
Unknown, // unrecognized event variant
AllInputsClosed, // all inputs closed (stream ended)
};
AdoraInput
Returned by event_as_input(). Contains raw bytes.
struct AdoraInput {
rust::String id; // input identifier (e.g. "tick", "image")
rust::Vec<uint8_t> data; // raw payload bytes
};
ArrowInputInfo
Returned by event_as_arrow_input_with_info(). Contains the input ID, metadata, and an error string.
struct ArrowInputInfo {
rust::String id; // input identifier
rust::Box<Metadata> metadata; // attached metadata
rust::String error; // empty on success
};
AdoraResult
Returned by output-sending functions. Check the error field – empty means success.
struct AdoraResult {
rust::String error; // empty string on success
};
OutputSender
Opaque Rust type. All methods take rust::Box<OutputSender>& as the first argument (the sender from AdoraNode::send_output).
send_output
Send raw bytes on a named output.
AdoraResult send_output(
rust::Box<OutputSender>& sender,
rust::String id,
rust::Slice<const uint8_t> data);
send_output_with_metadata
Send raw bytes with attached metadata.
AdoraResult send_output_with_metadata(
rust::Box<OutputSender>& sender,
rust::String id,
rust::Slice<const uint8_t> data,
rust::Box<Metadata> metadata);
send_arrow_output
Send an Arrow array via the C Data Interface. The pointers must reference valid ArrowArray and ArrowSchema structs. Ownership of the Arrow data transfers to Rust on success.
AdoraResult send_arrow_output(
rust::Box<OutputSender>& sender,
rust::String id,
uint8_t* array_ptr,
uint8_t* schema_ptr);
// Overload with metadata (same C++ name via cxx_name attribute).
AdoraResult send_arrow_output(
rust::Box<OutputSender>& sender,
rust::String id,
uint8_t* array_ptr,
uint8_t* schema_ptr,
rust::Box<Metadata> metadata);
log_message
Send a log message through the Adora logging system.
AdoraResult log_message(
const rust::Box<OutputSender>& sender,
rust::String level, // e.g. "info", "warn", "error"
rust::String message);
Metadata
Opaque Rust type for attaching typed key-value pairs to outputs.
Construction
rust::Box<Metadata> new_metadata();
Reading
uint64_t Metadata::timestamp() const;
bool Metadata::get_bool(const rust::Str key) const; // throws on missing/wrong type
int64_t Metadata::get_int(const rust::Str key) const;
double Metadata::get_float(const rust::Str key) const;
rust::String Metadata::get_str(const rust::Str key) const;
rust::Vec<int64_t> Metadata::get_list_int(const rust::Str key) const;
rust::Vec<double> Metadata::get_list_float(const rust::Str key) const;
rust::Vec<rust::String> Metadata::get_list_string(const rust::Str key) const;
int64_t Metadata::get_timestamp(const rust::Str key) const; // nanoseconds since epoch
rust::String Metadata::get_json(const rust::Str key) const; // single value as JSON string
Writing
All setters throw on failure.
void Metadata::set_bool(const rust::Str key, bool value);
void Metadata::set_int(const rust::Str key, int64_t value);
void Metadata::set_float(const rust::Str key, double value);
void Metadata::set_string(const rust::Str key, rust::String value);
void Metadata::set_list_int(const rust::Str key, rust::Vec<int64_t> value);
void Metadata::set_list_float(const rust::Str key, rust::Vec<double> value);
void Metadata::set_list_string(const rust::Str key, rust::Vec<rust::String> value);
void Metadata::set_timestamp(const rust::Str key, int64_t nanos); // nanoseconds since epoch
Introspection
MetadataValueType Metadata::type(const rust::Str key) const; // throws if key missing
rust::String Metadata::to_json() const; // full metadata as JSON
rust::Vec<rust::String> Metadata::list_keys() const;
MetadataValueType
enum class MetadataValueType : uint8_t {
Bool,
Integer,
Float,
String,
ListInt,
ListFloat,
ListString,
Timestamp,
};
Service, Action, and Streaming Patterns
C++ nodes can implement communication patterns using the metadata API. The well-known metadata keys are:
| Key | Description |
|---|---|
"request_id" | Service request/response correlation (UUID v7) |
"goal_id" | Action goal identification (UUID v7) |
"goal_status" | Action result status: "succeeded", "aborted", or "canceled" |
"session_id" | Streaming session identifier |
"segment_id" | Streaming segment within a session (integer) |
"seq" | Streaming chunk sequence number (integer) |
"fin" | Last chunk of a streaming segment (bool) |
"flush" | Discard older queued messages on input (bool) |
// Service server: pass through request_id from input metadata
auto input_metadata = event_as_arrow_input_with_info(event);
send_output_with_metadata(sender, "response", result, std::move(input_metadata.metadata));
// Action server: set goal_id and goal_status on result
auto meta = new_metadata();
meta->set_string("goal_id", goal_id);
meta->set_string("goal_status", "succeeded");
send_output_with_metadata(sender, "result", result_data, std::move(meta));
CombinedEvents (ROS2 integration)
When using the optional ros2-bridge feature, node events and ROS2 subscription events can be merged into a single stream.
// Convert Adora events into a combined stream.
CombinedEvents adora_events_into_combined(rust::Box<Events> events);
// Create an empty combined stream (for ROS2-only nodes).
CombinedEvents empty_combined_events();
CombinedEvents struct
struct CombinedEvents {
rust::Box<MergedEvents> events;
CombinedEvent next(); // blocking -- returns the next merged event
};
CombinedEvent struct
struct CombinedEvent {
rust::Box<MergedAdoraEvent> event;
bool is_adora() const; // true if this is a standard Adora event
};
// Downcast a combined event back to an AdoraEvent. Throws if not an Adora event.
rust::Box<AdoraEvent> downcast_adora(CombinedEvent event);
ROS2 subscriptions add their own events to the merged stream. Use subscription->matches(event) and subscription->downcast(event) to handle ROS2-specific events (see the ROS2 Bridge docs).
Operator API (adora-operator-api-cxx)
Operators are shared libraries loaded by the Adora runtime. The C++ side implements two functions that the CXX bridge calls into.
Required C++ interface
You must provide a header operator.h and an implementation file. The header declares an Operator class and two free functions:
// operator.h
#pragma once
#include <memory>
#include "adora-operator-api.h"
class Operator {
public:
Operator();
// Add any state your operator needs.
};
std::unique_ptr<Operator> new_operator();
AdoraOnInputResult on_input(
Operator& op,
rust::Str id,
rust::Slice<const uint8_t> data,
OutputSender& output_sender);
new_operator()– called once at startup; returns the operator instance.on_input()– called for every input event; process data and optionally send outputs.
OutputSender (operator)
Available inside on_input(). Sends data on a named output.
AdoraSendOutputResult send_output(
OutputSender& sender,
rust::Str id,
rust::Slice<const uint8_t> data);
Result types
struct AdoraOnInputResult {
rust::String error; // empty on success
bool stop; // true to request graceful shutdown
};
struct AdoraSendOutputResult {
rust::String error; // empty on success
};
Quick Start: Node Example
A minimal node that receives timer ticks and sends a counter.
#include "adora-node-api.h"
#include <iostream>
#include <vector>
int main() {
auto adora_node = init_adora_node();
unsigned char counter = 0;
for (;;) {
auto event = next_event(adora_node.events);
auto ty = event_type(event);
if (ty == AdoraEventType::AllInputsClosed) {
break;
}
if (ty == AdoraEventType::Stop) {
break;
}
if (ty == AdoraEventType::Input) {
auto input = event_as_input(std::move(event));
counter += 1;
std::cout << "Input: " << std::string(input.id)
<< " counter=" << (int)counter << std::endl;
std::vector<unsigned char> out{counter};
rust::Slice<const uint8_t> slice{out.data(), out.size()};
auto result = send_output(adora_node.send_output, "counter", slice);
if (!result.error.empty()) {
std::cerr << "Send error: " << std::string(result.error) << std::endl;
return 1;
}
}
}
return 0;
}
Dataflow YAML:
nodes:
- id: cxx-node
path: build/my_node
inputs:
tick: adora/timer/millis/300
outputs:
- counter
Quick Start: Arrow Node Example
A node that receives and sends Arrow arrays via the C Data Interface, with metadata.
#include "adora-node-api.h"
#include <arrow/api.h>
#include <arrow/c/bridge.h>
#include <iostream>
int main() {
auto adora_node = init_adora_node();
for (int i = 0; i < 10; i++) {
auto event = adora_node.events->next();
auto ty = event_type(event);
if (ty == AdoraEventType::AllInputsClosed || ty == AdoraEventType::Stop) {
break;
}
if (ty == AdoraEventType::Input) {
// Receive Arrow input with metadata
struct ArrowArray c_array;
struct ArrowSchema c_schema;
auto info = event_as_arrow_input_with_info(
std::move(event),
reinterpret_cast<uint8_t*>(&c_array),
reinterpret_cast<uint8_t*>(&c_schema));
if (!info.error.empty()) {
std::cerr << std::string(info.error) << std::endl;
continue;
}
std::cout << "Input: " << std::string(info.id)
<< " ts=" << info.metadata->timestamp() << std::endl;
auto imported = arrow::ImportArray(&c_array, &c_schema);
auto array = imported.ValueOrDie();
std::cout << "Arrow: " << array->ToString() << std::endl;
// Build an output Arrow array
arrow::Int32Builder builder;
builder.Append(i * 10);
std::shared_ptr<arrow::Array> out_array;
builder.Finish(&out_array);
// Export and send with metadata
struct ArrowArray out_c_array;
struct ArrowSchema out_c_schema;
arrow::ExportArray(*out_array, &out_c_array, &out_c_schema);
auto meta = new_metadata();
meta->set_string("source", "cpp-arrow-node");
meta->set_int("iteration", i);
auto result = send_arrow_output(
adora_node.send_output, "counter",
reinterpret_cast<uint8_t*>(&out_c_array),
reinterpret_cast<uint8_t*>(&out_c_schema),
std::move(meta));
if (!result.error.empty()) {
std::cerr << "Send error: " << std::string(result.error) << std::endl;
}
}
}
return 0;
}
Quick Start: Operator Example
A minimal operator shared library.
// operator.cc
#include "operator.h"
#include <iostream>
#include <vector>
Operator::Operator() {}
std::unique_ptr<Operator> new_operator() {
return std::make_unique<Operator>();
}
AdoraOnInputResult on_input(
Operator& op,
rust::Str id,
rust::Slice<const uint8_t> data,
OutputSender& output_sender)
{
op.counter += 1;
std::vector<unsigned char> out{op.counter};
rust::Slice<const uint8_t> slice{out.data(), out.size()};
auto send_result = send_output(output_sender, rust::Str("status"), slice);
return AdoraOnInputResult{send_result.error, false};
}
Dataflow YAML:
nodes:
- id: runtime-node
operators:
- id: my-operator
shared-library: build/my_operator
inputs:
data: some-node/output
outputs:
- status
Build Integration (CMake)
The recommended build approach uses CMake with the DoraTargets.cmake helper (see examples/cmake-dataflow/).
Project structure
my-project/
CMakeLists.txt
DoraTargets.cmake # copied from examples/cmake-dataflow/
node/main.cc
operator/operator.h
operator/operator.cc
dataflow.yml
CMakeLists.txt
cmake_minimum_required(VERSION 3.21)
project(my-dataflow LANGUAGES C CXX)
set(CMAKE_CXX_STANDARD 20)
set(CMAKE_CXX_FLAGS "-fPIC")
include(DoraTargets.cmake)
link_directories(${adora_link_dirs})
# Standalone node (executable)
add_executable(my_node node/main.cc ${node_bridge})
add_dependencies(my_node Adora_cxx)
target_include_directories(my_node PRIVATE ${adora_cxx_include_dir})
target_link_libraries(my_node adora_node_api_cxx)
# Operator (shared library)
add_library(my_operator SHARED
operator/operator.cc ${operator_bridge})
add_dependencies(my_operator Adora_cxx)
target_include_directories(my_operator PRIVATE
${adora_cxx_include_dir} ${adora_c_include_dir}
${CMAKE_CURRENT_SOURCE_DIR}/operator)
target_link_libraries(my_operator adora_operator_api_cxx)
install(TARGETS my_node DESTINATION ${CMAKE_CURRENT_SOURCE_DIR}/bin)
install(TARGETS my_operator DESTINATION ${CMAKE_CURRENT_SOURCE_DIR}/lib)
What DoraTargets.cmake provides
| Variable | Description |
|---|---|
adora_cxx_include_dir | Path to generated CXX headers (adora-node-api.h, adora-operator-api.h) |
adora_c_include_dir | Path to C API headers (for mixed C/C++ projects) |
adora_link_dirs | Library search path for libadora_node_api_cxx.a / libadora_operator_api_cxx.a |
node_bridge | Generated CXX bridge source file for nodes (node_bridge.cc) |
operator_bridge | Generated CXX bridge source file for operators (operator_bridge.cc) |
Adora_cxx | CMake target dependency that builds the CXX crates |
Build steps
# Option A: Build against local Adora source
mkdir build && cd build
cmake .. -DDORA_ROOT_DIR=/path/to/adora
cmake --build .
# Option B: Build against Adora from GitHub (cloned automatically)
mkdir build && cd build
cmake ..
cmake --build .
Requirements
- C++20 compiler
- Rust toolchain (for building the Adora static libraries via Cargo)
- CMake 3.21+
- For Arrow integration: Apache Arrow C++ library
CXX Bridge Notes
- All Rust opaque types (
Events,OutputSender,AdoraEvent,Metadata,MergedEvents,MergedAdoraEvent) are accessed throughrust::Box<T>. rust::String,rust::Vec<T>, andrust::Slice<const T>are CXX bridge types that interoperate with their C++ standard library counterparts. See the CXX type reference.- Functions that return
Result<T>in Rust throw C++ exceptions on the error path. - Arrow FFI functions (
event_as_arrow_input,send_arrow_output) areunsafeon the Rust side. The caller must pass valid pointers toArrowArray/ArrowSchemastructs cast touint8_t*. - The node library is a static archive (
staticlib). Link it into your executable with-ladora_node_api_cxx. - The operator library is also a static archive. Link it into your shared library with
-ladora_operator_api_cxx.
Adora CLI Reference
Adora (AI-Dora, Dataflow-Oriented Robotic Architecture) is a 100% Rust framework for building real-time robotics and AI applications. This document covers the adora CLI from both an end-user and developer perspective.
Table of Contents
- Quick Start
- Installation
- Core Concepts
- Dataflow Descriptor
- Command Reference
- Environment Variables
- Architecture Guide
- Writing Nodes
- Writing Operators
- Distributed Deployments – see also Distributed Deployment Guide for cluster management, scheduling, and operations
- Troubleshooting
- Debugging and Observability – standalone guide covering record/replay, topic inspection, log analysis, and resource monitoring
- API References: Rust | Python | C | C++
Quick Start
# Create a new project
adora new my-robot --kind dataflow --lang rust
# Run locally (no coordinator/daemon needed)
adora run dataflow.yml
# Or use coordinator/daemon for production
adora up
adora start dataflow.yml --attach
# Ctrl-C to stop
adora down
Installation
From crates.io (recommended)
cargo install adora-cli
From source
cargo install --path binaries/cli --locked
Verify
adora --version
adora status
Core Concepts
Dataflow
A dataflow is a directed graph of nodes connected by typed data channels. Nodes produce outputs that other nodes consume as inputs. The framework handles data routing, serialization (Apache Arrow), and lifecycle management.
Execution Modes
| Mode | Command | Infrastructure | Use case |
|---|---|---|---|
| Local | adora run | None | Development, testing, single-machine |
| Distributed | adora up + adora start | Coordinator + Daemon(s) | Production, multi-machine |
Component Roles
CLI --> Coordinator --> Daemon(s) --> Nodes / Operators
(control plane) (per machine) (user code)
- CLI: User interface. Sends commands, displays logs.
- Coordinator: Orchestrates dataflow lifecycle across machines.
- Daemon: Spawns node processes, manages IPC, collects metrics.
- Node: A standalone process that produces and consumes Arrow data.
- Operator: In-process code running inside a shared runtime (lower latency than nodes).
Data Format
All data flows through the system as Apache Arrow columnar arrays. This enables zero-copy shared memory transfer between co-located nodes and zero-serialization overhead.
Dataflow Descriptor
Dataflows are defined in YAML files. Here is the complete schema:
Minimal Example
nodes:
- id: sender
path: sender.py
outputs:
- message
- id: receiver
path: receiver.py
inputs:
message: sender/message
Full Schema
# Dataflow-level settings
health_check_interval: 5.0 # health check sweep interval in seconds (default: 5.0)
nodes:
- id: my-node # unique identifier (required)
name: "My Node" # human-readable name (optional)
description: "..." # description (optional)
# --- Source (pick one) ---
path: ./target/debug/my-node # local executable
# path: https://example.com/node.zip # download from URL
# git: https://github.com/org/repo.git # build from git
# branch: main # git branch (mutually exclusive with tag/rev)
# tag: v1.0 # git tag
# rev: abc123 # git commit hash
# --- Build ---
build: cargo build -p my-node # shell command to build (optional)
# --- Inputs ---
inputs:
# Short form: source_node/output_id
tick: adora/timer/millis/100
data: other-node/output
# Long form with options
sensor_data:
source: sensor/frames
queue_size: 10 # input buffer size (default: 10)
queue_policy: drop_oldest # or "backpressure" (buffers up to 10x queue_size)
input_timeout: 5.0 # circuit breaker timeout in seconds
# --- Outputs ---
outputs:
- processed
- status
# --- Environment ---
env:
MY_VAR: "value"
FROM_ENV:
__adora_env: HOST_VAR # read from host environment
args: "--verbose" # command-line arguments
# --- Fault tolerance ---
restart_policy: on-failure # never (default) | on-failure | always
max_restarts: 5 # 0 = unlimited
restart_delay: 1.0 # initial backoff in seconds
max_restart_delay: 30.0 # backoff cap in seconds
restart_window: 300.0 # reset counter after N seconds
health_check_timeout: 30.0 # kill if no activity for N seconds
# --- Logging ---
min_log_level: info # source-level filter (daemon-side)
send_stdout_as: raw_output # route raw stdout as data output
send_logs_as: log_entries # route structured logs as data output
max_log_size: "50MB" # rotate log files at this size
max_rotated_files: 5 # number of rotated files to keep (1-100)
# --- Deployment ---
_unstable_deploy:
machine: A # target machine/daemon ID
# Debug settings
_unstable_debug:
publish_all_messages_to_zenoh: true # required for topic echo/hz/info
Built-in Timer Nodes
Timers are virtual nodes that emit ticks at fixed intervals:
inputs:
tick: adora/timer/millis/100 # every 100ms
slow: adora/timer/millis/1000 # every 1s
fast: adora/timer/hz/30 # 30 Hz (~33ms)
Operator Nodes
Operators run in-process inside a shared runtime (no separate process):
nodes:
# Single operator (shorthand)
- id: detector
operator:
python: detect.py
build: pip install -r requirements.txt
inputs:
image: camera/frames
outputs:
- bbox
# Multiple operators sharing a runtime
- id: runtime-node
operators:
- id: preprocessor
shared-library: ../../target/debug/libpreprocess
inputs:
raw: sensor/data
outputs:
- processed
- id: analyzer
shared-library: ../../target/debug/libanalyze
inputs:
data: runtime-node/preprocessor/processed
outputs:
- result
Distributed Deployment
Assign nodes to specific machines using _unstable_deploy:
nodes:
- id: camera-driver
_unstable_deploy:
machine: robot-arm
path: ./target/debug/camera
outputs:
- frames
- id: ml-inference
_unstable_deploy:
machine: gpu-server
path: ./target/debug/inference
inputs:
frames: camera-driver/frames
outputs:
- predictions
When nodes are on different machines, communication automatically switches from shared memory to Zenoh pub/sub.
Command Reference
Lifecycle Commands
adora run
Run a dataflow locally without coordinator or daemon. Best for development and testing.
adora run <PATH> [OPTIONS]
| Argument/Flag | Default | Description |
|---|---|---|
<PATH> | required | Path to dataflow descriptor YAML |
--stop-after <DURATION> | Auto-stop after duration (e.g., 30s, 5m) | |
--uv | false | Use uv for Python node management |
--debug | false | Enable debug topics (equivalent to publish_all_messages_to_zenoh: true) |
--allow-shell-nodes | false | Enable shell-based node execution |
--log-level <LEVEL> | stdout | Min display level: error|warn|info|debug|trace|stdout |
--log-format <FORMAT> | pretty | Output format: pretty|json|compact |
--log-filter <FILTER> | Per-node level overrides: "node1=debug,node2=warn" |
Examples:
# Basic run
adora run dataflow.yml
# Stop after 10 seconds, only show warnings
adora run dataflow.yml --stop-after 10s --log-level warn
# Python dataflow with uv
adora run dataflow.yml --uv
# Debug one node, silence others
adora run dataflow.yml --log-level warn --log-filter "sensor=debug"
# JSON output for CI pipelines
adora run dataflow.yml --log-format json --stop-after 30s 2>test.json
adora up
Start coordinator and daemon in local mode.
adora up
Spawns adora coordinator and adora daemon as background processes. Waits for both to be ready before returning. Idempotent: if already running, does nothing.
adora down (alias: adora destroy)
Tear down coordinator and daemon. Stops all running dataflows first.
adora down [OPTIONS]
| Flag | Default | Description |
|---|---|---|
--coordinator-addr <IP> | 127.0.0.1 | Coordinator address |
--coordinator-port <PORT> | 6013 | Coordinator port |
adora build
Run build commands defined in the dataflow descriptor.
adora build <PATH> [OPTIONS]
| Flag | Default | Description |
|---|---|---|
<PATH> | required | Dataflow descriptor path |
--uv | false | Use uv for Python builds |
--local | false | Force local build (skip coordinator) |
--strict-types | false | Treat type warnings as errors (non-zero exit code) |
Type checking: After expanding modules, build runs the same type checks as validate. Warnings are printed by default; use --strict-types (or set strict_types: true in the YAML) to fail the build on type mismatches. User-defined types in a types/ directory next to the dataflow are loaded automatically.
Build strategy: If nodes have _unstable_deploy sections and a coordinator is reachable, builds are distributed to target machines. Otherwise, builds run locally.
Git sources: Nodes with a git: field are cloned/updated before building. The build command runs from the git repository root.
adora start
Start a dataflow on a running coordinator.
adora start <PATH> [OPTIONS]
| Flag | Default | Description |
|---|---|---|
<PATH> | required | Dataflow descriptor path |
--name <NAME>, -n | Assign a name to the dataflow | |
--attach | auto | Attach to log stream and wait for completion |
--detach | auto | Return immediately after spawn |
--debug | false | Enable debug topics (equivalent to publish_all_messages_to_zenoh: true) |
--hot-reload | false | Watch Python files and reload on change |
--uv | false | Use uv for Python nodes |
--coordinator-addr <IP> | 127.0.0.1 | Coordinator address |
--coordinator-port <PORT> | 6013 | Coordinator port |
If neither --attach nor --detach is specified: attaches if running in a TTY, detaches otherwise.
Attach mode: Streams logs, handles Ctrl-C gracefully (first = stop, second = force kill).
Hot reload: Watches Python operator source files. On change, sends a reload request to the coordinator which propagates to the daemon.
adora stop
Stop a running dataflow.
adora stop [UUID_OR_NAME] [OPTIONS]
| Flag | Default | Description |
|---|---|---|
[UUID_OR_NAME] | interactive | Dataflow UUID or name |
--name <NAME>, -n | Alternative name specification | |
--grace-duration <DURATION> | Graceful shutdown timeout | |
--force, -f | false | Immediate termination |
--coordinator-addr <IP> | 127.0.0.1 | Coordinator address |
--coordinator-port <PORT> | 6013 | Coordinator port |
If no identifier is given and running in a TTY, presents an interactive picker.
Stop sequence: Send Event::Stop -> wait grace duration -> SIGTERM -> hard kill.
adora restart
Restart a running dataflow (stop + re-start with stored descriptor). No YAML path needed – the coordinator retains the original descriptor.
adora restart [UUID] [OPTIONS]
| Flag | Default | Description |
|---|---|---|
[UUID] | Dataflow UUID | |
--name <NAME>, -n | Restart by name instead of UUID | |
--grace-duration <DURATION> | Graceful shutdown timeout for the stop phase | |
--force, -f | false | Force kill before restart |
--coordinator-addr <IP> | 127.0.0.1 | Coordinator address |
--coordinator-port <PORT> | 6013 | Coordinator port |
Examples:
# Restart by name
adora restart --name my-app
# Restart by UUID with forced stop
adora restart a1b2c3d4-... --force
adora record
Record dataflow messages to an .adorec file for offline replay. See Debugging Guide for full workflows.
adora record <DATAFLOW_YAML> [OPTIONS]
| Flag | Default | Description |
|---|---|---|
<DATAFLOW_YAML> | required | Path to dataflow descriptor |
-o, --output <PATH> | recording_{timestamp}.adorec | Output file path |
--topics <TOPICS> | all | Comma-separated node/output topics to record |
--proxy | false | Stream via WebSocket instead of recording on target |
--output-yaml <PATH> | Write modified YAML without running (dry run) |
Default mode injects a record node into the dataflow. --proxy mode requires a running dataflow and publish_all_messages_to_zenoh: true.
adora replay
Replay a recorded .adorec file by replacing source nodes with replay nodes. See Debugging Guide for full workflows.
adora replay <FILE> [OPTIONS]
| Flag | Default | Description |
|---|---|---|
<FILE> | required | Path to .adorec recording |
--speed <FLOAT> | 1.0 | Playback speed (0 = max speed) |
--loop | false | Loop the recording |
--replace <NODE_IDS> | all recorded | Comma-separated nodes to replace |
--output-yaml <PATH> | Write modified YAML without running (dry run) |
Monitoring Commands
adora list (alias: adora ps)
List running dataflows with metrics.
adora list [OPTIONS]
| Flag | Default | Description |
|---|---|---|
--format <FMT>, -f | table | Output format: table|json |
--status <STATUS> | Filter: running|finished|failed | |
--name <PATTERN> | Filter by name (case-insensitive substring) | |
--sort-by <FIELD> | Sort by: cpu|memory | |
--quiet, -q | false | Print only UUIDs |
--coordinator-addr <IP> | 127.0.0.1 | Coordinator address |
--coordinator-port <PORT> | 6013 | Coordinator port |
Output columns: UUID, Name, Status, Nodes, CPU, Memory
adora logs
Show and follow logs of a dataflow and node.
adora logs [UUID_OR_NAME] [NODE] [OPTIONS]
| Flag | Default | Description |
|---|---|---|
[UUID_OR_NAME] | Dataflow UUID or name | |
[NODE] | Node name (required unless --all-nodes) | |
--all-nodes | false | Merge logs from all nodes by timestamp |
--tail <N> | all | Show last N lines |
--follow, -f | false | Stream new log entries |
--local | false | Read from local out/ directory |
--since <DURATION> | Show logs newer than duration ago | |
--until <DURATION> | Show logs older than duration ago | |
--level <LEVEL> | stdout | Min log level |
--log-format <FORMAT> | pretty | Output format |
--log-filter <FILTER> | Per-node level overrides | |
--grep <PATTERN> | Case-insensitive text search | |
--coordinator-addr <IP> | 127.0.0.1 | Coordinator address |
--coordinator-port <PORT> | 6013 | Coordinator port |
Filter pipeline: Read/Parse -> Time filters -> Grep -> Tail -> Display
Examples:
# Follow all nodes live
adora logs my-dataflow --all-nodes --follow
# Last 50 errors from a specific node
adora logs my-dataflow sensor --level error --tail 50
# Search logs from last 5 minutes
adora logs my-dataflow --all-nodes --since 5m --grep "timeout"
# Read local files (no coordinator needed)
adora logs --local --all-nodes --tail 100
# Post-mortem analysis: errors in time window
adora logs --local sensor --since 1h --until 30m --level error
Duration formats: 30 (seconds), 30s, 5m, 1h, 2d
adora inspect top (alias: adora top)
Real-time TUI monitor for node resource usage (like top).
adora inspect top [OPTIONS]
adora top [OPTIONS]
| Flag | Default | Description |
|---|---|---|
--refresh-interval <SECONDS> | 2 | Update interval (min: 1) |
--once | false | Print a single JSON snapshot and exit (for scripting/CI) |
--coordinator-addr <IP> | 127.0.0.1 | Coordinator address |
--coordinator-port <PORT> | 6013 | Coordinator port |
Requires an interactive terminal (unless --once is used).
| Key | Action |
|---|---|
q / Esc | Quit |
Up / k | Select previous node |
Down / j | Select next node |
n | Sort by node name |
c | Sort by CPU |
m | Sort by memory |
r | Force refresh |
Columns: NODE, STATUS, DATAFLOW, PID, CPU%, MEMORY (MB), RESTARTS, QUEUE, NET TX, NET RX, I/O READ (MB/s), I/O WRITE (MB/s)
- STATUS: Running, Restarting, Degraded (broken inputs), or Failed
- RESTARTS: Current restart count per node
- QUEUE: Pending messages in the node’s input queue
- NET TX/RX: Cumulative cross-daemon network bytes sent/received via Zenoh
CPU values are per-core (can exceed 100% with multiple cores). Metrics come from daemons, so this works for distributed deployments.
Scripting example:
# JSON snapshot for CI/monitoring pipelines
adora top --once | jq '.[].cpu_usage'
adora topic list
List all topics (outputs) in a running dataflow.
adora topic list [OPTIONS]
| Flag | Default | Description |
|---|---|---|
-d <DATAFLOW>, --dataflow | interactive | Dataflow UUID or name |
--format <FMT> | table | Output format: table|json |
adora topic echo
Subscribe to topics and display messages in real-time.
adora topic echo [OPTIONS] [DATA...]
| Flag | Default | Description |
|---|---|---|
-d <DATAFLOW>, --dataflow | required | Dataflow UUID or name |
[DATA...] | all outputs | Topics to echo (e.g., node1/output) |
--format <FMT> | table | Output format: table|json |
Requires _unstable_debug.publish_all_messages_to_zenoh: true in the descriptor.
adora topic hz
Measure topic publish frequency with a TUI dashboard.
adora topic hz [OPTIONS] [DATA...]
| Flag | Default | Description |
|---|---|---|
-d <DATAFLOW>, --dataflow | required | Dataflow UUID or name |
[DATA...] | all outputs | Topics to measure |
--window <SECONDS> | 10 | Sliding window (min: 1) |
Requires an interactive terminal. Displays: Avg (ms), Avg (Hz), Min (ms), Max (ms), Std (ms), plus a rate sparkline and histogram for the selected topic.
adora topic info
Show detailed metadata of a single topic.
adora topic info [OPTIONS] DATA
| Flag | Default | Description |
|---|---|---|
-d <DATAFLOW>, --dataflow | required | Dataflow UUID or name |
DATA | required | Single topic (e.g., camera/image) |
--duration <SECONDS> | 5 | Collection duration (min: 1) |
Subscribes to the topic for the specified duration and reports: type (Arrow schema), publisher, subscribers, message count, bandwidth.
adora node
Manage and inspect dataflow nodes.
adora node list
adora node list [OPTIONS]
Lists nodes in a running dataflow with their status, CPU, memory, and restart count.
Columns: NODE, STATUS, PID, CPU%, MEMORY (MB), RESTARTS, DATAFLOW
adora node info
Show detailed information about a specific node including status, inputs, outputs, and metrics.
adora node info <NODE> [OPTIONS]
| Flag | Default | Description |
|---|---|---|
<NODE> | required | Node ID to inspect |
-d <DATAFLOW>, --dataflow | interactive | Dataflow UUID or name |
-f <FORMAT>, --format | table | Output format: table|json |
adora node restart
Restart a single node within a running dataflow. The daemon stops the node process and respawns it.
adora node restart <NODE> [OPTIONS]
| Flag | Default | Description |
|---|---|---|
<NODE> | required | Node ID to restart |
-d <DATAFLOW>, --dataflow | interactive | Dataflow UUID or name |
--grace <DURATION> | Grace period before force-killing the node |
adora node stop
Stop a single node within a running dataflow without stopping the entire dataflow.
adora node stop <NODE> [OPTIONS]
| Flag | Default | Description |
|---|---|---|
<NODE> | required | Node ID to stop |
-d <DATAFLOW>, --dataflow | interactive | Dataflow UUID or name |
--grace <DURATION> | Grace period before force-killing the node |
adora topic pub
Publish JSON data to a topic in a running dataflow. Requires publish_all_messages_to_zenoh: true.
adora topic pub <TOPIC> [DATA] [OPTIONS]
| Flag | Default | Description |
|---|---|---|
<TOPIC> | required | Topic to publish to (format: node_id/output_id) |
[DATA] | JSON data to publish (required unless --file) | |
--file <PATH> | Read data from a JSON file instead of command line | |
--count <N> | 1 | Number of messages to publish |
-d <DATAFLOW>, --dataflow | required | Dataflow UUID or name |
Examples:
# Publish a single value
adora topic pub -d my-app sensor/threshold '[42]'
# Publish from file, 10 times
adora topic pub -d my-app sensor/config --file config.json --count 10
adora param
Manage runtime parameters for nodes. Parameters are persisted in the coordinator store and optionally forwarded to running nodes.
adora param list
List all runtime parameters for a node.
adora param list <NODE> [OPTIONS]
| Flag | Default | Description |
|---|---|---|
<NODE> | required | Node ID |
-d <DATAFLOW>, --dataflow | interactive | Dataflow UUID or name |
--format <FMT> | table | Output format: table|json |
adora param get
Get a single runtime parameter value.
adora param get <NODE> <KEY> [OPTIONS]
| Flag | Default | Description |
|---|---|---|
<NODE> | required | Node ID |
<KEY> | required | Parameter key |
-d <DATAFLOW>, --dataflow | interactive | Dataflow UUID or name |
adora param set
Set a runtime parameter. The value is JSON. The parameter is stored in the coordinator and forwarded to the node if it is running.
adora param set <NODE> <KEY> <VALUE> [OPTIONS]
| Flag | Default | Description |
|---|---|---|
<NODE> | required | Node ID |
<KEY> | required | Parameter key (max 256 bytes) |
<VALUE> | required | Parameter value as JSON (max 64KB serialized) |
-d <DATAFLOW>, --dataflow | interactive | Dataflow UUID or name |
Examples:
# Set a numeric parameter
adora param set -d my-app sensor threshold 42
# Set a string parameter
adora param set -d my-app camera resolution '"1080p"'
# Set a complex parameter
adora param set -d my-app detector config '{"confidence": 0.8, "nms": 0.5}'
adora param delete
Delete a runtime parameter.
adora param delete <NODE> <KEY> [OPTIONS]
| Flag | Default | Description |
|---|---|---|
<NODE> | required | Node ID |
<KEY> | required | Parameter key |
-d <DATAFLOW>, --dataflow | interactive | Dataflow UUID or name |
adora doctor
Diagnose environment, coordinator/daemon connectivity, and optionally validate a dataflow YAML.
adora doctor [OPTIONS]
| Flag | Default | Description |
|---|---|---|
--dataflow <PATH> | Path to a dataflow YAML to validate |
Checks performed:
- Coordinator reachability
- Daemon connectivity
- Active dataflow status
- Dataflow YAML validation (if
--dataflowprovided)
Examples:
# Basic health check
adora doctor
# Check environment + validate a dataflow
adora doctor --dataflow dataflow.yml
adora trace list
List recent traces captured by the coordinator. The coordinator captures spans from adora_coordinator and adora_core crates in-memory (up to 4096 spans). No external tracing infrastructure required.
adora trace list [OPTIONS]
| Flag | Default | Description |
|---|---|---|
--coordinator-addr <IP> | 127.0.0.1 | Coordinator address |
--coordinator-port <PORT> | 6013 | Coordinator port |
Output columns: TRACE ID (first 12 chars), ROOT SPAN, SPANS, STARTED, DURATION
Example:
adora trace list
TRACE ID ROOT SPAN SPANS STARTED DURATION
a1b2c3d4e5f6 spawn_dataflow 12 2026-03-01 10:30:05 1.234s
f8e7d6c5b4a3 build_dataflow 5 2026-03-01 10:29:58 0.500s
adora trace view
View spans for a specific trace as an indented tree. Supports prefix matching on trace IDs.
adora trace view <TRACE_ID> [OPTIONS]
| Argument/Flag | Default | Description |
|---|---|---|
<TRACE_ID> | required | Full trace ID or unique prefix |
--coordinator-addr <IP> | 127.0.0.1 | Coordinator address |
--coordinator-port <PORT> | 6013 | Coordinator port |
Example:
adora trace view a1b2c3d4
spawn_dataflow [INFO 1.234s] {build_id="abc", session_id="def"}
build_dataflow [INFO 0.500s]
download_node [DEBUG 0.200s] {url="..."}
start_inner [INFO 0.734s]
spawn_node [INFO 0.100s] {node_id="camera"}
spawn_node [INFO 0.080s] {node_id="detector"}
Trace IDs are prefix-matched: if the prefix uniquely identifies a trace, it resolves automatically. If ambiguous, you’ll be prompted to use a longer prefix.
Setup Commands
adora status (alias: adora check)
Check system health and connectivity.
adora status [OPTIONS]
Reports coordinator connectivity, daemon status, and active dataflow count.
adora new
Generate a new project or node from templates.
adora new <NAME> [OPTIONS]
| Flag | Default | Description |
|---|---|---|
<NAME> | required | Project or node name |
--kind <KIND> | dataflow | dataflow|node |
--lang <LANG> | rust | rust|python|c|cxx |
adora expand
Expand module references in a dataflow and print the resulting flat YAML. Useful for debugging module composition.
adora expand <PATH> [OPTIONS]
| Flag | Default | Description |
|---|---|---|
<PATH> | required | Dataflow descriptor (or module file with --module) |
--module | false | Validate a standalone module file instead of a full dataflow |
Examples:
# Expand a dataflow with modules
adora expand dataflow.yml
# Validate a module file
adora expand --module modules/navigation.module.yml
See the Modules Guide for full documentation on module composition.
adora graph
Visualize a dataflow as a graph.
adora graph <PATH> [OPTIONS]
| Flag | Default | Description |
|---|---|---|
<PATH> | required | Dataflow descriptor path |
--mermaid | false | Output Mermaid diagram text |
--open | false | Open HTML in browser |
Without --mermaid, generates an interactive HTML file using mermaid.js. When outputs have type annotations, edge labels include the type name (e.g. image [Image]).
# Generate HTML
adora graph dataflow.yml --open
# Generate Mermaid for GitHub markdown
adora graph dataflow.yml --mermaid
adora validate
Validate a dataflow YAML file and check type annotations.
adora validate <PATH> [OPTIONS]
| Flag | Default | Description |
|---|---|---|
<PATH> | required | Dataflow descriptor path |
--strict-types | false | Treat warnings as errors (non-zero exit code for CI) |
Checks:
- Key existence:
output_types/input_typeskeys exist in the correspondingoutputs/inputslists - URN resolution: All type URNs resolve in the standard or user-defined type library
- Edge compatibility: Connected edges have compatible types (exact match, widening, or user-defined rules)
- Parameterized types: Parameter mismatches (e.g.
AudioFrame[sample_type=f32]vsAudioFrame[sample_type=i16]) - Timer auto-typing: Timer inputs are automatically typed as
std/core/v1/UInt64 - Type inference: When only upstream annotates a type, it is inferred on the downstream input
- Metadata patterns:
output_metadatakeys andpatternshorthands are validated - Schema compatibility: Struct types are checked at the field level (missing/wrong fields)
User-defined types in a types/ directory next to the dataflow are loaded automatically.
# Validate with warnings
adora validate dataflow.yml
# Strict mode for CI (exit 1 on warnings)
adora validate --strict-types dataflow.yml
See the Type Annotations Guide for the full type library and usage details.
Utility Commands
adora completion
Generate shell completion scripts.
adora completion [SHELL]
Shell is auto-detected if omitted. Supported: bash, zsh, fish, elvish, powershell.
# Bash
eval "$(adora completion bash)"
echo 'eval "$(adora completion bash)"' >> ~/.bashrc
# Zsh
eval "$(adora completion zsh)"
echo 'eval "$(adora completion zsh)"' >> ~/.zshrc
# Fish
adora completion fish > ~/.config/fish/completions/adora.fish
adora system
System management commands.
adora system status [OPTIONS]
Currently provides status as a subcommand (equivalent to adora status).
Self-Management Commands
adora self update
Check for and install CLI updates.
adora self update [--check-only]
Downloads from GitHub releases (dora-rs/adora).
adora self uninstall
Remove the CLI from the system.
adora self uninstall [--force]
Without --force, prompts for confirmation (requires a TTY). Tries uv pip uninstall first, then pip uninstall, then binary self-delete.
Environment Variables
All environment variables serve as fallbacks. CLI flags always take precedence.
| Variable | Default | Commands | Description |
|---|---|---|---|
ADORA_COORDINATOR_ADDR | 127.0.0.1 | All coordinator commands | Coordinator IP address |
ADORA_COORDINATOR_PORT | 6013 | All coordinator commands | Coordinator WebSocket port |
ADORA_LOG_LEVEL | stdout | run, logs | Default minimum log level |
ADORA_LOG_FORMAT | pretty | run, logs | Default output format |
ADORA_LOG_FILTER | run, logs | Default per-node level overrides | |
ADORA_ALLOW_SHELL_NODES | run | Enable shell node execution | |
ADORA_RUNTIME_TYPE_CHECK | run, start | Runtime type checking: warn (log mismatches) or error (fail on mismatch). See Type Annotations |
# Set defaults for a development session
export ADORA_COORDINATOR_ADDR=192.168.1.10
export ADORA_LOG_LEVEL=info
export ADORA_LOG_FORMAT=compact
Architecture Guide
This section is for developers who want to understand the framework internals, extend it, or debug issues.
Communication Stack
┌─────────────────────────────────────┐
│ CLI (adora) │
│ WebSocket (JSON request/reply) │
└─────────────┬───────────────────────┘
│
┌─────────────▼───────────────────────┐
│ Coordinator │
│ WebSocket control + daemon mgmt │
│ State: InMemoryStore | RedbStore │
└──┬──────────────────────────────┬───┘
│ │
┌────────────▼──────────┐ ┌─────────────▼──────────┐
│ Daemon A │ │ Daemon B │
│ (machine: robot) │ │ (machine: gpu-server) │
│ │ │ │
│ ┌─────┐ ┌─────┐ │ │ ┌──────┐ ┌───────┐ │
│ │Node1│ │Node2│ │ │ │Node3 │ │Node4 │ │
│ └──┬──┘ └──┬──┘ │ │ └──┬───┘ └───┬───┘ │
│ │shmem │shmem │ │ │shmem │shmem │
│ └────┬────┘ │ │ └─────┬─────┘ │
└──────────┼────────────┘ └───────────┼────────────┘
│ │
└──────── Zenoh pub/sub ────────┘
(cross-machine)
Protocol Layers
| Layer | Transport | Format | Use |
|---|---|---|---|
| CLI <-> Coordinator | WebSocket | JSON (ControlRequest/Reply) | Commands, log streaming |
| Coordinator <-> Daemon | WebSocket | JSON (DaemonCoordinatorEvent) | Node lifecycle, metrics |
| Daemon <-> Node (small) | TCP / Unix socket | Custom binary | Control messages, small data |
| Daemon <-> Node (large) | Shared memory | Zero-copy Arrow | Data messages > 4KB |
| Daemon <-> Daemon | Zenoh pub/sub | Arrow + metadata | Cross-machine data routing |
Coordinator Internals
The coordinator is an event-driven async server:
Event Sources:
- CLI WebSocket connections (ControlRequest)
- Daemon WebSocket connections (DaemonEvent)
- Heartbeat timer (3s interval)
- External events (for embedding)
Event Loop:
merge_all(cli_events, daemon_events, heartbeat, external)
-> handle_event()
-> update state
-> persist to store (if redb)
-> send replies
Key types:
#![allow(unused)]
fn main() {
// State
RunningDataflow { uuid, name, descriptor, daemons, node_metrics, ... }
RunningBuild { build_id, errors, log_subscribers, pending_results, ... }
DaemonConnection { sender, pending_replies, last_heartbeat }
// Store trait
trait CoordinatorStore: Send + Sync {
fn put_dataflow(&self, record: &DataflowRecord) -> Result<()>;
fn get_dataflow(&self, uuid: &Uuid) -> Result<Option<DataflowRecord>>;
fn list_dataflows(&self) -> Result<Vec<DataflowRecord>>;
// ... daemon and build methods
}
}
Store backends:
memory(default): In-memory, lost on restart.redb: Persistent to disk (~/.adora/coordinator.redb). Survives crashes. Requiresredb-backendfeature.
adora coordinator --store redb
adora coordinator --store redb:/custom/path.redb
Daemon Internals
The daemon manages node processes on a single machine:
Per Node:
1. Build (if build command specified)
2. Spawn process with ADORA_NODE_CONFIG env var
3. Node registers via TCP/shmem handshake
4. Route inputs/outputs between nodes
5. Collect metrics (CPU, memory, I/O)
6. Handle restart policy on exit
7. Forward logs to coordinator
Communication:
- Shared memory for messages > 4KB (zero-copy)
- TCP for control messages and small data
- flume channels for internal event routing
Metrics collection:
#![allow(unused)]
fn main() {
struct NodeMetrics {
pid: u32,
cpu_usage: f32, // per-core percentage
memory_mb: f64,
disk_read_mb_s: Option<f64>,
disk_write_mb_s: Option<f64>,
status: NodeStatus, // Running | Restarting | Degraded | Failed
restart_count: u32,
pending_messages: u64,
}
}
Message Types
All inter-component messages are defined in libraries/message/:
#![allow(unused)]
fn main() {
// Node identification
struct NodeId(String); // [a-zA-Z0-9_.-]
struct DataId(String); // same validation
type DataflowId = uuid::Uuid;
// Data metadata
struct Metadata {
timestamp: uhlc::Timestamp, // hybrid logical clock
type_info: ArrowTypeInfo, // Arrow schema
parameters: MetadataParameters, // custom key-value pairs
}
// Node events (daemon -> node)
enum NodeEvent {
Stop,
Reload { operator_id },
Input { id, metadata, data },
InputClosed { id },
InputRecovered { id },
NodeRestarted { id },
AllInputsClosed,
}
}
Timestamping
Adora uses a Unified Hybrid Logical Clock (UHLC) for distributed causality. Every message carries a uhlc::Timestamp that preserves causal ordering across machines without synchronized clocks.
Zero-Copy Shared Memory
For large messages (> 4KB), the daemon uses shared memory regions:
- Sender node requests a shared memory slot from daemon
- Daemon allocates a region and returns the ID
- Sender writes Arrow data directly into shared memory
- Daemon notifies receiver node of the region ID
- Receiver reads directly from shared memory (zero-copy)
- Receiver sends a drop token when done
This achieves 10-17x lower latency than ROS2 for large payloads.
Writing Nodes
Rust Node
use adora_node_api::{AdoraNode, Event, IntoArrow};
use adora_core::config::DataId;
fn main() -> eyre::Result<()> {
let (mut node, mut events) = AdoraNode::init_from_env()?;
let output = DataId::from("result".to_owned());
while let Some(event) = events.recv() {
match event {
Event::Input { id, metadata, data } => {
// Process input data (Arrow array)
let result: u64 = 42;
node.send_output(
output.clone(),
metadata.parameters,
result.into_arrow(),
)?;
}
Event::Stop(_) => break,
Event::InputClosed { id } => {
eprintln!("input {id} closed");
}
Event::InputRecovered { id } => {
eprintln!("input {id} recovered");
}
_ => {}
}
}
Ok(())
}
Cargo.toml:
[dependencies]
adora-node-api = { workspace = true }
eyre = "0.6"
Python Node
import pyarrow as pa
from adora import Node
node = Node()
for event in node:
if event["type"] == "INPUT":
# event["value"] is a PyArrow array
values = event["value"].to_pylist()
result = pa.array([sum(values)])
node.send_output("result", result)
elif event["type"] == "STOP":
break
C Node
#include "node_api.h"
int main() {
void *ctx = init_adora_context_from_env();
// ... event loop using adora_next_event / adora_send_output
free_adora_context(ctx);
return 0;
}
Node Logging
Nodes can emit structured logs:
Rust:
#![allow(unused)]
fn main() {
// Via tracing (recommended)
tracing::info!("processing frame {}", frame_id);
// Via node API
node.log_info("processing complete");
node.log_with_fields("info", "reading", None, Some(&fields));
}
Python:
import logging
logging.info("processing frame %d", frame_id)
# Or via node API
node.log("info", "processing complete")
Writing Operators
Operators run in-process inside a shared runtime, avoiding process spawn overhead.
Rust Operator
#![allow(unused)]
fn main() {
use adora_operator_api::{register_operator, AdoraOperator, AdoraOutputSender, AdoraStatus, Event};
#[register_operator]
#[derive(Default)]
pub struct MyOperator {
counter: u32,
}
impl AdoraOperator for MyOperator {
fn on_event(
&mut self,
event: &Event,
output_sender: &mut AdoraOutputSender,
) -> Result<AdoraStatus, String> {
match event {
Event::Input { id, data } => {
self.counter += 1;
output_sender.send(
"count".to_string(),
arrow::array::UInt32Array::from(vec![self.counter]),
)?;
Ok(AdoraStatus::Continue)
}
Event::Stop => Ok(AdoraStatus::Stop),
_ => Ok(AdoraStatus::Continue),
}
}
}
}
Cargo.toml:
[lib]
crate-type = ["cdylib"]
[dependencies]
adora-operator-api = { workspace = true }
arrow = "53"
Python Operator
nodes:
- id: my-node
operator:
python: my_operator.py
inputs:
data: source/output
outputs:
- result
# my_operator.py
class Operator:
def __init__(self):
self.counter = 0
def on_event(self, event, send_output):
if event["type"] == "INPUT":
self.counter += 1
send_output("result", pa.array([self.counter]))
Distributed Deployments
Setup
# Machine A (coordinator + daemon)
adora up
# Machine B (daemon only, pointing to coordinator on Machine A)
adora daemon --interface 0.0.0.0 --coordinator-addr 192.168.1.10 --machine-id B
# Machine C (same)
adora daemon --interface 0.0.0.0 --coordinator-addr 192.168.1.10 --machine-id C
Dataflow with Machine Assignment
nodes:
- id: camera
_unstable_deploy:
machine: robot
path: ./camera-driver
outputs:
- frames
- id: inference
_unstable_deploy:
machine: gpu-server
path: ./ml-model
inputs:
frames: camera/frames
outputs:
- predictions
- id: actuator
_unstable_deploy:
machine: robot
path: ./actuator-driver
inputs:
commands: inference/predictions
Build and Start
# From any machine with coordinator access
adora build dataflow.yml # distributed build on target machines
adora start dataflow.yml --name my-robot --attach
Monitor
# Resource usage across all machines
adora top
# Logs from any node regardless of machine
adora logs my-robot inference --follow
# List all dataflows
adora list
Coordinator Persistence
For production, use the redb store backend so the coordinator survives restarts:
adora coordinator --store redb
State is persisted to ~/.adora/coordinator.redb. On restart, stale dataflows are marked as failed and the coordinator resumes normal operation.
For managed cluster deployments (cluster.yml, SSH-based lifecycle, label scheduling, systemd services, rolling upgrades), see the Distributed Deployment Guide.
Troubleshooting
For a comprehensive debugging guide covering record/replay workflows, topic inspection, resource monitoring, and end-to-end debugging scenarios, see Debugging and Observability Guide.
Common Issues
“Could not connect to adora-coordinator”
- Run
adora upfirst, or checkADORA_COORDINATOR_ADDR/ADORA_COORDINATOR_PORT - Verify with
adora status
“publish_all_messages_to_zenoh not enabled”
- Use
--debugflag:adora start dataflow.yml --debugoradora run dataflow.yml --debug - Or add to your dataflow YAML:
_unstable_debug: publish_all_messages_to_zenoh: true - Required for
topic echo,topic hz,topic info
“adora top requires an interactive terminal”
- These TUI commands need a real terminal (not piped output)
- Same applies to
topic hz
Node not receiving inputs
- Check that output names match:
source_node/output_id - Verify the source node lists the output in its
outputs:array - Check
adora topic listfor available topics
Logs not appearing
- Check
--log-levelsetting (defaultstdoutshows everything) - Check
min_log_levelin YAML (filters at source) - For distributed: verify coordinator/daemon connectivity
Build fails with git source
- Verify
git:URL is accessible - Check that
branch,tag, orrevexists - Build command runs from the git repo root, not the dataflow directory
Debug Workflow
# 1. Full environment diagnosis
adora doctor --dataflow dataflow.yml
# 2. Start with verbose logging and debug topics
adora run dataflow.yml --log-level trace --debug
# 3. Inspect a specific node
adora node info -d my-dataflow problem-node
# 4. Monitor specific node logs
adora logs my-dataflow problem-node --follow --level debug
# 5. Check resource usage
adora top
# 6. Inspect topic data
adora topic echo -d my-dataflow problem-node/output
# 7. Publish test data to a topic
adora topic pub -d my-dataflow problem-node/input '[1, 2, 3]'
# 8. Measure frequencies
adora topic hz -d my-dataflow --window 5
# 9. View/modify runtime parameters
adora param list -d my-dataflow problem-node
adora param set -d my-dataflow problem-node threshold 42
# 10. Restart a misbehaving node without stopping the dataflow
adora node restart -d my-dataflow problem-node
# 11. View coordinator traces (no external infra needed)
adora trace list
adora trace view <trace-id-prefix>
# 12. Visualize dataflow graph
adora graph dataflow.yml --open
Log File Locations
out/
<dataflow-uuid>/
log_<node-id>.jsonl # current log
log_<node-id>.1.jsonl # rotated (previous)
log_<node-id>.2.jsonl # rotated (older)
Read directly with:
adora logs --local --all-nodes
adora logs --local <node-name> --tail 50
Logging
Adora provides a structured logging system for real-time robotics and AI dataflows. Logs are captured per-node as structured JSONL files, forwarded to the coordinator for live streaming, and optionally routed through the dataflow graph as data messages.
Which Logging Approach Should I Use?
Start here if you’re unsure which approach fits your use case.
| I want to… | Approach | Config |
|---|---|---|
| Log from Python | Use Python’s logging module (auto-bridged) | Nothing – just import logging |
| Log from Rust | Use node.log_info() / node.log_error() etc. | Nothing – works out of the box |
| Log from C/C++ | Use adora_log() / log_message() | Nothing – works out of the box |
| Filter noisy nodes | Set min_log_level in YAML | Per-node YAML field |
| Watch all logs in one place | Subscribe to adora/logs virtual input | inputs: logs: adora/logs |
| Process one node’s logs as data | Use send_logs_as on that node | Per-node YAML + wire the output |
| Rotate log files | Set max_log_size in YAML | Per-node YAML field |
| Build a custom log sink | Use adora-log-utils crate | Rust dependency |
| Filter CLI display | Use --log-level / --log-filter flags | CLI flags or env vars |
Language-Specific Quick Start
Python – the simplest path is Python’s built-in logging module:
import logging
from adora import Node
node = Node() # Automatically bridges Python logging -> adora
logging.info("Sensor started") # Captured as structured "info" log
logging.warning("High temp: 42C") # Captured as structured "warn" log
print("raw debug output") # Captured as "stdout" level
When Node() is created, it installs a handler that routes all Python logging calls through Rust’s tracing system. The daemon parses these as structured log entries with level, message, file, and line number. No extra configuration needed.
You can also use the explicit API for structured fields:
node.log_info("Reading acquired")
node.log("info", "Reading acquired", fields={"sensor_id": "temp-01"})
Rust – use the node API convenience methods:
#![allow(unused)]
fn main() {
let (node, mut events) = AdoraNode::init_from_env()?;
// Convenience methods (recommended for most cases)
node.log_info("Sensor started");
node.log_warn("High temperature");
// With structured fields
let mut fields = BTreeMap::new();
fields.insert("sensor_id".into(), "temp-01".into());
node.log_with_fields("info", "Reading acquired", None, Some(&fields));
}
Alternatively, Rust nodes can use the tracing crate. When adora’s tracing subscriber is initialized (via init_tracing()), tracing::info!() etc. output structured JSON to stdout, which the daemon parses automatically:
#![allow(unused)]
fn main() {
// Also works -- parsed as structured logs by the daemon
tracing::info!("Sensor started");
tracing::warn!(sensor_id = "temp-01", "High temperature");
}
Use node.log_*() when you want explicit control over the log format. Use tracing::*!() when you want ecosystem integration (spans, instrumentation, OpenTelemetry). Both produce identical structured log entries in the daemon.
C – use the adora_log() function:
adora_log(ctx, "info", 4, "Sensor started", 14);
C++ – use the log_message() function:
log_message(node.send_output, "info", "Sensor started");
Features at a Glance
| Feature | Scope | Config |
|---|---|---|
| Log level filtering | CLI display | --log-level, ADORA_LOG_LEVEL |
| Output formats | CLI display | --log-format, ADORA_LOG_FORMAT |
| Per-node level overrides | CLI display | --log-filter, ADORA_LOG_FILTER |
| Source-level filtering | Per-node YAML | min_log_level |
| Stdout-as-data routing | Per-node YAML | send_stdout_as |
| Structured log routing | Per-node YAML | send_logs_as |
| Log file rotation | Per-node YAML | max_log_size |
| Rotation file limit | Per-node YAML | max_rotated_files |
| Node log API | Rust/Python/C/C++ node | node.log(), adora_log(), etc. |
| Log utilities library | Rust crate | adora-log-utils |
| Log aggregation | Dataflow input | adora/logs virtual input |
| Time-range filtering | adora logs | --since, --until |
| Live log streaming | adora logs | --follow |
| Text search | adora logs | --grep |
| Local log reading | adora logs | --local, --all-nodes |
Log File Format
Each node produces a JSONL file (one JSON object per line) at:
<working_dir>/out/<dataflow_uuid>/log_<node_id>.jsonl
Each line has this structure:
{
"timestamp": "2024-01-15T10:30:00.123Z",
"level": "info",
"node_id": "sensor",
"message": "Starting sensor...",
"target": "sensor::module",
"fields": { "key": "value" }
}
| Field | Type | Description |
|---|---|---|
timestamp | string | RFC3339 timestamp with millisecond precision |
level | string | "error", "warn", "info", "debug", "trace", or "stdout" |
node_id | string | Node ID |
message | string | The log message text |
target | string? | Rust module target (e.g. "sensor::module"), null if absent |
fields | object? | Structured key-value fields from the logging framework. Trust model: fields originate from node stdout and are passed through without sanitization. In mixed-trust environments, log consumers should validate field contents before acting on them |
How Node Output Becomes Log Entries
The daemon captures each line of stdout/stderr from a node process and attempts to parse it as a structured log message (JSON with level, message, timestamp, and optional fields). If parsing succeeds, the structured fields are preserved. If parsing fails, the raw line becomes a "stdout"-level entry.
This means nodes using Rust’s tracing or log crate with JSON output get full structured logging automatically. Nodes that simply println! produce "stdout"-level entries.
Viewing Logs: adora run
When running a dataflow with adora run, logs from all nodes are displayed in real-time on the terminal.
Flags
adora run dataflow.yml [OPTIONS]
| Flag | Default | Env Var | Description |
|---|---|---|---|
--log-level LEVEL | stdout | ADORA_LOG_LEVEL | Minimum level to display |
--log-format FORMAT | pretty | ADORA_LOG_FORMAT | Output format: pretty, json, compact |
--log-filter FILTER | none | ADORA_LOG_FILTER | Per-node level overrides |
Log Levels
From most to least verbose:
| Level | Description |
|---|---|
stdout | Everything including raw stdout from nodes (default) |
trace | Fine-grained diagnostic messages |
debug | Developer-level diagnostic messages |
info | General informational messages |
warn | Warning conditions |
error | Error conditions only |
Setting --log-level info hides stdout, trace, and debug messages. The stdout level is a special catch-all that passes everything.
Level Filtering Logic
The level filter uses LogLevelOrStdout::passes():
Message level Filter level Displayed?
───────────── ──────────── ──────────
stdout stdout yes
stdout info no (stdout only passes stdout filter)
info stdout yes (any log level passes stdout filter)
debug info no (debug is more verbose than info)
error info yes (error is less verbose than info)
Per-Node Overrides
The --log-filter flag lets you set different levels for different nodes:
adora run dataflow.yml --log-level info --log-filter "sensor=debug,planner=warn"
This shows info and above for all nodes, except sensor (shows debug and above) and planner (shows warn and above).
Format: "node1=level,node2=level" (comma-separated name=level pairs).
Output Formats
Pretty (default) – colored, human-readable:
10:30:00 INFO sensor: Starting sensor...
10:30:01 INFO [adora]: spawning node processor
10:30:01 stdout sensor: raw output line
- Timestamp in local timezone (
HH:MM:SS) - Level colored: ERROR (red), WARN (yellow), INFO (green), DEBUG (blue), TRACE (dimmed), stdout (italic dimmed blue)
- Node name in bold with a unique color based on the name
- System messages prefixed with
[adora] - Lifecycle messages (
spawning,node finished,stopping) get visual separation with blank lines
Json – full LogMessage struct as JSON, one per line:
{"build_id":null,"dataflow_id":"abc-123","node_id":"sensor","level":"INFO","message":"Starting...","timestamp":"2024-01-15T10:30:00Z",...}
Useful for piping to jq or ingesting into log aggregation systems.
Compact – minimal, no color:
10:30:00 INFO sensor: Starting sensor...
Useful for CI/CD environments and log files.
Viewing Logs: adora logs
Read historical logs or stream live logs from a running dataflow.
Basic Usage
# Read logs for a specific node (via coordinator)
adora logs <dataflow_uuid> <node_name>
# Read local log files directly
adora logs --local <node_name>
adora logs --local --all-nodes
# Stream live logs
adora logs <dataflow_uuid> <node_name> --follow
adora logs --local <node_name> --follow
Flags
| Flag | Short | Default | Description |
|---|---|---|---|
--local | false | Read from local out/ directory instead of coordinator | |
--all-nodes | false | Merge logs from all nodes, sorted by timestamp | |
--tail N | -n | all | Show only the last N lines |
--follow | -f | false | Stream new log entries as they arrive |
--since DURATION | none | Only show logs newer than this duration ago | |
--until DURATION | none | Only show logs older than this duration ago | |
--level LEVEL | stdout | Minimum log level (env: ADORA_LOG_LEVEL) | |
--grep PATTERN | none | Case-insensitive text search | |
--coordinator-addr IP | 127.0.0.1 | Coordinator address | |
--coordinator-port PORT | default | Coordinator control port |
Time Filters
--since and --until accept duration strings relative to now:
# Logs from the last 5 minutes
adora logs --local sensor --since 5m
# Logs from 1 hour ago to 30 minutes ago
adora logs --local sensor --since 1h --until 30m
# Last 10 errors from the past hour
adora logs --local sensor --since 1h --level error --tail 10
Supported duration formats: 30 (seconds), 30s, 5m, 1h, 2d.
Text Search
--grep performs case-insensitive substring matching against:
- The log message text
- The node ID
- The module target
# Find all timeout-related messages
adora logs --local --all-nodes --grep "timeout"
# Find errors from a specific module
adora logs --local sensor --grep "camera::driver" --level error
Filter Pipeline
All filters are applied in this order:
Read/Parse -> Time Filters -> Grep -> Tail -> Display
When --since, --until, or --grep are used in coordinator mode, the CLI fetches all logs from the server (ignoring --tail server-side) and applies all filters client-side. This ensures correct results when combining filters.
Local vs Coordinator Mode
Local mode (--local) reads JSONL files directly from the out/ directory in the current working directory. No coordinator or daemon needs to be running. If --all-nodes is used or no node name is given, all log files are merged and sorted by timestamp.
Coordinator mode (default) connects to a running coordinator via WebSocket. The coordinator reads log files from the daemon’s working directory and streams them back. This works for both local and distributed deployments.
Follow Mode
Local follow (--local --follow): Polls log files every 200ms for new content. New lines are parsed, filtered by --grep, and printed. Time/tail filters only apply to the initial historical output.
Coordinator follow (--follow): Opens a WebSocket subscription to the coordinator. The coordinator forwards log messages from the daemon in real-time. Level filtering is applied server-side for efficiency. --grep and --since are applied client-side on the stream.
Environment Variables
All environment variables serve as fallbacks – CLI flags always take precedence.
| Variable | Used By | Values | Description |
|---|---|---|---|
ADORA_LOG_LEVEL | adora run, adora logs | error, warn, info, debug, trace, stdout | Default minimum log level |
ADORA_LOG_FORMAT | adora run | pretty, json, compact | Default output format |
ADORA_LOG_FILTER | adora run | "node1=level,node2=level" | Default per-node overrides |
ADORA_QUIET | daemon | any value | Suppress log forwarding to display (file writing continues) |
Example:
# Set defaults for a development session
export ADORA_LOG_LEVEL=info
export ADORA_LOG_FORMAT=pretty
export ADORA_LOG_FILTER="sensor=debug"
# These are equivalent:
adora run dataflow.yml
adora run dataflow.yml --log-level info --log-format pretty --log-filter "sensor=debug"
# CLI flag overrides env var:
adora run dataflow.yml --log-level debug # overrides ADORA_LOG_LEVEL=info
YAML Configuration
min_log_level
Filter logs at the source (daemon-side) before they reach log files, the coordinator, or send_logs_as routing.
nodes:
- id: noisy-sensor
path: ./target/debug/sensor
min_log_level: info # suppress debug/trace/stdout from this node
Valid values: error, warn, info, debug, trace, stdout.
When set, the daemon drops log messages below this level immediately after parsing. This reduces disk I/O, network traffic, and log file size. The filtering uses the same passes() logic as the CLI display filter.
send_stdout_as
Route raw stdout/stderr lines as dataflow output messages.
nodes:
- id: legacy-node
path: ./legacy-script.py
send_stdout_as: raw_output
outputs:
- raw_output
- data
- id: log-consumer
inputs:
logs: legacy-node/raw_output
Each stdout/stderr line is sent as an Arrow-encoded string. This is useful for integrating legacy nodes that output data on stdout (e.g., Python scripts using print()).
Both send_stdout_as and normal log file writing happen – stdout routing does not suppress log files.
send_logs_as
Route parsed structured log entries as dataflow output messages.
nodes:
- id: sensor
path: ./target/debug/sensor
send_logs_as: log_entries
outputs:
- data
- log_entries
- id: log-aggregator
inputs:
sensor_logs: sensor/log_entries
Unlike send_stdout_as, this only sends lines that were successfully parsed as structured logs (not raw stdout). Each entry is serialized as a full JSON LogMessage string. The min_log_level filter applies before routing – suppressed messages are not sent.
Use this to build log aggregation, alerting, or monitoring nodes within the dataflow itself.
adora/logs – Automatic Log Aggregation
Subscribe to logs from all nodes with a single input line – no manual wiring needed:
nodes:
- id: sensor
path: sensor.py
inputs:
tick: adora/timer/millis/200
outputs:
- reading
- id: processor
path: processor.py
inputs:
reading: sensor/reading
outputs:
- result
- id: log-viewer
path: log_viewer.py
inputs:
logs: adora/logs # all nodes, all levels
errors: adora/logs/error # only error+ from all nodes
sensor: adora/logs/info/sensor # info+ from one node
The adora/logs virtual input works like adora/timer – the daemon handles subscription internally. Each log message arrives as a JSON-encoded LogMessage string in an Arrow array. To prevent infinite loops, a node never receives its own log messages.
Syntax:
| Input | Description |
|---|---|
adora/logs | All logs from all nodes |
adora/logs/<level> | Logs at <level> or above from all nodes |
adora/logs/<level>/<node-id> | Logs at <level> or above from a specific node |
Levels: stdout, error, warn, info, debug, trace.
When to use adora/logs vs send_logs_as:
adora/logs | send_logs_as | |
|---|---|---|
| Scope | All nodes at once | One node at a time |
| YAML changes | Only the consumer | Each source node |
| Adding a node | Zero wiring changes | Must update consumer |
| Use case | Dashboard, monitoring | Per-node log processing |
See examples/log-aggregator/ for a complete working example.
max_log_size
Enable size-based log file rotation.
nodes:
- id: sensor
path: ./target/debug/sensor
max_log_size: "50MB"
| Value | Bytes |
|---|---|
"1KB" or "1K" | 1,024 |
"50MB" or "50M" | 52,428,800 |
"1GB" or "1G" | 1,073,741,824 |
"1000" | 1,000 (plain number = bytes) |
When the active log file exceeds the configured size, the daemon:
- Flushes and closes the current file
- Renames existing rotated files:
.4.jsonl->.5.jsonl,.3.jsonl->.4.jsonl, etc. - Renames the current file:
log_sensor.jsonl->log_sensor.1.jsonl - Creates a fresh
log_sensor.jsonl - Deletes any file beyond the rotation limit (default 5, configurable via
max_rotated_files)
Naming convention:
log_sensor.jsonl # current (active)
log_sensor.1.jsonl # previous
log_sensor.2.jsonl # older
log_sensor.3.jsonl
log_sensor.4.jsonl
log_sensor.5.jsonl # oldest (deleted on next rotation)
Maximum disk usage per node: max_log_size * (1 + max_rotated_files) (1 active + N rotated).
Without max_log_size, log files grow unbounded. For long-running dataflows, always set this.
The adora logs --local command automatically reads all rotated files for a node and merges them in chronological order (oldest rotated file first, current file last).
max_rotated_files
Control how many rotated log files to keep (default: 5, range: 1-100).
nodes:
- id: sensor
path: ./target/debug/sensor
max_log_size: "50MB"
max_rotated_files: 10 # keep 10 rotated files instead of 5
With max_rotated_files: 10 and max_log_size: "50MB", maximum disk usage is 50MB * 11 = 550MB per node. Lower values save disk space; higher values preserve more history.
Runtime Node Restrictions
For runtime nodes (operators), only one of each logging field is allowed per runtime:
# OK -- single operator
nodes:
- id: runtime-node
operator:
python: process.py
send_logs_as: logs
min_log_level: info
max_log_size: "100MB"
# ERROR -- multiple operators with conflicting configs
nodes:
- id: runtime-node
operators:
- id: op1
python: a.py
send_logs_as: logs1
- id: op2
python: b.py
send_logs_as: logs2 # Error: multiple send_logs_as
When a single operator in a runtime sets these fields, the output name is prefixed with the operator ID (e.g., op1/logs).
Node Log API
Nodes can emit structured log messages programmatically using the node API. These are equivalent to writing JSON-formatted log lines to stdout – the daemon parses them identically.
Rust
#![allow(unused)]
fn main() {
use adora_node_api::AdoraNode;
use std::collections::BTreeMap;
let (node, mut events) = AdoraNode::init_from_env()?;
// General log with level string and optional target
node.log("info", "sensor initialized", Some("sensor::init"));
// Convenience methods (no target parameter)
node.log_error("connection failed");
node.log_warn("temperature elevated");
node.log_info("reading acquired");
node.log_debug("raw bytes received");
node.log_trace("entering loop iteration");
// Structured fields (key-value context preserved through send_logs_as)
let mut fields = BTreeMap::new();
fields.insert("sensor_id".to_string(), "temp-01".to_string());
fields.insert("reading".to_string(), "42.5".to_string());
node.log_with_fields("info", "reading acquired", None, Some(&fields));
}
The level parameter accepts "error", "warn" (or "warning"), "info", "debug", "trace". Unknown levels default to "info". Fields are capped at 60 KB total to match the downstream 64 KB parse limit.
Python
Python nodes have three ways to log, all producing structured log entries:
from adora import Node
import logging
node = Node()
# Option 1: Python's logging module (recommended -- auto-bridged by Node())
logging.info("sensor initialized")
logging.warning("temperature elevated")
logging.debug("raw bytes: %s", data)
# Option 2: Explicit adora API with level string
node.log("info", "sensor initialized", target="sensor.init")
node.log("info", "reading acquired", fields={"sensor_id": "temp-01", "reading": "42.5"})
# Option 3: Convenience methods
node.log_error("connection failed")
node.log_warn("temperature elevated")
node.log_info("reading acquired")
node.log_debug("raw bytes received")
node.log_trace("entering loop iteration")
# This also works but produces "stdout"-level entries (no structure):
print("raw output")
How the Python logging bridge works: When Node() is created, it installs a custom logging.Handler that routes all Python logging calls through Rust’s tracing system. The daemon parses these as structured log entries with level, message, file path, and line number. This happens automatically – no configuration needed.
| Method | Structured? | Fields support? | When to use |
|---|---|---|---|
logging.info() | Yes | No (use extra= for custom formatters) | General-purpose logging |
node.log("info", msg, fields={...}) | Yes | Yes | When you need structured key-value context |
node.log_info(msg) | Yes | No | Quick one-liner, same as node.log("info", msg) |
print() | No (stdout level) | No | Legacy code, quick debugging |
Common pitfall: Do not call logging.basicConfig() before creating Node(). The node constructor sets up the logging bridge; calling basicConfig() first may install a conflicting handler. If you need custom formatters, configure them after Node() creation.
C
#include "node_api.h"
void *ctx = init_adora_context_from_env();
const char *level = "info";
const char *msg = "sensor initialized";
adora_log(ctx, level, strlen(level), msg, strlen(msg));
C++
// Via the cxx bridge
auto node = init_adora_node();
log_message(node.send_output, "info", "sensor initialized");
Log Utilities Library (adora-log-utils)
The adora-log-utils crate provides parsing, merging, filtering, and formatting utilities for working with LogMessage entries in custom sink nodes. Use it when building nodes that consume log data via send_logs_as.
API
#![allow(unused)]
fn main() {
use adora_log_utils;
// Parse a LogMessage from JSON (as received from send_logs_as)
let log = adora_log_utils::parse_log(json_str)?;
// Parse directly from Arrow input data (convenience for event handlers)
let log = adora_log_utils::parse_log_from_arrow(&data)?;
// Merge multiple log streams into a single timeline
let merged = adora_log_utils::merge_by_timestamp(vec![stream_a, stream_b]);
// Filter by minimum level
let errors = adora_log_utils::filter_by_level(&logs, &min_level);
// Format as JSON (one line, no trailing newline)
let json = adora_log_utils::format_json(&log);
// Format as compact single-line: "<timestamp> <node> <LEVEL>: <message>"
let compact = adora_log_utils::format_compact(&log);
// Format as pretty: "[<timestamp>][<LEVEL>][<node>] <message>"
let pretty = adora_log_utils::format_pretty(&log);
}
Dependency
Add to your sink node’s Cargo.toml:
[dependencies]
adora-log-utils = { workspace = true }
Log Sink Examples
Three example sink nodes demonstrate how to consume logs routed via send_logs_as and forward them to external destinations.
File Sink (examples/log-sink-file/)
Merges log streams from multiple nodes into a single JSONL file. Useful for unified log collection.
nodes:
- id: sensor
path: sensor.py
send_logs_as: log_entries
inputs:
tick: adora/timer/millis/200
outputs:
- reading
- log_entries
- id: processor
path: processor.py
send_logs_as: log_entries
inputs:
reading: sensor/reading
outputs:
- result
- log_entries
- id: file_sink
path: log-sink-file
inputs:
sensor_logs: sensor/log_entries
processor_logs: processor/log_entries
env:
LOG_FILE: "./combined.jsonl"
The file sink reads LOG_FILE from the environment (default ./combined.jsonl), parses each incoming Arrow message with adora_log_utils::parse_log_from_arrow(), formats it as JSON, and appends it to the file.
TCP Sink (examples/log-sink-tcp/)
Forwards log entries over a TCP socket to a remote log collector. Useful for embedded systems that lack local filesystems and need to stream logs off-device.
nodes:
- id: source
path: source.py
send_logs_as: log_entries
inputs:
tick: adora/timer/millis/500
outputs:
- data
- log_entries
- id: tcp_sink
path: log-sink-tcp
inputs:
logs: source/log_entries
env:
SINK_ADDR: "127.0.0.1:9876"
The TCP sink reads SINK_ADDR from the environment (default 127.0.0.1:9876), connects to the server on startup, and sends each log entry as a JSON line. It reconnects automatically on write failure.
Alert Router (examples/log-sink-alert/)
Splits incoming log entries by severity. All logs are forwarded to the all_logs output; only error and warn logs are forwarded to the alerts output. This enables downstream nodes to handle alerts differently (e.g., trigger notifications, write to a dedicated file).
nodes:
- id: source
path: my_node.py
send_stdout_as: log_entries
inputs:
tick: adora/timer/millis/200
outputs:
- log_entries
- id: alert_router
path: log-sink-alert
inputs:
logs: source/log_entries
outputs:
- all_logs
- alerts
The source node uses send_stdout_as to route its stdout lines as Arrow string data. The router parses each log entry with adora_log_utils::parse_log_from_arrow(), checks the level, and uses node.send_output() to forward data to the appropriate outputs. Nodes using the node API can alternatively use send_logs_as to route structured logs from node.log().
Building a Custom Sink
To build your own sink node, follow this pattern:
use adora_node_api::{AdoraNode, Event};
fn main() -> eyre::Result<()> {
let (_node, mut events) = AdoraNode::init_from_env()?;
while let Some(event) = events.recv() {
match event {
Event::Input { data, .. } => {
let log = adora_log_utils::parse_log_from_arrow(&data)?;
// Process the log entry: write to file, send over network, etc.
let json = adora_log_utils::format_json(&log);
println!("{json}");
}
Event::Stop(_) => break,
_ => {}
}
}
Ok(())
}
How the Daemon Processes Logs
Understanding the internal pipeline helps with debugging and tuning. For each node, the daemon runs a dedicated async task that processes log lines in order:
Node Process (stdout/stderr)
|
v
[1] Capture: lines buffered in mpsc channel (capacity 100)
|
v
[2] send_stdout_as: raw line -> Arrow data -> dataflow output
|
v
[3] Parse: try JSON structured log, fall back to Stdout-level
|
v
[4] min_log_level filter: drop messages below threshold
|
v
[5] send_logs_as: LogMessage -> JSON -> Arrow data -> dataflow output
|
v
[6] Write JSONL: compact format to log file, track bytes written
|
v
[7] Rotation check: if bytes_written >= max_log_size, rotate files
|
v
[8] Forward: send LogMessage to display channel (unless ADORA_QUIET)
|
v
[9] Sync: fsync log file to disk
Key details:
- Step 2 happens before parsing, so
send_stdout_ascaptures every line including non-structured output - Step 4 happens before Steps 5-8, so
min_log_levelsuppresses messages from all downstream processing - Step 5 only fires for successfully parsed structured logs (Step 3 success path)
- Step 8 sends to either a flume channel (
adora rundirect mode) or the coordinator (distributed mode) - Step 9 calls
sync_all()after every write, ensuring durability at the cost of some I/O overhead
Structured Log Parsing
When a node emits JSON-formatted log output (e.g., from tracing-subscriber with JSON formatting), the daemon extracts:
level: log severitymessage: the log texttarget: module pathtimestamp: when the log was emittedfields: arbitrary key-value pairsbuild_id,dataflow_id,node_id,daemon_id: extracted from fields as fallback
The daemon also sets dataflow_id, node_id, and daemon_id on all messages to ensure they are always present in the log file.
Coordinator Log Streaming Protocol
When a daemon runs under a coordinator (distributed mode), log forwarding works via WebSocket:
- Daemon -> Coordinator: Each
LogMessageis wrapped inDaemonEvent::Log(message)and sent over the daemon’s WebSocket connection - Coordinator storage: The coordinator stores/forwards logs
- CLI subscription: The CLI sends
ControlRequest::LogSubscribe { dataflow_id, level }over its WebSocket connection - Server-side filtering: The coordinator only forwards messages where
msg_level <= subscription_level. This reduces network traffic for filtered subscriptions - CLI receive: Messages arrive as serialized
LogMessagestructs
The --level flag maps to log::LevelFilter:
stdout->LevelFilter::Trace(most permissive, receives everything)info->LevelFilter::Info(receives Error, Warn, Info)- etc.
Complete YAML Reference
nodes:
- id: sensor
path: ./target/debug/sensor
outputs:
- data
- raw_output # for send_stdout_as
- log_entries # for send_logs_as
# Source-level log filtering (daemon-side)
min_log_level: info # suppress debug/trace/stdout
# Route stdout to dataflow
send_stdout_as: raw_output # every stdout line becomes a data message
# Route structured logs to dataflow
send_logs_as: log_entries # parsed log entries become data messages
# Log file rotation
max_log_size: "50MB" # rotate when file exceeds 50MB
max_rotated_files: 5 # keep 5 rotated files (default, range 1-100)
inputs:
tick: adora/timer/millis/100
Complete Example
The examples/python-logging/ directory contains a runnable three-node pipeline that exercises every logging feature:
sensor (noisy, high-volume) --> processor (structured logs) --> monitor (log aggregator)
Dataflow configuration highlights:
nodes:
- id: sensor
path: sensor.py
min_log_level: info # suppress debug noise at source
max_log_size: "1KB" # small for demo (triggers rotation quickly)
inputs:
tick: adora/timer/millis/50
outputs:
- reading
- id: processor
path: processor.py
send_logs_as: log_entries # route structured logs as data
inputs:
reading: sensor/reading
outputs:
- result
- log_entries
- id: monitor
path: monitor.py
inputs:
logs: processor/log_entries
reading: sensor/reading
What each node demonstrates:
- sensor – Mixes
print()(raw stdout),logging.info(),logging.debug(), andlogging.warning(). Withmin_log_level: info, debug messages are dropped by the daemon before reaching log files. Withmax_log_size: "1KB", log rotation kicks in after a few seconds. - processor – Uses
send_logs_as: log_entriesto route its structured log entries as dataflow data. Rawprint()output is not routed (only parsed structured entries are). - monitor – Subscribes to
processor/log_entriesand counts warnings/errors, demonstrating in-dataflow log aggregation.
Direct mode (adora run – single process, good for quick testing):
# Basic run
adora run examples/python-logging/dataflow.yml --stop-after 5s
# Only warnings and above
adora run examples/python-logging/dataflow.yml --log-level warn --stop-after 5s
# Per-node overrides
adora run examples/python-logging/dataflow.yml --log-filter "monitor=debug,sensor=warn" --stop-after 5s
# JSON output for machine parsing
adora run examples/python-logging/dataflow.yml --log-format json --stop-after 3s
# Environment variable control
ADORA_LOG_LEVEL=warn adora run examples/python-logging/dataflow.yml --stop-after 5s
Distributed mode (adora up + adora start – coordinator/daemon architecture, required for multi-machine deployments):
# Start infrastructure
adora up
# Start attached (live log stream)
adora start examples/python-logging/dataflow.yml --attach
# Or start detached and query logs separately
adora start examples/python-logging/dataflow.yml
adora logs <dataflow-id> sensor --follow # stream one node
adora logs <dataflow-id> sensor --follow --level warn # only warnings
adora logs <dataflow-id> --all-nodes --tail 20 # last 20 lines
adora logs <dataflow-id> processor --grep "error" --since 5m # targeted search
In distributed mode, logs flow Node -> Daemon -> Coordinator -> CLI over WebSocket. The coordinator buffers log messages until a subscriber connects, so you won’t miss logs even if you attach late. YAML-level settings (min_log_level, send_logs_as, max_log_size) work identically since they are applied at the daemon.
adora run | adora start | |
|---|---|---|
| Display filtering | --log-level, --log-format, --log-filter | --level on adora logs |
| Per-node overrides | --log-filter "sensor=debug" | Separate adora logs per node |
| Remote nodes | No | Yes |
| Live streaming | Always attached | --attach or adora logs --follow |
Post-run log analysis (works the same for both modes):
# Read all local logs
adora logs --local --all-nodes --tail 20
# Search for warnings in sensor logs
adora logs --local sensor --grep "high temp"
# Check that rotation created multiple files
ls -la out/*/log_sensor*.jsonl
Use Case Scenarios
1. Debugging a Noisy Sensor Pipeline
A camera sensor node floods the logs with debug messages, making it hard to see errors from other nodes.
nodes:
- id: camera
path: ./target/debug/camera
min_log_level: warn # suppress info/debug/trace at the source
max_log_size: "10MB" # limit disk usage
- id: detector
path: ./target/debug/detector
- id: planner
path: ./target/debug/planner
# During development: see everything from detector, only warnings from camera
adora run dataflow.yml --log-level debug --log-filter "camera=warn,detector=debug"
# In production: only errors
export ADORA_LOG_LEVEL=error
adora run dataflow.yml
What happens:
- Camera node’s debug/info messages are dropped by the daemon before reaching the log file (
min_log_level: warn) - The CLI further filters display based on
--log-filter - Log files rotate at 10MB, keeping at most 60MB on disk for the camera node
2. Log Aggregation Within the Dataflow
Build an in-dataflow log monitoring node that watches for errors across multiple nodes and sends alerts.
nodes:
- id: camera
path: ./target/debug/camera
send_logs_as: logs
outputs:
- frames
- logs
- id: detector
path: ./target/debug/detector
send_logs_as: logs
outputs:
- detections
- logs
- id: log-monitor
path: ./target/debug/log-monitor
inputs:
camera_logs: camera/logs
detector_logs: detector/logs
outputs:
- alerts
Node-side handling in the log monitor (using adora-log-utils):
#![allow(unused)]
fn main() {
use adora_node_api::{AdoraNode, Event};
use adora_message::common::{LogLevel, LogLevelOrStdout};
let (mut node, mut events) = AdoraNode::init_from_env()?;
while let Some(event) = events.recv() {
match event {
Event::Input { data, .. } => {
let log = adora_log_utils::parse_log_from_arrow(&data)?;
let is_error = matches!(log.level,
LogLevelOrStdout::LogLevel(LogLevel::Error));
if is_error || log.message.contains("timeout") {
// Send alert downstream
node.send_output("alerts", /* ... */)?;
}
}
Event::Stop(_) => break,
_ => {}
}
}
}
See also the Log Sink Examples section for complete runnable examples.
3. Post-Mortem Debugging of a Crash
After a dataflow crashes, investigate what happened in the last few minutes.
# Find available dataflows
ls out/
# Read the last 50 lines from all nodes around the crash
adora logs --local --all-nodes --tail 50
# Focus on errors in the last 5 minutes
adora logs --local --all-nodes --since 5m --level error
# Search for a specific error pattern
adora logs --local --all-nodes --grep "out of memory"
# Drill into a specific node
adora logs --local detector --since 2m
# Export as JSON for external analysis
adora run dataflow.yml --log-format json 2>logs.json
4. Long-Running Production Dataflow
A dataflow runs for days or weeks. Without log rotation, disk space fills up.
nodes:
- id: ingest
path: ./target/debug/ingest
min_log_level: info # no debug noise in production
max_log_size: "100MB" # ~600MB max per node (100MB * 6)
restart_policy: always
inputs:
tick: adora/timer/millis/1000
outputs:
- data
- id: processor
path: ./target/debug/processor
min_log_level: warn # only warnings and errors
max_log_size: "50MB"
restart_policy: on-failure
inputs:
data: ingest/data
outputs:
- results
- id: writer
path: ./target/debug/writer
min_log_level: error # minimal logging
max_log_size: "20MB"
inputs:
results: processor/results
Disk budget:
ingest: up to 600MB (100MB x 6 files)processor: up to 300MB (50MB x 6 files)writer: up to 120MB (20MB x 6 files)- Total: ~1GB maximum disk usage for all logs
5. Live Monitoring of a Distributed Deployment
Multiple daemons running on different machines, monitored from a central workstation.
# Start infrastructure (coordinator + local daemon)
adora up
# On remote machines, start a daemon pointing to the coordinator:
# adora daemon --coordinator-addr 192.168.1.10
# Start the dataflow (detached)
adora start dataflow.yml
# Open targeted log streams in separate terminals:
# Terminal 1: all sensor warnings
adora logs <dataflow-id> sensor --follow --level warn
# Terminal 2: processor errors with text search
adora logs <dataflow-id> processor --follow --level error --grep "timeout"
# Terminal 3: all nodes merged
adora logs <dataflow-id> --all-nodes --follow
# Terminal 4: historical + live (errors from the last hour, then stream)
adora logs <dataflow-id> processor --since 1h --level error --follow
# Monitor a remote coordinator from another machine:
adora logs <dataflow-id> sensor --follow --coordinator-addr 192.168.1.10
How it works internally:
- CLI connects to the coordinator (default
localhost:6013, or--coordinator-addr) - For historical logs: request-reply with filters applied client-side (
--since,--grep,--tail) - For
--follow: opens a WebSocket subscription to the coordinator - Coordinator filters by
--levelserver-side before forwarding (reduces network traffic) - CLI applies
--grepand--sinceclient-side on the live stream - Coordinator buffers log messages until a subscriber connects, so late-joining subscribers see recent history
6. CI/CD Pipeline with Structured Logging
In CI, use JSON format for machine-parseable output and compact format for readable logs.
# Machine-parseable logs for CI tooling
adora run dataflow.yml --log-format json --stop-after 30s 2>test-logs.json
# Compact logs for CI console output
adora run dataflow.yml --log-format compact --log-level info --stop-after 30s
# Post-run analysis: count errors per node
adora logs --local --all-nodes --level error | wc -l
With JSON format, each line is a complete LogMessage that can be processed by jq, log aggregators, or custom scripts:
# Extract error messages with jq
cat test-logs.json | jq -r 'select(.level == "ERROR") | "\(.node_id): \(.message)"'
Performance Considerations
Logging adds I/O overhead proportional to log volume. Here’s how to tune it:
min_log_level is the most impactful setting. It filters at the daemon before any I/O: no log file write, no coordinator forwarding, no send_logs_as routing. A node emitting 1000 debug lines/sec at min_log_level: info generates zero overhead for those lines.
send_logs_as adds a dataflow message per log line. Each parsed log entry is serialized to JSON, converted to Arrow, and sent through the dataflow. For high-volume nodes, this can consume significant bandwidth. Use min_log_level to limit what gets routed.
adora/logs subscribers share a single serialization. The daemon converts each log line to Arrow once and clones the result for each subscriber. The cost scales linearly with subscriber count, not log volume x subscriber count. For most dataflows (1-3 log subscribers), this is negligible.
Log line size is capped at 1 MB. Lines longer than 1 MB from node stdout/stderr are truncated to prevent heap exhaustion. This protects against buggy nodes that dump large binary data to stdout.
Log file rotation is recommended for long-running dataflows. Without max_log_size, log files grow unbounded. A node emitting 100 lines/sec at ~200 bytes/line fills 1 GB in ~14 hours.
Recommended production settings:
nodes:
- id: my-node
path: ./my-node
min_log_level: info # drop debug/trace at source
max_log_size: "50MB" # rotate at 50MB
max_rotated_files: 5 # keep 5 rotated files (300MB max)
Best Practices
Set min_log_level in production. Source-level filtering at the daemon prevents debug noise from reaching log files and the network. This is the most effective way to reduce log volume since it filters before any I/O.
Always set max_log_size for long-running dataflows. Without rotation, a single noisy node can fill the disk. Start with "50MB" (300MB total per node with rotation) and adjust based on your storage budget. Use max_rotated_files to tune how much history to keep (default 5, range 1-100).
Use environment variables for team defaults. Set ADORA_LOG_LEVEL and ADORA_LOG_FORMAT in your shell profile or CI configuration. Individual developers can override with CLI flags.
Use --log-filter during development. Instead of changing YAML config, use per-node display overrides to focus on the node you’re debugging: --log-filter "my-node=debug".
Use send_logs_as for operational monitoring. Build monitoring nodes that watch for error patterns, compute error rates, or forward alerts. This keeps monitoring logic within the dataflow graph. Use adora-log-utils to parse and format log entries in custom sink nodes (see examples/log-sink-file/ and examples/log-sink-tcp/).
Prefer send_logs_as over send_stdout_as for structured data. send_stdout_as captures every stdout line (including raw prints), while send_logs_as only captures parsed structured log entries with full metadata.
Use --local for post-mortem debugging. After a crash, adora logs --local --all-nodes works without a running coordinator and merges all node logs chronologically.
Combine --since with --grep for targeted debugging. Instead of scrolling through thousands of lines, narrow the window: adora logs --local sensor --since 5m --grep "error".
Use JSON format for log pipelines. When feeding logs to external systems (ELK, Grafana Loki, Datadog), use --log-format json for structured ingestion.
Debugging and Observability Guide
This guide covers how to debug, record, replay, and monitor adora dataflows. It is written for new users who want to understand what went wrong in a dataflow, measure performance, or reproduce issues offline.
Table of Contents
- Prerequisites
- Quick Debugging Checklist
- Record and Replay
- Node Management
- Topic Inspection
- Runtime Parameters
- Environment Diagnosis
- Trace Inspection
- Resource Monitoring
- Log Analysis
- Dataflow Visualization
- Monitoring Running Dataflows
- End-to-End Debugging Workflows
Prerequisites
Before using topic inspection commands (topic echo, topic hz, topic info), enable debug message publishing using either approach:
Option 1: CLI flag (recommended)
adora start dataflow.yml --debug
adora run dataflow.yml --debug
Option 2: YAML descriptor
_unstable_debug:
publish_all_messages_to_zenoh: true
This tells the daemon to publish all inter-node messages to Zenoh, where the coordinator can proxy them to CLI clients via WebSocket. Without this flag, topic inspection commands will return an error.
The record, replay, logs, list, top, graph, node info/restart/stop, param, and doctor commands do not require this flag. The topic pub command does require it.
Quick Debugging Checklist
When something goes wrong, follow this sequence:
# 1. Run full environment diagnosis
adora doctor --dataflow dataflow.yml
# 2. What dataflows are active?
adora list
# 3. Inspect the problem node
adora node info -d my-dataflow problem-node
# 4. Check node resource usage
adora top
# 5. Stream logs from the problem node
adora logs my-dataflow problem-node --follow --level debug
# 6. Is the node producing output?
adora topic echo -d my-dataflow problem-node/output
# 7. Inject test data
adora topic pub -d my-dataflow problem-node/input '[1, 2, 3]'
# 8. Is it publishing at the expected rate?
adora topic hz -d my-dataflow --window 5
# 9. Check/modify runtime parameters
adora param list -d my-dataflow problem-node
adora param set -d my-dataflow problem-node debug_level 2
# 10. Restart a misbehaving node (without stopping the dataflow)
adora node restart -d my-dataflow problem-node
# 11. View coordinator traces (no external infra needed)
adora trace list
adora trace view <trace-id-prefix>
# 12. Visualize the dataflow graph
adora graph dataflow.yml --open
# 13. Record for offline analysis
adora record dataflow.yml -o debug-capture.adorec
Record and Replay
Record captures live dataflow messages to a file. Replay substitutes source nodes with recorded data, letting you reproduce behavior without hardware.
Recording a Dataflow
# Record all topics (default output: recording_{timestamp}.adorec)
adora record dataflow.yml
# Specify output file
adora record dataflow.yml -o my-capture.adorec
This injects a hidden __adora_record__ node into the dataflow that subscribes to all node outputs and writes them to an .adorec file. The record node binary (adora-record-node) is auto-built on first use.
The recording runs until you press Ctrl-C or the dataflow stops.
Recording Specific Topics
# Only record camera and lidar
adora record dataflow.yml --topics sensor/image,lidar/points
Topic names use the format node_id/output_id. Available topics can be discovered with adora topic list -d <dataflow>.
Proxy Recording (Remote / Diskless)
When the target machine has no local disk or you want to record on your local machine:
# Start the dataflow first (detached)
adora start dataflow.yml --detach
# Record via WebSocket proxy -- data streams through coordinator to CLI
adora record dataflow.yml --proxy -o capture.adorec
# Record specific topics via proxy
adora record dataflow.yml --proxy --topics sensor/image,lidar/points
How proxy mode works:
- The dataflow must already be running (
adora start --detach) - The CLI connects to the coordinator via WebSocket
- The coordinator subscribes to Zenoh on the CLI’s behalf
- Message data streams through WebSocket binary frames to the CLI
- The CLI writes the
.adorecfile locally
This requires publish_all_messages_to_zenoh: true in the descriptor.
When to use --proxy:
- Embedded targets with no local disk
- Remote machines where you want the recording on your workstation
- When you only have WebSocket connectivity (no direct Zenoh access)
When to use default mode (no --proxy):
- Same machine or shared filesystem
- High-throughput scenarios (no WebSocket overhead)
- No need for
publish_all_messages_to_zenoh
Replaying a Recording
# Replay at original speed
adora replay recording.adorec
# Replay at 2x speed
adora replay recording.adorec --speed 2.0
# Replay as fast as possible (speed 0)
adora replay recording.adorec --speed 0
Replay works by:
- Reading the
.adorecfile header to get the original dataflow descriptor - Identifying which nodes produced the recorded data
- Replacing those source nodes with
adora-replay-nodeinstances - Running the modified dataflow – downstream nodes receive replayed data identically to live data
The replay node binary (adora-replay-node) is auto-built on first use.
Replay Options
| Flag | Default | Description |
|---|---|---|
--speed <FLOAT> | 1.0 | Playback speed multiplier. 2.0 = 2x, 0.5 = half speed, 0 = as fast as possible |
--loop | off | Loop the recording continuously |
--replace <NODES> | all recorded | Comma-separated list of nodes to replace |
--output-yaml <PATH> | - | Write modified descriptor YAML without running |
Selective Replay
Replace only specific source nodes while keeping others live:
# Only replace the sensor node, keep camera live
adora replay recording.adorec --replace sensor
# Replace sensor and lidar, keep everything else live
adora replay recording.adorec --replace sensor,lidar
This is useful when you want to debug a specific processing pipeline with known input data while keeping other parts of the system live.
Dry Run (Output YAML)
Both record and replay support --output-yaml to see the modified descriptor without running:
# See what the record-injected descriptor looks like
adora record dataflow.yml --output-yaml record-modified.yml
# See what the replay-modified descriptor looks like
adora replay recording.adorec --output-yaml replay-modified.yml
Recording File Format
The .adorec format is a simple binary file:
┌──────────────────────────────────┐
│ Header (bincode) │
│ version: u32 │
│ start_nanos: u64 │
│ dataflow_id: Uuid │
│ descriptor_yaml: Vec<u8> │
├──────────────────────────────────┤
│ Entry 1 (bincode) │
│ node_id: String │
│ output_id: String │
│ timestamp_offset_nanos: u64 │
│ event_bytes: Vec<u8> │
├──────────────────────────────────┤
│ Entry 2 ... │
├──────────────────────────────────┤
│ ... │
├──────────────────────────────────┤
│ Footer (bincode) │
│ total_messages: u64 │
│ total_bytes: u64 │
└──────────────────────────────────┘
The event_bytes field contains the raw Timestamped<InterDaemonEvent> bincode payload – the same format used on the wire between daemons. The descriptor_yaml in the header stores the original dataflow descriptor so replay can reconstruct the dataflow.
Node Management
Node Info
Get detailed information about a specific node including its status, inputs, outputs, metrics, and restart count:
adora node info -d my-dataflow camera
# JSON output
adora node info -d my-dataflow camera --format json
Node Restart
Restart a single node without stopping the entire dataflow. Useful for recovering a misbehaving node or picking up configuration changes:
# Restart with default grace period
adora node restart -d my-dataflow camera
# Restart with custom grace period
adora node restart -d my-dataflow camera --grace 10s
The daemon sends a stop event, waits for the grace period, then respawns the node process.
Node Stop
Stop a single node without stopping the entire dataflow:
adora node stop -d my-dataflow camera
# With custom grace period
adora node stop -d my-dataflow camera --grace 5s
Topic Inspection
Topic inspection commands subscribe to live dataflow messages via the coordinator’s WebSocket proxy. They require --debug flag or publish_all_messages_to_zenoh: true.
Listing Topics
# List all topics in a running dataflow
adora topic list -d my-dataflow
# JSON output
adora topic list -d my-dataflow --format json
Shows each output, which node publishes it, and which nodes subscribe to it. This command reads from the descriptor and does not require publish_all_messages_to_zenoh.
Echoing Topic Data
Stream live topic data to the terminal:
# Echo a single topic
adora topic echo -d my-dataflow camera_node/image
# Echo multiple topics
adora topic echo -d my-dataflow robot1/pose robot2/vel
# JSON output (useful for piping to jq or other tools)
adora topic echo -d my-dataflow robot1/pose --format json
# Echo all topics
adora topic echo -d my-dataflow
Each line shows the topic name, Arrow data content, and metadata parameters. Use --format json for machine-readable output:
{"timestamp":1709000000000,"name":"robot1/pose","data":[1.0,2.0,3.0],"metadata":null}
Measuring Frequency
Interactive TUI showing per-topic publish frequency:
# All topics with 10-second sliding window
adora topic hz -d my-dataflow --window 10
# Specific topics with 5-second window
adora topic hz -d my-dataflow robot1/pose robot2/vel --window 5
The TUI displays:
- Average frequency (Hz)
- Average, min, max interval
- Standard deviation
- Sparkline showing recent activity
Press q or Ctrl-C to exit. Requires an interactive terminal.
Publishing Test Data
Inject data into a running dataflow for testing. Requires publish_all_messages_to_zenoh: true.
# Publish a single Arrow array
adora topic pub -d my-dataflow sensor/threshold '[42]'
# Publish from a JSON file
adora topic pub -d my-dataflow sensor/config --file test-config.json
# Publish multiple messages
adora topic pub -d my-dataflow sensor/trigger '[1]' --count 10
This is useful for:
- Testing node behavior with known input data
- Triggering specific code paths in downstream nodes
- Simulating sensor inputs without hardware
Topic Metadata and Stats
One-shot statistics collection:
# Collect stats for 5 seconds (default)
adora topic info -d my-dataflow camera_node/image
# Collect for 10 seconds
adora topic info -d my-dataflow camera_node/image --duration 10
Reports:
- Arrow data type
- Publisher node
- Subscriber nodes (from descriptor)
- Message count and bandwidth
- Publishing frequency
Runtime Parameters
Runtime parameters let you read and modify node configuration while a dataflow is running, without restarting. Parameters are stored in the coordinator and optionally forwarded to running nodes.
# List all parameters for a node
adora param list -d my-dataflow detector
# Get a single parameter
adora param get -d my-dataflow detector confidence
# Set a parameter (value is JSON)
adora param set -d my-dataflow detector confidence 0.8
adora param set -d my-dataflow detector config '{"nms": 0.5, "classes": ["car", "person"]}'
# Delete a parameter
adora param delete -d my-dataflow detector confidence
Parameters are persisted in the coordinator store (in-memory or redb). When a node is running, param set also forwards the new value to the node’s daemon. Nodes can read parameters through the node event stream.
Limits: Keys max 256 bytes, values max 64KB serialized.
Environment Diagnosis
adora doctor performs a comprehensive health check of your environment:
# Basic diagnosis
adora doctor
# Diagnosis + dataflow validation
adora doctor --dataflow dataflow.yml
Checks performed:
- Coordinator reachability
- Connected daemon status
- Active dataflow health
- Dataflow YAML validation (if
--dataflowprovided)
Use this as a first step when debugging any issue, or in CI to validate the environment before running tests.
Trace Inspection
The coordinator captures tracing spans in-memory from adora_coordinator and adora_core crates (up to 4096 spans in a ring buffer). You can view these traces without any external tracing infrastructure (no Jaeger, Tempo, etc. required).
Listing Traces
adora trace list
Shows all captured traces with their root span name, span count, start time, and total duration:
TRACE ID ROOT SPAN SPANS STARTED DURATION
a1b2c3d4e5f6 spawn_dataflow 12 2026-03-01 10:30:05 1.234s
f8e7d6c5b4a3 build_dataflow 5 2026-03-01 10:29:58 0.500s
Viewing a Trace
# Full trace ID
adora trace view a1b2c3d4-e5f6-7890-abcd-1234567890ab
# Or use a unique prefix
adora trace view a1b2c3d4
Displays spans as an indented tree showing parent-child relationships, log levels, durations, and span fields:
spawn_dataflow [INFO 1.234s] {build_id="abc", session_id="def"}
build_dataflow [INFO 0.500s]
download_node [DEBUG 0.200s] {url="..."}
start_inner [INFO 0.734s]
spawn_node [INFO 0.100s] {node_id="camera"}
spawn_node [INFO 0.080s] {node_id="detector"}
When to Use Trace Inspection
- Quick debugging – see what the coordinator did during a
start,stop, orbuildwithout setting up Jaeger/Tempo - Performance analysis – identify slow spans in dataflow lifecycle operations
- Deployment troubleshooting – understand the sequence and timing of coordinator operations
For full distributed tracing across daemons and nodes, set ADORA_OTLP_ENDPOINT and use an OTLP-compatible backend.
Resource Monitoring
adora top (also adora inspect top) provides a real-time TUI showing per-node resource usage:
# Default 2-second refresh
adora top
# Custom refresh interval
adora top --refresh-interval 5
# JSON snapshot for scripting/CI
adora top --once | jq .
Displays for each node:
- CPU usage (% of a single core)
- Memory (RSS)
- Node status (Running, Restarting, Degraded, Failed)
- Restart count
- Queue depth (pending messages)
- Network TX/RX (cross-daemon bytes via Zenoh)
- Disk I/O read/write
Metrics are collected by daemons and reported to the coordinator, so this works for distributed dataflows across multiple machines. Press q or Ctrl-C to exit.
Use --once to print a single JSON snapshot and exit, useful for CI pipelines and monitoring integrations.
Note: CPU percentages are per-core, so values can exceed 100% for multi-threaded nodes. Nodes on different machines may have different CPUs, so percentages are not directly comparable across machines.
Log Analysis
Live Log Streaming
# Stream logs from a specific node
adora logs my-dataflow sensor-node --follow
# Stream logs from all nodes
adora logs my-dataflow --all-nodes --follow
# Filter by log level
adora logs my-dataflow sensor-node --follow --level debug
# Stream with grep filter
adora logs my-dataflow --all-nodes --follow --grep "error"
Without --follow, reads from local log files. With --follow, streams live from the coordinator via WebSocket.
Local Log Files
Logs are stored in the out/ directory:
out/
<dataflow-uuid>/
log_<node-id>.jsonl # current log
log_<node-id>.1.jsonl # rotated (previous)
log_<node-id>.2.jsonl # rotated (older)
Read directly:
# All nodes, local files
adora logs --local --all-nodes
# Specific node, last 50 lines
adora logs --local sensor-node --tail 50
Filtering and Searching
| Flag | Example | Description |
|---|---|---|
--level <LEVEL> | --level debug | Minimum level: error, warn, info, debug, trace, stdout |
--log-filter <FILTER> | --log-filter "sensor=debug,processor=warn" | Per-node level filter |
--grep <PATTERN> | --grep "timeout" | Case-insensitive substring match |
--since <DURATION> | --since 5m | Only logs newer than this |
--until <DURATION> | --until 1h | Only logs older than this |
--tail <N> | --tail 100 | Show last N lines |
--log-format <FMT> | --log-format json | Output format: pretty (default) or json |
Environment variables:
ADORA_LOG_LEVEL– default log levelADORA_LOG_FORMAT– default log formatADORA_LOG_FILTER– default per-node filter
Dataflow Visualization
Generate a visual graph of your dataflow:
# Generate HTML and open in browser
adora graph dataflow.yml --open
# Generate Mermaid diagram text
adora graph dataflow.yml --mermaid
The Mermaid output can be pasted into mermaid.live or used in GitHub markdown:
```mermaid
graph TD
sensor --> processor
processor --> controller
```
The HTML mode generates a self-contained file with an interactive mermaid.js diagram.
Monitoring Running Dataflows
# Full environment diagnosis
adora doctor
# List all dataflows (active and completed)
adora list
# List nodes in a specific dataflow
adora node list -d my-dataflow
# Get detailed info on a specific node
adora node info -d my-dataflow camera
# Check coordinator/daemon status
adora status
# View/modify runtime parameters
adora param list -d my-dataflow detector
adora param set -d my-dataflow detector threshold 0.5
adora list shows each dataflow’s UUID, name, status, and node count. Use -d <name> with other commands to target a specific dataflow.
End-to-End Debugging Workflows
Workflow 1: Node Not Producing Output
# 1. Verify the node is running
adora list
adora top
# 2. Check its logs
adora logs my-dataflow problem-node --follow --level trace
# 3. Check if upstream nodes are publishing
adora topic echo -d my-dataflow upstream-node/output
# 4. Verify topic wiring
adora topic list -d my-dataflow
adora graph dataflow.yml --open
Workflow 2: Unexpected Data or Wrong Values
# 1. Echo the topic to see raw data
adora topic echo -d my-dataflow node/output --format json
# 2. Record for offline analysis
adora record dataflow.yml -o debug.adorec
# 3. Replay with known input to isolate the issue
adora replay debug.adorec --replace sensor --speed 0
Workflow 3: Performance Issues
# 1. Check CPU/memory per node
adora top
# 2. Measure publish frequencies
adora topic hz -d my-dataflow --window 10
# 3. Get bandwidth stats for suspected bottleneck
adora topic info -d my-dataflow heavy-node/output --duration 10
# 4. Record and replay at max speed to find throughput limits
adora record dataflow.yml -o perf.adorec
adora replay perf.adorec --speed 0
Workflow 4: Reproducing a Field Issue
# On the robot / target machine:
adora start dataflow.yml --detach
adora record dataflow.yml --proxy -o field-capture.adorec
# Transfer the .adorec file to your workstation, then:
adora replay field-capture.adorec
adora replay field-capture.adorec --speed 0.5 # slow motion
adora replay field-capture.adorec --loop # continuous replay
Workflow 5: Remote Debugging (No Direct Access)
When you only have WebSocket connectivity to the coordinator:
# All these commands work over WebSocket -- no Zenoh needed
adora list
adora top
adora logs my-dataflow --all-nodes --follow
adora topic echo -d my-dataflow node/output
adora topic hz -d my-dataflow
adora record dataflow.yml --proxy -o remote-capture.adorec
See Also
- CLI Reference – complete command reference
- WebSocket Control Plane – how CLI communicates with coordinator
- WebSocket Topic Data Channel – how topic data is proxied
- Testing Guide – running smoke tests
Fault Tolerance
Adora provides built-in fault tolerance for robotic and AI dataflows. Nodes can automatically restart on failure, detect stale upstream connections, gracefully degrade when inputs are unavailable, and the coordinator can persist state to disk so it survives crashes and restarts.
Features at a Glance
| Feature | Scope | Config |
|---|---|---|
| Restart policies | Per-node | restart_policy, max_restarts, restart_delay, … |
| Health monitoring | Per-node | health_check_timeout, health_check_interval (dataflow-level) |
| Input timeouts | Per-input | input_timeout |
| Circuit breaker | Automatic | Triggered by input_timeout, auto-recovers |
| NodeRestarted event | Downstream nodes | Automatic when upstream restarts |
| InputTracker API | Rust nodes | adora_node_api::InputTracker |
| Observability | Daemon-wide | Atomic counters logged periodically |
| Distributed health | Multi-daemon | Coordinator heartbeat monitoring |
| Coordinator state persistence | Coordinator | --store redb (requires redb-backend feature) |
Restart Policies
Control what happens when a node exits or crashes.
Configuration
nodes:
- id: my-node
path: ./target/debug/my-node
restart_policy: on-failure # never | on-failure | always
max_restarts: 5 # 0 = unlimited (default: 0)
restart_delay: 1.0 # initial delay in seconds
max_restart_delay: 30.0 # cap for exponential backoff
restart_window: 300.0 # reset counter after this many seconds
Policy Types
never (default) – Node is not restarted. Failure propagates normally.
on-failure – Restart only when the node exits with a non-zero exit code. Clean exits (code 0) are not restarted.
always – Restart on any exit, except:
- The dataflow was stopped by the user (
adora stopor Ctrl-C) - All inputs were closed and the node exited with a non-zero code
How Restarts Work Internally
When a node process exits, the daemon evaluates the restart decision in this order:
- Policy check: Does the restart policy allow it?
Never-> no restartOnFailure-> restart only if exit code != 0Always-> restart
- Disable check: Has
disable_restartbeen set? (set when all inputs close or during manual stop viastop_all) - Window check: If
restart_windowis set and the window has elapsed since the first restart, reset the counter to 0 - Limit check: If
max_restarts > 0and the window counter exceeds it, give up permanently - Backoff: If
restart_delayis set, sleep for the computed delay (re-checkingdisable_restartafter waking) - Respawn: The node process is spawned fresh with the same configuration
The daemon tracks restart state per node instance in the spawn/prepared.rs lifecycle loop. Each node runs in its own tokio task, so restarts don’t block other nodes.
Backoff
When restart_delay is set, the daemon waits before restarting. The delay doubles on each attempt (exponential backoff) and is capped by max_restart_delay.
The backoff exponent is capped at 16 internally to prevent overflow (2^16 = 65536x multiplier).
Example with restart_delay: 1.0 and max_restart_delay: 10.0:
Attempt 1: wait 1s (1.0 * 2^0)
Attempt 2: wait 2s (1.0 * 2^1)
Attempt 3: wait 4s (1.0 * 2^2)
Attempt 4: wait 8s (1.0 * 2^3)
Attempt 5: wait 10s (capped at max_restart_delay)
Attempt 6: wait 10s (capped)
During the backoff sleep, the daemon continuously monitors the disable_restart flag. If all inputs close while the node is waiting to restart, the restart is cancelled with the log message: “restart cancelled: inputs closed during backoff wait”.
Restart Window
When restart_window is set, the restart counter resets after the window elapses (measured from the first restart in the current window). This enables “N restarts per M seconds” semantics.
Example: max_restarts: 5, restart_window: 300.0 means “at most 5 restarts per 5 minutes”. If the window elapses without hitting the limit, the counter resets and the node gets another 5 attempts.
Restart Disable During Shutdown
When the daemon stops a dataflow (via stop_all), it calls disable_restart() on every node before sending Stop events. This prevents the restart mechanism from fighting the shutdown process. The disable_restart flag is an Arc<AtomicBool> shared between the daemon event loop and the node’s spawn lifecycle task.
NodeRestarted Event
When a node restarts, the daemon sends a NodeRestarted event to all downstream nodes that consume its outputs. This allows downstream nodes to:
- Reset internal state or caches
- Log the upstream recovery
- Re-initialize connections or sessions
The event carries the NodeId of the restarting node. Downstream nodes receive it automatically via the event stream:
#![allow(unused)]
fn main() {
match event {
Event::NodeRestarted { id } => {
println!("upstream node {id} restarted, resetting state");
// Clear any cached state from the old node instance
}
_ => {}
}
}
The daemon finds downstream nodes via dataflow.mappings, which maps each node’s outputs to all subscribing (receiver_node, input_id) pairs. Each unique receiver gets one NodeRestarted event per restart.
Health Monitoring
Passive monitoring detects hung nodes that stop communicating with the daemon.
health_check_interval: 2.0 # seconds (default: 5.0, dataflow-level)
nodes:
- id: my-node
path: ./target/debug/my-node
health_check_timeout: 30.0 # seconds (per-node)
restart_policy: on-failure
Configurable Health Check Interval
The health_check_interval is a dataflow-level setting that controls how often the daemon checks node health. Default is 5.0 seconds. Lower values detect hung nodes faster but add more overhead. Set this at the top level of your dataflow YAML, not per-node.
How It Works Internally
The daemon runs a health check sweep at the configured health_check_interval (via a tokio interval stream emitting Event::NodeHealthCheckInterval).
Each RunningNode has a last_activity: Arc<AtomicU64> field storing the timestamp (milliseconds since epoch) of the last communication. This is updated atomically by the node’s communication handler (node_communication/mod.rs) every time the node sends any request to the daemon (event subscriptions, output sends, etc.).
The health check function (check_node_health) iterates all running nodes:
- Skip nodes without
health_check_timeoutset - Skip nodes with
last_activity == 0(not yet connected) - Compute
elapsed_ms = now - last_activity - If
elapsed_ms > timeout_ms, log a warning and kill the node process
After killing, the normal exit handling runs, which evaluates the restart policy. This means health_check_timeout combined with restart_policy: on-failure automatically recovers hung nodes.
What Counts as “Activity”
Any message from the node to the daemon counts:
- Event subscription requests
- Output data sends (via shared memory or TCP)
- Timer tick acknowledgments
Normal input data received from other nodes does not reset the timer – the node must actively communicate with the daemon.
Input Timeouts and Circuit Breaker
Per-input timeouts detect when an upstream node stops producing data.
Configuration
nodes:
- id: downstream-node
path: ./target/debug/downstream
inputs:
sensor_data:
source: camera-node/frames
input_timeout: 5.0 # seconds
The input_timeout is set per input, not per node. Different inputs can have different timeouts.
How It Works Internally
The daemon maintains an InputDeadline for each input with a timeout:
struct InputDeadline {
timeout: Duration, // configured timeout
last_received: Instant, // last time data arrived
}
These are stored in RunningDataflow.input_deadlines keyed by (NodeId, DataId).
Timeout detection runs during the same 5-second health check interval. The check_input_timeouts function:
- Scans all
input_deadlinesentries - If
last_received.elapsed() > timeout, the input is “broken” - The
(node_id, input_id)pair is moved frominput_deadlinestobroken_inputs - The daemon calls
break_input()which sendsInputClosed { id }to the downstream node - If all of a node’s inputs are now closed (and none are broken/recoverable),
AllInputsClosedis sent and the node’s restart is disabled
Deadline reset: Every time data arrives on an input, its last_received is reset to Instant::now().
Circuit Breaker: Auto-Recovery
The circuit breaker tracks broken inputs in RunningDataflow.broken_inputs. When new data arrives on a broken input:
- The data is delivered to the node normally
- The
broken_inputsentry is removed - The input is re-added to
open_inputs - A new
InputDeadlineis created (restarting the timeout) - An
InputRecovered { id }event is sent to the node - The
circuit_breaker_recoveriescounter is incremented
This means recovery is fully automatic. If the upstream node restarts (via restart policy) and begins producing data again, downstream nodes seamlessly resume receiving it.
Node-Side Handling
In Rust nodes, handle these events in your event loop:
#![allow(unused)]
fn main() {
use adora_node_api::{AdoraNode, Event};
let (mut node, mut events) = AdoraNode::init_from_env()?;
while let Some(event) = events.recv() {
match event {
Event::Input { id, data, .. } => {
// Normal processing
}
Event::InputClosed { id } => {
// Upstream stopped producing on this input.
// You can: use cached data, skip processing, alert operator, etc.
}
Event::InputRecovered { id } => {
// Upstream is back online for this input.
// Resume normal processing.
}
Event::Stop(_) => break,
_ => {}
}
}
}
InputTracker API (Rust)
The InputTracker helper tracks input health and caches the last received value per input, making graceful degradation easy.
#![allow(unused)]
fn main() {
use adora_node_api::{AdoraNode, Event, InputTracker, InputState};
let (mut node, mut events) = AdoraNode::init_from_env()?;
let mut tracker = InputTracker::new();
while let Some(event) = events.recv() {
tracker.process_event(&event);
match event {
Event::Input { id, data, .. } => {
// Fresh data available
}
Event::InputClosed { id } => {
// Input timed out -- fall back to cached data
if let Some(stale_data) = tracker.last_value(&id) {
// Use stale_data as fallback
}
}
Event::Stop(_) => break,
_ => {}
}
// Check overall health
if tracker.any_closed() {
let closed: Vec<_> = tracker.closed_inputs();
// Log or adjust behavior
}
}
}
Internal Design
InputTracker maintains two HashMaps:
states: HashMap<DataId, InputState>– current state per input (Healthy or Closed)cache: HashMap<DataId, ArrowData>– last received value per input
On Event::Input, both maps are updated (state = Healthy, cache = data clone). On Event::InputClosed, only state changes (cache is preserved). On Event::InputRecovered, state is set back to Healthy. The cache is never cleared, so last_value() always returns the most recent data even after the input closes.
Note: ArrowData wraps Arc<dyn arrow::array::Array>, so the cache clone is reference-counted (cheap).
API Reference
| Method | Returns | Description |
|---|---|---|
new() | InputTracker | Create empty tracker |
process_event(&Event) | bool | Update state. Returns true if event was relevant |
state(&DataId) | Option<InputState> | Current state (Healthy or Closed) |
is_closed(&DataId) | bool | Check if input is closed |
last_value(&DataId) | Option<&ArrowData> | Last received value (available even when closed) |
closed_inputs() | Vec<&DataId> | All currently closed inputs |
any_closed() | bool | True if any tracked input is closed |
Observability
The daemon tracks fault tolerance events with atomic counters (FaultToleranceStats) and logs a summary every 5 seconds during the health check interval.
Counters
| Counter | Type | Incremented when |
|---|---|---|
restarts | AtomicU64 | A node restart is initiated (in spawn lifecycle) |
health_check_kills | AtomicU64 | A node is killed by the health check (unresponsive) |
input_timeouts | AtomicU64 | An input timeout fires (circuit breaker trips) |
circuit_breaker_recoveries | AtomicU64 | Data arrives on a broken input (auto-recovery) |
All counters use Ordering::Relaxed since they are informational and don’t need strict ordering guarantees.
Log Output
When any counter is non-zero, the daemon emits a structured log line:
INFO fault tolerance stats restarts=3 health_kills=0 input_timeouts=1 cb_recoveries=1
These counters are cumulative for the lifetime of the daemon process. They are not reset between dataflows.
Distributed Health
In multi-daemon deployments, the coordinator monitors daemon heartbeats.
Protocol
- Heartbeat interval: 3 seconds (coordinator sends heartbeat to each daemon)
- Disconnect threshold: 30 seconds without a response
- Detection: On each heartbeat sweep, the coordinator removes daemons that haven’t responded within the threshold
- Notification: The coordinator broadcasts
PeerDaemonDisconnected { daemon_id }to all remaining daemons
DaemonInfo
The ConnectedMachines CLI query returns Vec<DaemonInfo>:
#![allow(unused)]
fn main() {
pub struct DaemonInfo {
pub daemon_id: DaemonId,
pub last_heartbeat_ago_ms: u64, // milliseconds since last heartbeat
}
}
This allows monitoring tools to detect daemons that are alive but slow to respond.
Daemon-Side Handling
When a daemon receives PeerDaemonDisconnected, it logs a structured warning:
WARN peer daemon disconnected daemon_id=machine-B
Currently this is informational. Future work may include automatic migration of nodes from the disconnected daemon.
Coordinator State Persistence
By default the coordinator holds all state in memory. If the coordinator process crashes or is restarted, all knowledge of running dataflows is lost – daemons continue running but become orphaned, and users must manually re-run dataflows.
The redb store backend solves this by persisting coordinator state to a single file on disk using redb, a pure-Rust embedded key-value store with copy-on-write B-trees that are crash-safe by design.
Design: Stateless Coordinator with Stateful Backend
The coordinator itself remains stateless in the K8s sense – it can be stopped and restarted at any time. All durable state lives in the store backend behind the CoordinatorStore trait:
Coordinator (stateless process)
|
v
CoordinatorStore trait
|
+-- InMemoryStore (default, no persistence)
+-- RedbStore (persists to ~/.adora/coordinator.redb)
This separation means:
- The coordinator event loop never reads from the filesystem during normal operation (only at startup recovery)
- All state mutations are written to the store at well-defined persistence points
- The store can be swapped without changing coordinator logic
Enabling Persistence
# Use default path (~/.adora/coordinator.redb)
adora coordinator --store redb
# Use custom path
adora coordinator --store redb:/path/to/coordinator.redb
# Default: in-memory only (no persistence)
adora coordinator --store memory
The redb backend requires the redb-backend Cargo feature, which is enabled in the default CLI build.
What Is Persisted
The store tracks three record types:
| Record | Key | Persisted Fields |
|---|---|---|
DataflowRecord | UUID (16 bytes) | uuid, name, descriptor (JSON), status, daemon IDs, generation counter, created/updated timestamps |
BuildRecord | UUID (16 bytes) | build ID, status, errors, created/updated timestamps |
DaemonInfo | DaemonId (bincode) | daemon ID, machine ID |
Records are serialized with bincode for compact, fast encoding.
Dataflow Status Lifecycle
The coordinator persists dataflow status at every state transition:
Start command --> Pending
All daemons ready --> Running
Stop command --> Stopping
All nodes finish --> Succeeded or Failed { error }
Spawn failure --> Failed { error: "spawn failed: ..." }
Each persist call increments the record’s generation counter, providing a monotonic version for conflict detection.
Persistence Points
The coordinator writes to the store at these moments in the event loop:
- Dataflow started (
ControlRequest::Start) – record created with statusPending - Dataflow spawned (
DataflowSpawnResultsuccess from all daemons) – updated toRunning - Spawn failed (
DataflowSpawnResulterror) – updated toFailedwith the actual error message - Stop requested (
ControlRequest::StoporStopByName) – updated toStopping - All nodes finished (
DataflowFinishedOnDaemon) – updated toSucceededorFailedwith per-node error details - Graceful shutdown (Ctrl-C or
Destroycommand) – all running dataflows markedStoppingbefore stop messages are sent
If a store write fails, the coordinator logs a warning and continues operating with in-memory state. This prevents a store failure from blocking the dataflow lifecycle.
Startup Recovery
When the coordinator starts with a redb store that contains data from a previous run, it performs recovery:
- Read all persisted dataflow records via
store.list_dataflows() - For any record with a non-terminal status (
Pending,Running,Stopping):- Mark it as
Failed { error: "coordinator restarted" } - Increment the generation counter
- Write the updated record back to the store
- Mark it as
- Terminal records (
Succeeded,Failed) are left unchanged
This ensures that stale dataflows from a crashed coordinator are not confused with actively running ones. The daemons that were running those dataflows will detect the coordinator disconnect independently.
Error Detail Preservation
When a dataflow fails, the Failed status includes the actual per-node error messages rather than a generic string:
Failed { error: "node-1: exited with code 137; node-2: failed to spawn node: binary not found" }
Errors are collected from DataflowDaemonResult.node_results across all daemons, formatted as node_id: error_message, and joined with ; .
Schema Versioning
The redb database includes a meta table with a schema_version key. On open:
- If no version exists (fresh database), the current version is written
- If the stored version matches the binary’s version, the database opens normally
- If there is a mismatch, the database is rejected with an error
This prevents silent data corruption when the serialization format of stored records changes between Adora versions. The current schema version is 1.
File Security
On Unix systems:
- The database file is set to
0600(owner read/write only) after creation - The default directory (
~/.adora/) is set to0700(owner only) - Custom paths provided via
redb:/pathare validated to reject..components
Internal Architecture
#![allow(unused)]
fn main() {
// Store trait (libraries/coordinator-store/src/lib.rs)
pub trait CoordinatorStore: Send + Sync {
fn put_dataflow(&self, record: &DataflowRecord) -> Result<()>;
fn get_dataflow(&self, uuid: &Uuid) -> Result<Option<DataflowRecord>>;
fn list_dataflows(&self) -> Result<Vec<DataflowRecord>>;
fn delete_dataflow(&self, uuid: &Uuid) -> Result<()>;
// ... daemon and build methods
}
}
The RedbStore implementation uses three redb tables (daemons, dataflows, builds) with UUID-based binary keys and bincode-serialized values. All operations are synchronous (redb is a synchronous library); the coordinator calls them directly from the async event loop since they are fast in-process operations.
A bincode deserialization limit of 64 MiB guards against corrupted data that could encode huge allocation sizes in length prefixes.
Complete YAML Reference
# Dataflow-level settings
health_check_interval: 2.0 # health check sweep interval (default: 5.0s)
nodes:
- id: sensor-node
path: ./target/debug/sensor
inputs:
tick: adora/timer/millis/100
outputs:
- frames
- id: processor
path: ./target/debug/processor
# Restart policy
restart_policy: on-failure # never | on-failure | always
max_restarts: 5 # 0 = unlimited
restart_delay: 1.0 # initial backoff delay (seconds)
max_restart_delay: 30.0 # max backoff cap (seconds)
restart_window: 300.0 # reset counter after N seconds
# Health monitoring
health_check_timeout: 30.0 # kill if no activity for N seconds
inputs:
frames:
source: sensor-node/frames
input_timeout: 5.0 # circuit breaker timeout (seconds)
queue_size: 10 # input buffer size (default: 10)
outputs:
- result
Use Case Scenarios
1. Camera Pipeline with Intermittent Hardware Failures
A camera driver node occasionally crashes due to USB disconnects. The processing pipeline should survive these outages and resume when the camera reconnects.
nodes:
- id: camera-driver
path: ./target/debug/camera-driver
restart_policy: on-failure
max_restarts: 0 # unlimited -- hardware failures are expected
restart_delay: 2.0 # wait for USB to re-enumerate
max_restart_delay: 30.0
inputs:
tick: adora/timer/millis/33 # ~30 FPS
outputs:
- frames
- id: object-detector
path: ./target/debug/detector
inputs:
frames:
source: camera-driver/frames
input_timeout: 5.0 # tolerate 5s camera outage
outputs:
- detections
- id: planner
path: ./target/debug/planner
inputs:
detections:
source: object-detector/detections
input_timeout: 10.0 # longer tolerance -- can plan with stale data
lidar:
source: lidar-driver/points
input_timeout: 3.0
What happens when the camera crashes:
camera-driverexits with non-zero code- Daemon evaluates
on-failurepolicy -> restart after 2s backoff - During the outage,
object-detectorreceivesInputClosed { id: "frames" }after 5s plannerreceivesInputClosed { id: "detections" }after 10s- Camera restarts, begins producing frames
object-detectorreceives new frame data +InputRecovered { id: "frames" }(circuit breaker recovers)plannerreceives detections +InputRecovered { id: "detections" }
Node-side handling in the planner:
#![allow(unused)]
fn main() {
use adora_node_api::{AdoraNode, Event, InputTracker};
let (mut node, mut events) = AdoraNode::init_from_env()?;
let mut tracker = InputTracker::new();
while let Some(event) = events.recv() {
tracker.process_event(&event);
match event {
Event::Input { id, data, .. } => match id.as_ref() {
"detections" => plan_with_detections(&data),
"lidar" => update_lidar_map(&data),
_ => {}
},
Event::InputClosed { id } => match id.as_ref() {
"detections" => {
// Camera pipeline down -- plan with lidar only
plan_lidar_only();
}
"lidar" => {
// LiDAR down -- use last known detection data
if let Some(stale) = tracker.last_value(&"detections".into()) {
plan_with_stale_detections(stale);
}
}
_ => {}
},
Event::Stop(_) => break,
_ => {}
}
}
}
2. ML Inference Node with OOM Crashes
An ML inference node occasionally runs out of memory on large inputs. It should restart quickly but give up after repeated failures (indicating a systemic issue).
nodes:
- id: ml-inference
path: ./target/debug/ml-inference
restart_policy: on-failure
max_restarts: 3
restart_delay: 0.5
restart_window: 60.0 # 3 restarts per minute
health_check_timeout: 60.0 # ML inference can be slow
inputs:
images:
source: preprocessor/images
outputs:
- predictions
Behavior:
- Node crashes from OOM -> restarts after 0.5s
- Crashes again on another large input -> restarts after 1.0s
- Crashes a third time -> restarts after 2.0s
- Crashes a fourth time within 60s ->
max_restartsexceeded, node fails permanently - If the node runs stably for 60s after the first crash, the restart window resets and it gets 3 more chances
3. Multi-Sensor Fusion with Graceful Degradation
A robot fuses data from multiple sensors. Individual sensors may fail, but the system should continue operating with reduced capability.
nodes:
- id: sensor-fusion
path: ./target/debug/sensor-fusion
inputs:
camera:
source: camera-node/frames
input_timeout: 3.0
lidar:
source: lidar-node/points
input_timeout: 3.0
imu:
source: imu-node/readings
input_timeout: 1.0 # IMU is critical, short timeout
gps:
source: gps-node/fix
input_timeout: 10.0 # GPS can be intermittent
outputs:
- fused-state
Node-side with InputTracker:
#![allow(unused)]
fn main() {
use adora_node_api::{AdoraNode, Event, InputTracker};
let (mut node, mut events) = AdoraNode::init_from_env()?;
let mut tracker = InputTracker::new();
while let Some(event) = events.recv() {
tracker.process_event(&event);
match event {
Event::Input { id, data, .. } => {
// Process fresh data from any sensor
update_sensor(&id, &data);
compute_and_send_fusion(&mut node, &tracker);
}
Event::InputClosed { id } => {
// Sensor went offline -- adjust fusion weights
eprintln!("sensor {id} offline, degrading");
compute_and_send_fusion(&mut node, &tracker);
}
Event::InputRecovered { id } => {
// Sensor back online
eprintln!("sensor {id} recovered");
}
Event::Stop(_) => break,
_ => {}
}
}
fn compute_and_send_fusion(node: &mut AdoraNode, tracker: &InputTracker) {
// Use fresh data where available, stale cache for degraded sensors
let camera = tracker.last_value(&"camera".into());
let lidar = tracker.last_value(&"lidar".into());
let imu = tracker.last_value(&"imu".into());
if tracker.is_closed(&"imu".into()) {
// IMU is critical -- switch to emergency mode
emergency_stop(node);
return;
}
// Fuse available sensors, weighting active ones higher
let closed = tracker.closed_inputs();
let active_count = 4 - closed.len();
// ... fusion logic using active_count for confidence weighting
}
}
4. Long-Running Data Processing Pipeline
A batch processing pipeline runs continuously. The processing node occasionally hangs due to a third-party library bug. Health monitoring detects and recovers from these hangs.
nodes:
- id: data-ingest
path: ./target/debug/ingest
restart_policy: always # always restart -- this is a long-running service
max_restarts: 0 # unlimited
restart_delay: 1.0
inputs:
tick: adora/timer/millis/1000
outputs:
- records
- id: processor
path: ./target/debug/processor
restart_policy: on-failure
max_restarts: 10
restart_delay: 0.5
restart_window: 600.0 # 10 restarts per 10 minutes
health_check_timeout: 30.0 # kill if hung for 30s
inputs:
records: data-ingest/records
outputs:
- results
- id: writer
path: ./target/debug/writer
restart_policy: on-failure
max_restarts: 5
restart_delay: 2.0 # give DB time to recover
max_restart_delay: 60.0
inputs:
results:
source: processor/results
input_timeout: 60.0 # processor may be slow
What happens when the processor hangs:
- Processor stops communicating with daemon
- After 30s, health check detects the hang and kills the process
health_check_killscounter increments- Daemon evaluates
on-failure-> restart after 0.5s - New processor instance starts, resumes consuming from
data-ingest writermay have receivedInputClosedduring the 60s timeout – or may not if the restart was fast enough- If
writerdid receiveInputClosed, it getsInputRecoveredwhen new results arrive
5. Distributed Deployment with Daemon Failure Detection
A multi-machine deployment where the coordinator monitors daemon health.
Machine A (coordinator + daemon): camera-driver, preprocessor
Machine B (daemon): ml-inference, postprocessor
Machine C (daemon): planner, actuator-driver
What happens when Machine B loses network:
- Coordinator’s heartbeat to Machine B fails
- After 30s without response, coordinator removes Machine B from active daemons
- Coordinator broadcasts
PeerDaemonDisconnected { daemon_id: "machine-B" }to Machine A and Machine C - Daemons on A and C log:
WARN peer daemon disconnected daemon_id=machine-B - Nodes on A and C with inputs from Machine B’s nodes receive
InputClosedevents (via their input timeouts) - CLI queries to
ConnectedMachinesshow only A and C with theirlast_heartbeat_ago_ms
6. Coordinator Crash Recovery with redb Persistence
A long-running multi-daemon deployment where the coordinator must survive restarts without losing track of dataflow history.
# Start coordinator with persistent store
adora coordinator --store redb
# In another terminal, start a dataflow
adora start examples/rust-dataflow/dataflow.yml --name my-pipeline --detach
# Coordinator crashes or is killed (e.g., OOM, hardware failure)
# ... time passes ...
# Restart coordinator with the same store
adora coordinator --store redb
What happens on restart:
- Coordinator opens
~/.adora/coordinator.redband reads persisted dataflow records - Finds
my-pipelinewith statusRunning - Marks it as
Failed { error: "coordinator restarted" }, increments generation - Logs:
INFO recovering stale dataflow <uuid> ("my-pipeline") -> marking as Failed adora listnow showsmy-pipelinewith its final status and timestamps- Daemons detect the coordinator disconnect independently and stop their nodes
- User can start a fresh dataflow – the coordinator is fully operational
The key benefit: the coordinator retains a complete history of dataflow lifecycle events across restarts. Without --store redb, all state would be lost and the operator would have no record of what was running before the crash.
7. Periodic Batch Job with Always-Restart
A node that processes batches and exits when done. It should restart to process the next batch.
nodes:
- id: batch-processor
path: ./target/debug/batch-proc
restart_policy: always # restart even on clean exit
max_restarts: 0 # unlimited
restart_delay: 10.0 # wait 10s between batches
max_restart_delay: 10.0 # no exponential growth
inputs:
trigger: adora/timer/millis/1 # immediate first trigger
outputs:
- batch-result
The node processes one batch, exits with code 0, waits 10s, then restarts to process the next. The always policy ensures restarts even on success. Setting restart_delay == max_restart_delay gives a constant delay.
Best Practices
Start with on-failure. Use always only for nodes that are expected to exit and restart (e.g., periodic batch jobs).
Set max_restarts. Unlimited restarts can mask bugs. Start with 3-5 and increase if needed. Use max_restarts: 0 only for nodes where crashes are expected and unavoidable (hardware drivers, external API clients).
Use restart_window. Prevents permanent restart loops. A window of 60-300 seconds is typical. Without a window, a node that crashes at startup will exhaust its restart budget immediately.
Tune restart_delay. Start with 0.5-1.0 seconds. Too short causes thrashing; too long delays recovery. Match the delay to your node’s typical startup time and the root cause of failures:
- USB/hardware reconnection: 2-5s
- Network service reconnection: 1-3s
- OOM/transient bugs: 0.5-1.0s
Set health_check_timeout generously. Should be at least 2-3x your node’s longest expected processing time. ML inference nodes may need 60s+. If too short, healthy nodes get killed during normal processing.
Set input_timeout per input. Not all inputs need the same timeout. Use shorter timeouts for high-frequency inputs (IMU, camera) and longer timeouts for slow/bursty sources (GPS, batch results). A good starting point is 3-5x the expected publish interval.
Use InputTracker for critical paths. When a node must keep running even with degraded inputs, use InputTracker to fall back to cached data. This is essential for sensor fusion, planning, and control nodes.
Use --store redb for production deployments. The redb backend ensures the coordinator retains dataflow history across crashes and restarts. The in-memory default is fine for development but loses all state on exit. The redb file is small (proportional to the number of dataflow records) and adds negligible overhead.
Combine features for defense in depth:
restart_policy+restart_delay-> recover from node crasheshealth_check_timeout-> recover from hung nodesinput_timeout-> detect stale upstream dataInputTracker-> graceful degradation in node code--store redb-> survive coordinator crashes
Distributed Deployment Guide
Adora supports deploying dataflows across multiple machines for multi-robot fleets, edge AI pipelines, and distributed robotics systems. This guide covers cluster management, node scheduling, binary distribution, auto-recovery, and operational best practices.
Table of Contents
- Overview
- Quick Start
- Features at a Glance
- Cluster Configuration Reference
- Cluster Commands Reference
- Node Scheduling
- Binary Distribution
- systemd Service Management
- Auto-Recovery
- Rolling Upgrade
- Use Cases
- Operations Runbook
- Deployment YAML Reference
- Best Practices
Overview
Adora’s distributed architecture has three tiers:
CLI --> Coordinator --> Daemon(s) --> Nodes / Operators
(one) (per machine) (user code)
- CLI sends control commands (build, start, stop) to the coordinator.
- Coordinator orchestrates daemons, resolves node placement, and manages dataflow lifecycle.
- Daemons run on each machine, spawning and supervising node processes.
- Nodes communicate via shared memory (same machine) or Zenoh pub-sub (cross-machine).
There are two paths to distributed deployment:
Ad-hoc – manually start adora daemon on each machine, then use the coordinator for control. Good for development and testing. See Distributed Deployments in the CLI reference.
Managed (cluster.yml) – define your cluster topology in a YAML file, then use adora cluster commands for SSH-based lifecycle management. This guide focuses on the managed path.
Quick Start
- Create a
cluster.yml:
coordinator:
addr: 10.0.0.1
machines:
- id: robot
host: 10.0.0.2
user: ubuntu
- id: gpu-server
host: 10.0.0.3
user: ubuntu
- Bring up the cluster:
adora cluster up cluster.yml
- Start a dataflow:
adora start dataflow.yml --name my-app --attach
- Check cluster health:
adora cluster status
- Tear down:
adora cluster down
Features at a Glance
| Feature | Command / Config | Description |
|---|---|---|
| Cluster lifecycle | adora cluster up/status/down | SSH-based daemon management from a single machine |
| Label scheduling | _unstable_deploy.labels | Route nodes to daemons by key-value labels |
| Binary distribution | _unstable_deploy.distribute | local, scp, or http strategies |
| systemd services | adora cluster install/uninstall | Persistent daemon services that survive reboots |
| Auto-recovery | Automatic | Re-spawn nodes when a daemon reconnects |
| Rolling upgrade | adora cluster upgrade | SCP binary + restart per-machine sequentially |
| Dataflow restart | adora cluster restart | Restart a running dataflow by name or UUID |
Cluster Configuration Reference
A cluster.yml file defines the coordinator address and the set of machines in the cluster.
Full Schema
coordinator:
addr: 10.0.0.1 # IP address the coordinator binds to (required)
port: 6013 # WebSocket port (default: 6013)
machines:
- id: edge-01 # Unique machine identifier (required)
host: 10.0.0.2 # SSH-reachable hostname or IP (required)
user: ubuntu # SSH user (optional, defaults to current user)
labels: # Key-value labels for scheduling (optional)
gpu: "true"
arch: arm64
- id: edge-02
host: 10.0.0.3
labels:
arch: arm64
Fields
coordinator
| Field | Type | Default | Description |
|---|---|---|---|
addr | IP address | (required) | Address the coordinator binds to |
port | u16 | 6013 | WebSocket port |
machines[]
| Field | Type | Default | Description |
|---|---|---|---|
id | string | (required) | Unique machine identifier, used in _unstable_deploy.machine |
host | string | (required) | SSH-reachable hostname or IP address |
user | string | current user | SSH username |
labels | map | empty | Key-value pairs for label-based scheduling |
Validation Rules
- At least one machine must be defined.
- Machine IDs must be non-empty and unique.
- Machine hosts must be non-empty.
- Unknown fields are rejected (
deny_unknown_fields).
Example: 3-Machine GPU Cluster
coordinator:
addr: 192.168.1.1
machines:
- id: coordinator-host
host: 192.168.1.1
labels:
role: control
- id: gpu-a100
host: 192.168.1.10
user: ml
labels:
gpu: a100
arch: x86_64
- id: jetson-01
host: 192.168.1.20
user: nvidia
labels:
gpu: jetson
arch: arm64
Cluster Commands Reference
All adora cluster commands operate on a cluster.yml file and use SSH to manage remote machines.
SSH options used: BatchMode=yes, ConnectTimeout=10, StrictHostKeyChecking=accept-new.
adora cluster up
Bring up a multi-machine cluster from a cluster.yml file. Starts the coordinator locally, then SSH-es into each machine to start a daemon.
adora cluster up <PATH>
Arguments:
| Argument | Description |
|---|---|
PATH | Path to the cluster configuration file |
Behavior:
- Loads and validates the cluster config.
- Starts the coordinator locally on
addr:port. - For each machine, SSH-es in and runs
nohup adora daemon --machine-id <id> --coordinator-addr <addr> --coordinator-port <port> [--labels k1=v1,k2=v2] --quiet. - Polls until all expected daemons register with the coordinator (30s timeout).
Example:
$ adora cluster up cluster.yml
Starting coordinator on 10.0.0.1:6013...
Starting daemon on robot (ubuntu@10.0.0.2)... OK
Starting daemon on gpu-server (ubuntu@10.0.0.3)... OK
All 2 daemons connected.
adora cluster status
Show the current status of the cluster. Displays connected daemons and active dataflow count.
adora cluster status [--coordinator-addr ADDR] [--coordinator-port PORT]
Flags:
| Flag | Default | Description |
|---|---|---|
--coordinator-addr | localhost | Coordinator hostname or IP |
--coordinator-port | 6013 | Coordinator WebSocket port |
Example:
$ adora cluster status
DAEMON ID LAST HEARTBEAT
robot 2s ago
gpu-server 1s ago
Active dataflows: 1
adora cluster down
Tear down the cluster (coordinator and all daemons).
adora cluster down [--coordinator-addr ADDR] [--coordinator-port PORT]
Terminates all daemons and the coordinator process.
adora cluster install
Install adora-daemon as a systemd service on each machine. SSH-es into each machine, writes a systemd unit file, and enables the service.
adora cluster install <PATH>
Arguments:
| Argument | Description |
|---|---|
PATH | Path to the cluster configuration file |
Behavior:
For each machine, creates and enables a systemd service named adora-daemon-<id>. The unit file:
[Unit]
Description=Adora Daemon (<id>)
After=network-online.target
Wants=network-online.target
[Service]
ExecStart=adora daemon --machine-id <id> --coordinator-addr <addr> --coordinator-port <port> --labels k1=v1,k2=v2 --quiet
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
Example:
$ adora cluster install cluster.yml
Installing adora-daemon-robot on ubuntu@10.0.0.2... OK
Installing adora-daemon-gpu-server on ubuntu@10.0.0.3... OK
2/2 succeeded.
adora cluster uninstall
Uninstall adora-daemon systemd services from each machine. Stops, disables, and removes the systemd unit.
adora cluster uninstall <PATH>
Behavior:
For each machine, runs:
sudo systemctl stop adora-daemon-<id>
sudo systemctl disable adora-daemon-<id>
sudo rm -f /etc/systemd/system/adora-daemon-<id>.service
sudo systemctl daemon-reload
adora cluster upgrade
Rolling upgrade: SCP the local adora binary to each machine and restart daemons. Processes machines sequentially to maintain availability.
adora cluster upgrade <PATH>
Behavior:
For each machine sequentially:
- SCP the local
adorabinary to/usr/local/bin/adoraon the target machine. - Restart the systemd service via
sudo systemctl restart adora-daemon-<id>. - Poll the coordinator until the daemon reconnects (30s timeout, 500ms intervals).
Nodes on other machines continue running while each machine is being upgraded.
Example:
$ adora cluster upgrade cluster.yml
Upgrading robot (ubuntu@10.0.0.2)...
SCP binary... OK
Restart service... OK
Waiting for reconnect... OK (3.2s)
Upgrading gpu-server (ubuntu@10.0.0.3)...
SCP binary... OK
Restart service... OK
Waiting for reconnect... OK (2.8s)
2/2 succeeded.
adora cluster restart
Restart a running dataflow by name or UUID. Stops the dataflow and immediately re-starts it using the stored descriptor (no YAML path needed).
adora cluster restart <DATAFLOW>
Arguments:
| Argument | Description |
|---|---|
DATAFLOW | Name or UUID of the dataflow to restart |
Example:
$ adora cluster restart my-app
Restarting dataflow `my-app`
dataflow restarted: a1b2c3d4-... -> e5f6a7b8-...
Node Scheduling
When the coordinator receives a dataflow, it decides which daemon runs each node based on the _unstable_deploy section in the dataflow YAML. Resolution priority: machine > labels > unnamed.
Machine-based scheduling
Assign a node to a specific machine by its id from cluster.yml:
nodes:
- id: camera
_unstable_deploy:
machine: robot
path: ./camera-driver
outputs:
- frames
The coordinator looks up the daemon whose machine-id matches. If no matching daemon is connected, the deployment fails with: no matching daemon for machine id "robot".
Label-based scheduling
Assign a node by requiring specific labels on the target daemon:
nodes:
- id: inference
_unstable_deploy:
labels:
gpu: "true"
path: ./ml-model
inputs:
frames: camera/frames
outputs:
- predictions
The coordinator finds the first connected daemon whose labels are a superset of the required labels. All required key-value pairs must match exactly. If no daemon satisfies the requirements, deployment fails with: no daemon matches labels {"gpu": "true"}.
Unassigned nodes
Nodes without an _unstable_deploy section (or with an empty one) are assigned to the first unnamed daemon – one that connected without a --machine-id flag.
How resolve_daemon() works internally
The coordinator resolves node placement in coordinator/run/mod.rs:
resolve_daemon(connections, deploy) -> DaemonId
1. If deploy.machine is Some(id):
-> look up daemon by machine-id
2. Else if deploy.labels is non-empty:
-> find first daemon where all required labels match
3. Else:
-> pick first unnamed daemon
The label matching function iterates over all connected daemons and checks that every required key-value pair exists in the daemon’s label set (conn.labels.get(k) == Some(v)). This is a superset check: a daemon with {gpu: "true", arch: "arm64", role: "edge"} satisfies the requirement {gpu: "true"}.
Binary Distribution
Control how node binaries are delivered to remote daemons via the distribute field.
Local (default)
Each daemon builds from source on its own machine. This is the current default behavior.
nodes:
- id: my-node
_unstable_deploy:
machine: edge-01
distribute: local
path: ./my-node
SCP mode
The CLI pushes the locally-built binary to the target machine via SSH/SCP before spawning.
nodes:
- id: my-node
_unstable_deploy:
machine: edge-01
distribute: scp
path: ./my-node
HTTP mode
The coordinator runs an artifact store. Daemons pull binaries from the coordinator via HTTP before spawning.
nodes:
- id: my-node
_unstable_deploy:
machine: edge-01
distribute: http
path: ./my-node
Artifacts are served from GET /api/artifacts/{build_id}/{node_id} on the coordinator’s WebSocket port. The endpoint requires authentication (Bearer token) and sanitizes node IDs to prevent path traversal.
When to use each strategy
| Strategy | Best for | Tradeoffs |
|---|---|---|
local | Homogeneous clusters, CI builds | Requires build toolchain on every machine |
scp | Heterogeneous clusters, cross-compiled binaries | Requires SSH access from CLI to all machines |
http | Air-gapped daemons, firewalled networks | Requires coordinator reachability from all daemons |
systemd Service Management
For production deployments, install daemons as systemd services so they survive reboots and auto-restart on failure.
Install
adora cluster install cluster.yml
Creates a systemd unit file on each machine (see adora cluster install for the full unit template). Key properties:
- Restart=on-failure with RestartSec=5: daemon auto-restarts if it crashes.
- After=network-online.target: waits for network before starting.
- WantedBy=multi-user.target: starts on boot.
Uninstall
adora cluster uninstall cluster.yml
Stops, disables, and removes the unit file from each machine, then reloads the systemd daemon.
Verifying service status
After install, check services directly:
ssh ubuntu@10.0.0.2 sudo systemctl status adora-daemon-robot
Auto-Recovery
When a daemon disconnects and reconnects (e.g., after a network blip, machine reboot, or service restart), the coordinator automatically re-spawns any missing dataflows on that daemon.
How it works
- Daemon reconnects and sends a
StatusReportlisting its currently running dataflows. - Coordinator compares the report against its expected state (dataflows that should have nodes on this daemon).
- For each running dataflow with nodes assigned to this daemon that the daemon did not report, the coordinator sends a
SpawnDataflowNodescommand to re-spawn the missing nodes.
30-second backoff
To prevent crash loops (e.g., a node that immediately crashes on spawn), recovery uses a per-daemon, per-dataflow backoff:
- After a recovery attempt, the coordinator records the timestamp.
- Subsequent recovery for the same daemon/dataflow pair is skipped until 30 seconds have elapsed.
- The backoff clears when the daemon reports the dataflow as running again.
This means a node that crashes immediately will only be re-spawned once every 30 seconds, not in a tight loop.
Limitations
- Auto-recovery only applies to dataflows started via
adora start(coordinator-managed). Localadora rundataflows are not tracked by the coordinator. - Recovery re-spawns all nodes assigned to the reconnecting daemon, not individual nodes. For per-node restart on crash, use restart policies.
Rolling Upgrade
Upgrade the adora binary on all cluster machines with zero downtime using sequential per-machine upgrades.
Process
adora cluster upgrade cluster.yml
For each machine, sequentially:
- SCP the local
adorabinary to/usr/local/bin/adoraon the target. - Restart the systemd service (
systemctl restart adora-daemon-<id>). - Poll the coordinator until the daemon reconnects (30s timeout).
Because machines are upgraded one at a time, nodes on other machines continue running. After the daemon reconnects, auto-recovery re-spawns any dataflow nodes that were running on that machine.
Prerequisites
- Daemons must be installed as systemd services (
adora cluster install). - The local
adorabinary must be compatible with the cluster’s coordinator version. - SSH access with
sudopermissions on all target machines.
Use Cases
1. Edge AI Pipeline (Robot + GPU Server)
A camera node runs on the robot, sends frames to a GPU server for inference, and results flow back to an actuator on the robot.
cluster.yml:
coordinator:
addr: 192.168.1.1
machines:
- id: robot
host: 192.168.1.10
user: ubuntu
labels:
role: edge
- id: gpu-server
host: 192.168.1.20
user: ml
labels:
gpu: "true"
dataflow.yml:
nodes:
- id: camera
_unstable_deploy:
machine: robot
path: ./camera-driver
outputs:
- frames
- id: inference
_unstable_deploy:
labels:
gpu: "true"
path: ./ml-model
inputs:
frames: camera/frames
outputs:
- predictions
- id: actuator
_unstable_deploy:
machine: robot
path: ./actuator-driver
inputs:
commands: inference/predictions
2. Multi-Robot Fleet
A central coordinator manages N robots with heterogeneous hardware. Label scheduling routes nodes to the right machines without hardcoding machine IDs.
cluster.yml:
coordinator:
addr: 10.0.0.1
machines:
- id: bot-01
host: 10.0.0.11
user: robot
labels:
fleet: warehouse
lidar: "true"
- id: bot-02
host: 10.0.0.12
user: robot
labels:
fleet: warehouse
camera: rgbd
- id: bot-03
host: 10.0.0.13
user: robot
labels:
fleet: warehouse
lidar: "true"
camera: rgbd
dataflow.yml:
nodes:
- id: lidar-driver
_unstable_deploy:
labels:
lidar: "true"
path: ./lidar-driver
outputs:
- scans
- id: camera-driver
_unstable_deploy:
labels:
camera: rgbd
path: ./camera-driver
outputs:
- frames
With this configuration, lidar-driver runs on bot-01 or bot-03, and camera-driver runs on bot-02 or bot-03.
3. CI/CD Pipeline for Robotics
Automate cluster management in CI:
# Setup
adora cluster install cluster.yml
# Deploy new version
adora cluster upgrade cluster.yml
# Run integration tests
adora start test-dataflow.yml --name integration-test --attach
# Monitor
adora cluster status
adora top
# Cleanup
adora stop integration-test
4. Development to Production
| Stage | Approach | Command |
|---|---|---|
| Local dev | Single-process, no coordinator | adora run dataflow.yml |
| Staging | Ad-hoc daemons, manual setup | adora up + adora daemon on each machine |
| Production | Managed cluster, systemd services | adora cluster install cluster.yml |
Operations Runbook
Initial Setup Checklist
- SSH keys: Distribute SSH keys so the CLI machine can reach all cluster machines without a password (
BatchMode=yes). - Adora binary: Install the
adorabinary on all machines (same version). - Network: Ensure coordinator port (default 6013) is reachable from all machines. Ensure Zenoh ports are open between daemons for cross-machine node communication.
- cluster.yml: Create the cluster configuration with correct IPs, users, and labels.
Day-to-Day Operations
# Start a dataflow
adora start dataflow.yml --name my-app --attach
# List running dataflows
adora list
# Monitor resource usage
adora top
# View node logs
adora logs my-app <node-id> --follow
# Stop a dataflow
adora stop my-app
# Check cluster health
adora cluster status
Upgrading
- Build or download the new
adorabinary locally. - Run
adora cluster upgrade cluster.yml. - Verify with
adora cluster statusthat all daemons reconnected. - Running dataflows are automatically re-spawned via auto-recovery.
Troubleshooting
Daemon not connecting
- Verify the coordinator is running and reachable:
curl http://<addr>:6013/api/health(or check coordinator logs). - Check daemon logs:
journalctl -u adora-daemon-<id> -f(systemd) or the daemon’s stderr output (ad-hoc). - Confirm the
--coordinator-addrand--coordinator-portmatch the coordinator’s actual bind address.
SSH failures during cluster commands
- Ensure
ssh -o BatchMode=yes <user>@<host> echo okworks from the CLI machine. - Check that
StrictHostKeyChecking=accept-newis acceptable for your environment (first connection auto-accepts the host key). - Verify the
userfield incluster.ymlmatches a valid SSH user on the target.
Label mismatch errors
- Error:
no daemon matches labels {"gpu": "true"}. - Check that the daemon was started with the correct
--labelsflag. - Run
adora cluster statusto see connected daemons. Labels are set at daemon startup fromcluster.ymland cannot be changed at runtime.
Auto-recovery not triggering
- Auto-recovery only applies to coordinator-managed dataflows (
adora start), notadora run. - Check coordinator logs for
auto-recovery: re-spawningmessages. - If the node crashes immediately, recovery is throttled to once every 30 seconds per daemon per dataflow.
Deployment YAML Reference
The _unstable_deploy section on each node controls placement and distribution. All fields are optional.
nodes:
- id: my-node
_unstable_deploy:
machine: edge-01 # Target machine ID from cluster.yml
labels: # Label requirements (superset match)
gpu: "true"
arch: arm64
distribute: local # local | scp | http
working_dir: /opt/my-app # Working directory on the target machine
path: ./my-node
Fields
| Field | Type | Default | Description |
|---|---|---|---|
machine | string | none | Target machine ID. Takes priority over labels. |
labels | map | empty | Required daemon labels. All key-value pairs must match. |
distribute | string | local | Binary distribution strategy: local, scp, or http. |
working_dir | path | none | Working directory on the target machine. |
Resolution priority
- machine – if set, the node is assigned to the daemon with that machine ID.
- labels – if set (and machine is not), the node is assigned to the first daemon whose labels are a superset of the required labels.
- Fallback – if neither is set, the node is assigned to the first unnamed (no machine-id) daemon.
Best Practices
- Use labels over machine IDs for flexibility. Labels decouple your dataflow from specific machines, making it easier to add, remove, or replace hardware.
- Use systemd install for production. Daemon services survive reboots and auto-restart on failure with
Restart=on-failure. - Use coordinator persistence (
adora coordinator --store redb) with clusters so the coordinator survives restarts. See Coordinator State Persistence. - Set restart policies on nodes for per-node resilience. Combine with auto-recovery for defense in depth. See Restart Policies.
- Monitor with multiple tools:
adora cluster statusfor daemon health,adora topfor resource usage,adora logsfor node output. - Test locally first. Develop with
adora run dataflow.yml, then deploy to a cluster. The same dataflow YAML works in both modes –_unstable_deployfields are ignored in local mode. - Use rolling upgrades instead of stopping the entire cluster.
adora cluster upgradeprocesses one machine at a time to maintain availability. - Keep cluster.yml in version control alongside your dataflow definitions.
Performance
Adora achieves 10-17x lower latency than ROS2 Python through zero-copy shared memory IPC, Apache Arrow columnar format, and 100% Rust internals. This document covers methodology, reproduction, and tuning.
Architecture Advantages
| Layer | Adora | ROS2 (rclpy) |
|---|---|---|
| Runtime | Rust async (tokio) | Python + C++ middleware |
| IPC (>4KB) | Zero-copy shared memory | DDS serialization + copy |
| IPC (<4KB) | TCP with bincode | DDS serialization + copy |
| Data format | Apache Arrow (zero-serde) | CDR serialization |
| Threading | Lock-free channels (flume) | GIL-bound callbacks |
Benchmark Suite
Internal benchmarks (examples/benchmark/)
Measures Adora’s own latency and throughput across 10 payload sizes (0B to 4MB).
cd examples/benchmark
./compare.sh # Rust vs Python sender comparison
Metrics reported: avg, p50, p95, p99, p99.9, min, max latency; msg/s throughput.
ROS2 comparison (examples/ros2-comparison/)
Apples-to-apples comparison using identical Python workloads on both frameworks.
cd examples/ros2-comparison
./run_comparison.sh # Requires ROS2 Humble+
Both sides use time.perf_counter_ns() timestamps embedded in payload first 8 bytes. Same message count, sizes, and sleep intervals ensure comparable results.
Criterion micro-benchmarks
Isolated benchmarks for internal hot paths:
# Daemon message routing (fan-out x payload size matrix)
cargo bench -p adora-daemon
# Message serialization/deserialization
cargo bench -p adora-message
CI tracks these via benchmark-action/github-action-benchmark with 120% alert threshold.
Reproducing Results
Requirements
- Linux or macOS (shared memory IPC)
- Rust 1.85+ with release profile
- Python 3.10+ with
numpy,pyarrow - ROS2 Humble+ (for comparison only)
Steps
-
Build Adora:
cargo install --path binaries/cli --locked -
Run internal benchmark:
cd examples/benchmark BENCH_CSV=results/rust.csv adora run dataflow.yml -
Run ROS2 comparison:
cd examples/ros2-comparison ./run_comparison.sh
Environment Notes
- Close background applications to reduce variance
- Use
tasksetorcpusetto pin processes for consistent results - Run at least 3 iterations and report median
- Shared memory benefits appear at payloads >4KB
Performance Tuning
Queue sizes
Default queue size is 10. For high-throughput outputs, increase it:
inputs:
data:
source: producer/output
queue_size: 1000
Payload size
Adora automatically uses shared memory for messages >4KB, avoiding copies. Structure data to exceed this threshold when low latency matters.
Arrow format
Use Arrow arrays directly instead of converting to/from Python lists:
# Fast: pass Arrow array directly
node.send_output("out", pa.array(data, type=pa.uint8()))
# Slow: convert through Python list
node.send_output("out", pa.array(list(data), type=pa.uint8()))
Operator vs Node
Operators run in-process with the runtime (zero IPC overhead) but share the GIL in Python. Use Rust operators for compute-heavy work, Python operators for glue logic.
Distributed deployment
For cross-machine communication, Adora uses Zenoh pub-sub. Latency depends on network quality. Use local deployment (single-machine) when sub-millisecond latency is required.
CSV Output Format
All benchmarks support BENCH_CSV environment variable for machine-readable output:
latency,<bytes>,<label>,<n>,<avg_ns>,<p50_ns>,<p95_ns>,<p99_ns>,<p999_ns>,<min_ns>,<max_ns>
throughput,<bytes>,<label>,<n>,<msg_per_sec>,<elapsed_ns>,0,0,0,0,0
ROS2 Bridge
Adora provides a declarative YAML-based ROS2 bridge that lets any Adora node communicate with ROS2 topics, services, and actions without importing ROS2 libraries. You define the bridge in your dataflow YAML using the ros2: key, and the framework automatically spawns a bridge binary that converts between Apache Arrow (Adora’s native format) and ROS2 CDR/DDS. Your user nodes stay ROS2-free – they send and receive pure Arrow StructArray data.
Features at a Glance
| Feature | Config | Description |
|---|---|---|
| Topic subscribe | topic + direction: subscribe | Receive from ROS2, forward as Arrow |
| Topic publish | topic + direction: publish | Receive Arrow, publish to ROS2 |
| Multi-topic | topics | Multiple topics on a single ROS2 node |
| Service client | service + role: client | Send requests, receive responses |
| Service server | service + role: server | Receive requests, send responses |
| Action client | action + role: client | Send goals, receive feedback + result |
| Action server | action + role: server | Receive goals, send feedback + result |
| QoS policies | qos | Reliability, durability, history, liveliness |
| Auto-spawn | Automatic | Bridge binary spawned by daemon as a Custom node |
Architecture
When the Adora descriptor resolver encounters a ros2: key on a node, it converts it into a Custom node pointing to the adora-ros2-bridge-node binary. The bridge config is serialized as JSON into the ADORA_ROS2_BRIDGE_CONFIG environment variable.
User Node <--(Arrow/SharedMem)--> Bridge Binary <--(CDR/DDS)--> ROS2
The bridge binary:
- Reads
AMENT_PREFIX_PATHto locate installed ROS2 message packages - Parses message/service/action definitions at startup
- Creates a
ros2_clientnode and the appropriate publishers, subscribers, clients, or servers - Converts incoming ROS2 CDR messages to Arrow
StructArray(subscribe/response/feedback) - Converts incoming Arrow
StructArrayto ROS2 CDR messages (publish/request/goal)
Your user nodes never link against ROS2 – all ROS2 communication is isolated in the bridge binary.
Prerequisites
- ROS2 environment sourced:
AMENT_PREFIX_PATHmust be set and point to a workspace containing the required message packages - Message packages installed: e.g.,
turtlesim,geometry_msgs,example_interfaces - For service client: A ROS2 service server must be running (or use a companion server dataflow)
- For action client: A ROS2 action server must be running before starting the dataflow (no
wait_for_action_servermechanism) - For action server: A ROS2 action client sends goals to the bridge (e.g.,
ros2 action send_goal)
Topic Bridge
Single Topic (Subscribe)
Subscribe to a ROS2 topic and forward messages as Arrow data to downstream Adora nodes.
nodes:
- id: pose_bridge
ros2:
topic: /turtle1/pose
message_type: turtlesim/Pose
direction: subscribe # default, can be omitted
outputs:
- pose
The bridge creates a ROS2 subscription on /turtle1/pose, deserializes each incoming turtlesim/Pose message into an Arrow StructArray, and sends it on the pose output.
Single Topic (Publish)
Receive Arrow data from Adora nodes and publish to a ROS2 topic.
nodes:
- id: cmd_bridge
ros2:
topic: /turtle1/cmd_vel
message_type: geometry_msgs/Twist
direction: publish
inputs:
cmd_vel: planner/cmd_vel
The bridge receives Arrow data on the cmd_vel input, serializes it to geometry_msgs/Twist CDR, and publishes to /turtle1/cmd_vel.
Multi-Topic
Bridge multiple topics on a single ROS2 node context, mixing subscribe and publish directions.
nodes:
- id: turtle_bridge
ros2:
topics:
- topic: /turtle1/pose
message_type: turtlesim/Pose
direction: subscribe
output: pose
- topic: /turtle1/cmd_vel
message_type: geometry_msgs/Twist
direction: publish
input: velocity
qos:
reliable: true
keep_last: 10
inputs:
velocity: planner/cmd_vel
outputs:
- pose
Multi-topic mode supports up to 64 topics per bridge node.
Input/Output ID Mapping
By default, topic names are converted to Adora IDs by stripping the leading / and replacing remaining / with _:
| ROS2 Topic | Default Adora ID |
|---|---|
/turtle1/pose | turtle1_pose |
/camera/image_raw | camera_image_raw |
In multi-topic mode, you can override this with explicit output (for subscribe) or input (for publish) fields. In single-topic mode, the node’s declared outputs or inputs are used directly.
Service Bridge
Service Client
Send requests from Adora to an external ROS2 service and receive responses.
nodes:
- id: add_client
ros2:
service: /add_two_ints
service_type: example_interfaces/AddTwoInts
role: client
inputs:
request: requester/data
outputs:
- response
The bridge waits for the service to become available (up to 10 retries, 2 seconds each), then for each Arrow input it receives:
- Serializes the Arrow data as an
AddTwoInts_RequestCDR message - Sends the request to the ROS2 service
- Waits for a response (30-second timeout)
- Deserializes the response into Arrow and sends it on the
responseoutput
Service Server
Expose an Adora handler node as a ROS2 service that external ROS2 clients can call.
nodes:
- id: add_server
ros2:
service: /adora_add_two_ints
service_type: example_interfaces/AddTwoInts
role: server
inputs:
response: handler/result
outputs:
- request
- id: handler
path: path/to/handler-node
inputs:
request: add_server/request
outputs:
- result
The bridge receives ROS2 service requests, assigns each a unique request_id (UUID v7), forwards the request data as Arrow on the request output with request_id in metadata, and waits for the handler node to send a response back on the response input with the same request_id. The response is then returned to the correct ROS2 client.
See examples/ros2-bridge/yaml-bridge-service/ for a working example.
Request ID Correlation
Each incoming ROS2 request is assigned a request_id metadata parameter. The handler node must include the same request_id in metadata when sending the response. The simplest approach is to pass through metadata.parameters:
#![allow(unused)]
fn main() {
Event::Input { id, metadata, data } => {
// metadata.parameters contains request_id
let result = compute(data);
node.send_service_response("response".into(), metadata.parameters, result)?;
}
}
Responses can arrive in any order – the bridge correlates them by request_id, not by arrival order. Stale pending requests are evicted after 30 seconds. The maximum pending request queue is 64 – additional requests are dropped when full.
Service Wait and Timeouts
| Behavior | Value |
|---|---|
| Service client: wait for availability | 10 retries, 2s each (20s total) |
| Service client: response timeout | 30 seconds |
| Service server: pending request limit | 64 |
Action Bridge
Action Client
Send goals from Adora to an external ROS2 action server, receiving feedback and results.
nodes:
- id: fib_client
ros2:
action: /fibonacci
action_type: example_interfaces/Fibonacci
role: client
inputs:
goal: goal_sender/goal
outputs:
- feedback
- result
For each Arrow goal input:
- Serializes the Arrow data as a
Fibonacci_GoalCDR message - Sends the goal to the action server (30-second timeout)
- If accepted, spawns background threads for feedback and result
- Feedback messages arrive on the
feedbackoutput as they stream in - The final result arrives on the
resultoutput (5-minute timeout)
Feedback and Result Streams
The action bridge sends feedback and results on separate outputs:
feedback: Streamed as each feedback message arrives from the action server. Contains the action’s feedback message as Arrow (e.g.,{partial_sequence: int32[]}for Fibonacci)result: Sent once when the action completes. Contains the action’s result message as Arrow (e.g.,{sequence: int32[]}for Fibonacci)
Concurrent Goals
The bridge supports up to 8 concurrent in-flight goals (MAX_CONCURRENT_GOALS). Additional goals are dropped with a warning. Each goal spawns dedicated feedback and result reader threads.
Timeouts
| Behavior | Value |
|---|---|
| Goal send timeout | 30 seconds |
| Result retrieval timeout | 5 minutes |
| Feedback | No timeout (streams until action completes) |
Action Server
Expose an Adora handler node as a ROS2 action server that external ROS2 clients can call.
nodes:
- id: fib_server
ros2:
action: /fibonacci
action_type: example_interfaces/Fibonacci
role: server
inputs:
feedback: handler/feedback
result: handler/result
outputs:
- goal
- id: handler
path: path/to/handler-node
inputs:
goal: fib_server/goal
outputs:
- feedback
- result
The bridge receives goals from ROS2 clients, auto-accepts them, and forwards the goal data on the goal output. The handler computes feedback and results and sends them back on the feedback and result inputs.
See examples/ros2-bridge/yaml-bridge-action-server/ for a working Fibonacci example.
Goal ID Metadata
Each goal is identified by a UUID string passed as a goal_id metadata parameter. The bridge sets goal_id on every goal output. The handler must include the same goal_id in metadata when sending feedback and result so the bridge can correlate them to the correct goal.
The simplest approach is to pass through metadata.parameters from the goal event:
#![allow(unused)]
fn main() {
Event::Input { id, metadata, data } => match id.as_str() {
"goal" => {
let params = metadata.parameters; // contains goal_id
// ... compute ...
node.send_output("feedback".into(), params.clone(), feedback)?;
node.send_output("result".into(), params, result)?;
}
// ...
}
}
Action Server Lifecycle
- ROS2 client sends a goal request
- Bridge auto-accepts the goal and starts executing
- Bridge sends goal data on
goaloutput withgoal_idin metadata - Handler sends
feedback(zero or more times) with samegoal_id - Handler sends
result(once) with samegoal_id; bridge returns it to the ROS2 client - Result send times out after 5 minutes if the client never requests it
Goals that contain no data or cannot be forwarded to the handler are automatically aborted – the bridge sends Aborted status back to the ROS2 client so it does not hang indefinitely.
Goal Status
By default, results are returned with Succeeded status. The handler can override this by setting a goal_status metadata parameter on the result output:
goal_status value | ROS2 Status | Use case |
|---|---|---|
"succeeded" (or omitted) | Succeeded | Goal completed successfully |
"aborted" | Aborted | Goal failed during execution |
"canceled" | Canceled | Goal was canceled by the handler |
Unrecognized goal_status values default to Aborted with a warning logged. Omitting goal_status entirely defaults to Succeeded.
Rust example:
#![allow(unused)]
fn main() {
use adora_node_api::{GOAL_STATUS, GOAL_STATUS_ABORTED, Parameter};
let mut params = metadata.parameters; // contains goal_id
params.insert(GOAL_STATUS.to_string(), Parameter::String(GOAL_STATUS_ABORTED.to_string()));
node.send_output("result".into(), params, error_result)?;
}
Action Server Limits
| Behavior | Value |
|---|---|
| Max concurrent goals | 8 (additional goals receive Aborted status) |
| Auto-accept | All goals are auto-accepted |
| Result send timeout | 5 minutes |
Python Action Server Handler
Python nodes receive goal data as PyArrow arrays with goal_id in the metadata dictionary. Pass it through on feedback/result outputs:
for event in node:
if event["type"] == "INPUT" and event["id"] == "goal":
goal_id = event["metadata"]["goal_id"]
order = event["value"]["order"][0].as_py()
# Send feedback
node.send_output("feedback", feedback_array, {"goal_id": goal_id})
# Send result (with optional status)
node.send_output("result", result_array, {
"goal_id": goal_id,
"goal_status": "succeeded", # or "aborted", "canceled"
})
C++ Action Server Handler
C++ nodes access goal_id via type-safe metadata accessors:
auto goal_id = metadata->get_str("goal_id");
// Send feedback with goal_id
auto fb_metadata = new_metadata();
fb_metadata->set_string("goal_id", goal_id);
send_arrow_output_with_metadata("feedback", feedback_data, fb_metadata);
// Send result with goal_id
auto res_metadata = new_metadata();
res_metadata->set_string("goal_id", goal_id);
send_arrow_output_with_metadata("result", result_data, res_metadata);
Quality of Service (QoS)
Configuration
Set QoS at the bridge level (applies to all topics/channels) or per-topic in multi-topic mode.
nodes:
- id: my_bridge
ros2:
topic: /sensor/data
message_type: sensor_msgs/LaserScan
qos:
reliable: true
durability: transient_local
keep_last: 10
liveliness: automatic
lease_duration: 5.0
max_blocking_time: 0.5
Defaults
| Field | Default |
|---|---|
reliable | false (best effort) |
durability | volatile |
liveliness | automatic |
lease_duration | infinity |
max_blocking_time | 100ms (only applies when reliable: true) |
keep_last | 1 |
keep_all | false |
Per-Topic QoS Override
In multi-topic mode, each topic can override the bridge-level QoS:
ros2:
topics:
- topic: /fast_sensor
message_type: sensor_msgs/Imu
direction: subscribe
qos:
reliable: false # override: best effort for this topic
keep_last: 1
- topic: /cmd
message_type: geometry_msgs/Twist
direction: publish
# inherits bridge-level QoS (reliable: true)
qos:
reliable: true # default for all topics
keep_last: 10
Validation Rules
| Field | Valid Values |
|---|---|
reliable | true, false |
durability | "volatile", "transient_local" |
liveliness | "automatic", "manual_by_participant", "manual_by_topic" |
keep_last | 1 to 10000 |
keep_all | true, false (mutually exclusive intent with keep_last) |
lease_duration | Finite non-negative float (seconds) |
max_blocking_time | Finite non-negative float (seconds) |
Data Format: Arrow Structs
All data exchanged between your nodes and the bridge uses Arrow StructArray with a single row. Each field in the ROS2 message becomes a column in the struct.
How to Build Arrow Messages
Rust example: building an AddTwoInts_Request ({a: i64, b: i64}):
#![allow(unused)]
fn main() {
use std::sync::Arc;
use arrow::array::{Array, Int64Array, StructArray};
use arrow::datatypes::{DataType, Field};
fn make_add_request(a: i64, b: i64) -> StructArray {
let fields = vec![
Arc::new(Field::new("a", DataType::Int64, false)),
Arc::new(Field::new("b", DataType::Int64, false)),
];
let arrays: Vec<Arc<dyn Array>> = vec![
Arc::new(Int64Array::from(vec![a])),
Arc::new(Int64Array::from(vec![b])),
];
StructArray::try_new(fields.into(), arrays, None)
.expect("failed to create struct array")
}
}
Reading a response ({sum: i64}):
#![allow(unused)]
fn main() {
use arrow::array::{Int64Array, StructArray};
fn read_response(data: &dyn arrow::array::Array) -> i64 {
let struct_array = data
.as_any()
.downcast_ref::<StructArray>()
.expect("expected struct array");
struct_array
.column_by_name("sum")
.expect("missing 'sum' field")
.as_any()
.downcast_ref::<Int64Array>()
.expect("expected Int64Array")
.value(0)
}
}
Mapping ROS2 Types to Arrow Types
| ROS2 Type | Arrow Type | Rust Arrow Array |
|---|---|---|
bool | Boolean | BooleanArray |
int8 | Int8 | Int8Array |
int16 | Int16 | Int16Array |
int32 | Int32 | Int32Array |
int64 | Int64 | Int64Array |
uint8 / byte / char | UInt8 | UInt8Array |
uint16 | UInt16 | UInt16Array |
uint32 | UInt32 | UInt32Array |
uint64 | UInt64 | UInt64Array |
float32 | Float32 | Float32Array |
float64 | Float64 | Float64Array |
string | Utf8 | StringArray |
wstring | Utf8 (encoded as UTF-16 on CDR side) | StringArray |
| Nested message | Struct | StructArray |
Sequences and Arrays
| ROS2 Type | Arrow Type | Rust Arrow Array |
|---|---|---|
Variable-length sequence (int32[]) | List | ListArray |
Bounded sequence (int32[<=10]) | List (length validated) | ListArray |
Fixed-size array (int32[3]) | FixedSizeList | FixedSizeListArray |
Example: reading a ListArray from Fibonacci feedback ({partial_sequence: int32[]}):
#![allow(unused)]
fn main() {
use arrow::array::{Int32Array, ListArray, StructArray};
let struct_array = data.as_any().downcast_ref::<StructArray>().unwrap();
let list = struct_array
.column_by_name("partial_sequence")
.unwrap()
.as_any()
.downcast_ref::<ListArray>()
.unwrap();
let values = list
.value(0)
.as_any()
.downcast_ref::<Int32Array>()
.unwrap()
.values()
.to_vec();
}
Complete YAML Reference
nodes:
- id: my_bridge
ros2:
# --- Mode (exactly one required) ---
# Single topic mode
topic: /topic_name # ROS2 topic name
message_type: package/TypeName # ROS2 message type
direction: subscribe # subscribe (default) | publish
# Multi-topic mode (mutually exclusive with topic)
topics:
- topic: /topic_a
message_type: package/TypeA
direction: subscribe
output: custom_output_id # override default ID mapping
qos: # per-topic QoS override
reliable: true
- topic: /topic_b
message_type: package/TypeB
direction: publish
input: custom_input_id # override default ID mapping
# Service mode (mutually exclusive with topic/topics/action)
service: /service_name # ROS2 service name
service_type: package/TypeName # ROS2 service type
role: client # client | server
# Action mode (mutually exclusive with topic/topics/service)
action: /action_name # ROS2 action name
action_type: package/TypeName # ROS2 action type
role: client # client | server
# --- QoS (optional, applies to all channels) ---
qos:
reliable: false # true | false (default: false = best effort)
durability: volatile # volatile (default) | transient_local
liveliness: automatic # automatic | manual_by_participant | manual_by_topic
lease_duration: 5.0 # seconds (default: infinity)
max_blocking_time: 0.1 # seconds (default: 0.1, reliable only)
keep_last: 1 # 1-10000 (default: 1)
keep_all: false # true | false (default: false)
# --- Optional ROS2 node config ---
namespace: / # ROS2 namespace (default: "/")
node_name: my_ros_node # ROS2 node name (default: adora node id)
# --- Standard Adora node fields ---
inputs:
input_id: source_node/output_id
outputs:
- output_id
Use Case Scenarios
1. Subscribe to Sensor Data (turtlesim pose)
nodes:
- id: pose_bridge
ros2:
topic: /turtle1/pose
message_type: turtlesim/Pose
outputs:
- pose
- id: my_processor
path: ./target/debug/my-processor
inputs:
pose: pose_bridge/pose
#![allow(unused)]
fn main() {
// In my_processor: receive turtlesim/Pose as Arrow
Event::Input { id, data, .. } if id.as_str() == "pose" => {
let s = data.as_any().downcast_ref::<StructArray>().unwrap();
let x = s.column_by_name("x").unwrap()
.as_any().downcast_ref::<Float32Array>().unwrap().value(0);
let y = s.column_by_name("y").unwrap()
.as_any().downcast_ref::<Float32Array>().unwrap().value(0);
println!("Turtle at ({x}, {y})");
}
}
2. Publish Velocity Commands
nodes:
- id: planner
path: ./target/debug/planner
inputs:
tick: adora/timer/millis/100
outputs:
- cmd_vel
- id: cmd_bridge
ros2:
topic: /turtle1/cmd_vel
message_type: geometry_msgs/Twist
direction: publish
inputs:
cmd_vel: planner/cmd_vel
#![allow(unused)]
fn main() {
// In planner: send geometry_msgs/Twist as Arrow
// Twist has nested Vector3 fields: linear {x,y,z} and angular {x,y,z}
fn make_twist(linear_x: f64, angular_z: f64) -> StructArray {
let vec3_fields = vec![
Arc::new(Field::new("x", DataType::Float64, false)),
Arc::new(Field::new("y", DataType::Float64, false)),
Arc::new(Field::new("z", DataType::Float64, false)),
];
let linear = StructArray::try_new(
vec3_fields.clone().into(),
vec![
Arc::new(Float64Array::from(vec![linear_x])) as _,
Arc::new(Float64Array::from(vec![0.0])) as _,
Arc::new(Float64Array::from(vec![0.0])) as _,
],
None,
).unwrap();
let angular = StructArray::try_new(
vec3_fields.into(),
vec![
Arc::new(Float64Array::from(vec![0.0])) as _,
Arc::new(Float64Array::from(vec![0.0])) as _,
Arc::new(Float64Array::from(vec![angular_z])) as _,
],
None,
).unwrap();
let fields = vec![
Arc::new(Field::new("linear", linear.data_type().clone(), false)),
Arc::new(Field::new("angular", angular.data_type().clone(), false)),
];
StructArray::try_new(
fields.into(),
vec![Arc::new(linear) as _, Arc::new(angular) as _],
None,
).unwrap()
}
}
3. Multi-Topic Bidirectional Bridge
Subscribe to pose and publish velocity on a single ROS2 node.
nodes:
- id: turtle_bridge
ros2:
topics:
- topic: /turtle1/pose
message_type: turtlesim/Pose
direction: subscribe
output: pose
- topic: /turtle1/cmd_vel
message_type: geometry_msgs/Twist
direction: publish
input: velocity
qos:
reliable: true
keep_last: 10
inputs:
velocity: planner/cmd_vel
outputs:
- pose
- id: planner
path: ./target/debug/planner
inputs:
pose: turtle_bridge/pose
tick: adora/timer/millis/100
outputs:
- cmd_vel
4. Service Client: Call an External ROS2 Service
nodes:
- id: requester
path: ./target/debug/requester
inputs:
tick: adora/timer/millis/1000
response: add_client/response
outputs:
- request
- id: add_client
ros2:
service: /add_two_ints
service_type: example_interfaces/AddTwoInts
role: client
inputs:
request: requester/request
outputs:
- response
Prerequisites: run a ROS2 service first:
ros2 run examples_rclcpp_minimal_service service_main
5. Service Server: Expose an Adora Handler as ROS2 Service
nodes:
- id: add_server
ros2:
service: /add_two_ints
service_type: example_interfaces/AddTwoInts
role: server
inputs:
response: handler/response
outputs:
- request
- id: handler
path: ./target/debug/handler
inputs:
request: add_server/request
outputs:
- response
The handler receives {a: i64, b: i64} as Arrow, computes the result, and sends {sum: i64} back. External ROS2 clients can call this service:
ros2 service call /add_two_ints example_interfaces/srv/AddTwoInts "{a: 3, b: 5}"
6. Action Client: Long-Running Fibonacci Goal
nodes:
- id: goal_sender
path: ./target/debug/goal-sender
inputs:
tick: adora/timer/millis/5000
feedback: fib_client/feedback
result: fib_client/result
outputs:
- goal
- id: fib_client
ros2:
action: /fibonacci
action_type: example_interfaces/Fibonacci
role: client
inputs:
goal: goal_sender/goal
outputs:
- feedback
- result
Prerequisites: start the action server before the dataflow:
ros2 run examples_rclcpp_action_server fibonacci_action_server
The goal node sends {order: int32}, receives streamed {partial_sequence: int32[]} feedback, and a final {sequence: int32[]} result.
7. Action Server: Expose an Adora Handler as ROS2 Action
nodes:
- id: fib_server
ros2:
action: /fibonacci
action_type: example_interfaces/Fibonacci
role: server
inputs:
feedback: handler/feedback
result: handler/result
outputs:
- goal
- id: handler
path: ./target/debug/handler
inputs:
goal: fib_server/goal
outputs:
- feedback
- result
The handler receives {order: int32} goals with a goal_id in metadata, sends {partial_sequence: int32[]} feedback, and a final {sequence: int32[]} result – all with the same goal_id in metadata. External ROS2 clients can send goals:
ros2 action send_goal /fibonacci example_interfaces/action/Fibonacci "{order: 10}"
Limitations and Known Constraints
- Action server auto-accept: All incoming goals are automatically accepted. The handler cannot reject goals before execution starts.
- No action cancel support: Neither client nor server handles ROS2 cancel requests.
- No
wait_for_action_server: Theros2_clientlibrary does not provide this API. Start the action server before the dataflow. The first goal will time out (30s) if the server is unavailable. - Single-flight service client: The service client processes requests sequentially – each request blocks until the response arrives (or times out at 30s).
- QoS uniform for service/action channels: The
qosconfig applies to all service/action sub-channels (goal, result, cancel, feedback, status). Per-channel QoS is not configurable. AMENT_PREFIX_PATHrequired: The bridge fails at startup if no ROS2 message definitions are found.- Max 64 topics: Multi-topic mode supports at most 64 topics per bridge node.
- Max 8 concurrent action goals: Additional goals receive
Abortedstatus when the limit is reached. - Max 64 pending service requests (server): Requests are dropped when the queue is full.
Best Practices
Source your ROS2 environment before running. Ensure AMENT_PREFIX_PATH is set and includes all required message packages. The bridge logs an error if no definitions are found.
Start action servers before the dataflow. There is no wait mechanism for action servers. If the server is not ready, the first goal send will time out after 30 seconds.
Use multi-topic mode for related topics. Bridging /turtle1/pose (subscribe) and /turtle1/cmd_vel (publish) on the same bridge node reduces resource usage compared to two separate bridge nodes.
Match Arrow field names exactly. The bridge validates that Arrow struct field names match the ROS2 message definition. Missing fields use default values (zero for numbers, empty string). Extra fields cause an error.
Use explicit output/input in multi-topic mode. Default ID mapping (stripping /, replacing / with _) can be confusing for deep topic names. Explicit IDs make the dataflow YAML self-documenting.
Set QoS to match the ROS2 publisher/subscriber. QoS mismatches (e.g., reliable subscriber with best-effort publisher) cause silent communication failures. Check with ros2 topic info -v /topic_name to see the existing QoS settings.
Pass through request_id in service responses. The bridge correlates responses to requests using the request_id metadata parameter. If the handler does not include request_id in the response metadata, the bridge cannot match the response to the original ROS2 request.
WebSocket Control Plane
Adora’s control plane uses WebSocket connections for all communication between the CLI, coordinator, and daemons. A single Axum server exposes three routes on one port, replacing the previous multi-port TCP design. JSON text frames carry a UUID-correlated request-reply protocol with fire-and-forget events for log streaming.
Features at a Glance
| Feature | Detail |
|---|---|
| Routes | /api/control (CLI), /api/daemon (daemons), /health |
| Wire format | JSON text frames + binary frames for topic data |
| Protocol | UUID-correlated request-reply + fire-and-forget events |
| Message size limit | 1 MiB (MAX_CONTROL_MESSAGE_BYTES) |
| Concurrency limit | 256 connections (MAX_WS_CONNECTIONS) |
| Server framework | Axum + Tower middleware |
| Client library | tokio-tungstenite (integration tests, daemon), custom WsSession (CLI) |
| Security | Re-register guard, daemon ID verification, machine ID length limit |
Architecture
Single Axum server (one port)
┌────────────────────────────┐
│ /api/control (CLI) │
CLI ──── WS ────────>│ /api/daemon (Daemons) │
│ /health (HTTP GET) │
Daemon ── WS ───────>│ │
└──────────┬─────────────────┘
│ mpsc::Sender<Event>
v
Coordinator
(event loop)
The coordinator binds a single TcpListener and serves an Axum router. Each WebSocket upgrade spawns a handler task that communicates with the coordinator’s main event loop through an mpsc::Sender<Event> channel.
Key source files
| File | Role |
|---|---|
binaries/coordinator/src/ws_server.rs | Router, serve(), constants, ShutdownTrigger |
binaries/coordinator/src/ws_control.rs | /api/control handler |
binaries/coordinator/src/ws_daemon.rs | /api/daemon handler, security, event translation |
binaries/cli/src/ws_client.rs | WsSession synchronous client wrapper |
libraries/message/src/ws_protocol.rs | WsRequest, WsResponse, WsEvent, WsMessage types |
Wire Protocol
All messages are JSON text frames. Three message shapes exist:
WsRequest (client -> server)
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"method": "control",
"params": { "List": null }
}
| Field | Type | Description |
|---|---|---|
id | UUID | Unique request identifier for reply correlation |
method | string | "control" for CLI requests, "daemon_event" / "daemon_command" for daemon |
params | object | Serialized ControlRequest or Timestamped<CoordinatorRequest> |
WsResponse (server -> client)
Success:
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"result": { "DataflowList": [] }
}
Error:
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"error": "no running dataflow with id ..."
}
| Field | Type | Description |
|---|---|---|
id | UUID | Matches the originating request id |
result | object? | Present on success (serialized ControlRequestReply) |
error | string? | Present on failure |
WsEvent (either direction)
{
"event": "log",
"payload": { "message": "sensor started", "level": "info" }
}
Used for log streaming after a LogSubscribe/BuildLogSubscribe is acknowledged.
Dispatch
Each handler parses incoming frames with its own strategy to preserve u128 fidelity (see u128 serialization):
- CLI (
ws_client.rs): Uses a flatIncomingFramestruct withserde_json::value::RawValuefor theresult/payloadfields, avoidingserde_json::Valueentirely. Discriminates by presence ofevent(log push) orid(response). - Coordinator control handler (
ws_control.rs): Parses asWsRequest(always a request from CLI). - Coordinator daemon handler (
ws_daemon.rs): Checks for"method"key to distinguish requests vs responses. UsesDaemonWsRequestRawhelper for requests. - Daemon (
coordinator.rs): UsesCoordinatorCommandRaw/RegisterReplyRawhelper structs to parse directly from raw JSON text.
A WsMessage untagged enum is defined in ws_protocol.rs for generic dispatch but is not used by the production handlers:
#![allow(unused)]
fn main() {
#[serde(untagged)]
pub enum WsMessage {
Request(WsRequest),
Response(WsResponse),
Event(WsEvent),
}
}
CLI Control Plane (/api/control)
The CLI connects to /api/control to send ControlRequest commands and receive ControlRequestReply responses.
Connection lifecycle
- Connect – HTTP upgrade to WebSocket
- Request-reply – CLI sends
WsRequest, coordinator processes theControlRequest, sendsWsResponse - Log subscribe (optional) – CLI sends
LogSubscribe/BuildLogSubscribe, coordinator acks withWsResponse, then pushesWsEvent{event:"log"}frames - Close – CLI sends
Closeframe or drops connection
Supported ControlRequest variants
| Variant | Description |
|---|---|
List | List all running dataflows |
Build | Trigger a dataflow build |
WaitForBuild | Block until build completes |
Start | Start a dataflow |
WaitForSpawn | Block until nodes are spawned |
Stop / StopByName | Stop a running dataflow |
Reload | Hot-reload a node/operator |
Check | Check dataflow status |
Destroy | Tear down all daemons |
Logs | Retrieve historical logs |
Info | Get dataflow details |
DaemonConnected | Check if any daemon is connected |
ConnectedMachines | List connected daemons |
LogSubscribe | Subscribe to live dataflow logs |
BuildLogSubscribe | Subscribe to live build logs |
CliAndDefaultDaemonOnSameMachine | Check co-location |
GetNodeInfo | Get node metadata |
TopicSubscribe | Subscribe to live topic data via binary WS frames (details) |
TopicUnsubscribe | Cancel a topic subscription |
Log subscription flow
CLI Coordinator
│ │
│─── WsRequest{LogSubscribe} ─>│
│ │ (check dataflow exists)
│<── WsResponse{subscribed} ───│
│ │
│<── WsEvent{event:"log"} ────│ (repeated)
│<── WsEvent{event:"log"} ────│
│ │
│─── Close ───────────────────>│ (log_subscribers dropped)
If the dataflow is not found, the coordinator returns WsResponse with an error and no events are sent.
WsSession (CLI client)
WsSession is a synchronous wrapper that bridges blocking CLI code to the async WebSocket connection. It creates an internal tokio::runtime::Runtime (current-thread) and spawns an async session_loop task.
CLI thread (sync) session_loop (async)
│ │
│── SessionCommand::Request ────────────>│── WsRequest ──> server
│ │<── WsResponse ──
│<── oneshot reply ─────────────────────│
│ │
│── SessionCommand::SubscribeLogs ──────>│── WsRequest ──> server
│ │<── WsResponse (ack)
│<── oneshot ack ───────────────────────│
│<── std_mpsc log events ───────────────│<── WsEvent ──
The session loop maintains:
pending_requests: HashMap<Uuid, oneshot::Sender>– for request-reply correlationpending_subscribes: HashMap<Uuid, (ack_tx, log_tx)>– for subscribe ack routinglog_subscribers: Vec<std_mpsc::Sender>– for broadcasting log eventspending_topic_subscribes: HashMap<Uuid, (ack_tx, data_tx)>– for topic subscribe ack routingtopic_subscribers: HashMap<Uuid, std_mpsc::Sender>– for binary frame dispatch by subscription UUID
Binary WS frames (topic data) are dispatched separately from text frames. See WebSocket Topic Data Channel for details.
On disconnect, all pending requests receive an error via their oneshot channels.
Daemon Plane (/api/daemon)
Daemons connect to /api/daemon for registration, event reporting, and receiving coordinator commands.
Registration flow
Daemon Coordinator
│ │
│── WsRequest{Register} ─────>│
│ │ (validate, assign daemon_id)
│ │ (track connection + cmd channel)
│ │
│── WsRequest{Event{...}} ───>│ (subsequent events)
- Daemon sends a
Registerrequest containingDaemonRegisterRequest(version + machine ID) - Coordinator validates version compatibility and machine ID length
- Coordinator assigns a
DaemonIdand stores theDaemonConnection(includescmd_txchannel for sending commands back to the daemon) - The connection is tracked via
tracked_daemon_idfor cleanup on disconnect
Event translation
Daemon events are translated into coordinator-internal Event variants:
| DaemonEvent | Coordinator Event |
|---|---|
AllNodesReady | Event::Dataflow { ReadyOnDaemon } |
AllNodesFinished | Event::Dataflow { DataflowFinishedOnDaemon } |
Heartbeat | Event::DaemonHeartbeat |
Log(message) | Event::Log(message) |
Exit | Event::DaemonExit |
NodeMetrics | Event::NodeMetrics |
BuildResult | Event::DataflowBuildResult |
SpawnResult | Event::DataflowSpawnResult |
Bidirectional communication
The coordinator can send commands back to daemons via the cmd_tx channel stored in DaemonConnection. The daemon handler maintains a pending_replies: HashMap<Uuid, oneshot::Sender> to correlate daemon responses to coordinator-initiated requests.
Message routing on the daemon handler:
- Frame has
"method"key -> daemon request (registration or event) - Frame lacks
"method"key -> daemon response to a coordinator command
u128 serialization workaround
uhlc::ID contains a NonZeroU128 which exceeds serde_json::Value::Number range (i64/u64/f64 only). Using serde_json::to_value() errors with “number out of range”, and serde_json::from_slice::<Value>() silently loses precision by storing as f64.
All production code bypasses serde_json::Value for data containing uhlc::Timestamp:
| Component | Serialization | Deserialization |
|---|---|---|
Daemon (coordinator.rs) | to_string + format! | Helper structs (RegisterReplyRaw, CoordinatorCommandRaw) + from_str |
Coordinator control (ws_control.rs) | to_string + format! for replies | N/A (CLI requests don’t contain u128) |
Coordinator daemon (ws_daemon.rs) | N/A | DaemonWsRequestRaw + from_str |
Coordinator state (state.rs) | str::from_utf8 + format! (raw bytes embedding) | N/A |
CLI (ws_client.rs) | N/A (requests don’t contain u128) | IncomingFrame with serde_json::value::RawValue |
Integration tests similarly construct WsRequest JSON strings manually via format!() + serde_json::to_string() (not to_value()) to match the real wire format.
Security
Re-register guard
Each daemon WebSocket connection allows exactly one Register request. If a connection attempts a second registration, the coordinator logs a warning and closes the connection:
daemon attempted re-register on same connection, rejecting
Daemon ID verification
After registration, every Event message must include a daemon_id matching the one assigned during registration. Mismatched IDs cause connection termination:
daemon sent event with mismatched id: expected `X`, got `Y` -- closing connection
Machine ID length validation
The machine_id field in DaemonRegisterRequest is limited to 256 bytes. Oversized values cause connection termination.
Connection and message limits
| Limit | Value | Enforced by |
|---|---|---|
| Max message size | 1 MiB | WebSocketUpgrade::max_message_size |
| Max concurrent connections | 256 | Tower ConcurrencyLimitLayer |
Connection Lifecycle & Keepalive
Establishment
Both /api/control and /api/daemon use standard HTTP/1.1 WebSocket upgrade. The Axum WebSocketUpgrade extractor handles the handshake.
Ping/pong
Both handlers respond to Ping frames with Pong frames containing the same payload:
#![allow(unused)]
fn main() {
Ok(Message::Ping(data)) => {
let _ = ws_tx.send(Message::Pong(data)).await;
continue;
}
}
Graceful close
When a Close frame is received:
- Control handler: breaks the handler loop, dropping log subscriber channels
- Daemon handler: breaks the loop, then emits
Event::DaemonExit { daemon_id }for immediate cleanup
Cleanup on disconnect
Control connections:
log_txchannel is dropped, stopping log forwarding to that client- No coordinator state to clean up (control connections are stateless)
Daemon connections:
DaemonExitevent is emitted if adaemon_idwas trackedcmd_txandpending_repliesare dropped- Coordinator removes the daemon from its connection map
WsSession (CLI client):
- All entries in
pending_requestsreceiveErr("WS connection closed") - All entries in
pending_subscribesreceiveErr("WS connection closed")
Message Flow Examples
CLI lists dataflows
CLI WsSession Coordinator
│ │ │
│── request(&List) ───────────>│ │
│ │── WsRequest ────────────────>│
│ │ id: "abc-123" │
│ │ method: "control" │
│ │ params: "List" │
│ │ │
│ │ ControlEvent::IncomingRequest
│ │ reply via oneshot
│ │ │
│ │<── WsResponse ──────────────│
│ │ id: "abc-123" │
│ │ result: {DataflowList:[]} │
│ │ │
│<── ControlRequestReply ─────│ │
Daemon registration
Daemon Coordinator
│ │
│── WsRequest ─────────────────────────────>│
│ method: "daemon_event" │
│ params: {inner: Register{...}, │
│ timestamp: ...} │
│ │ validate version
│ │ validate machine_id
│ │ assign daemon_id
│ │ store DaemonConnection
│ │
│── WsRequest{Event{Heartbeat}} ──────────>│
│ │ Event::DaemonHeartbeat
│ │
│ (on WS close) ────>│ Event::DaemonExit
Log subscription lifecycle
CLI WsSession Coordinator
│ │ │
│── subscribe_logs() ───>│ │
│ │── WsRequest ──────────>│
│ │ params: LogSubscribe │
│ │ │ find dataflow
│ │<── WsResponse ────────│ {subscribed: true}
│<── ack (Ok) ──────────│ │
│ │ │
│ │<── WsEvent{log} ──────│ (node produces log)
│<── log_rx.recv() ─────│ │
│ │<── WsEvent{log} ──────│
│<── log_rx.recv() ─────│ │
│ │ │
│ (drop session) ─────>│── Close ─────────────>│ (log_subscribers dropped)
Test Coverage
Test tiers
| Tier | Location | Tests | What’s covered |
|---|---|---|---|
| Unit (protocol) | libraries/message/src/ws_protocol.rs | 10 | Roundtrip serialization, untagged dispatch, error cases |
| Unit (client) | binaries/cli/src/ws_client.rs | 6 | Response routing, subscribe ack, topic subscribe ack, orphan handling, disconnect |
| Integration (control) | binaries/coordinator/tests/ws_control_tests.rs | 11 | Health check, List, invalid JSON/params, Destroy, DaemonConnected, ping/pong, concurrent requests, connection close, log subscribe |
| Integration (daemon) | binaries/coordinator/tests/ws_daemon_tests.rs | 4 | Register, register-then-status, disconnect cleanup, ping/pong |
| E2E (WsSession) | tests/ws-cli-e2e.rs | 4 | WsSession + coordinator: list, status, stop, multi-request |
| Total | 35 |
Key test patterns
Poll-with-timeout: Integration tests poll coordinator state (e.g., DaemonConnected) with a 2-second deadline and 20ms sleep intervals, avoiding flaky timing assumptions.
No nested runtimes: E2E tests run the coordinator on a background std::thread with its own tokio runtime, while WsSession (which creates its own current-thread runtime) runs on the test’s main thread. This avoids the “cannot start a runtime from within a runtime” panic.
u128 workaround in tests: Daemon test helpers construct WsRequest JSON strings manually via format!() + serde_json::to_string() (not serde_json::to_value()) to preserve uhlc::ID u128 values on the wire.
Test coordinator setup: Both integration and E2E tests use adora_coordinator::start_testing() which binds to port 0 (OS-assigned) and accepts an empty external event stream.
Configuration Reference
Constants
| Constant | Value | File | Purpose |
|---|---|---|---|
MAX_CONTROL_MESSAGE_BYTES | 1 MiB (1,048,576) | ws_server.rs | Max WebSocket frame size |
MAX_WS_CONNECTIONS | 256 | ws_server.rs | Tower concurrency limit |
Server setup
#![allow(unused)]
fn main() {
// Production: called by coordinator's main startup
let (port, shutdown, future) = ws_server::serve(bind_addr, event_tx, clock).await?;
tokio::spawn(future);
// ...
shutdown.shutdown(); // graceful stop
}
Test setup
#![allow(unused)]
fn main() {
// Binds to port 0, returns (port, future)
let (port, future) = adora_coordinator::start_testing(
"127.0.0.1:0".parse().unwrap(),
futures::stream::empty(),
).await?;
}
Shutdown
ShutdownTrigger wraps a oneshot::Sender<()>. Calling .shutdown() sends the signal, which the Axum server receives via with_graceful_shutdown. In-flight requests complete; new connections are rejected.
WebSocket Topic Data Channel
The topic data channel extends the WebSocket control plane to proxy live dataflow messages from the coordinator to CLI clients. Instead of requiring direct Zenoh network access, CLI commands like topic echo, topic hz, and topic info receive message data over the existing WebSocket connection as binary frames.
Motivation
| Scenario | Before (Zenoh direct) | After (WS proxy) |
|---|---|---|
| CLI on same machine as daemon | Works | Works |
| CLI remote, Zenoh reachable | Works | Works |
| CLI remote, no Zenoh access | Fails | Works |
| Browser-based web UI | Impossible | Possible |
| Embedded target, no local disk | Cannot record locally | --proxy streams to CLI |
The key insight: CLI and future web UIs connect to the coordinator via WebSocket. By having the coordinator subscribe to Zenoh on their behalf and forward messages as binary frames, topic inspection works anywhere the WebSocket connection reaches.
Architecture
CLI ──── WS (binary frames) ────> Coordinator ──── Zenoh sub ────> Daemon
(Zenoh proxy) (debug publish)
The coordinator acts as a Zenoh proxy:
- CLI sends a
TopicSubscriberequest over the existing text-frame WS protocol - Coordinator validates the dataflow and opens Zenoh subscribers
- Coordinator forwards each Zenoh sample as a binary WS frame back to the CLI
- CLI dispatches binary frames by subscription UUID to the appropriate consumer
Key source files
| File | Role |
|---|---|
libraries/message/src/cli_to_coordinator.rs | TopicSubscribe, TopicUnsubscribe request variants |
libraries/message/src/coordinator_to_cli.rs | TopicSubscribed reply variant |
binaries/coordinator/src/ws_control.rs | Zenoh proxy: subscribe, forward binary frames |
binaries/coordinator/src/control.rs | ControlEvent::TopicSubscribe for validation |
binaries/cli/src/ws_client.rs | WsSession::subscribe_topics(), binary frame dispatch |
binaries/cli/src/command/topic/echo.rs | Topic echo via WS |
binaries/cli/src/command/topic/hz.rs | Topic frequency measurement via WS |
binaries/cli/src/command/topic/info.rs | Topic metadata/stats via WS |
binaries/cli/src/command/record.rs | --proxy flag for WS-based recording |
Wire Protocol
Subscription handshake (JSON text frames)
The subscription uses the existing UUID-correlated request-reply protocol:
Request (CLI -> Coordinator):
{
"id": "abc-123",
"method": "control",
"params": {
"TopicSubscribe": {
"dataflow_id": "550e8400-...",
"topics": [["camera_node", "image"], ["lidar_node", "points"]]
}
}
}
Response (Coordinator -> CLI):
{
"id": "abc-123",
"result": {
"TopicSubscribed": {
"subscription_id": "7f1b3a00-..."
}
}
}
Unsubscribe (CLI -> Coordinator):
{
"id": "def-456",
"method": "control",
"params": {
"TopicUnsubscribe": {
"subscription_id": "7f1b3a00-..."
}
}
}
Binary data frames
After the handshake, the coordinator pushes binary WS frames. Each frame has a fixed-size header:
0 16 N
├───────────────────┼──────────────────────────────┤
│ subscription_id │ Timestamped<InterDaemonEvent>│
│ (16 bytes UUID) │ (bincode serialized) │
└───────────────────┴──────────────────────────────┘
| Field | Size | Description |
|---|---|---|
subscription_id | 16 bytes | UUID matching the TopicSubscribed ack, for multiplexing |
| payload | variable | Raw Timestamped<InterDaemonEvent> bincode bytes from Zenoh |
The 16-byte UUID prefix allows multiplexing multiple subscriptions on a single WS connection without additional framing overhead.
Data Flow
CLI WsSession Coordinator
│ │ │
│── subscribe_topics() ───────>│ │
│ │── WsRequest{TopicSubscribe} >│
│ │ │ validate dataflow
│ │ │ open Zenoh session (lazy)
│ │ │ spawn subscriber tasks
│ │<── WsResponse{TopicSubscribed}│
│<── (sub_id, data_rx) ───────│ │
│ │ │
│ │ ┌── Zenoh sample ──────│ Daemon publishes
│ │<──────│ Binary frame │
│<── data_rx.recv() ──────────│ │ (sub_id + payload) │
│ │ │ │
│ │<──────│ Binary frame │
│<── data_rx.recv() ──────────│ │ │
│ │ └ │
│ │ │
│ (drop session) ───────────>│── Close ────────────────────>│ abort subscriber tasks
Coordinator internals
- Validation:
ControlEvent::TopicSubscribeis sent to the coordinator event loop, which checks that the dataflow exists and haspublish_all_messages_to_zenoh: trueenabled - Lazy Zenoh: The coordinator’s Zenoh session is opened on the first
TopicSubscriberequest and reused for subsequent subscriptions on the same WS connection - Per-topic tasks: Each
(node_id, data_id)pair spawns a tokio task that subscribes to the corresponding Zenoh topic and forwards samples to the binary frame channel - Backpressure: The binary frame channel has capacity 64.
try_sendis used – if the channel is full (slow consumer), samples are silently dropped rather than blocking the Zenoh subscriber - Cleanup: When the WS connection closes, all subscriber tasks are aborted
WsSession (CLI side)
The WsSession::subscribe_topics() method:
- Serializes a
TopicSubscriberequest - Sends
SessionCommand::SubscribeTopicsthrough the internal command channel - The async
session_loopwraps it as aWsRequestand sends it - On receiving the
TopicSubscribedack, registers thedata_txsender intopic_subscriberskeyed bysubscription_id - Binary frames are dispatched by extracting the first 16 bytes as UUID and sending the remainder to the matching
data_tx
State maintained in session_loop:
pending_topic_subscribes: HashMap<Uuid, (ack_tx, data_tx)>– awaiting acktopic_subscribers: HashMap<Uuid, Sender>– active subscriptions receiving binary data
Prerequisites
The dataflow descriptor must enable debug message publishing:
_unstable_debug:
publish_all_messages_to_zenoh: true
Without this, the coordinator rejects the TopicSubscribe with:
dataflow {id} not found or publish_all_messages_to_zenoh not enabled
CLI Commands
adora topic echo
Stream topic data to the terminal in real-time.
# Echo a single topic
adora topic echo -d my-dataflow camera_node/image
# Echo multiple topics
adora topic echo -d my-dataflow robot1/pose robot2/vel
# JSON output for piping
adora topic echo -d my-dataflow robot1/pose --format json
Internally: calls session.subscribe_topics(), receives Timestamped<InterDaemonEvent> from the data_rx channel, deserializes Arrow data, and renders as table or JSON.
adora topic hz
Interactive TUI displaying per-topic publish frequency statistics.
# All topics
adora topic hz -d my-dataflow --window 10
# Specific topics
adora topic hz -d my-dataflow robot1/pose robot2/vel --window 5
Uses ratatui for the TUI. A background std::thread receives events from data_rx and dispatches to per-topic HzStats trackers via a BTreeMap<(node_id, data_id), index> lookup.
adora topic info
One-shot topic metadata and statistics.
adora topic info -d my-dataflow camera_node/image --duration 5
Collects messages for --duration seconds, then displays type information, publisher, subscribers (from descriptor), message count, and bandwidth.
adora record --proxy
Stream dataflow data through WebSocket for local recording.
# Start dataflow first
adora start dataflow.yml --detach
# Record via proxy (data streams through coordinator to CLI)
adora record dataflow.yml --proxy -o capture.adorec
# Record specific topics
adora record dataflow.yml --proxy --topics sensor/image,lidar/points
Use case: the target machine (running the daemon) has no local disk or limited storage. The --proxy flag routes data through the coordinator WebSocket to the CLI machine, where the .adorec file is written locally.
Without --proxy (default), a record node is injected into the dataflow and records directly on the daemon’s machine.
Zenoh Topic Format
The coordinator subscribes to Zenoh topics using the format from adora_core::topics::zenoh_output_publish_topic():
adora/{dataflow_id}/{node_id}/{data_id}
Each topic carries Timestamped<InterDaemonEvent> as its payload, serialized with bincode. The coordinator forwards these bytes as-is (prepended with subscription UUID) – no re-serialization.
Backpressure and Performance
| Parameter | Value | Rationale |
|---|---|---|
| Binary frame channel capacity | 64 | Balance between latency and memory |
| Drop policy | Drop on full | Prefer freshness over completeness |
| Binary format | Raw bincode (no base64) | Avoid 33% overhead for large payloads |
For high-throughput topics (camera images, point clouds), the binary frame channel may fill up if the WS connection is slow. Dropped samples are silent – the CLI will show reduced frequency in topic hz but won’t stall.
Error Handling
| Error | Source | Response |
|---|---|---|
| Dataflow not found | Coordinator validation | WsResponse with error message |
publish_all_messages_to_zenoh not enabled | Coordinator validation | WsResponse with error message |
| Zenoh session open failure | Coordinator | WsResponse with error message |
| Zenoh subscriber failure | Per-topic task | Warning log, task exits |
| Binary frame too short (<16 bytes) | CLI session_loop | Warning log, frame dropped |
| Unknown subscription UUID | CLI session_loop | Frame dropped silently |
| WS connection closed | Either side | All tasks aborted, pending acks get error |
Test Coverage
| Tier | Location | What’s covered |
|---|---|---|
| Unit (client) | binaries/cli/src/ws_client.rs | handle_response_topic_subscribe_ack – verifies ack routing and subscriber registration |
| Unit (all existing) | binaries/cli/src/ws_client.rs | Updated to pass topic subscribe state through handle_response |
The TopicSubscribe / binary frame path is primarily validated via integration testing with a running coordinator and Zenoh session. See Testing Guide for smoke test instructions.
Adora Testing Guide
This guide covers how to run, write, and troubleshoot tests across the Adora workspace.
Quick Start (5-minute validation)
Run these three commands to validate that the workspace is healthy:
# 1. Format check (~5s)
cargo fmt --all -- --check
# 2. Lint (~60s first run, cached after)
cargo clippy --all \
--exclude adora-node-api-python \
--exclude adora-operator-api-python \
--exclude adora-ros2-bridge-python \
-- -D warnings
# 3. Unit + integration tests (~90s first run)
cargo test --all \
--exclude adora-node-api-python \
--exclude adora-operator-api-python \
--exclude adora-ros2-bridge-python
All three must pass before opening a PR. Python packages are excluded because they require maturin.
Test Tiers
| Tier | What it covers | Command | Speed |
|---|---|---|---|
| Format | Code style | cargo fmt --all -- --check | ~5s |
| Lint | Warnings, correctness | cargo clippy --all ... | ~60s |
| Unit | Individual functions | cargo test --all ... | ~90s |
| CLI | Command parsing, validation | cargo test -p adora-cli | ~5s |
| Integration | Node I/O via env vars | cargo test --test example-tests | ~30s |
| Smoke | Full CLI lifecycle | cargo test --test example-smoke -- --test-threads=1 | ~3min |
| E2E | Multi-dataflow scenarios | cargo test --test ws-cli-e2e -- --ignored --test-threads=1 | ~2min |
| Fault tolerance | Restart policies, timeouts | cargo test --test fault-tolerance-e2e | ~45s |
| Typos | Spelling | Install typos-cli, then typos | ~2s |
Tier Details
Unit Tests
Unit tests live alongside the code they test using #[cfg(test)] modules. Key crates with tests:
| Crate | Test count | What’s tested |
|---|---|---|
| adora-arrow-convert | ~26 | Round-trip Arrow type conversions |
| adora-cli | ~96 | Command parsing, value parsers, log grep/filtering, JSON parsing, WebSocket client, cluster config |
| adora-coordinator | ~24 | WS control/daemon plane, health check, concurrent requests, artifact store, rate limiter, error sanitization |
| adora-coordinator-store | ~10 | In-memory and redb CRUD, schema versioning, persistence |
| adora-core | ~8 | Dataflow descriptor validation |
| adora-daemon | ~2 | Shlex argument parsing |
| adora-node-api | ~10 | Input tracking, service/action helpers (ID generation, send_service_request/response) |
| adora-log-utils | ~11 | Log parsing utilities |
| adora-message | ~36 | Common types, WS protocol, node/data IDs, metadata, auth tokens |
| ros2-bridge | ~30 | ROS2 message/service/action parsing |
Run a single crate’s tests:
cargo test -p adora-cli
cargo test -p adora-core
cargo test -p adora-arrow-convert
CLI Tests
CLI tests verify command parsing, argument validation, and value parsers without running any commands. They live in #[cfg(test)] modules inside the CLI crate.
What’s tested:
- Clap schema validation (
Args::command().debug_assert()) - Parsing of every subcommand (
run,up,down,start,stop,list,logs,build,graph,new,status,inspect top,topic list/hz/echo,node list) - Rejection of unknown subcommands
--helpand--versionexit codes- Value parsers:
parse_store_spec(coordinator store backend),parse_window(topic hz window) - Utility functions:
parse_version_from_pip_show
How to run:
cargo test -p adora-cli
How to add new tests:
When adding a new CLI subcommand or value parser, add a corresponding test in the #[cfg(test)] module of the same file. For subcommand parsing, add a parse_ok call in binaries/cli/src/command/mod.rs. For value parsers, add tests in the file that defines the parser function.
Integration Tests (Node I/O)
File: tests/example-tests.rs
These tests run compiled node executables with pre-recorded inputs and compare outputs against expected baselines. No coordinator or daemon is needed.
cargo test --test example-tests
How it works:
- Builds and runs a node crate (e.g.,
rust-dataflow-example-node) - Sets
ADORA_TEST_WITH_INPUTSto a JSON file with timed events - Sets
ADORA_TEST_NO_OUTPUT_TIME_OFFSET=1for deterministic output - Compares JSONL output against
tests/sample-inputs/expected-outputs-*.jsonl
Sample input/output files live in tests/sample-inputs/.
Smoke Tests
File: tests/example-smoke.rs
Two execution modes are tested for each applicable example:
- Networked (
adora up+adora start --detach+ poll +adora stop+adora down): exercises the full coordinator/daemon WS control plane. - Local (
adora run --stop-after): runs everything in-process, testing the single-process dataflow path.
# Must run single-threaded (shared coordinator port)
cargo test --test example-smoke -- --test-threads=1
# Run only networked or local tests
cargo test --test example-smoke smoke_rust -- --test-threads=1
cargo test --test example-smoke smoke_local -- --test-threads=1
A bash script is also available for quick local validation:
./scripts/smoke-all.sh # all examples
./scripts/smoke-all.sh --rust-only # Rust examples only
./scripts/smoke-all.sh --python-only # Python examples only
Networked tests (17):
| Test | Example | Timeout |
|---|---|---|
smoke_rust_dataflow | rust-dataflow/dataflow.yml | 30s |
smoke_rust_dataflow_dynamic | rust-dataflow/dataflow_dynamic.yml | 30s |
smoke_rust_dataflow_socket | rust-dataflow/dataflow_socket.yml | 30s |
smoke_rust_dataflow_url | rust-dataflow-url/dataflow.yml | 30s |
smoke_benchmark | benchmark/dataflow.yml | 30s |
smoke_log_sink_file | log-sink-file/dataflow.yml | 30s |
smoke_log_sink_alert | log-sink-alert/dataflow.yml | 30s |
smoke_log_sink_tcp | log-sink-tcp/dataflow.yml | 30s |
smoke_python_dataflow | python-dataflow/dataflow.yml | 30s |
smoke_python_async | python-async/dataflow.yaml | 15s |
smoke_python_drain | python-drain/dataflow.yaml | 15s |
smoke_python_log | python-log/dataflow.yaml | 15s |
smoke_python_logging | python-logging/dataflow.yml | 15s |
smoke_python_multiple_arrays | python-multiple-arrays/dataflow.yml | 15s |
smoke_python_concurrent_rw | python-concurrent-rw/dataflow.yml | 15s |
smoke_service_example | service-example/dataflow.yml | 30s |
smoke_action_example | action-example/dataflow.yml | 30s |
Local tests (9):
| Test | Example | stop-after |
|---|---|---|
smoke_local_python_dataflow | python-dataflow/dataflow.yml | 30s |
smoke_local_python_async | python-async/dataflow.yaml | 10s |
smoke_local_python_drain | python-drain/dataflow.yaml | 10s |
smoke_local_python_log | python-log/dataflow.yaml | 10s |
smoke_local_python_logging | python-logging/dataflow.yml | 10s |
smoke_local_python_multiple_arrays | python-multiple-arrays/dataflow.yml | 10s |
smoke_local_python_concurrent_rw | python-concurrent-rw/dataflow.yml | 10s |
smoke_local_service_example | service-example/dataflow.yml | 10s |
smoke_local_action_example | action-example/dataflow.yml | 10s |
Examples requiring special dependencies (webcam, CUDA, ROS2, C/C++ toolchain, multi-machine deploy) are not included in smoke tests.
E2E Tests (WebSocket CLI)
File: tests/ws-cli-e2e.rs
Two groups:
Non-ignored (fast): Start an in-process coordinator and test WsSession directly:
cargo test --test ws-cli-e2e
cli_list_empty– empty dataflow listingcli_status_no_daemon– daemon connectivity checkcli_stop_nonexistent– error for missing dataflowscli_multiple_requests_same_session– session reuse
Ignored (full stack): Use adora up with real nodes:
cargo test --test ws-cli-e2e -- --ignored --test-threads=1
e2e_start_list_stop– start, list, stop lifecyclee2e_sequential_dataflows– two dataflows in sequence
Fault Tolerance Tests
File: tests/fault-tolerance-e2e.rs
These test restart policies and input timeouts using Daemon::run_dataflow directly (no CLI needed).
cargo test --test fault-tolerance-e2e
Tests:
restart_recovers_from_failure– node withrestart_policy: on-failuresurvives panics (15s)max_restarts_limit_reached– node exhaustsmax_restarts: 2budget (15s)input_timeout_closes_stale_input–input_timeout: 2.0sfires when upstream stops (10s)
Dataflow YAMLs for these tests live in tests/dataflows/.
Coordinator Integration Tests
Files: binaries/coordinator/tests/ws_control_tests.rs, binaries/coordinator/tests/ws_daemon_tests.rs
These start an in-process coordinator and test the WebSocket control/daemon planes.
cargo test -p adora-coordinator
Topics covered: health check, list/stop/destroy requests, invalid JSON/params, concurrent requests, ping/pong, daemon registration, disconnect cleanup, error sanitization (no internal chain leaks), artifact store cleanup on drop.
CI Pipeline
CI runs on push/PR to main. See .github/workflows/ci.yml.
fmt ──────────────┐
clippy ────────────┤ (all run in parallel)
test ──────────────┤
typos ─────────────┘
│
e2e (depends on test)
| Job | Runner | What runs |
|---|---|---|
| fmt | ubuntu-latest | cargo fmt --all -- --check |
| clippy | ubuntu-latest | cargo clippy --all ... -- -D warnings |
| test | ubuntu-latest | cargo test --all ... (excl. Python + adora-examples) |
| e2e | ubuntu-latest | example-tests, fault-tolerance, smoke tests, WS E2E |
| typos | ubuntu-latest | crate-ci/typos@master |
The e2e job only runs after test passes. All other jobs run in parallel.
Writing New Tests
Unit tests
Add a #[cfg(test)] module in the same file as the code under test:
#![allow(unused)]
fn main() {
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn parses_valid_input() {
let result = parse("valid");
assert_eq!(result, expected);
}
}
}
Integration tests for nodes
Use the integration testing framework in adora-node-api. Three approaches:
1. setup_integration_testing (recommended)
Call before the node’s main function to inject inputs and capture outputs:
#![allow(unused)]
fn main() {
#[test]
fn test_main_function() -> eyre::Result<()> {
let events = vec![
TimedIncomingEvent {
time_offset_secs: 0.01,
event: IncomingEvent::Input {
id: "tick".into(),
metadata: None,
data: None,
},
},
TimedIncomingEvent {
time_offset_secs: 0.055,
event: IncomingEvent::Stop,
},
];
let inputs = TestingInput::Input(
IntegrationTestInput::new("node_id".parse().unwrap(), events),
);
let (tx, rx) = flume::unbounded();
let outputs = TestingOutput::ToChannel(tx);
let options = TestingOptions { skip_output_time_offsets: true };
integration_testing::setup_integration_testing(inputs, outputs, options);
crate::main()?;
let outputs = rx.try_iter().collect::<Vec<_>>();
assert_eq!(outputs, expected_outputs);
Ok(())
}
}
2. Environment variable mode
Test the compiled executable directly, closest to production behavior:
ADORA_TEST_WITH_INPUTS=path/to/inputs.json \
ADORA_TEST_NO_OUTPUT_TIME_OFFSET=1 \
ADORA_TEST_WRITE_OUTPUTS_TO=/tmp/out.jsonl \
cargo run -p my-node
3. AdoraNode::init_testing
For testing node logic without going through main:
#![allow(unused)]
fn main() {
let (node, events) = AdoraNode::init_testing(inputs, outputs, Default::default())?;
}
Generating test input files
Record real dataflow events by setting ADORA_WRITE_EVENTS_TO:
ADORA_WRITE_EVENTS_TO=/tmp/recorded-events adora run examples/rust-dataflow/dataflow.yml
This writes inputs-{node_id}.json files that can be used directly with ADORA_TEST_WITH_INPUTS.
Workspace-level integration tests
Add new test files in the tests/ directory. For tests that need the full CLI stack, follow the patterns in tests/example-smoke.rs:
Networked pattern (exercises coordinator + daemon):
- Build nodes with
Onceguards (avoid rebuilding per test) - Clean up stale processes with
adora down - Start cluster with
adora up - Run dataflow with
adora start --detach - Poll
adora list --jsonfor completion - Clean up with
adora stop --allandadora down
Local pattern (single-process, in-process coordinator):
- Build CLI with
Onceguard - Run
adora run <yaml> --stop-after <duration> - Assert exit code is success
Conventions
- Use
assert2::assert!for better error messages (available as dev-dependency) - Use
tempfile::NamedTempFilefor temporary output files - E2E tests that need exclusive port access should be
#[ignore]and run with--test-threads=1 - Async tests use
#[tokio::test(flavor = "multi_thread")] - Fault tolerance test dataflows go in
tests/dataflows/ - Sample input/output baselines go in
tests/sample-inputs/
Troubleshooting
cargo test fails to compile Python packages
Always exclude Python packages:
cargo test --all \
--exclude adora-node-api-python \
--exclude adora-operator-api-python \
--exclude adora-ros2-bridge-python
Smoke/E2E tests fail with “address already in use”
A stale coordinator or daemon is still running. Clean up:
adora down
# or kill processes manually:
pkill -f adora-coordinator
pkill -f adora-daemon
Smoke tests hang or timeout
- Increase the timeout in the test if your machine is slow (look for
Duration::from_secs(...)) - Check that example nodes build successfully:
cargo build -p rust-dataflow-example-node -p rust-dataflow-example-status-node \ -p rust-dataflow-example-sink -p rust-dataflow-example-sink-dynamic cargo build -p log-sink-file -p log-sink-alert -p log-sink-tcp cargo build --release -p benchmark-example-node -p benchmark-example-sink - For Python smoke tests, ensure
pyarrowandnumpyare installed
E2E tests fail when run in parallel
Smoke and ignored E2E tests must run single-threaded:
cargo test --test example-smoke -- --test-threads=1
cargo test --test ws-cli-e2e -- --ignored --test-threads=1
Integration test output doesn’t match expected
- Check that
ADORA_TEST_NO_OUTPUT_TIME_OFFSET=1is set (time offsets vary per machine) - Regenerate baselines if the node’s behavior intentionally changed:
ADORA_TEST_WITH_INPUTS=tests/sample-inputs/inputs-rust-node.json \ ADORA_TEST_NO_OUTPUT_TIME_OFFSET=1 \ ADORA_TEST_WRITE_OUTPUTS_TO=tests/sample-inputs/expected-outputs-rust-node.jsonl \ cargo run -p rust-dataflow-example-node
Typos check fails
The typos config is in _typos.toml. To add a false-positive exclusion:
[default.extend-identifiers]
MyCustomIdent = "MyCustomIdent"
Tests pass locally but fail in CI
- CI runs on Ubuntu; check for platform-specific assumptions (paths, process signals)
- CI uses
rust-cacheso dependency versions may differ from your local lockfile - Ensure
cargo fmt --all -- --checkpasses (CI enforces this)