Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Dora CLI Reference

Dora (AI-Dora, Dataflow-Oriented Robotic Architecture) is a 100% Rust framework for building real-time robotics and AI applications. This document covers the dora CLI from both an end-user and developer perspective.

Table of Contents


Quick Start

# Create a new project
dora new my-robot --kind dataflow --lang rust

# Run locally (no coordinator/daemon needed)
dora run dataflow.yml

# Or use coordinator/daemon for production
dora up
dora start dataflow.yml --attach
# Ctrl-C to stop
dora down

Installation

cargo install dora-cli

From source

cargo install --path binaries/cli --locked

Verify

dora --version
dora status

Core Concepts

Dataflow

A dataflow is a directed graph of nodes connected by typed data channels. Nodes produce outputs that other nodes consume as inputs. The framework handles data routing, serialization (Apache Arrow), and lifecycle management.

Execution Modes

ModeCommandInfrastructureUse case
Localdora runNoneDevelopment, testing, single-machine
Distributeddora up + dora startCoordinator + Daemon(s)Production, multi-machine

Component Roles

CLI  -->  Coordinator  -->  Daemon(s)  -->  Nodes / Operators
              (control plane)  (per machine)    (user code)
  • CLI: User interface. Sends commands, displays logs.
  • Coordinator: Orchestrates dataflow lifecycle across machines.
  • Daemon: Spawns node processes, manages IPC, collects metrics.
  • Node: A standalone process that produces and consumes Arrow data.
  • Operator: In-process code running inside a shared runtime (lower latency than nodes).

Data Format

All data flows through the system as Apache Arrow columnar arrays. This enables zero-copy shared memory transfer between co-located nodes and zero-serialization overhead.


Dataflow Descriptor

Dataflows are defined in YAML files. Here is the complete schema:

Minimal Example

nodes:
  - id: sender
    path: sender.py
    outputs:
      - message

  - id: receiver
    path: receiver.py
    inputs:
      message: sender/message

Full Schema

# Dataflow-level settings
health_check_interval: 5.0    # health check sweep interval in seconds (default: 5.0)

nodes:
  - id: my-node                 # unique identifier (required)
    name: "My Node"             # human-readable name (optional)
    description: "..."          # description (optional)

    # --- Source (pick one) ---
    path: ./target/debug/my-node          # local executable
    # path: https://example.com/node.zip  # download from URL
    # git: https://github.com/org/repo.git  # build from git
    #   branch: main            # git branch (mutually exclusive with tag/rev)
    #   tag: v1.0               # git tag
    #   rev: abc123             # git commit hash

    # --- Build ---
    build: cargo build -p my-node   # shell command to build (optional)

    # --- Inputs ---
    inputs:
      # Short form: source_node/output_id
      tick: dora/timer/millis/100
      data: other-node/output

      # Long form with options
      sensor_data:
        source: sensor/frames
        queue_size: 10            # input buffer size (default: 10)
        queue_policy: drop_oldest # or "backpressure" (buffers up to 10x queue_size)
        input_timeout: 5.0        # circuit breaker timeout in seconds

    # --- Outputs ---
    outputs:
      - processed
      - status

    # --- Environment ---
    env:
      MY_VAR: "value"
      FROM_ENV:
        __dora_env: HOST_VAR     # read from host environment
    args: "--verbose"             # command-line arguments

    # --- Fault tolerance ---
    restart_policy: on-failure    # never (default) | on-failure | always
    max_restarts: 5               # 0 = unlimited
    restart_delay: 1.0            # initial backoff in seconds
    max_restart_delay: 30.0       # backoff cap in seconds
    restart_window: 300.0         # reset counter after N seconds
    health_check_timeout: 30.0    # kill if no activity for N seconds

    # --- Logging ---
    min_log_level: info           # source-level filter (daemon-side)
    send_stdout_as: raw_output    # route raw stdout as data output
    send_logs_as: log_entries     # route structured logs as data output
    max_log_size: "50MB"          # rotate log files at this size
    max_rotated_files: 5          # number of rotated files to keep (1-100)

    # --- Deployment ---
    _unstable_deploy:
      machine: A                  # target machine/daemon ID

# Debug settings
_unstable_debug:
  publish_all_messages_to_zenoh: true   # required for topic echo/hz/info

Built-in Timer Nodes

Timers are virtual nodes that emit ticks at fixed intervals:

inputs:
  tick: dora/timer/millis/100   # every 100ms
  slow: dora/timer/millis/1000  # every 1s
  fast: dora/timer/hz/30        # 30 Hz (~33ms)

Operator Nodes

Operators run in-process inside a shared runtime (no separate process):

nodes:
  # Single operator (shorthand)
  - id: detector
    operator:
      python: detect.py
      build: pip install -r requirements.txt
      inputs:
        image: camera/frames
      outputs:
        - bbox

  # Multiple operators sharing a runtime
  - id: runtime-node
    operators:
      - id: preprocessor
        shared-library: ../../target/debug/libpreprocess
        inputs:
          raw: sensor/data
        outputs:
          - processed
      - id: analyzer
        shared-library: ../../target/debug/libanalyze
        inputs:
          data: runtime-node/preprocessor/processed
        outputs:
          - result

Distributed Deployment

Assign nodes to specific machines using _unstable_deploy:

nodes:
  - id: camera-driver
    _unstable_deploy:
      machine: robot-arm
    path: ./target/debug/camera
    outputs:
      - frames

  - id: ml-inference
    _unstable_deploy:
      machine: gpu-server
    path: ./target/debug/inference
    inputs:
      frames: camera-driver/frames
    outputs:
      - predictions

When nodes are on different machines, communication automatically switches from shared memory to Zenoh pub/sub.


Command Reference

Lifecycle Commands

dora run

Run a dataflow locally without coordinator or daemon. Best for development and testing.

dora run <PATH> [OPTIONS]
Argument/FlagDefaultDescription
<PATH>requiredPath to dataflow descriptor YAML
--stop-after <DURATION>Auto-stop after duration (e.g., 30s, 5m)
--uvfalseUse uv for Python node management
--debugfalseEnable debug topics (equivalent to publish_all_messages_to_zenoh: true)
--allow-shell-nodesfalseEnable shell-based node execution
--log-level <LEVEL>stdoutMin display level: error|warn|info|debug|trace|stdout
--log-format <FORMAT>prettyOutput format: pretty|json|compact
--log-filter <FILTER>Per-node level overrides: "node1=debug,node2=warn"

Examples:

# Basic run
dora run dataflow.yml

# Stop after 10 seconds, only show warnings
dora run dataflow.yml --stop-after 10s --log-level warn

# Python dataflow with uv
dora run dataflow.yml --uv

# Debug one node, silence others
dora run dataflow.yml --log-level warn --log-filter "sensor=debug"

# JSON output for CI pipelines
dora run dataflow.yml --log-format json --stop-after 30s 2>test.json

dora up

Start coordinator and daemon in local mode.

dora up

Spawns dora coordinator and dora daemon as background processes. Waits for both to be ready before returning. Idempotent: if already running, does nothing.

dora down (alias: dora destroy)

Tear down coordinator and daemon. Stops all running dataflows first.

dora down [OPTIONS]
FlagDefaultDescription
--coordinator-addr <IP>127.0.0.1Coordinator address
--coordinator-port <PORT>6013Coordinator port

dora build

Run build commands defined in the dataflow descriptor.

dora build <PATH> [OPTIONS]
FlagDefaultDescription
<PATH>requiredDataflow descriptor path
--uvfalseUse uv for Python builds
--localfalseForce local build (skip coordinator)
--strict-typesfalseTreat type warnings as errors (non-zero exit code)

Type checking: After expanding modules, build runs the same type checks as validate. Warnings are printed by default; use --strict-types (or set strict_types: true in the YAML) to fail the build on type mismatches. User-defined types in a types/ directory next to the dataflow are loaded automatically.

Build strategy: If nodes have _unstable_deploy sections and a coordinator is reachable, builds are distributed to target machines. Otherwise, builds run locally.

Git sources: Nodes with a git: field are cloned/updated before building. The build command runs from the git repository root.

dora start

Start a dataflow on a running coordinator.

dora start <PATH> [OPTIONS]
FlagDefaultDescription
<PATH>requiredDataflow descriptor path
--name <NAME>, -nAssign a name to the dataflow
--attachautoAttach to log stream and wait for completion
--detachautoReturn immediately after spawn
--debugfalseEnable debug topics (equivalent to publish_all_messages_to_zenoh: true)
--hot-reloadfalseWatch Python files and reload on change
--uvfalseUse uv for Python nodes
--coordinator-addr <IP>127.0.0.1Coordinator address
--coordinator-port <PORT>6013Coordinator port

If neither --attach nor --detach is specified: attaches if running in a TTY, detaches otherwise.

Attach mode: Streams logs, handles Ctrl-C gracefully (first = stop, second = force kill).

Hot reload: Watches Python operator source files. On change, sends a reload request to the coordinator which propagates to the daemon.

dora stop

Stop a running dataflow.

dora stop [UUID_OR_NAME] [OPTIONS]
FlagDefaultDescription
[UUID_OR_NAME]interactiveDataflow UUID or name
--name <NAME>, -nAlternative name specification
--grace-duration <DURATION>Graceful shutdown timeout
--force, -ffalseImmediate termination
--coordinator-addr <IP>127.0.0.1Coordinator address
--coordinator-port <PORT>6013Coordinator port

If no identifier is given and running in a TTY, presents an interactive picker.

Stop sequence: Send Event::Stop -> wait grace duration -> SIGTERM -> hard kill.

dora restart

Restart a running dataflow (stop + re-start with stored descriptor). No YAML path needed – the coordinator retains the original descriptor.

dora restart [UUID] [OPTIONS]
FlagDefaultDescription
[UUID]Dataflow UUID
--name <NAME>, -nRestart by name instead of UUID
--grace-duration <DURATION>Graceful shutdown timeout for the stop phase
--force, -ffalseForce kill before restart
--coordinator-addr <IP>127.0.0.1Coordinator address
--coordinator-port <PORT>6013Coordinator port

Examples:

# Restart by name
dora restart --name my-app

# Restart by UUID with forced stop
dora restart a1b2c3d4-... --force

dora record

Record dataflow messages to an .adorec file for offline replay. See Debugging Guide for full workflows.

dora record <DATAFLOW_YAML> [OPTIONS]
FlagDefaultDescription
<DATAFLOW_YAML>requiredPath to dataflow descriptor
-o, --output <PATH>recording_{timestamp}.adorecOutput file path
--topics <TOPICS>allComma-separated node/output topics to record
--proxyfalseStream via WebSocket instead of recording on target
--output-yaml <PATH>Write modified YAML without running (dry run)

Default mode injects a record node into the dataflow. --proxy mode requires a running dataflow and publish_all_messages_to_zenoh: true.

dora replay

Replay a recorded .adorec file by replacing source nodes with replay nodes. See Debugging Guide for full workflows.

dora replay <FILE> [OPTIONS]
FlagDefaultDescription
<FILE>requiredPath to .adorec recording
--speed <FLOAT>1.0Playback speed (0 = max speed)
--loopfalseLoop the recording
--replace <NODE_IDS>all recordedComma-separated nodes to replace
--output-yaml <PATH>Write modified YAML without running (dry run)

Monitoring Commands

dora list (alias: dora ps)

List running dataflows with metrics.

dora list [OPTIONS]
FlagDefaultDescription
--format <FMT>, -ftableOutput format: table|json
--status <STATUS>Filter: running|finished|failed
--name <PATTERN>Filter by name (case-insensitive substring)
--sort-by <FIELD>Sort by: cpu|memory
--quiet, -qfalsePrint only UUIDs
--coordinator-addr <IP>127.0.0.1Coordinator address
--coordinator-port <PORT>6013Coordinator port

Output columns: UUID, Name, Status, Nodes, CPU, Memory

dora logs

Show and follow logs of a dataflow and node.

dora logs [UUID_OR_NAME] [NODE] [OPTIONS]
FlagDefaultDescription
[UUID_OR_NAME]Dataflow UUID or name
[NODE]Node name (required unless --all-nodes)
--all-nodesfalseMerge logs from all nodes by timestamp
--tail <N>allShow last N lines
--follow, -ffalseStream new log entries
--localfalseRead from local out/ directory
--since <DURATION>Show logs newer than duration ago
--until <DURATION>Show logs older than duration ago
--level <LEVEL>stdoutMin log level
--log-format <FORMAT>prettyOutput format
--log-filter <FILTER>Per-node level overrides
--grep <PATTERN>Case-insensitive text search
--coordinator-addr <IP>127.0.0.1Coordinator address
--coordinator-port <PORT>6013Coordinator port

Filter pipeline: Read/Parse -> Time filters -> Grep -> Tail -> Display

Examples:

# Follow all nodes live
dora logs my-dataflow --all-nodes --follow

# Last 50 errors from a specific node
dora logs my-dataflow sensor --level error --tail 50

# Search logs from last 5 minutes
dora logs my-dataflow --all-nodes --since 5m --grep "timeout"

# Read local files (no coordinator needed)
dora logs --local --all-nodes --tail 100

# Post-mortem analysis: errors in time window
dora logs --local sensor --since 1h --until 30m --level error

Duration formats: 30 (seconds), 30s, 5m, 1h, 2d

dora inspect top (alias: dora top)

Real-time TUI monitor for node resource usage (like top).

dora inspect top [OPTIONS]
dora top [OPTIONS]
FlagDefaultDescription
--refresh-interval <SECONDS>2Update interval (min: 1)
--oncefalsePrint a single JSON snapshot and exit (for scripting/CI)
--coordinator-addr <IP>127.0.0.1Coordinator address
--coordinator-port <PORT>6013Coordinator port

Requires an interactive terminal (unless --once is used).

KeyAction
q / EscQuit
Up / kSelect previous node
Down / jSelect next node
nSort by node name
cSort by CPU
mSort by memory
rForce refresh

Columns: NODE, STATUS, DATAFLOW, PID, CPU%, MEMORY (MB), RESTARTS, QUEUE, NET TX, NET RX, I/O READ (MB/s), I/O WRITE (MB/s)

  • STATUS: Running, Restarting, Degraded (broken inputs), or Failed
  • RESTARTS: Current restart count per node
  • QUEUE: Pending messages in the node’s input queue
  • NET TX/RX: Cumulative cross-daemon network bytes sent/received via Zenoh

CPU values are per-core (can exceed 100% with multiple cores). Metrics come from daemons, so this works for distributed deployments.

Scripting example:

# JSON snapshot for CI/monitoring pipelines
dora top --once | jq '.[].cpu_usage'

dora topic list

List all topics (outputs) in a running dataflow.

dora topic list [OPTIONS]
FlagDefaultDescription
-d <DATAFLOW>, --dataflowinteractiveDataflow UUID or name
--format <FMT>tableOutput format: table|json

dora topic echo

Subscribe to topics and display messages in real-time.

dora topic echo [OPTIONS] [DATA...]
FlagDefaultDescription
-d <DATAFLOW>, --dataflowrequiredDataflow UUID or name
[DATA...]all outputsTopics to echo (e.g., node1/output)
--format <FMT>tableOutput format: table|json

Requires _unstable_debug.publish_all_messages_to_zenoh: true in the descriptor.

dora topic hz

Measure topic publish frequency with a TUI dashboard.

dora topic hz [OPTIONS] [DATA...]
FlagDefaultDescription
-d <DATAFLOW>, --dataflowrequiredDataflow UUID or name
[DATA...]all outputsTopics to measure
--window <SECONDS>10Sliding window (min: 1)

Requires an interactive terminal. Displays: Avg (ms), Avg (Hz), Min (ms), Max (ms), Std (ms), plus a rate sparkline and histogram for the selected topic.

dora topic info

Show detailed metadata of a single topic.

dora topic info [OPTIONS] DATA
FlagDefaultDescription
-d <DATAFLOW>, --dataflowrequiredDataflow UUID or name
DATArequiredSingle topic (e.g., camera/image)
--duration <SECONDS>5Collection duration (min: 1)

Subscribes to the topic for the specified duration and reports: type (Arrow schema), publisher, subscribers, message count, bandwidth.

dora node

Manage and inspect dataflow nodes.

dora node list
dora node list [OPTIONS]

Lists nodes in a running dataflow with their status, CPU, memory, and restart count.

Columns: NODE, STATUS, PID, CPU%, MEMORY (MB), RESTARTS, DATAFLOW

dora node info

Show detailed information about a specific node including status, inputs, outputs, and metrics.

dora node info <NODE> [OPTIONS]
FlagDefaultDescription
<NODE>requiredNode ID to inspect
-d <DATAFLOW>, --dataflowinteractiveDataflow UUID or name
-f <FORMAT>, --formattableOutput format: table|json
dora node restart

Restart a single node within a running dataflow. The daemon stops the node process and respawns it.

dora node restart <NODE> [OPTIONS]
FlagDefaultDescription
<NODE>requiredNode ID to restart
-d <DATAFLOW>, --dataflowinteractiveDataflow UUID or name
--grace <DURATION>Grace period before force-killing the node
dora node stop

Stop a single node within a running dataflow without stopping the entire dataflow.

dora node stop <NODE> [OPTIONS]
FlagDefaultDescription
<NODE>requiredNode ID to stop
-d <DATAFLOW>, --dataflowinteractiveDataflow UUID or name
--grace <DURATION>Grace period before force-killing the node

dora topic pub

Publish JSON data to a topic in a running dataflow. Requires publish_all_messages_to_zenoh: true.

dora topic pub <TOPIC> [DATA] [OPTIONS]
FlagDefaultDescription
<TOPIC>requiredTopic to publish to (format: node_id/output_id)
[DATA]JSON data to publish (required unless --file)
--file <PATH>Read data from a JSON file instead of command line
--count <N>1Number of messages to publish
-d <DATAFLOW>, --dataflowrequiredDataflow UUID or name

Examples:

# Publish a single value
dora topic pub -d my-app sensor/threshold '[42]'

# Publish from file, 10 times
dora topic pub -d my-app sensor/config --file config.json --count 10

dora param

Manage runtime parameters for nodes. Parameters are persisted in the coordinator store and optionally forwarded to running nodes.

dora param list

List all runtime parameters for a node.

dora param list <NODE> [OPTIONS]
FlagDefaultDescription
<NODE>requiredNode ID
-d <DATAFLOW>, --dataflowinteractiveDataflow UUID or name
--format <FMT>tableOutput format: table|json
dora param get

Get a single runtime parameter value.

dora param get <NODE> <KEY> [OPTIONS]
FlagDefaultDescription
<NODE>requiredNode ID
<KEY>requiredParameter key
-d <DATAFLOW>, --dataflowinteractiveDataflow UUID or name
dora param set

Set a runtime parameter. The value is JSON. The parameter is stored in the coordinator and forwarded to the node if it is running.

dora param set <NODE> <KEY> <VALUE> [OPTIONS]
FlagDefaultDescription
<NODE>requiredNode ID
<KEY>requiredParameter key (max 256 bytes)
<VALUE>requiredParameter value as JSON (max 64KB serialized)
-d <DATAFLOW>, --dataflowinteractiveDataflow UUID or name

Examples:

# Set a numeric parameter
dora param set -d my-app sensor threshold 42

# Set a string parameter
dora param set -d my-app camera resolution '"1080p"'

# Set a complex parameter
dora param set -d my-app detector config '{"confidence": 0.8, "nms": 0.5}'
dora param delete

Delete a runtime parameter.

dora param delete <NODE> <KEY> [OPTIONS]
FlagDefaultDescription
<NODE>requiredNode ID
<KEY>requiredParameter key
-d <DATAFLOW>, --dataflowinteractiveDataflow UUID or name

dora doctor

Diagnose environment, coordinator/daemon connectivity, and optionally validate a dataflow YAML.

dora doctor [OPTIONS]
FlagDefaultDescription
--dataflow <PATH>Path to a dataflow YAML to validate

Checks performed:

  1. Coordinator reachability
  2. Daemon connectivity
  3. Active dataflow status
  4. Dataflow YAML validation (if --dataflow provided)

Examples:

# Basic health check
dora doctor

# Check environment + validate a dataflow
dora doctor --dataflow dataflow.yml

dora trace list

List recent traces captured by the coordinator. The coordinator captures spans from dora_coordinator and dora_core crates in-memory (up to 4096 spans). No external tracing infrastructure required.

dora trace list [OPTIONS]
FlagDefaultDescription
--coordinator-addr <IP>127.0.0.1Coordinator address
--coordinator-port <PORT>6013Coordinator port

Output columns: TRACE ID (first 12 chars), ROOT SPAN, SPANS, STARTED, DURATION

Example:

dora trace list
TRACE ID      ROOT SPAN          SPANS  STARTED              DURATION
a1b2c3d4e5f6  spawn_dataflow     12     2026-03-01 10:30:05  1.234s
f8e7d6c5b4a3  build_dataflow     5      2026-03-01 10:29:58  0.500s

dora trace view

View spans for a specific trace as an indented tree. Supports prefix matching on trace IDs.

dora trace view <TRACE_ID> [OPTIONS]
Argument/FlagDefaultDescription
<TRACE_ID>requiredFull trace ID or unique prefix
--coordinator-addr <IP>127.0.0.1Coordinator address
--coordinator-port <PORT>6013Coordinator port

Example:

dora trace view a1b2c3d4
spawn_dataflow [INFO 1.234s] {build_id="abc", session_id="def"}
  build_dataflow [INFO 0.500s]
    download_node [DEBUG 0.200s] {url="..."}
  start_inner [INFO 0.734s]
    spawn_node [INFO 0.100s] {node_id="camera"}
    spawn_node [INFO 0.080s] {node_id="detector"}

Trace IDs are prefix-matched: if the prefix uniquely identifies a trace, it resolves automatically. If ambiguous, you’ll be prompted to use a longer prefix.


Setup Commands

dora status (alias: dora check)

Check system health and connectivity.

dora status [OPTIONS]

Reports coordinator connectivity, daemon status, and active dataflow count.

dora new

Generate a new project or node from templates.

dora new <NAME> [OPTIONS]
FlagDefaultDescription
<NAME>requiredProject or node name
--kind <KIND>dataflowdataflow|node
--lang <LANG>rustrust|python|c|cxx

dora expand

Expand module references in a dataflow and print the resulting flat YAML. Useful for debugging module composition.

dora expand <PATH> [OPTIONS]
FlagDefaultDescription
<PATH>requiredDataflow descriptor (or module file with --module)
--modulefalseValidate a standalone module file instead of a full dataflow

Examples:

# Expand a dataflow with modules
dora expand dataflow.yml

# Validate a module file
dora expand --module modules/navigation.module.yml

See the Modules Guide for full documentation on module composition.

dora graph

Visualize a dataflow as a graph.

dora graph <PATH> [OPTIONS]
FlagDefaultDescription
<PATH>requiredDataflow descriptor path
--mermaidfalseOutput Mermaid diagram text
--openfalseOpen HTML in browser

Without --mermaid, generates an interactive HTML file using mermaid.js. When outputs have type annotations, edge labels include the type name (e.g. image [Image]).

# Generate HTML
dora graph dataflow.yml --open

# Generate Mermaid for GitHub markdown
dora graph dataflow.yml --mermaid

dora validate

Validate a dataflow YAML file and check type annotations.

dora validate <PATH> [OPTIONS]
FlagDefaultDescription
<PATH>requiredDataflow descriptor path
--strict-typesfalseTreat warnings as errors (non-zero exit code for CI)

Checks:

  1. Key existence: output_types/input_types keys exist in the corresponding outputs/inputs lists
  2. URN resolution: All type URNs resolve in the standard or user-defined type library
  3. Edge compatibility: Connected edges have compatible types (exact match, widening, or user-defined rules)
  4. Parameterized types: Parameter mismatches (e.g. AudioFrame[sample_type=f32] vs AudioFrame[sample_type=i16])
  5. Timer auto-typing: Timer inputs are automatically typed as std/core/v1/UInt64
  6. Type inference: When only upstream annotates a type, it is inferred on the downstream input
  7. Metadata patterns: output_metadata keys and pattern shorthands are validated
  8. Schema compatibility: Struct types are checked at the field level (missing/wrong fields)

User-defined types in a types/ directory next to the dataflow are loaded automatically.

# Validate with warnings
dora validate dataflow.yml

# Strict mode for CI (exit 1 on warnings)
dora validate --strict-types dataflow.yml

See the Type Annotations Guide for the full type library and usage details.


Utility Commands

dora completion

Generate shell completion scripts.

dora completion [SHELL]

Shell is auto-detected if omitted. Supported: bash, zsh, fish, elvish, powershell.

# Bash
eval "$(dora completion bash)"
echo 'eval "$(dora completion bash)"' >> ~/.bashrc

# Zsh
eval "$(dora completion zsh)"
echo 'eval "$(dora completion zsh)"' >> ~/.zshrc

# Fish
dora completion fish > ~/.config/fish/completions/dora.fish

dora system

System management commands.

dora system status [OPTIONS]

Currently provides status as a subcommand (equivalent to dora status).


Self-Management Commands

dora self update

Check for and install CLI updates.

dora self update [--check-only]

Downloads from GitHub releases (dora-rs/dora).

dora self uninstall

Remove the CLI from the system.

dora self uninstall [--force]

Without --force, prompts for confirmation (requires a TTY). Tries uv pip uninstall first, then pip uninstall, then binary self-delete.


Environment Variables

All environment variables serve as fallbacks. CLI flags always take precedence.

VariableDefaultCommandsDescription
DORA_COORDINATOR_ADDR127.0.0.1All coordinator commandsCoordinator IP address
DORA_COORDINATOR_PORT6013All coordinator commandsCoordinator WebSocket port
DORA_LOG_LEVELstdoutrun, logsDefault minimum log level
DORA_LOG_FORMATprettyrun, logsDefault output format
DORA_LOG_FILTERrun, logsDefault per-node level overrides
DORA_ALLOW_SHELL_NODESrunEnable shell node execution
DORA_RUNTIME_TYPE_CHECKrun, startRuntime type checking: warn (log mismatches) or error (fail on mismatch). See Type Annotations
# Set defaults for a development session
export DORA_COORDINATOR_ADDR=192.168.1.10
export DORA_LOG_LEVEL=info
export DORA_LOG_FORMAT=compact

Architecture Guide

This section is for developers who want to understand the framework internals, extend it, or debug issues.

Communication Stack

                    ┌─────────────────────────────────────┐
                    │           CLI (dora)                │
                    │   WebSocket (JSON request/reply)     │
                    └─────────────┬───────────────────────┘
                                  │
                    ┌─────────────▼───────────────────────┐
                    │        Coordinator                   │
                    │   WebSocket control + daemon mgmt    │
                    │   State: InMemoryStore | RedbStore   │
                    └──┬──────────────────────────────┬───┘
                       │                              │
          ┌────────────▼──────────┐     ┌─────────────▼──────────┐
          │     Daemon A          │     │     Daemon B           │
          │  (machine: robot)     │     │  (machine: gpu-server) │
          │                       │     │                        │
          │  ┌─────┐  ┌─────┐    │     │  ┌──────┐  ┌───────┐  │
          │  │Node1│  │Node2│    │     │  │Node3 │  │Node4  │  │
          │  └──┬──┘  └──┬──┘    │     │  └──┬───┘  └───┬───┘  │
          │     │shmem    │shmem  │     │     │shmem      │shmem │
          │     └────┬────┘       │     │     └─────┬─────┘      │
          └──────────┼────────────┘     └───────────┼────────────┘
                     │                              │
                     └──────── Zenoh pub/sub ────────┘
                              (cross-machine)

Protocol Layers

LayerTransportFormatUse
CLI <-> CoordinatorWebSocketJSON (ControlRequest/Reply)Commands, log streaming
Coordinator <-> DaemonWebSocketJSON (DaemonCoordinatorEvent)Node lifecycle, metrics
Daemon <-> Node (small)TCP / Unix socketCustom binaryControl messages, small data
Daemon <-> Node (large)Shared memoryZero-copy ArrowData messages > 4KB
Daemon <-> DaemonZenoh pub/subArrow + metadataCross-machine data routing

Coordinator Internals

The coordinator is an event-driven async server:

Event Sources:
  - CLI WebSocket connections (ControlRequest)
  - Daemon WebSocket connections (DaemonEvent)
  - Heartbeat timer (3s interval)
  - External events (for embedding)

Event Loop:
  merge_all(cli_events, daemon_events, heartbeat, external)
    -> handle_event()
    -> update state
    -> persist to store (if redb)
    -> send replies

Key types:

#![allow(unused)]
fn main() {
// State
RunningDataflow { uuid, name, descriptor, daemons, node_metrics, ... }
RunningBuild    { build_id, errors, log_subscribers, pending_results, ... }
DaemonConnection { sender, pending_replies, last_heartbeat }

// Store trait
trait CoordinatorStore: Send + Sync {
    fn put_dataflow(&self, record: &DataflowRecord) -> Result<()>;
    fn get_dataflow(&self, uuid: &Uuid) -> Result<Option<DataflowRecord>>;
    fn list_dataflows(&self) -> Result<Vec<DataflowRecord>>;
    // ... daemon and build methods
}
}

Store backends:

  • memory (default): In-memory, lost on restart.
  • redb: Persistent to disk (~/.dora/coordinator.redb). Survives crashes. Requires redb-backend feature.
dora coordinator --store redb
dora coordinator --store redb:/custom/path.redb

Daemon Internals

The daemon manages node processes on a single machine:

Per Node:
  1. Build (if build command specified)
  2. Spawn process with DORA_NODE_CONFIG env var
  3. Node registers via TCP/shmem handshake
  4. Route inputs/outputs between nodes
  5. Collect metrics (CPU, memory, I/O)
  6. Handle restart policy on exit
  7. Forward logs to coordinator

Communication:
  - Shared memory for messages > 4KB (zero-copy)
  - TCP for control messages and small data
  - flume channels for internal event routing

Metrics collection:

#![allow(unused)]
fn main() {
struct NodeMetrics {
    pid: u32,
    cpu_usage: f32,      // per-core percentage
    memory_mb: f64,
    disk_read_mb_s: Option<f64>,
    disk_write_mb_s: Option<f64>,
    status: NodeStatus,  // Running | Restarting | Degraded | Failed
    restart_count: u32,
    pending_messages: u64,
}
}

Message Types

All inter-component messages are defined in libraries/message/:

#![allow(unused)]
fn main() {
// Node identification
struct NodeId(String);      // [a-zA-Z0-9_.-]
struct DataId(String);      // same validation
type DataflowId = uuid::Uuid;

// Data metadata
struct Metadata {
    timestamp: uhlc::Timestamp,    // hybrid logical clock
    type_info: ArrowTypeInfo,      // Arrow schema
    parameters: MetadataParameters, // custom key-value pairs
}

// Node events (daemon -> node)
enum NodeEvent {
    Stop,
    Reload { operator_id },
    Input { id, metadata, data },
    InputClosed { id },
    InputRecovered { id },
    NodeRestarted { id },
    AllInputsClosed,
}
}

Timestamping

Dora uses a Unified Hybrid Logical Clock (UHLC) for distributed causality. Every message carries a uhlc::Timestamp that preserves causal ordering across machines without synchronized clocks.

Zero-Copy Shared Memory

For large messages (> 4KB), the daemon uses shared memory regions:

  1. Sender node requests a shared memory slot from daemon
  2. Daemon allocates a region and returns the ID
  3. Sender writes Arrow data directly into shared memory
  4. Daemon notifies receiver node of the region ID
  5. Receiver reads directly from shared memory (zero-copy)
  6. Receiver sends a drop token when done

This achieves 10-17x lower latency than ROS2 for large payloads.


Writing Nodes

Rust Node

use dora_node_api::{DoraNode, Event, IntoArrow};
use dora_core::config::DataId;

fn main() -> eyre::Result<()> {
    let (mut node, mut events) = DoraNode::init_from_env()?;

    let output = DataId::from("result".to_owned());

    while let Some(event) = events.recv() {
        match event {
            Event::Input { id, metadata, data } => {
                // Process input data (Arrow array)
                let result: u64 = 42;
                node.send_output(
                    output.clone(),
                    metadata.parameters,
                    result.into_arrow(),
                )?;
            }
            Event::Stop(_) => break,
            Event::InputClosed { id } => {
                eprintln!("input {id} closed");
            }
            Event::InputRecovered { id } => {
                eprintln!("input {id} recovered");
            }
            _ => {}
        }
    }
    Ok(())
}

Cargo.toml:

[dependencies]
dora-node-api = { workspace = true }
eyre = "0.6"

Python Node

import pyarrow as pa
from dora import Node

node = Node()

for event in node:
    if event["type"] == "INPUT":
        # event["value"] is a PyArrow array
        values = event["value"].to_pylist()
        result = pa.array([sum(values)])
        node.send_output("result", result)
    elif event["type"] == "STOP":
        break

C Node

#include "node_api.h"

int main() {
    void *ctx = init_dora_context_from_env();
    // ... event loop using dora_next_event / dora_send_output
    free_dora_context(ctx);
    return 0;
}

Node Logging

Nodes can emit structured logs:

Rust:

#![allow(unused)]
fn main() {
// Via tracing (recommended)
tracing::info!("processing frame {}", frame_id);

// Via node API
node.log_info("processing complete");
node.log_with_fields("info", "reading", None, Some(&fields));
}

Python:

import logging
logging.info("processing frame %d", frame_id)

# Or via node API
node.log("info", "processing complete")

Writing Operators

Operators run in-process inside a shared runtime, avoiding process spawn overhead.

Rust Operator

#![allow(unused)]
fn main() {
use dora_operator_api::{register_operator, DoraOperator, DoraOutputSender, DoraStatus, Event};

#[register_operator]
#[derive(Default)]
pub struct MyOperator {
    counter: u32,
}

impl DoraOperator for MyOperator {
    fn on_event(
        &mut self,
        event: &Event,
        output_sender: &mut DoraOutputSender,
    ) -> Result<DoraStatus, String> {
        match event {
            Event::Input { id, data } => {
                self.counter += 1;
                output_sender.send(
                    "count".to_string(),
                    arrow::array::UInt32Array::from(vec![self.counter]),
                )?;
                Ok(DoraStatus::Continue)
            }
            Event::Stop => Ok(DoraStatus::Stop),
            _ => Ok(DoraStatus::Continue),
        }
    }
}
}

Cargo.toml:

[lib]
crate-type = ["cdylib"]

[dependencies]
dora-operator-api = { workspace = true }
arrow = "53"

Python Operator

nodes:
  - id: my-node
    operator:
      python: my_operator.py
      inputs:
        data: source/output
      outputs:
        - result
# my_operator.py
class Operator:
    def __init__(self):
        self.counter = 0

    def on_event(self, event, send_output):
        if event["type"] == "INPUT":
            self.counter += 1
            send_output("result", pa.array([self.counter]))

Distributed Deployments

Setup

# Machine A (coordinator + daemon)
dora up

# Machine B (daemon only, pointing to coordinator on Machine A)
dora daemon --interface 0.0.0.0 --coordinator-addr 192.168.1.10 --machine-id B

# Machine C (same)
dora daemon --interface 0.0.0.0 --coordinator-addr 192.168.1.10 --machine-id C

Dataflow with Machine Assignment

nodes:
  - id: camera
    _unstable_deploy:
      machine: robot
    path: ./camera-driver
    outputs:
      - frames

  - id: inference
    _unstable_deploy:
      machine: gpu-server
    path: ./ml-model
    inputs:
      frames: camera/frames
    outputs:
      - predictions

  - id: actuator
    _unstable_deploy:
      machine: robot
    path: ./actuator-driver
    inputs:
      commands: inference/predictions

Build and Start

# From any machine with coordinator access
dora build dataflow.yml       # distributed build on target machines
dora start dataflow.yml --name my-robot --attach

Monitor

# Resource usage across all machines
dora top

# Logs from any node regardless of machine
dora logs my-robot inference --follow

# List all dataflows
dora list

Coordinator Persistence

For production, use the redb store backend so the coordinator survives restarts:

dora coordinator --store redb

State is persisted to ~/.dora/coordinator.redb. On restart, stale dataflows are marked as failed and the coordinator resumes normal operation.

For managed cluster deployments (cluster.yml, SSH-based lifecycle, label scheduling, systemd services, rolling upgrades), see the Distributed Deployment Guide.


Troubleshooting

For a comprehensive debugging guide covering record/replay workflows, topic inspection, resource monitoring, and end-to-end debugging scenarios, see Debugging and Observability Guide.

Common Issues

“Could not connect to dora-coordinator”

  • Run dora up first, or check DORA_COORDINATOR_ADDR/DORA_COORDINATOR_PORT
  • Verify with dora status

“publish_all_messages_to_zenoh not enabled”

  • Use --debug flag: dora start dataflow.yml --debug or dora run dataflow.yml --debug
  • Or add to your dataflow YAML:
    _unstable_debug:
      publish_all_messages_to_zenoh: true
    
  • Required for topic echo, topic hz, topic info

dora top requires an interactive terminal”

  • These TUI commands need a real terminal (not piped output)
  • Same applies to topic hz

Node not receiving inputs

  • Check that output names match: source_node/output_id
  • Verify the source node lists the output in its outputs: array
  • Check dora topic list for available topics

Logs not appearing

  • Check --log-level setting (default stdout shows everything)
  • Check min_log_level in YAML (filters at source)
  • For distributed: verify coordinator/daemon connectivity

Build fails with git source

  • Verify git: URL is accessible
  • Check that branch, tag, or rev exists
  • Build command runs from the git repo root, not the dataflow directory

Debug Workflow

# 1. Full environment diagnosis
dora doctor --dataflow dataflow.yml

# 2. Start with verbose logging and debug topics
dora run dataflow.yml --log-level trace --debug

# 3. Inspect a specific node
dora node info -d my-dataflow problem-node

# 4. Monitor specific node logs
dora logs my-dataflow problem-node --follow --level debug

# 5. Check resource usage
dora top

# 6. Inspect topic data
dora topic echo -d my-dataflow problem-node/output

# 7. Publish test data to a topic
dora topic pub -d my-dataflow problem-node/input '[1, 2, 3]'

# 8. Measure frequencies
dora topic hz -d my-dataflow --window 5

# 9. View/modify runtime parameters
dora param list -d my-dataflow problem-node
dora param set -d my-dataflow problem-node threshold 42

# 10. Restart a misbehaving node without stopping the dataflow
dora node restart -d my-dataflow problem-node

# 11. View coordinator traces (no external infra needed)
dora trace list
dora trace view <trace-id-prefix>

# 12. Visualize dataflow graph
dora graph dataflow.yml --open

Log File Locations

out/
  <dataflow-uuid>/
    log_<node-id>.jsonl          # current log
    log_<node-id>.1.jsonl        # rotated (previous)
    log_<node-id>.2.jsonl        # rotated (older)

Read directly with:

dora logs --local --all-nodes
dora logs --local <node-name> --tail 50