dora dora dora

Comparison

What's new in dora 1.0

dora 1.0 is a major release that adds production-grade features on top of dora's zero-copy core: services, actions, streaming, fault tolerance, cluster management, record/replay, dynamic topology, and comprehensive observability. Here's everything that's new.

Architecture

TCP to WebSocket, single to distributed

dora 0.x used TCP for coordinator-daemon communication and supported single-machine deployment. dora 1.0 replaces TCP with WebSocket, adds a multi-machine Zenoh data plane, and introduces persistent coordinator state for high availability.

Communication

From 1 pattern to 4

dora 0.x supported only topic-based pub/sub. dora 1.0 adds Service, Action, and Streaming patterns via well-known metadata keys — no protocol or daemon changes needed.

Topic (pub/sub)

dora 1.0 dora 0.x ROS2
nodes:
  - id: sensor
    outputs: [data]
  - id: processor
    inputs:
      data: sensor/data

Service (request/reply)

dora 1.0 dora 0.x ROS2
# dora 1.0 — metadata-based correlation
node.send_service_request("request", params, data)
# Server passes through request_id
node.send_service_response("response", metadata.parameters, result)

Action (goal/feedback/result)

dora 1.0 dora 0.x ROS2
# Goal with periodic feedback + final result
# Supports cancellation via cancel output
# Metadata keys: goal_id, goal_status

Streaming (session/segment/chunk)

dora 1.0 dora 0.x ROS2
# Real-time pipelines with interruption
# flush: true clears downstream queues
# Enables voice/video barge-in

Features

Complete feature matrix

Dimension dora 1.0 dora 0.x ROS2
Coordinator protocol WebSocket (port 6013) TCP DDS
Data plane Zenoh SHM + shared memory Shared memory only DDS serialization
Communication patterns 4 (Topic, Service, Action, Streaming) 1 (Topic only) 4 (Topic, Service, Action, Timer)
Latency (SHM path) ~500us p50 ~500us p50 1-10ms
Copy overhead Zero (Apache Arrow) Zero (Apache Arrow) CDR serialization
Fault tolerance Built-in (restart policies, health checks, circuit breakers) None Basic lifecycle
Record / Replay Built-in (.dora files) None rosbag2
Dynamic topology Yes (add/remove/connect/disconnect at runtime) No Partial (lifecycle nodes)
Cluster management Built-in SSH clusters with label scheduling None Manual or external tools
Observability OpenTelemetry (logs, metrics, traces) Basic logging ROS2 logging + third-party
Real-time support SCHED_FIFO + mlockall + CPU affinity None rclcpp Executor (partial)
Type annotations Static validation via dora validate None IDL / .msg files
Reusable modules YAML module composition with typed ports None ROS2 packages
Log management Structured + rotation + routing + aggregation Basic stdout rosout
Resource monitoring dora top TUI (CPU, mem, queues, network) None External tools
Topic inspection topic echo / hz / info None ros2 topic echo / hz / info
Schema validation dora validate with wiring checks None colcon build (compile-time)
WebSocket control Full control + topic data channels None rosbridge (third-party)
Python CLI API PyO3 native bindings None rclpy
Coordinator HA Persistent redb state + daemon reconnect None N/A (no coordinator)
Source files 772 488 ~50,000+
Examples 37 20 Hundreds (community)
CI templates Reusable downstream CI workflows None N/A

CLI

CLI commands side by side

dora 1.0 extends the dora CLI with record/replay, topic inspection, resource monitoring, dynamic topology, cluster management, parameter management, and more.

Operation dora 1.0 dora 0.x
Start dataflow dora start dataflow.yml dora start dataflow.yml
Stop dataflow dora stop dora destroy
View logs dora logs --node webcam (stdout only)
Record messages dora record (not available)
Replay recording dora replay recording.dora (not available)
Live topic data dora topic echo webcam/image (not available)
Frequency monitor dora topic hz webcam/image (not available)
Resource monitor dora top (not available)
Add node at runtime dora node add --path ./detector (not available)
Cluster deploy dora cluster up (not available)
Validate dataflow dora validate dataflow.yml (not available)
View traces dora trace list (not available)
Parameter management dora param get/set/delete (not available)
Dataflow visualization dora graph dataflow.yml dora graph dataflow.yml

Production

Production readiness

Fault Tolerance

Per-node restart policies (never/on-failure/always), exponential backoff, health monitoring, circuit breakers with configurable input timeouts.

Coordinator HA

Persistent redb-backed state store. Daemons auto-reconnect with exponential backoff. Dataflow state reconstructed on coordinator restart.

Observability

OpenTelemetry integration for structured logging, metrics, and distributed tracing. Zero-setup trace viewing via CLI.

Record & Replay

Capture all messages to .dora files. Replay at any speed with node substitution for regression testing.

Cluster Management

SSH-based deployment with label scheduling, rolling upgrades, auto-recovery, and centralized management from the CLI.

Real-Time Support

Optional --rt flag for mlockall + SCHED_FIFO. Per-node CPU affinity pinning. Comprehensive tuning guide.

Upgrade

Upgrading from dora 0.x

Backward compatible for pub/sub

Existing topic-based dataflows work with minimal changes. The core pub/sub model is unchanged, and YAML dataflow definitions follow the same structure — your existing dora CLI commands keep working.

Incremental adoption

New features are additive. Add service patterns to specific nodes, enable fault tolerance per-node with restart_policy, start recording with dora record. No need to rewrite your entire pipeline.

Same crates, new capabilities

Your existing dora-* crate dependencies keep working — dora-node-api, dora-operator-api, and the rest are unchanged. dora 1.0 adds new APIs for services, actions, and streaming without breaking existing ones.