dora dora dora

Comparison

What's new in dora 1.0

dora 1.0 is a major release that adds production-grade features on top of dora's zero-copy core: services, actions, streaming, fault tolerance, cluster management, record/replay, dynamic topology, and comprehensive observability. Here's everything that's new.

Architecture

TCP to WebSocket, single to distributed

dora 0.x used TCP for coordinator-daemon communication and supported single-machine deployment. dora 1.0 replaces TCP with WebSocket, adds a multi-machine Zenoh data plane, and introduces persistent coordinator state for high availability.

Communication

From 1 pattern to 4

dora 0.x supported only topic-based pub/sub. dora 1.0 adds Service, Action, and Streaming patterns via well-known metadata keys — no protocol or daemon changes needed.

Topic (pub/sub)

dora 1.0 dora 0.x ROS2
nodes:
  - id: sensor
    outputs: [data]
  - id: processor
    inputs:
      data: sensor/data

Service (request/reply)

dora 1.0 dora 0.x ROS2
# dora 1.0 — metadata-based correlation
node.send_service_request("request", params, data)
# Server passes through request_id
node.send_service_response("response", metadata.parameters, result)

Action (goal/feedback/result)

dora 1.0 dora 0.x ROS2
# Goal with periodic feedback + final result
# Supports cancellation via cancel output
# Metadata keys: goal_id, goal_status

Streaming (session/segment/chunk)

dora 1.0 dora 0.x ROS2
# Real-time pipelines with interruption
# flush: true clears downstream queues
# Enables voice/video barge-in

Features

Complete feature matrix

Dimension dora 1.0 dora 0.x ROS2
Coordinator protocol WebSocket (port 6013) DDS
Data plane Zenoh SHM + shared memory DDS serialization
Communication patterns 4 (Topic, Service, Action, Streaming) 4 (Topic, Service, Action, Timer)
Latency (SHM path) ~500us p50 1-10ms
Copy overhead Zero (Apache Arrow) CDR serialization
Fault tolerance Built-in (restart policies, health checks, circuit breakers) Basic lifecycle
Record / Replay Built-in (.drec files) rosbag2
Dynamic topology Yes (add/remove/connect/disconnect at runtime) Partial (lifecycle nodes)
Cluster management Built-in SSH clusters with label scheduling Manual or external tools
Observability OpenTelemetry (logs, metrics, traces) ROS2 logging + third-party
Real-time support SCHED_FIFO + mlockall + CPU affinity rclcpp Executor (partial)
Type annotations Static validation via dora validate IDL / .msg files
Reusable modules YAML module composition with typed ports ROS2 packages
Log management Structured + rotation + routing + aggregation rosout
Resource monitoring dora top TUI (CPU, mem, queues, network) External tools
Topic inspection topic echo / hz / info ros2 topic echo / hz / info
Schema validation dora validate with wiring checks colcon build (compile-time)
WebSocket control Full control + topic data channels rosbridge (third-party)
Python CLI API PyO3 native bindings rclpy
Coordinator HA Persistent redb state + daemon reconnect N/A (no coordinator)
Source files 772 ~50,000+
Examples 45 Hundreds (community)
CI templates Reusable downstream CI workflows N/A

CLI

CLI commands side by side

dora 1.0 extends the dora CLI with record/replay, topic inspection, resource monitoring, dynamic topology, cluster management, parameter management, and more.

Operation dora 1.0 dora 0.x
Start dataflow dora start dataflow.yml
Stop dataflow dora stop
View logs dora logs --node webcam
Record messages dora record
Replay recording dora replay recording.drec
Live topic data dora topic echo webcam/image
Frequency monitor dora topic hz webcam/image
Resource monitor dora top
Add node at runtime dora node add --from-yaml node.yml
Cluster deploy dora cluster up
Validate dataflow dora validate dataflow.yml
View traces dora trace list
Parameter management dora param get/set/delete
Dataflow visualization dora graph dataflow.yml

Production

Production readiness

Fault Tolerance

Per-node restart policies (never/on-failure/always), exponential backoff, health monitoring, circuit breakers with configurable input timeouts.

Coordinator HA

Persistent redb-backed state store. Daemons auto-reconnect with exponential backoff. Dataflow state reconstructed on coordinator restart.

Observability

OpenTelemetry integration for structured logging, metrics, and distributed tracing. Zero-setup trace viewing via CLI.

Record & Replay

Capture all messages to .drec files. Replay at any speed with node substitution for regression testing.

Cluster Management

SSH-based deployment with label scheduling, rolling upgrades, auto-recovery, and centralized management from the CLI.

Real-Time Support

Optional --rt flag for mlockall + SCHED_FIFO. Per-node CPU affinity pinning. Comprehensive tuning guide.

Upgrade

Upgrading from dora 0.x

Backward compatible for pub/sub

Existing topic-based dataflows work with minimal changes. The core pub/sub model is unchanged, and YAML dataflow definitions follow the same structure — your existing dora CLI commands keep working.

Incremental adoption

New features are additive. Add service patterns to specific nodes, enable fault tolerance per-node with restart_policy, start recording with dora record. No need to rewrite your entire pipeline.

Same crates, new capabilities

Your existing dora-* crate dependencies keep working — dora-node-api, dora-operator-api, and the rest are unchanged. dora 1.0 adds new APIs for services, actions, and streaming without breaking existing ones.