Dora 测试指南

本指南介绍如何在 Dora 工作空间中运行、编写和排查测试。

快速开始（5 分钟验证）

运行这三个命令来验证工作空间是否健康：

# 1. Format check (~5s)
cargo fmt --all -- --check

# 2. Lint (~60s first run, cached after)
cargo clippy --all \
  --exclude dora-node-api-python \
  --exclude dora-operator-api-python \
  --exclude dora-ros2-bridge-python \
  -- -D warnings

# 3. Unit + integration tests (~90s first run)
cargo test --all \
  --exclude dora-node-api-python \
  --exclude dora-operator-api-python \
  --exclude dora-ros2-bridge-python

All three must pass before opening a PR. Python packages are excluded because they require maturin.

Test Tiers

Tier	What it covers	命令	Speed
Format	Code style	`cargo fmt --all -- --check`	~5s
Lint	Warnings, correctness	`cargo clippy --all ...`	~60s
Unit	Individual functions	`cargo test --all ...`	~90s
CLI	Command parsing, validation	`cargo test -p dora-cli`	~5s
Integration	Node I/O via env vars	`cargo test --test example-tests`	~30s
Smoke	Full CLI lifecycle	`cargo test --test example-smoke -- --test-threads=1`	~3min
E2E	Multi-dataflow scenarios	`cargo test --test ws-cli-e2e -- --ignored --test-threads=1`	~2min
Fault tolerance	Restart policies, timeouts	`cargo test --test fault-tolerance-e2e`	~45s
Typos	Spelling	Install typos-cli, then `typos`	~2s

Crate	Test count	What’s tested
dora-arrow-convert	~26	Round-trip Arrow type conversions
dora-cli	~96	Command parsing, value parsers, log grep/filtering, JSON parsing, WebSocket client, cluster config
dora-coordinator	~24	WS control/daemon plane, health check, concurrent requests, artifact store, rate limiter, error sanitization
dora-coordinator-store	~10	In-memory and redb CRUD, schema versioning, persistence
dora-core	~8	Dataflow descriptor validation
dora-daemon	~2	Shlex argument parsing
dora-node-api	~10	Input tracking, service/action helpers (ID generation, send_service_request/response)
dora-log-utils	~11	Log parsing utilities
dora-message	~36	Common types, WS protocol, node/data IDs, metadata, auth tokens
ros2-bridge	~30	ROS2 message/service/action parsing

When adding a new CLI subcommand or value parser, add a corresponding test in the #[cfg(test)] module of the same file. For subcommand parsing, add a parse_ok call in binaries/cli/src/command/mod.rs. For value parsers, add tests in the file that defines the parser function.

Integration Tests (Node I/O)

File: tests/example-tests.rs

These tests run compiled node executables with pre-recorded inputs and compare outputs against expected baselines. No coordinator or daemon is needed.

cargo test --test example-tests

How it works:

Builds and runs a node crate (e.g., rust-dataflow-example-node)
Sets DORA_TEST_WITH_INPUTS to a JSON file with timed events
Sets DORA_TEST_NO_OUTPUT_TIME_OFFSET=1 for deterministic output
Compares JSONL output against tests/sample-inputs/expected-outputs-*.jsonl

Sample input/output files live in tests/sample-inputs/.

冒烟测试

File: tests/example-smoke.rs

Two execution modes are tested for each applicable example:

Networked (dora up + dora start --detach + poll + dora stop + dora down): exercises the full coordinator/daemon WS control plane.
Local (dora run --stop-after): runs everything in-process, testing the single-process dataflow path.

# Must run single-threaded (shared coordinator port)
cargo test --test example-smoke -- --test-threads=1

# Run only networked or local tests
cargo test --test example-smoke smoke_rust -- --test-threads=1
cargo test --test example-smoke smoke_local -- --test-threads=1

A bash script is also available for quick local validation:

./scripts/smoke-all.sh              # all examples
./scripts/smoke-all.sh --rust-only  # Rust examples only
./scripts/smoke-all.sh --python-only # Python examples only

Networked tests (17):

Test	示例	Timeout
`smoke_rust_dataflow`	rust-dataflow/dataflow.yml	30s
`smoke_rust_dataflow_dynamic`	rust-dataflow/dataflow_dynamic.yml	30s
`smoke_rust_dataflow_url`	rust-dataflow-url/dataflow.yml	30s
`smoke_benchmark`	benchmark/dataflow.yml	30s
`smoke_log_sink_file`	log-sink-file/dataflow.yml	30s
`smoke_log_sink_alert`	log-sink-alert/dataflow.yml	30s
`smoke_log_sink_tcp`	log-sink-tcp/dataflow.yml	30s
`smoke_python_dataflow`	python-dataflow/dataflow.yml	30s
`smoke_python_async`	python-async/dataflow.yaml	15s
`smoke_python_drain`	python-drain/dataflow.yaml	15s
`smoke_python_log`	python-log/dataflow.yaml	15s
`smoke_python_logging`	python-logging/dataflow.yml	15s
`smoke_python_multiple_arrays`	python-multiple-arrays/dataflow.yml	15s
`smoke_python_concurrent_rw`	python-concurrent-rw/dataflow.yml	15s
`smoke_service_example`	service-example/dataflow.yml	30s
`smoke_action_example`	action-example/dataflow.yml	30s

Local tests (9):

Test	示例	stop-after
`smoke_local_python_dataflow`	python-dataflow/dataflow.yml	30s
`smoke_local_python_async`	python-async/dataflow.yaml	10s
`smoke_local_python_drain`	python-drain/dataflow.yaml	10s
`smoke_local_python_log`	python-log/dataflow.yaml	10s
`smoke_local_python_logging`	python-logging/dataflow.yml	10s
`smoke_local_python_multiple_arrays`	python-multiple-arrays/dataflow.yml	10s
`smoke_local_python_concurrent_rw`	python-concurrent-rw/dataflow.yml	10s
`smoke_local_service_example`	service-example/dataflow.yml	10s
`smoke_local_action_example`	action-example/dataflow.yml	10s

Examples requiring special dependencies (webcam, CUDA, ROS2, C/C++ toolchain, multi-machine deploy) are not included in smoke tests.

E2E Tests (WebSocket CLI)

File: tests/ws-cli-e2e.rs

Two groups:

Non-ignored (fast): Start an in-process coordinator and test WsSession directly:

cargo test --test ws-cli-e2e

cli_list_empty – empty dataflow listing
cli_status_no_daemon – daemon connectivity check
cli_stop_nonexistent – error for missing dataflows
cli_multiple_requests_same_session – session reuse

Ignored (full stack): Use dora up with real nodes:

cargo test --test ws-cli-e2e -- --ignored --test-threads=1

e2e_start_list_stop – start, list, stop lifecycle
e2e_sequential_dataflows – two dataflows in sequence

Fault Tolerance Tests

File: tests/fault-tolerance-e2e.rs

These test restart policies and input timeouts using Daemon::run_dataflow directly (no CLI needed).

cargo test --test fault-tolerance-e2e

Tests:

restart_recovers_from_failure – node with restart_policy: on-failure survives panics (15s)
max_restarts_limit_reached – node exhausts max_restarts: 2 budget (15s)
input_timeout_closes_stale_input – input_timeout: 2.0s fires when upstream stops (10s)

Dataflow YAMLs for these tests live in tests/dataflows/.

Coordinator Integration Tests

Files: binaries/coordinator/tests/ws_control_tests.rs, binaries/coordinator/tests/ws_daemon_tests.rs

These start an in-process coordinator and test the WebSocket control/daemon planes.

cargo test -p dora-coordinator

Topics covered: health check, list/stop/destroy requests, invalid JSON/params, concurrent requests, ping/pong, daemon registration, disconnect cleanup, error sanitization (no internal chain leaks), artifact store cleanup on drop.

CI Pipeline

CI runs on push/PR to main. See .github/workflows/ci.yml.

fmt  ──────────────┐
clippy ────────────┤ (all run in parallel)
test ──────────────┤
typos ─────────────┘
                   │
              e2e (depends on test)

Job	Runner	What runs
fmt	ubuntu-latest	`cargo fmt --all -- --check`
clippy	ubuntu-latest	`cargo clippy --all ... -- -D warnings`
test	ubuntu-latest	`cargo test --all ...` (excl. Python + dora-examples)
e2e	ubuntu-latest	example-tests, fault-tolerance, smoke tests, WS E2E
typos	ubuntu-latest	`crate-ci/typos@master`

The e2e job only runs after test passes. All other jobs run in parallel.

Writing New Tests

Unit tests

Add a #[cfg(test)] module in the same file as the code under test:

#![allow(unused)]
fn main() {
#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn parses_valid_input() {
        let result = parse("valid");
        assert_eq!(result, expected);
    }
}
}

Integration tests for nodes

Use the integration testing framework in dora-node-api. Three approaches:

1. setup_integration_testing (recommended)

Call before the node’s main function to inject inputs and capture outputs:

#![allow(unused)]
fn main() {
#[test]
fn test_main_function() -> eyre::Result<()> {
    let events = vec![
        TimedIncomingEvent {
            time_offset_secs: 0.01,
            event: IncomingEvent::Input {
                id: "tick".into(),
                metadata: None,
                data: None,
            },
        },
        TimedIncomingEvent {
            time_offset_secs: 0.055,
            event: IncomingEvent::Stop,
        },
    ];
    let inputs = TestingInput::Input(
        IntegrationTestInput::new("node_id".parse().unwrap(), events),
    );
    let (tx, rx) = flume::unbounded();
    let outputs = TestingOutput::ToChannel(tx);
    let options = TestingOptions { skip_output_time_offsets: true };

    integration_testing::setup_integration_testing(inputs, outputs, options);
    crate::main()?;

    let outputs = rx.try_iter().collect::<Vec<_>>();
    assert_eq!(outputs, expected_outputs);
    Ok(())
}
}

2. Environment variable mode

Test the compiled executable directly, closest to production behavior:

DORA_TEST_WITH_INPUTS=path/to/inputs.json \
DORA_TEST_NO_OUTPUT_TIME_OFFSET=1 \
DORA_TEST_WRITE_OUTPUTS_TO=/tmp/out.jsonl \
cargo run -p my-node

3. DoraNode::init_testing

For testing node logic without going through main:

#![allow(unused)]
fn main() {
let (node, events) = DoraNode::init_testing(inputs, outputs, Default::default())?;
}

Generating test input files

Record real dataflow events by setting DORA_WRITE_EVENTS_TO:

DORA_WRITE_EVENTS_TO=/tmp/recorded-events dora run examples/rust-dataflow/dataflow.yml

This writes inputs-{node_id}.json files that can be used directly with DORA_TEST_WITH_INPUTS.

Workspace-level integration tests

Add new test files in the tests/ directory. For tests that need the full CLI stack, follow the patterns in tests/example-smoke.rs:

Networked pattern (exercises coordinator + daemon):

Build nodes with Once guards (avoid rebuilding per test)
Clean up stale processes with dora down
Start cluster with dora up
Run dataflow with dora start --detach
Poll dora list --json for completion
Clean up with dora stop --all and dora down

Local pattern (single-process, in-process coordinator):

Build CLI with Once guard
Run dora run <yaml> --stop-after <duration>
Assert exit code is success

Conventions

Use assert2::assert! for better error messages (available as dev-dependency)
Use tempfile::NamedTempFile for temporary output files
E2E tests that need exclusive port access should be #[ignore] and run with --test-threads=1
Async tests use #[tokio::test(flavor = "multi_thread")]
Fault tolerance test dataflows go in tests/dataflows/
Sample input/output baselines go in tests/sample-inputs/

故障排除

`cargo test` fails to compile Python packages

Always exclude Python packages:

cargo test --all \
  --exclude dora-node-api-python \
  --exclude dora-operator-api-python \
  --exclude dora-ros2-bridge-python

Smoke/E2E tests fail with “address already in use”

A stale coordinator or daemon is still running. Clean up:

dora down
# or kill processes manually:
pkill -f dora-coordinator
pkill -f dora-daemon

Smoke tests hang or timeout

Increase the timeout in the test if your machine is slow (look for Duration::from_secs(...))

Check that example nodes build successfully:

cargo build -p rust-dataflow-example-node -p rust-dataflow-example-status-node \
  -p rust-dataflow-example-sink -p rust-dataflow-example-sink-dynamic
cargo build -p log-sink-file -p log-sink-alert -p log-sink-tcp
cargo build --release -p benchmark-example-node -p benchmark-example-sink

For Python smoke tests, ensure pyarrow and numpy are installed

E2E tests fail when run in parallel

Smoke and ignored E2E tests must run single-threaded:

cargo test --test example-smoke -- --test-threads=1
cargo test --test ws-cli-e2e -- --ignored --test-threads=1

Integration test output doesn’t match expected

Check that DORA_TEST_NO_OUTPUT_TIME_OFFSET=1 is set (time offsets vary per machine)

Regenerate baselines if the node’s behavior intentionally changed:

DORA_TEST_WITH_INPUTS=tests/sample-inputs/inputs-rust-node.json \
DORA_TEST_NO_OUTPUT_TIME_OFFSET=1 \
DORA_TEST_WRITE_OUTPUTS_TO=tests/sample-inputs/expected-outputs-rust-node.jsonl \
cargo run -p rust-dataflow-example-node

Typos check fails

The typos config is in _typos.toml. To add a false-positive exclusion:

[default.extend-identifiers]
MyCustomIdent = "MyCustomIdent"

Tests pass locally but fail in CI

CI runs on Ubuntu; check for platform-specific assumptions (paths, process signals)
CI uses rust-cache so dependency versions may differ from your local lockfile
Ensure cargo fmt --all -- --check passes (CI enforces this)

Dora User Guide