Skip to content

fix(cluster): sign DAG actions on cluster nodes + deterministic genesis (#99)#100

Merged
ApiliumDevTeam merged 1 commit into
mainfrom
fix/cluster-dag-signing
Jun 22, 2026
Merged

fix(cluster): sign DAG actions on cluster nodes + deterministic genesis (#99)#100
ApiliumDevTeam merged 1 commit into
mainfrom
fix/cluster-dag-signing

Conversation

@ApiliumDevTeam

Copy link
Copy Markdown
Contributor

Summary

Fixes #99 — cluster DAG writes were rejected with 500 — "DagAction rejected: missing Ed25519 signature", failing the cluster integration tests.

Two root causes, both fixed:

  1. Signing setup was binary-only. The DAG enable + author + Ed25519 signing-key initialization lived only in main(). Library/test callers that go through cluster_init::init_cluster (bypassing main) got an un-enabled, unsigned DAG, so cluster writes were emitted unsigned and rejected by the Raft state machine. → Extracted into cluster_init::ensure_dag_ready, called from init_cluster and from main (standalone mode only, to avoid double init).

  2. Genesis hash diverged across nodes (exposed once deps(rust): bump tokio-tungstenite from 0.20.1 to 0.28.0 #1 was fixed). Each node built its DAG genesis with a wall-clock Utc::now() timestamp, so genesis hashes differed per node; a follower then failed to validate a replicated action's parent (the leader's genesis), since DagStore::put requires every parent to exist locally — so the triple never materialized on followers. → Made the genesis deterministic (fixed epoch timestamp) so every fresh node computes the same genesis hash and cross-node replication validates. Existing persistent DAGs are unaffected (init_or_migrate is a no-op once actions exist).

Also hardened test_three_node_cluster_replication to poll for replication (up to 15s) instead of assuming a fixed 2s delay.

Test plan

  • cargo test -p aingle_cortex --features dag --test cluster_integration_test3/3 pass (single_node_bootstrap, three_node_replication, wal_stats); previously 1/3.
  • cargo test -p aingle_graph --features dag dag76 pass (deterministic genesis breaks no existing DAG test).
  • cargo build -p aingle_cortex (default) and --features dag — compile cleanly.

Notes

Fixes #99.

Cluster nodes rejected DAG writes with "missing Ed25519 signature": the
DAG enable + author + signing-key setup lived only in `main()`, so library
and integration-test callers that go through `init_cluster` got an
un-enabled, unsigned DAG. Extract that setup into
`cluster_init::ensure_dag_ready` and call it from `init_cluster` (and from
`main` only in standalone mode, to avoid double initialization).

That exposed a second, deeper bug: each node created its DAG genesis with a
wall-clock timestamp, so genesis hashes diverged across nodes and
replicated actions failed parent validation on followers (`DagStore::put`
requires every parent to exist locally). Make the genesis deterministic
(fixed epoch timestamp) so every fresh node computes the same genesis hash
and cross-node replication validates. Existing persistent DAGs are
unaffected (`init_or_migrate` is a no-op once actions exist).

Also harden the 3-node integration test to poll for replication instead of
assuming a fixed 2s delay.

Verified: all 3 cluster integration tests pass; aingle_graph DAG tests (76)
pass; default and `dag` builds compile cleanly.
@ApiliumDevTeam ApiliumDevTeam merged commit 3bcb1c8 into main Jun 22, 2026
12 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cluster DAG writes rejected: 'missing Ed25519 signature' (cluster integration tests fail)

1 participant