Skip to content

feat(dkg): Port node signature exchange from Charon#321

Open
mskrzypkows wants to merge 16 commits intomainfrom
dkg_nodesigs
Open

feat(dkg): Port node signature exchange from Charon#321
mskrzypkows wants to merge 16 commits intomainfrom
dkg_nodesigs

Conversation

@mskrzypkows
Copy link
Copy Markdown
Contributor

  • Port dkg/nodesigs.go from Charon — implements K1 signature exchange over the lock hash using the bcast protocol
  • NodeSigBcast registers a typed broadcast handler for MsgNodeSig, then exchange() signs the lock hash, broadcasts it to all peers, and polls until every peer's verified signature is collected

@iamquang95
Copy link
Copy Markdown
Collaborator

@claude

@claude
Copy link
Copy Markdown

claude bot commented Apr 13, 2026

Claude finished @iamquang95's task in 4m 5s —— View job


Code Review

  • Gather context and read changed files
  • Read Go source for functional equivalence check
  • Review nodesigs.rs implementation
  • Review supporting changes (errors, lib.rs, Cargo.toml)
  • Post findings

Summary

The port is solid and largely functionally equivalent to charon/dkg/nodesigs.go. The core exchange logic, peer validation, and signature verification steps all match the Go implementation. The Rust version also makes some sensible improvements (typed Option vs nil slices, proper u32 range check, stronger test assertions). Two issues are worth discussing before merge.


Findings

  • [Medium] Lock-hash delivery: wait vs. immediate error

    Go's broadcastCallback blocks waiting for the lock hash to arrive via a goroutine-written channel, and only unblocks on context cancellation:

    // nodesigs.go:77-83
    select {
    case <-ctx.Done():
        return nil, ctx.Err()
    case n.lockHashData = <-n.lockHashCh:
        return n.lockHashData, nil
    }

    Rust's receive instead reads from a watch::Receiver and returns an error immediately if the value is None:

    // nodesigs.rs:159-162
    let lock_hash_guard = lock_hash_rx.borrow();
    let lock_hash = lock_hash_guard
        .as_deref()
        .ok_or(bcast::Error::MissingField("lock_hash"))?;

    In the normal path this is fine — exchange() always calls lock_hash_tx.send(Some(...)) before bcast.broadcast(). But if a remote peer starts and broadcasts before the local node has called exchange() at all (plausible timing race in a real cluster), Go will wait gracefully while Rust will drop the message with an error and the signature will never be stored. This could cause the local node to stall in the polling loop indefinitely.

    Recommendation: Either document this as an accepted deviation (with the argument that the bcast protocol's signature-collection phase should protect against early delivery), or mirror Go more closely by spawning a task that waits on the watch channel and retries the receive callback. Fix this →

  • [Low] lock_hash_guard (watch Ref) held while acquiring sigs mutex

    receive borrows the watch receiver, then locks sigs while the Ref guard is still live:

    // nodesigs.rs:159-169
    let lock_hash_guard = lock_hash_rx.borrow(); // holds watch read-lock
    let lock_hash = lock_hash_guard.as_deref()...?;
    // ... verify_65 ...
    sigs.lock()...[peer_idx] = Some(msg.signature.to_vec()); // acquires Mutex while Ref alive

    lock_hash_tx.send() must take the watch's write-lock, so holding a Ref here blocks any concurrent sender. There is only one send() call (in exchange(), before broadcast), so deadlock is not possible in practice. Still, holding the guard longer than necessary is fragile. Cloning the hash before the mutex acquisition would be cleaner:

    let lock_hash: Vec<u8> = {
        let guard = lock_hash_rx.borrow();
        guard.as_deref().ok_or(bcast::Error::MissingField("lock_hash"))?.to_vec()
    }; // guard dropped here
    pluto_k1util::verify_65(&pubkey, &lock_hash, msg.signature.as_ref())?;
    sigs.lock()...[peer_idx] = Some(msg.signature.to_vec());

    Fix this →

  • [Low] NODE_SIG_MSG_ID is pub where Go has it unexported

    Go declares const nodeSigMsgID = "..." (package-private). Rust exposes it as pub const NODE_SIG_MSG_ID. If nothing outside crates/dkg needs this, pub(crate) would be more conservative and match Go's visibility.

    Evidence: crates/dkg/src/nodesigs.rs:21
    Go reference: charon/dkg/nodesigs.go:25


Parity Matrix

Component Go Rust Match Notes
Message ID constant nodeSigMsgID (unexported) NODE_SIG_MSG_ID (pub) partial Rust is more visible
Lock hash delivery goroutine + blocking channel watch::channel, non-blocking borrow no See Medium finding
Peer index validation == nodeIdx || < 0 || >= len == node_idx || >= peers.len() yes < 0 impossible from u32; Rust is equivalent
Local sig storage setSig(localSig, nodeIdx.PeerIdx) after broadcast sigs[self.node_idx] = Some(local_sig) after broadcast yes
Signature verification k1util.Verify65 pluto_k1util::verify_65 yes
Polling interval 100ms ticker 100ms interval yes
Missing lock hash test N/A (blocks instead) test_case(1, None, ..., "missing protobuf field") N/A Rust adds explicit test for new behaviour
Cross-node sig consistency not checked in Go test assert_eq!(result, first, ...) improvement Rust test is stricter
"Invalid message type" test present (Go) absent (Rust) acceptable Rust's type system makes it impossible

Tests

Tests were not executed in this review (no runner available). The unit tests cover all error paths from TestSigsCallbacks and the integration test ports TestSigsExchange with an equivalent timeout (45 s). The Rust test adds a cross-node signature consistency assertion not present in Go.


Open Questions

  1. Is the non-blocking lock-hash read in receive an accepted deviation, or should messages arriving before exchange() is called be handled? A comment in the code would clarify intent.
  2. Should NODE_SIG_MSG_ID be pub(crate) to match Go's unexported visibility?

@mskrzypkows
Copy link
Copy Markdown
Contributor Author

Fixing claude findings, both seems relevant.

@mskrzypkows
Copy link
Copy Markdown
Contributor Author

@claude

@claude
Copy link
Copy Markdown

claude bot commented Apr 13, 2026

Claude finished @mskrzypkows's task in 2m 17s —— View job


Re-review after fixes

  • Gather context
  • Read updated nodesigs.rs
  • Check fix for Medium: lock-hash delivery (async wait vs. immediate error)
  • Check fix for Low: lock_hash_guard held while sigs mutex acquired
  • Check fix for Low: NODE_SIG_MSG_ID visibility
  • Post findings

Results

[Medium] Lock-hash delivery — Fixed ✅

receive() now uses wait_for(|v| v.is_some()) (nodesigs.rs:168-177), which parks the future until the lock hash becomes Some — exactly matching Go's goroutine + blocking channel select. The map_err path (sender dropped while still None) also cleanly covers the missing_lock_hash test case, which relies on the immediate drop of the _ sender.

let lock_hash = {
    let mut rx = lock_hash_rx.clone();
    let guard = rx
        .wait_for(|v| v.is_some())
        .await
        .map_err(|_| bcast::Error::MissingField("lock_hash"))?;
    guard
        .clone()
        .ok_or(bcast::Error::MissingField("lock_hash"))?
};

[Low] Watch Ref held while sigs mutex acquired — Fixed ✅

The guard is now scoped inside its own block and dropped before verify_65 and sigs.lock() are reached. The owned Vec<u8> clone escapes cleanly.

[Low] NODE_SIG_MSG_ID visibility — Not addressed

nodesigs.rs:21 still has pub const NODE_SIG_MSG_ID. A grep across the workspace confirms nothing outside crates/dkg references it, so pub(crate) would match Go's unexported nodeSigMsgID. Minor point — either is acceptable, but worth a conscious decision.


Overall the two substantive fixes are correct and clean. The PR is in good shape.

@mskrzypkows
Copy link
Copy Markdown
Contributor Author

Hey @iamquang95 I have modified the CallbackFn interface. Could you review it?

Comment thread crates/dkg/src/nodesigs.rs
Comment thread crates/dkg/src/nodesigs.rs Outdated
Copy link
Copy Markdown
Collaborator

@varex83agent varex83agent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR ports dkg/nodesigs.go to Rust with solid functional parity. The core K1 signature-exchange logic is correctly translated: watch::channel replacing Go's unbuffered channel, CancellationToken replacing ctx.Done(), and the CallbackFn upgrade to async with proper 'static ownership semantics. No std::sync::Mutex is held across .await points; the struct is intentionally single-use via exchange(self, …). No bugs or major findings were identified.

Minor findings (non-blocking but worth addressing):

  1. lock_hash_rx parameter in receive() is borrowed then immediately cloned — should be passed by value.
  2. local_sig.to_vec() is called twice, creating an unnecessary extra clone.
  3. usize::try_from(u32) in receive() is infallible on all supported platforms, making the error path dead code.
  4. receive() awaits wait_for with no timeout, holding the bcast substream open if a message arrives before exchange() sets the lock hash.

Nits: Tokio interval fires immediately at t=0 (unlike Go's NewTicker), sigs_cb naming, all_sigs double-pass iteration.

Comment thread crates/dkg/src/nodesigs.rs Outdated
Comment thread crates/dkg/src/nodesigs.rs Outdated
Comment thread crates/dkg/src/nodesigs.rs Outdated
Comment thread crates/dkg/src/nodesigs.rs Outdated
Comment thread crates/dkg/src/nodesigs.rs
Comment thread crates/dkg/src/nodesigs.rs
Comment thread crates/dkg/src/nodesigs.rs
Comment thread crates/dkg/src/nodesigs.rs
@mskrzypkows
Copy link
Copy Markdown
Contributor Author

@varex83 @varex83agent fixed all comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants