feat(dkg): Port node signature exchange from Charon by mskrzypkows · Pull Request #321 · NethermindEth/pluto

mskrzypkows · 2026-04-13T11:11:13Z

Port dkg/nodesigs.go from Charon — implements K1 signature exchange over the lock hash using the bcast protocol
NodeSigBcast registers a typed broadcast handler for MsgNodeSig, then exchange() signs the lock hash, broadcasts it to all peers, and polls until every peer's verified signature is collected

iamquang95 · 2026-04-13T11:12:46Z

@claude

claude · 2026-04-13T11:13:00Z

Claude finished @iamquang95's task in 4m 5s —— View job

Code Review

Gather context and read changed files
Read Go source for functional equivalence check
Review nodesigs.rs implementation
Review supporting changes (errors, lib.rs, Cargo.toml)
Post findings

Summary

The port is solid and largely functionally equivalent to charon/dkg/nodesigs.go. The core exchange logic, peer validation, and signature verification steps all match the Go implementation. The Rust version also makes some sensible improvements (typed Option vs nil slices, proper u32 range check, stronger test assertions). Two issues are worth discussing before merge.

Findings

[Medium] Lock-hash delivery: wait vs. immediate error

Go's broadcastCallback blocks waiting for the lock hash to arrive via a goroutine-written channel, and only unblocks on context cancellation:
```
// nodesigs.go:77-83
select {
case <-ctx.Done():
    return nil, ctx.Err()
case n.lockHashData = <-n.lockHashCh:
    return n.lockHashData, nil
}
```
Rust's receive instead reads from a watch::Receiver and returns an error immediately if the value is None:
```
// nodesigs.rs:159-162
let lock_hash_guard = lock_hash_rx.borrow();
let lock_hash = lock_hash_guard
    .as_deref()
    .ok_or(bcast::Error::MissingField("lock_hash"))?;
```
In the normal path this is fine — exchange() always calls lock_hash_tx.send(Some(...)) before bcast.broadcast(). But if a remote peer starts and broadcasts before the local node has called exchange() at all (plausible timing race in a real cluster), Go will wait gracefully while Rust will drop the message with an error and the signature will never be stored. This could cause the local node to stall in the polling loop indefinitely.

Recommendation: Either document this as an accepted deviation (with the argument that the bcast protocol's signature-collection phase should protect against early delivery), or mirror Go more closely by spawning a task that waits on the watch channel and retries the receive callback. Fix this →

[Low] lock_hash_guard (watch Ref) held while acquiring sigs mutex

receive borrows the watch receiver, then locks sigs while the Ref guard is still live:

// nodesigs.rs:159-169
let lock_hash_guard = lock_hash_rx.borrow(); // holds watch read-lock
let lock_hash = lock_hash_guard.as_deref()...?;
// ... verify_65 ...
sigs.lock()...[peer_idx] = Some(msg.signature.to_vec()); // acquires Mutex while Ref alive

lock_hash_tx.send() must take the watch's write-lock, so holding a Ref here blocks any concurrent sender. There is only one send() call (in exchange(), before broadcast), so deadlock is not possible in practice. Still, holding the guard longer than necessary is fragile. Cloning the hash before the mutex acquisition would be cleaner:

let lock_hash: Vec<u8> = {
    let guard = lock_hash_rx.borrow();
    guard.as_deref().ok_or(bcast::Error::MissingField("lock_hash"))?.to_vec()
}; // guard dropped here
pluto_k1util::verify_65(&pubkey, &lock_hash, msg.signature.as_ref())?;
sigs.lock()...[peer_idx] = Some(msg.signature.to_vec());

Fix this →

[Low] NODE_SIG_MSG_ID is pub where Go has it unexported

Go declares const nodeSigMsgID = "..." (package-private). Rust exposes it as pub const NODE_SIG_MSG_ID. If nothing outside crates/dkg needs this, pub(crate) would be more conservative and match Go's visibility.

Evidence: crates/dkg/src/nodesigs.rs:21
Go reference: charon/dkg/nodesigs.go:25

Parity Matrix

Component	Go	Rust	Match	Notes
Message ID constant	`nodeSigMsgID` (unexported)	`NODE_SIG_MSG_ID` (pub)	partial	Rust is more visible
Lock hash delivery	goroutine + blocking channel	`watch::channel`, non-blocking borrow	no	See Medium finding
Peer index validation	`== nodeIdx \|\| < 0 \|\| >= len`	`== node_idx \|\| >= peers.len()`	yes	`< 0` impossible from u32; Rust is equivalent
Local sig storage	`setSig(localSig, nodeIdx.PeerIdx)` after broadcast	`sigs[self.node_idx] = Some(local_sig)` after broadcast	yes
Signature verification	`k1util.Verify65`	`pluto_k1util::verify_65`	yes
Polling interval	`100ms` ticker	`100ms` interval	yes
Missing lock hash test	N/A (blocks instead)	`test_case(1, None, ..., "missing protobuf field")`	N/A	Rust adds explicit test for new behaviour
Cross-node sig consistency	not checked in Go test	`assert_eq!(result, first, ...)`	improvement	Rust test is stricter
"Invalid message type" test	present (Go)	absent (Rust)	acceptable	Rust's type system makes it impossible

Tests

Tests were not executed in this review (no runner available). The unit tests cover all error paths from TestSigsCallbacks and the integration test ports TestSigsExchange with an equivalent timeout (45 s). The Rust test adds a cross-node signature consistency assertion not present in Go.

Open Questions

Is the non-blocking lock-hash read in receive an accepted deviation, or should messages arriving before exchange() is called be handled? A comment in the code would clarify intent.
Should NODE_SIG_MSG_ID be pub(crate) to match Go's unexported visibility?

mskrzypkows · 2026-04-13T13:15:57Z

Fixing claude findings, both seems relevant.

channel.

mskrzypkows · 2026-04-13T15:00:01Z

@claude

claude · 2026-04-13T15:00:22Z

Claude finished @mskrzypkows's task in 2m 17s —— View job

Re-review after fixes

Gather context
Read updated nodesigs.rs
Check fix for Medium: lock-hash delivery (async wait vs. immediate error)
Check fix for Low: lock_hash_guard held while sigs mutex acquired
Check fix for Low: NODE_SIG_MSG_ID visibility
Post findings

Results

[Medium] Lock-hash delivery — Fixed ✅

receive() now uses wait_for(|v| v.is_some()) (nodesigs.rs:168-177), which parks the future until the lock hash becomes Some — exactly matching Go's goroutine + blocking channel select. The map_err path (sender dropped while still None) also cleanly covers the missing_lock_hash test case, which relies on the immediate drop of the _ sender.

let lock_hash = {
    let mut rx = lock_hash_rx.clone();
    let guard = rx
        .wait_for(|v| v.is_some())
        .await
        .map_err(|_| bcast::Error::MissingField("lock_hash"))?;
    guard
        .clone()
        .ok_or(bcast::Error::MissingField("lock_hash"))?
};

[Low] Watch Ref held while sigs mutex acquired — Fixed ✅

The guard is now scoped inside its own block and dropped before verify_65 and sigs.lock() are reached. The owned Vec<u8> clone escapes cleanly.

[Low] NODE_SIG_MSG_ID visibility — Not addressed

nodesigs.rs:21 still has pub const NODE_SIG_MSG_ID. A grep across the workspace confirms nothing outside crates/dkg references it, so pub(crate) would match Go's unexported nodeSigMsgID. Minor point — either is acceptable, but worth a conscious decision.

Overall the two substantive fixes are correct and clean. The PR is in good shape.

mskrzypkows · 2026-04-14T07:21:36Z

Hey @iamquang95 I have modified the CallbackFn interface. Could you review it?

varex83agent

This PR ports dkg/nodesigs.go to Rust with solid functional parity. The core K1 signature-exchange logic is correctly translated: watch::channel replacing Go's unbuffered channel, CancellationToken replacing ctx.Done(), and the CallbackFn upgrade to async with proper 'static ownership semantics. No std::sync::Mutex is held across .await points; the struct is intentionally single-use via exchange(self, …). No bugs or major findings were identified.

Minor findings (non-blocking but worth addressing):

lock_hash_rx parameter in receive() is borrowed then immediately cloned — should be passed by value.
local_sig.to_vec() is called twice, creating an unnecessary extra clone.
usize::try_from(u32) in receive() is infallible on all supported platforms, making the error path dead code.
receive() awaits wait_for with no timeout, holding the bcast substream open if a message arrives before exchange() sets the lock hash.

Nits: Tokio interval fires immediately at t=0 (unlike Go's NewTicker), sigs_cb naming, all_sigs double-pass iteration.

…g_nodesigs

mskrzypkows · 2026-04-15T13:41:26Z

@varex83 @varex83agent fixed all comments

Node signature module

2f8546c

mskrzypkows requested review from emlautarom1 and iamquang95 April 13, 2026 11:11

Merge branch 'main' of github.com:NethermindEth/pluto into dkg_nodesigs

fafb04a

mskrzypkows added 2 commits April 13, 2026 13:22

clippy

2e54694

fixed too long locking the channel

128fd2b

Changed CallbackFn to be async, so I can wait for the watched lock hash

dab0eb3

channel.

mskrzypkows added 2 commits April 14, 2026 09:05

unneeded pub mod

6cbabbd

Merge branch 'main' of github.com:NethermindEth/pluto into dkg_nodesigs

ff87e98

iamquang95 reviewed Apr 14, 2026

View reviewed changes

Comment thread crates/dkg/src/nodesigs.rs

Comment thread crates/dkg/src/nodesigs.rs Outdated

mskrzypkows added 3 commits April 14, 2026 15:43

review corrections

074d370

clippy

154798c

Merge branch 'main' into dkg_nodesigs

6afcb49

varex83agent reviewed Apr 15, 2026

View reviewed changes

mskrzypkows added 4 commits April 15, 2026 15:19

review corrections

3d87424

cancelation token forwarded to the receive function

ec5a684

Merge branch 'dkg_nodesigs' of github.com:NethermindEth/pluto into dk…

55c3193

…g_nodesigs

Merge branch 'main' of github.com:NethermindEth/pluto into dkg_nodesigs

ab22f96

mskrzypkows added 2 commits April 16, 2026 09:15

Merge branch 'main' into dkg_nodesigs

1f24e96

Merge branch 'main' into dkg_nodesigs

152f981

Conversation

mskrzypkows commented Apr 13, 2026

Uh oh!

iamquang95 commented Apr 13, 2026

Uh oh!

claude bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

Summary

Findings

Parity Matrix

Tests

Open Questions

Uh oh!

mskrzypkows commented Apr 13, 2026

Uh oh!

mskrzypkows commented Apr 13, 2026

Uh oh!

claude bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Re-review after fixes

Results

Uh oh!

mskrzypkows commented Apr 14, 2026

Uh oh!

Uh oh!

Uh oh!

varex83agent left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mskrzypkows commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

claude bot commented Apr 13, 2026 •

edited

Loading

claude bot commented Apr 13, 2026 •

edited

Loading