Skip to content

feat(genesis): canonicalize hash so per-node connection_url doesn't diverge#855

Merged
tcsenpai merged 1 commit into
stabilisationfrom
feat/genesis-hash-canonicalize
May 22, 2026
Merged

feat(genesis): canonicalize hash so per-node connection_url doesn't diverge#855
tcsenpai merged 1 commit into
stabilisationfrom
feat/genesis-hash-canonicalize

Conversation

@tcsenpai
Copy link
Copy Markdown
Contributor

@tcsenpai tcsenpai commented May 22, 2026

Problem

Two peers running the same consensus configuration but with their own identity listed as localhost in data/genesis.json validators[] produced different genesisDataHash values. peerBootstrap rejected the pairing as a hash mismatch even though no consensus-significant field differed.

Empirically reproduced on devnet between node2 and node3:

  • node2's genesis lists its own entry as http://localhost:53552, node3's URL as http://dev.node3.demos.sh:53550
  • node3's genesis lists its own entry as http://localhost:53552, node2's URL as http://dev.node2.demos.sh:53552
  • Same set of validators, same stakes, same forks — only the per-node connection_url differs.
  • Hashes diverged, peering refused.

connection_url is network topology metadata, not consensus state. The active validator set, stakes, statuses, and activation windows ARE consensus-significant; how a peer is reached on the wire is not.

Fix

New module src/libs/blockchain/genesis/normalizeGenesisForHash.ts exposing hashGenesisData(...). It:

  1. Deep-clones the input (no mutation of caller's object).
  2. Strips connection_url from every validators[] entry.
  3. Sorts validators by address (insertion-order independent).
  4. Stably stringifies with lex-sorted keys at every depth.
  5. SHA-256s the result.

Two genesis files that differ ONLY in validators[*].connection_url, validator array order, or object-key authoring order now produce the same hash.

Call sites converted

  • src/libs/peer/routines/peerBootstrap.ts — baseline ourGenesisDataHash + post-fetch re-hash after auto-heal.
  • src/libs/network/handlers/blockHandlers.ts:getGenesisDataHash — RPC response to peers.

What is NOT changed

  • connection_url is still kept in the JSON file as an authoring/seeding aid; operators may still pre-populate per-node URLs.
  • seedValidators continues to write connection_url into the DB verbatim.
  • The canonical form is ONLY used for cross-node hashing.

Tests

testing/genesis/normalizeGenesisForHash.test.ts — 16 cases covering: stripping, sorting, deep-clone safety, missing/non-array/null/primitive tolerance, top-level preservation, lex-sort at every depth, hash equivalence across divergent URLs, consensus-significant changes still detected, determinism, output shape.

Full suite: 120 pass / 9 skip / 0 fail (was 104, +16 new).

Why this approach over removing connection_url from data/genesis.json entirely

Alternative considered: set all connection_url to null in every node's genesis file. Equally fixes the hash divergence with zero code. Drawback: operators lose the convenient genesis-driven seeding of validators.connection_url in the DB (currently the only DB-write path for that field at boot before any stake tx lands). Pushes URL maintenance entirely into demos_peerlist.json, which is a separate file with separate sync ergonomics.

The normalizer approach keeps the convenience for operators (URLs in genesis are still useful for validators table seeding + later RPC introspection via /peerlist) while making consensus hashing robust to the per-node-localhost reality.

Test plan

  • bun test testing/genesis/ reports 16 pass.
  • Two nodes with validators[*].connection_url differing only in localhost-vs-public-host produce identical hashes from hashGenesisData.
  • A change to validators[*].staked_amount, address, or status DOES change the hash.
  • bun test testing/state-snapshot/ testing/forks/restore/ testing/consensus/ testing/genesis/ = 120 pass / 9 skip / 0 fail.
  • Multi-node manual smoke (node2 + node3 on devnet): both boot with their own localhost URL in data/genesis.json validators[], peerBootstrap pairs cleanly, no Genesis data hash does not match ERROR in logs.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added deterministic genesis data hashing utility to ensure consistent cross-node consensus validation and network agreement.
  • Improvements

    • Updated genesis hash computation in block handlers and peer bootstrap to use canonical hashing instead of basic serialization.
  • Tests

    • Added comprehensive test suite covering genesis canonicalization, deterministic serialization, and hash consistency validation.

Review Change Stack

…iverge

Problem
=======
Two peers running the same consensus configuration but with their own
identity listed as `localhost` in `data/genesis.json validators[]`
produced different `genesisDataHash` values. `peerBootstrap` rejected
the pairing as a hash mismatch even though no consensus-significant
field differed.

`connection_url` is network topology, not consensus state. The active
validator set, stakes, statuses, and activation windows are consensus-
significant; how a peer is reached on the wire is not.

Fix
===
New module `src/libs/blockchain/genesis/normalizeGenesisForHash.ts`
exposing `hashGenesisData(...)`. It:

1. Deep-clones the input (no mutation of caller's object).
2. Strips `connection_url` from every `validators[]` entry.
3. Sorts validators by `address` (insertion-order independent).
4. Stably stringifies with lex-sorted keys at every depth.
5. SHA-256s the result.

Two genesis files that differ ONLY in `validators[*].connection_url`
(or in the array order, or in object-key authoring order) now produce
the same hash.

Call sites converted:
- `src/libs/peer/routines/peerBootstrap.ts` (baseline + post-fetch
  re-hash)
- `src/libs/network/handlers/blockHandlers.ts:getGenesisDataHash` (RPC
  response to peers)

Note: `connection_url` is still kept as an authoring/seeding aid
(operators may pre-populate per-node URLs) and is still readable from
the DB — `seedValidators` writes whatever the file says. The
canonical form is only used for cross-node hashing.

Tests
=====
`testing/genesis/normalizeGenesisForHash.test.ts` — 16 cases:
- connection_url stripped
- validators sorted by address regardless of input order
- input object never mutated
- missing/non-array/null/primitive input tolerated
- non-validators top-level fields preserved verbatim
- stableStringify lex-sort at every depth + array element-wise
- two divergent connection_url sets produce same hash
- consensus-significant changes (stake, currency) DO change hash
- determinism across repeated calls
- output is 64 lowercase hex chars

Full suite: 120 pass / 9 skip / 0 fail (was 104, +16 new tests).
@qodo-code-review
Copy link
Copy Markdown
Contributor

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 22, 2026

Caution

Review failed

Pull request was closed or merged during review

Walkthrough

This PR introduces deterministic genesis-data hashing by extracting consensus-relevant fields into a canonical form, applying stable JSON serialization with lexicographically sorted keys, then computing SHA-256 hashes. The new utilities are integrated into existing block handler and peer bootstrap code paths and are covered by comprehensive tests.

Changes

Deterministic Genesis Hashing and Integration

Layer / File(s) Summary
Canonical hashing utilities and deterministic serialization
src/libs/blockchain/genesis/normalizeGenesisForHash.ts
canonicalGenesisForHashing removes validator connection URLs and sorts validators by address; stableStringify produces deterministic JSON by sorting object keys lexicographically; hashGenesisData combines both and computes SHA-256 hash.
Block handler genesis hash computation
src/libs/network/handlers/blockHandlers.ts
getGenesisDataHash replaces direct SHA-256 of JSON-stringified input with canonical hashGenesisData call.
Peer bootstrap genesis hashing
src/libs/peer/routines/peerBootstrap.ts
Bootstrap re-hashing of downloaded genesis data and startup hash computation both switch to hashGenesisData instead of ad-hoc SHA-256 serialization.
Test suite for hashing utilities
testing/genesis/normalizeGenesisForHash.test.ts
Validates canonicalization (URL stripping, address sorting, immutability), deterministic serialization (key ordering, primitive handling, equivalence), and hash correctness (determinism, consensus-field sensitivity, hex format).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

Possibly related PRs

  • kynesyslabs/node#663: Both PRs evolve genesis-hash computation; this PR adds hashGenesisData canonicalization and integrates it into bootstrap and handlers, while PR #663 extends those code paths to sync and verify genesis hashes.

Poem

🐰 A hash that's true across the node,
Connection URLs stripped from the road,
Validators sorted, keys all aligned,
SHA-256 leaves chaos behind—
One canonical form, deterministic and kind! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: introducing canonical genesis hashing to prevent divergence caused by per-node connection_url differences.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/genesis-hash-canonicalize

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@tcsenpai tcsenpai merged commit 50ff97f into stabilisation May 22, 2026
2 of 3 checks passed
@tcsenpai tcsenpai deleted the feat/genesis-hash-canonicalize branch May 22, 2026 15:26
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 22, 2026

Greptile Summary

Introduces a canonical genesis-data normalizer (normalizeGenesisForHash.ts) that strips connection_url from each validator entry, sorts validators by address, and applies a stable lexicographic JSON serializer before SHA-256 hashing. Both peerBootstrap.ts and the getGenesisDataHash RPC handler are updated to use this normalizer, fixing the devnet peering failure caused by per-node localhost URLs producing divergent hashes.

  • New hashGenesisData helper centralizes all genesis-hash computation; it is well-documented, deeply-clones input, and is backed by 16 unit tests.
  • peerBootstrap.ts replaces two inline Hashing.sha256(JSON.stringify(...)) calls with hashGenesisData; the auto-heal path still unnecessarily re-reads the just-written genesis file from disk before hashing.
  • Hash scheme change is a hard cut-over: old nodes emit the unnormalized SHA-256 and new nodes emit the normalized one; mixed-version deployments will fail to peer, requiring all nodes to be updated simultaneously.

Confidence Score: 3/5

The genesis normalization logic is correct and well-tested, but the hash scheme change is a hard cut-over that breaks peering between nodes on different versions of this code.

The new hashGenesisData function correctly solves the localhost divergence problem and is well-tested. However, the switch from raw JSON.stringify+SHA-256 to the normalized hash is not backward-compatible: any node still running the old binary will compute a different digest and be rejected by nodes running the new code. A partial rollout or a single lagging node would reproduce the exact peering failure the PR is fixing, just for a different reason.

src/libs/peer/routines/peerBootstrap.ts — contains the deployment-incompatible hash cut-over and the unnecessary file re-read in the auto-heal path.

Important Files Changed

Filename Overview
src/libs/blockchain/genesis/normalizeGenesisForHash.ts New canonical normalizer: strips connection_url, sorts validators by address, and stably stringifies before SHA-256. Well-structured with good test coverage; minor edge case with undefined-valued keys in stableStringify.
src/libs/peer/routines/peerBootstrap.ts Switches both hash computation sites to hashGenesisData; auto-heal path retains an unnecessary write-then-reread pattern. The hash scheme change is a hard cut-over requiring all nodes to upgrade simultaneously.
src/libs/network/handlers/blockHandlers.ts Clean one-line swap of the RPC genesis-hash response to use hashGenesisData; logic unchanged otherwise.
testing/genesis/normalizeGenesisForHash.test.ts 16 well-targeted unit tests covering stripping, sorting, deep-clone safety, edge inputs, and hash equivalence; missing coverage for undefined-valued validator fields in stableStringify.

Sequence Diagram

sequenceDiagram
    participant N2 as Node2 (new code)
    participant N3 as Node3 (new code)

    Note over N2: peerBootstrap()<br/>hashGenesisData(genesisData)<br/>strips connection_url<br/>sorts validators<br/>stableStringify + SHA-256

    N2->>N3: RPC getGenesisDataHash
    N3-->>N2: hashGenesisData(genesisData) normalized

    Note over N2,N3: Hashes match even though<br/>each node has localhost in its own entry

    N2->>N3: sayHelloToPeer peers paired

    Note over N2: Auto-heal path if hashes differ<br/>downloads /genesis from N3<br/>hashGenesisData(res.data) compare
Loading

Reviews (1): Last reviewed commit: "feat(genesis): canonicalize hash so per-..." | Re-trigger Greptile

const genesisFile = "data/genesis.json"
const genesisData = JSON.parse(fs.readFileSync(genesisFile, "utf8"))
ourGenesisDataHash = Hashing.sha256(JSON.stringify(genesisData))
ourGenesisDataHash = hashGenesisData(genesisData)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Breaking change for mixed-version deployments

Nodes running the old code compute ourGenesisDataHash as Hashing.sha256(JSON.stringify(genesisData)). Nodes running this PR compute it via hashGenesisData(genesisData). The two schemes produce different hex digests for the same genesis file, so any peer running the old binary will fail the hash equality check at line 62 against any peer running the new binary. During a rolling upgrade, every node that hasn't been updated yet will be rejected with "Genesis data hash does not match" — reintroducing the exact peering failure this PR is meant to fix. All nodes in the cluster must be stopped and updated together; a partial rollout leaves the network non-functional.

Comment on lines +85 to 87
const ourNewGenesisDataHash = hashGenesisData(
JSON.parse(fs.readFileSync(genesisFile, "utf8")),
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unnecessary disk round-trip in auto-heal: the file is written on line 81 and immediately re-read on line 86 purely to compute a hash. If a write error or a concurrent file access occurs between the two syscalls, the hash will reflect a stale or corrupt file. Passing res.data (which is already the parsed object) directly to hashGenesisData removes the extra I/O and the TOCTOU window.

Suggested change
const ourNewGenesisDataHash = hashGenesisData(
JSON.parse(fs.readFileSync(genesisFile, "utf8")),
)
const ourNewGenesisDataHash = hashGenesisData(res.data)

Comment on lines +125 to +138
export function stableStringify(value: unknown): string {
if (value === null || typeof value !== "object") {
return JSON.stringify(value)
}

if (Array.isArray(value)) {
const parts = value.map(v => stableStringify(v))
return "[" + parts.join(",") + "]"
}

const obj = value as Record<string, unknown>
const keys = Object.keys(obj).sort()
const parts = keys.map(k => JSON.stringify(k) + ":" + stableStringify(obj[k]))
return "{" + parts.join(",") + "}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 stableStringify drops undefined-valued keys non-deterministically

JSON.stringify(undefined) returns the JS value undefined (not the string "undefined"). When an object key holds undefined, the parts.map line produces "\"key\":undefined" (invalid JSON). Standard JSON.stringify silently omits those keys. Genesis data from JSON.parse will never carry undefined, so this is dormant today, but any TypeScript caller passing an object with optional fields set to undefined will receive a non-parseable digest. Adding if (obj[k] === undefined) continue before pushing to parts aligns behaviour with JSON.stringify.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant