Skip to content

fix: autobahn InitChain, GetValidators, and mempool TTL (CON-249)#3243

Merged
wen-coding merged 9 commits intomainfrom
wen/fix-autobahn-initchain-height
Apr 16, 2026
Merged

fix: autobahn InitChain, GetValidators, and mempool TTL (CON-249)#3243
wen-coding merged 9 commits intomainfrom
wen/fix-autobahn-initchain-height

Conversation

@wen-coding
Copy link
Copy Markdown
Contributor

@wen-coding wen-coding commented Apr 14, 2026

Summary

Fixes autobahn runExecute so the docker cluster (AUTOBAHN=true) starts and produces blocks, and fixes a test hang in TestGigaRouter_FinalizeBlocks.

Root cause 1 — double InitChain: The CometBFT handshaker (consensus/replay.go) always calls InitChain when appHeight==0. The autobahn runExecute was also calling InitChain — this double-init corrupted app state. Additionally, GetValidators panicked reading staking params from the empty committed store before the first Commit.

Root cause 2 — mempool TTL eviction: The mempool starts at height=-1. Txs submitted before the first block inherit height=-1. With TTLNumBlocks=10, purgeExpiredTxs instantly evicts all txs on the first Update(blockHeight=InitialHeight) because -1 < InitialHeight-10 when InitialHeight is large (random up to 100k in the test). This is a race: the producer must drain the mempool before the first block executes, otherwise txs are silently lost.

What changed

  1. giga_router.go — remove InitChain/Commit from runExecute: The CometBFT handshaker owns InitChain. runExecute now just sets next = InitialHeight when last == 0, relying on the deliverState the handshaker already set up. Includes a WARNING comment about state sync incompatibility.

  2. app/app.goGetValidators uses DeliverContext at height 0: After the handshaker's InitChain but before the first Commit, the committed store is empty (staking params not persisted). Reading from it panics in MaxValidators with "UnmarshalJSON cannot decode empty bytes". At LastBlockHeight() == 0, we read from DeliverContext instead, which has the uncommitted staking state.

  3. baseapp/test_helpers.go — add DeliverContext() accessor: Returns the current deliverState context, or nil if not set.

  4. giga_router_test.go — align testApp with real SDK:

    • Adapt Committed field: InitChain now sets Committed=true so FinalizeBlock can follow without an intermediate Commit (matching the CometBFT handshaker flow). Double-commit is still caught.
    • Info() returns height=0 before first Commit (matches real SDK).
    • TestGigaRouter_FinalizeBlocks calls InitChain before router.Run() to simulate the CometBFT handshaker.
    • Add TestInitChainCommitThenFinalize contract test for testApp.
  5. mempool/testonly.go — disable TTL in TestConfig(): Sets TTLNumBlocks=0 and TTLDuration=0 to prevent spurious tx eviction in all mempool tests.

Test plan

  • TestInitChainCommitThenFinalize passes
  • TestGigaRouter_FinalizeBlocks passes (was hanging in CI)
  • Docker AUTOBAHN=true cluster: all 4 nodes produce blocks in sync
  • Bank transfer finalizes
  • Fault tolerance: 1 node down → chain continues (3/4 quorum); 2 nodes down → chain halts

🤖 Generated with Claude Code

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 14, 2026

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedApr 16, 2026, 2:31 PM

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 14, 2026

Codecov Report

❌ Patch coverage is 22.22222% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.26%. Comparing base (c5cad2b) to head (de4b7f5).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
sei-cosmos/baseapp/test_helpers.go 0.00% 4 Missing ⚠️
app/app.go 0.00% 2 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3243      +/-   ##
==========================================
- Coverage   59.29%   59.26%   -0.03%     
==========================================
  Files        2070     2070              
  Lines      169775   169701      -74     
==========================================
- Hits       100663   100578      -85     
- Misses      60321    60329       +8     
- Partials     8791     8794       +3     
Flag Coverage Δ
sei-chain-pr 62.81% <22.22%> (?)
sei-db 70.41% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
sei-tendermint/internal/mempool/testonly.go 100.00% <100.00%> (ø)
sei-tendermint/internal/p2p/giga_router.go 68.75% <ø> (+1.67%) ⬆️
app/app.go 68.90% <0.00%> (-0.14%) ⬇️
sei-cosmos/baseapp/test_helpers.go 66.66% <0.00%> (-10.26%) ⬇️

... and 25 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@pompon0
Copy link
Copy Markdown
Contributor

pompon0 commented Apr 14, 2026

this pr seems to be a noop, no?

@wen-coding
Copy link
Copy Markdown
Contributor Author

this pr seems to be a noop, no?

Sorry, still working on it, Claude pushed before it's ready. I'll ping you when it's ready for review.

@wen-coding wen-coding force-pushed the wen/fix-autobahn-initchain-height branch 2 times, most recently from d272836 to 85e690c Compare April 14, 2026 18:14
@wen-coding wen-coding changed the title fix: add Commit after InitChain in autobahn runExecute (CON-249) fix: autobahn InitChain height mismatch and testApp height tracking (CON-249) Apr 14, 2026
Without Commit after InitChain, the staking params are not persisted
to the committed store. When executeBlock calls app.GetValidators(),
it reads from the committed store and panics with
"UnmarshalJSON cannot decode empty bytes".

Also removes the testApp Committed check in FinalizeBlock since the
extra Commit after InitChain changes the Committed state flow.

Note: TestGigaRouter_FinalizeBlocks is pre-broken on main (times out
at 120s) — this is not caused by this change.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@wen-coding wen-coding force-pushed the wen/fix-autobahn-initchain-height branch from 85e690c to 8be74ab Compare April 14, 2026 19:05
@wen-coding wen-coding changed the title fix: autobahn InitChain height mismatch and testApp height tracking (CON-249) fix: autobahn InitChain height and GetValidators panic (CON-249) Apr 14, 2026
wen-coding and others added 3 commits April 14, 2026 20:43
GetValidators now reads from the committed store first and only falls
back to DeliverContext when the committed store is empty (autobahn-only
path after InitChain before first Commit). This avoids changing the
code path for CometBFT consensus.

Remove the Committed field from testApp. The original boolean conflated
two states: "post-InitChain" and "post-Commit". It blocked autobahn's
valid flow (InitChain → FinalizeBlock without intermediate Commit)
because InitChain set Committed=true, causing the next Commit after
FinalizeBlock to fail with "double commit". The real app guards against
actual double commit structurally (deliverState becomes nil after Commit,
so a second Commit panics on nil pointer) — not via a boolean flag.
The testApp is too simple to replicate that mechanism and doesn't need to.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The CometBFT handshaker (consensus/replay.go) always calls InitChain
when appHeight==0. Having runExecute also call InitChain caused a
double-init that corrupted state. Now runExecute skips InitChain and
relies on the handshaker having already set up deliverState.

GetValidators: at height 0, read from DeliverContext first since the
committed store is empty (staking params panic on MaxValidators).
After the first Commit, the committed store has the params and the
normal path is used.

Test: simulate the handshaker by calling InitChain before router.Run().

Verified with docker AUTOBAHN=true cluster: all 4 nodes produce blocks,
bank transfer works, fault tolerance (1-down continues, 2-down halts).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Match the real SDK behavior: InitChain without Commit leaves
LastBlockHeight=0. Previously the testApp returned InitialHeight-1
after InitChain with 0 blocks, causing runExecute to enter the
restart path and call PushAppHash with a block number below
the data layer's starting point.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@wen-coding wen-coding force-pushed the wen/fix-autobahn-initchain-height branch from 5982d5f to 469a247 Compare April 15, 2026 17:20
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@wen-coding wen-coding changed the title fix: autobahn InitChain height and GetValidators panic (CON-249) fix: autobahn InitChain and GetValidators for docker cluster (CON-249) Apr 15, 2026
The mempool starts at height=-1, so txs submitted before the first
block get height=-1. With TTLNumBlocks=10, purgeExpiredTxs instantly
evicts them on the first Update(blockHeight=InitialHeight) because
-1 < InitialHeight-10 when InitialHeight is large (random up to 100k).

This is a pre-existing race from #3224: the producer must drain the
mempool before the first block is executed, otherwise txs are lost.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@wen-coding wen-coding changed the title fix: autobahn InitChain and GetValidators for docker cluster (CON-249) fix: autobahn InitChain, GetValidators, and mempool TTL (CON-249) Apr 15, 2026
Comment thread sei-tendermint/internal/p2p/giga_router_test.go Outdated
Comment thread sei-tendermint/internal/p2p/giga_router_test.go Outdated
Comment thread sei-tendermint/internal/p2p/giga_router_test.go
- Move TTL disable (TTLNumBlocks=0, TTLDuration=0) into TestConfig()
  so all mempool tests benefit, per reviewer request.
- Restore Committed field in testApp to verify GigaRouter calls Commit
  at the right moments. InitChain now sets Committed=true (matching the
  handshaker flow: InitChain → FinalizeBlock → Commit).
- Document TestInitChainCommitThenFinalize as a contract test for testApp.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
cfg := DefaultConfig()
cfg.CacheSize = 1000
cfg.DropUtilisationThreshold = 0.0
// Disable TTL purging in tests: the mempool starts at height=-1, so txs
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this test config is used in various context, so the comment doesn't seem relevant.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@wen-coding wen-coding enabled auto-merge April 16, 2026 14:31
@wen-coding wen-coding added this pull request to the merge queue Apr 16, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Apr 16, 2026
@wen-coding wen-coding added this pull request to the merge queue Apr 16, 2026
Merged via the queue into main with commit 7206ae5 Apr 16, 2026
81 of 125 checks passed
@wen-coding wen-coding deleted the wen/fix-autobahn-initchain-height branch April 16, 2026 19:17
yzang2019 added a commit that referenced this pull request Apr 20, 2026
* main: (51 commits)
  Giga storage integration test (#3268)
  test(flatkv): add flatkv integration testings (#3262)
  perf(app): reuse decoded transactions across ProcessProposalHandler hot path (#3257)
  Fix of the proto conv testing (#3261)
  FlatKV refactor for state sync import + export (#3250)
  Validate block part index matches proof index (CON-20) (#3256)
  fix: add retry to apt-get update in Docker CI (#3264)
  fix: autobahn InitChain, GetValidators, and mempool TTL (CON-249) (#3243)
  Fix buffer offset in ProposerPriorityHash (CON-200) (#3255)
  Handle error case in light client divergence detector (#3254)
  perf(evmrpc): eliminate redundant block fetches in simulate backend (#3208)
  fix(evmrpc): omit notifications from legacy JSON-RPC batch responses per spec (#3246)
  fix: deduplicate block fetch in getTransactionReceipt (#3244)
  Made autobahn producer use TxMempool (#3224)
  Skip signature event building during Cosmos CheckTx/ReCheckTx (#3230)
  Regenerate changelog in prep to tag v6.4.2 (#3240)
  Fix receipt default retention (#3237)
  feat(flatkv): introduce module-prefix physical keys across all FlatKV DBs (#3229)
  added a ProposerAddress check to setProposal CON-250 (#3232)
  feat: add AUTOBAHN option to local docker cluster (CON-247) (#3220)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants