[CCIP-11717..11727] Fix CCIPReader test 12m-timeout hang: dedicated DB per parallel test#22970
[CCIP-11717..11727] Fix CCIPReader test 12m-timeout hang: dedicated DB per parallel test#22970KodeyThomas wants to merge 1 commit into
Conversation
…B per parallel test The parallel CCIPReader tests obtained their LogPoller DB via pgtest.NewSqlxDB, which is backed by the txdb driver. Under txdb every test in the package shares one physical Postgres database and tables — each caller only gets its own uncommitted transaction, not its own tables. The tests run with t.Parallel() and reuse the same evm_chain_id (chainD) while their simulated backends mint blocks 1,2,3..., so parallel tests insert the same (block_number, evm_chain_id) primary key into the shared evm.log_poller_blocks table (PRIMARY KEY (block_number, evm_chain_id), 0115_log_poller.sql). The duplicate INSERT ... ON CONFLICT DO NOTHING takes a speculative-insert lock that blocks on another test's never-committed txdb transaction. txdb's conn.ExecContext/QueryContext discard the caller context, so the blocked LogPoller insert never times out, holds conn.Mutex, serializes the other LogPoller goroutines, and logPoller.Close() blocks on its WaitGroup forever -> panic: test timed out after 12m0s (whole package fails). Replace pgtest.NewSqlxDB with heavyweight.FullTestDBV2 (via a newHeavyTestDB helper) at every site so each test gets its own migrated database (no cross-test PK contention) and a real connection that honors query timeouts. Mirrors the existing usage in ccip_reader_bench_test.go. Validated with the diagnose harness (sandbox disabled): scoped to the CI test set, TestCCIPReader* 10/10 iterations green (p50 35s) and Test_Get* 5/5 green (p50 45s) — zero hangs vs the prior 12m timeout. Fixes: CCIP-11717, CCIP-11718, CCIP-11719, CCIP-11720, CCIP-11721, CCIP-11723, CCIP-11724, CCIP-11727
|
✅ No conflicts with other open PRs targeting |
sorry, I am not getting "The duplicate INSERT ... ON CONFLICT takes a |
Problem
8 sibling flaky-test tickets (CCIP-11717/11718/11719/11720/11721/11723/11724/11727), all
t.Parallel()tests inintegration-tests/smoke/ccip/ccip_reader_test.go, share one root cause: the package intermittently hangs withpanic: test timed out after 12m0s.A CI goroutine dump showed
logPoller.pollAndSaveLogs → InsertBlocksstuck in a pgx network read for 11 minutes holdingconn.Mutex, 6 other LogPoller DB goroutines blocked behind it, andlogPoller.Close()(int.Cleanup) blocked forever on itsWaitGroup.Root cause
These tests get their LogPoller DB from
pgtest.NewSqlxDB, backed by thetxdbdriver. Undertxdbevery test in the package shares one physical Postgres database and tables — each caller only gets its own uncommitted transaction, not its own tables. The tests run in parallel and reuse the sameevm_chain_id(chainD) while their simulated backends mint blocks1,2,3…, so parallel tests insert the same(block_number, evm_chain_id)primary key into the sharedevm.log_poller_blockstable (PRIMARY KEY (block_number, evm_chain_id),0115_log_poller.sql).The duplicate
INSERT … ON CONFLICT DO NOTHINGtakes a speculative-insert lock that waits on another test's never-committed txdb transaction.txdb'sconn.ExecContext/QueryContextdiscard the caller's context, so the blocked insert never times out, holdsconn.Mutex, serializes the other LogPoller goroutines, andClose()hangs → the 12m package timeout.Fix
Replace
pgtest.NewSqlxDBwithheavyweight.FullTestDBV2(via a smallnewHeavyTestDBhelper) at every DB-acquisition site. Each test now gets its own migrated database (no cross-test PK contention) and a real connection that honors query timeouts. This mirrors the existing usage inccip_reader_bench_test.go.Validation
Local
diagnoseharness (scoped to the CI test set to exclude unrelated Aptos/E2E tests):TestCCIPReader*(10 parallel tests = max chain-4 collision pressure): 10/10 iterations green, 0 broken/timeout/slow, p50 35s.Test_Get*(MemoryEnvironment group): 5/5 iterations green, 0 broken/timeout/slow, p50 45s.Zero hangs across 15 collision-forcing iterations, versus the prior 12-minute timeout.
Fixes CCIP-11717, CCIP-11718, CCIP-11719, CCIP-11720, CCIP-11721, CCIP-11723, CCIP-11724, CCIP-11727.