Skip to content

feat: add op-pp-2chains env and L2→L2 bridge test#1691

Open
arnaubennassar wants to merge 13 commits into
developfrom
feat/e2e-op-pp-2chains
Open

feat: add op-pp-2chains env and L2→L2 bridge test#1691
arnaubennassar wants to merge 13 commits into
developfrom
feat/e2e-op-pp-2chains

Conversation

@arnaubennassar

@arnaubennassar arnaubennassar commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator

🔄 Changes Summary

  • Adds a two-network OP-PP e2e environment (test/e2e/envs/op-pp-2chains) generated from a kurtosis-cdk snapshot, running two OP-PP L2 chains (chain IDs 20201/20202) on a shared L1 and agglayer instance.
  • Generalizes the e2e loader to expose L2B (second L2) alongside L2 (first L2/L2A) without breaking the existing single-chain op-pp env.
  • Implements BridgeL2ToL2 helper in bridge_utils.go that bridges a MintableERC20 token L2A → L2B and asserts the destination ERC20 balance after claim.
  • Adds TestBridgeL2ToL2 exercising the full L2→L2 bridge flow end-to-end.
  • Matrices the CI workflow (test-go-e2e.yml) over [op-pp, op-pp-2chains] with fail-fast: false; bumps job timeout to 60 min for GER-propagation headroom.
  • Fixes BridgeL1ToL2 / BridgeL1ToL2WithResult to use the actual injected L1InfoTreeIndex from GetInjectedL1InfoLeaf for claim proofs (the API returns the first GER injected with index ≥ requested; using a stale lower index produced a GER hash absent from L2's GlobalExitRootMap, causing ClaimAsset to revert).
  • Fixes two claimsync bugs: concurrent block-0 insert race in L2ClaimSyncer startup and incorrect finality parameter in the EVMDownloader safe-zone check.

⚠️ Breaking Changes

  • None

📋 Config Updates

  • None

✅ Testing

  • 🤖 Automatic: TestBridgeL2ToL2 added and passing in CI under AGGKIT_E2E_ENV=op-pp-2chains. Both op-pp and op-pp-2chains matrix jobs pass.
  • 🖱️ Manual: AGGKIT_E2E_ENV=op-pp-2chains go test -v -timeout 30m -run TestBridgeL2ToL2 ./test/e2e/ passes from a cold Docker start (~250 s total including post-test L1↔L2 health checks).

🐞 Issues

  • N/A

🔗 Related PRs

  • kurtosis-cdk snapshot branch: feat/op-pp-2chains-snapshot @ 627340aa

📝 Notes

  • The op-pp-2chains env uses kurtosis-generated images pushed to the arnaubennassar Docker Hub namespace (arnaubennassar/geth:op-pp-2chains, arnaubennassar/beacon:op-pp-2chains, arnaubennassar/validator:op-pp-2chains). aggkit:local is built from source in CI.
  • govulncheck failure is pre-existing on develop (not introduced by this PR).

arnaubennassar and others added 8 commits July 1, 2026 09:22
…works

Add test/e2e/envs/op-pp-2chains/ (summary.json, docker-compose.yml, config/001,
config/002, config/agglayer) and the EnvOpPP2Chains env constant.

Generalize the loader so it can expose both L2 networks without changing the
single-chain EnvOpPP behavior:
- Add optional L2B *L2Config to Env (nil for single-chain envs).
- Extract per-L2 wiring into loadL2Config; LoadEnv loads "001" -> L2 always and
  "002" -> L2B only for EnvOpPP2Chains.
- L2Config now carries its own Client, BridgeService, Keys and URLs so each L2 is
  self-contained for multi-chain tests.
- Tolerate both op-geth (op-pp) and op-reth (op-pp-2chains) execution clients via
  summaryL2Network.l2RPCExternal().
- waitForServices now waits on every L2 network rather than only the first.

checks.go: extract L2 connectivity/contracts/bridge-service checks into reusable
*For helpers, run them for L2B when present, and gate the hardcoded chain-ID
assertions by env name (op-pp: 271828/2151908, op-pp-2chains: L2A 20201 / L2B 20202).

loader_test.go: add pure-parse tests asserting op-pp-2chains parses into two L2
networks (20201, 20202) with distinct endpoints, op-pp still parses to one (2151908),
and that op-pp-2chains/summary.json exists.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add BridgeL2ToL2 helper in bridge_utils.go that bridges an asset from
L2A to L2B through the aggkit bridge service: send BridgeAsset on L2A
with L2B's network id as destination, poll L2A bridge service for the
deposit, wait for L1 Info Tree inclusion, then poll the DESTINATION
(L2B) bridge service for GER propagation (GetInjectedL1InfoLeaf),
fetch the claim proof from L2B, ClaimAsset on L2B, and assert the
destination balance increased.

Add TestBridgeL2ToL2 which skips when testEnv.L2B is nil (single-chain
env) and fails loudly on any bridge-service or on-chain error.

Wire AGGKIT_E2E_ENV in TestMain to select the env (default op-pp) so CI
can run the op-pp-2chains matrix.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Matrixize test-go-e2e across [op-pp, op-pp-2chains] with fail-fast:false
and AGGKIT_E2E_ENV selector. Pull both envs' images in pull-docker-images.
Bump timeout to 60m for GER-propagation headroom.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…r startup

InsertBlock is now idempotent: a SQLite constraint violation (block already
present) is treated as a successful no-op. This fixes the startup race where the
L2ClaimSyncer auto-start goroutine and the aggsender's
SetClaimSyncerNextRequiredBlock loop both bootstrap the same block (block 0)
concurrently, one failing with 'UNIQUE constraint failed: block.num' and leaving
the aggsender stuck forever in starting_claim_syncer_stage.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…in both chains

OP-reth in this test env never produces a "finalized" L2 block
(eth_getBlockByNumber("finalized") always returns 0). With the previous
FinalizedBlock setting, the EVMDownloader's safe-zone check
(requestToBlock <= lastFinalizedBlock) never passed for the L2ClaimSyncer
when scanning empty blocks, causing it to loop forever at block 0.
This kept BridgeDataQuerier.GetLastProcessedBlock (which returns min of
bridge syncer and claim syncer) at 0, so the aggsender saw "no new blocks"
indefinitely and never sent a certificate — blocking the L1 info tree update.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The ClaimSync EVMDownloader was passed rd.GetFinalizedBlockType() as the
finalizedBlockType parameter. On OP chains (like op-pp-2chains test env),
the reorg detector uses FinalizedBlock but OP-reth never produces a
finalized block tag — eth_getBlockByNumber("finalized") returns 0
indefinitely. This caused the EVMDownloader safe-zone check
(requestToBlock <= lastFinalizedBlock) to never pass when scanning ranges
with no events, permanently stalling the L2ClaimSyncer at block 0 and
blocking the aggsender's BridgeDataQuerier (which gates on min(bridge,claim)).

Fix: pass cfg.BlockFinality instead, so the ClaimSync downloader uses its
own configured finality (LatestBlock) independent of the reorg detector's
finality (which is correctly FinalizedBlock for reorg safety).

Also revert the temporary workaround in op-pp-2chains TOML configs that
changed [ReorgDetectorL2] FinalizedBlock from "FinalizedBlock" to "LatestBlock".
The reorg detector now correctly retains FinalizedBlock semantics; only the
ClaimSync downloader's safe-zone check uses the syncer's own finality.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nce assertion

- Increase healthcheck start_period for op-reth and op-node to 300s so the
  docker compose dependency chain resolves before the test timeout fires on a
  cold start.
- Add missing aggsender-validator-config.toml files, keystore files, and
  beacon spec/entrypoint overrides so all containers start correctly.
- Fix op-reth entrypoint to write the genesis hash to the shared volume so
  op-node can consume it on first boot.
- Fix testmain_test.go: extend env-load context to 15m and select env via
  AGGKIT_E2E_ENV so the matrix CI job loads the correct environment.
- Fix BridgeL2ToL2: replace incorrect native-ETH balance check with an
  ERC20 balance assertion using ComputeTokenProxyAddress + MintableERC20
  binding; update test to mint/approve the MintableERC20 token before bridging.
- Test passes end-to-end from cold start in ~250s including post-test health
  check (L1<->L2 and L2<->L1 bridge flows).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Leftover .bak files from the snapshot generation process that were not
deleted in the prior commit.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@claude

claude Bot commented Jul 1, 2026

Copy link
Copy Markdown

Claude finished @arnaubennassar's task in 2m 21s —— View job


PR Review Complete

Reviewed PR #1691: feat(e2e): add op-pp-2chains env and L2→L2 bridge test


🚀 What's New

This PR introduces a comprehensive two-network OP-PP e2e testing environment that enables L2→L2 bridge testing. It adds the op-pp-2chains environment running two L2 chains (IDs 20201/20202) side-by-side sharing a single L1 and agglayer. The e2e loader has been generalized to expose both L2 and L2B networks without breaking single-chain compatibility. A new BridgeL2ToL2 helper and test validate the full cross-L2 bridge flow including GER propagation. The CI workflow now runs both environments in parallel via matrix strategy.

🐛 Bug Fixes

Two critical claimsync bugs were discovered and fixed during testing:

  1. Concurrent block 0 insert race (storage.go:159-162) - InsertBlock now handles SQLite constraint violations idempotently to prevent startup races between L2ClaimSyncer auto-start and aggsender's bootstrap loop
  2. Incorrect finality parameter (claimsync.go:102) - Fixed EVMDownloader to use cfg.BlockFinality instead of rd.GetFinalizedBlockType() for safe-zone checks

📋 Config Updates

None.

⚠️ Breaking Changes

None. The loader generalization is additive and maintains backward compatibility with single-chain environments.


Review Feedback

🟢 Strengths

  1. Excellent test coverage - The new TestInsertBlockIdempotent validates the concurrent insert fix with both single and transactional scenarios
  2. Backward compatibility - Single-chain op-pp environment remains fully functional; generalization is purely additive
  3. Clear documentation - Code comments explain the two-chain architecture, GER propagation wait, and race-condition fixes
  4. Proper error handling - L2→L2 helper validates env.L2B != nil upfront with clear error message
  5. Comprehensive validation - The loader's CheckEnv method validates both L2A and L2B configurations with environment-specific chain ID expectations

🟡 Suggestions

  1. CI timeout reasoning (.github/workflows/test-go-e2e.yml:73)

    • The 60-minute timeout is described as providing "GER-propagation headroom"
    • Consider documenting the expected worst-case propagation time in the PR description or a comment
    • Current timeout may be conservative; monitor actual run times and adjust if needed
  2. MintableERC20 deployment timing (envs/loader.go:454-461)

    • Token deployment happens during LoadEnv for every test run
    • Consider deploying once in the snapshot and reading the address from summary.json to reduce startup time
    • Current approach works but adds ~2-3 seconds to every test startup
  3. Balance assertion precision (bridge_utils.go:714)

    • The L2→L2 helper asserts exact balance match: finalDestBalance.Cmp(expected) != 0
    • L1→L2 helper uses 90% threshold for gas tolerance: balanceIncrease.Cmp(minExpectedIncrease) < 0 (line 155)
    • Inconsistency: L2→L2 expects exact amount but gas may be deducted on claim
    • Impact: Test may be flaky if L2B claim tx consumes gas from bridged amount
    • Consider adding similar gas tolerance or clarifying why exact match is safe
  4. Magic number in loader (envs/loader.go:349)

    • Hardcoded keystore password: const keystorePassword = "pSnv6Dh5s9ahuzGzH9RoCDrKAMddaX3m"
    • Fine for test environments but consider documenting this is the kurtosis-cdk snapshot default
    • Could extract to a constant with a descriptive comment
  5. Test timeout budget (bridge_l2_l2_test.go:31)

    • 20-minute timeout for L2→L2 test seems generous
    • Consider documenting expected time ranges (e.g., "typically 3-5 minutes, allow 20 for CI variability")
    • Helps future maintainers understand if tests are degrading

🔴 Issues

None - No blocking issues found. The code is production-ready.


Detailed Technical Review

CI Workflow Changes

  • Matrix strategy properly configured with fail-fast: false to run both envs independently
  • Artifact caching optimized: aggkit built once, pulled images saved once, both downloaded by matrix jobs
  • Docker image pull correctly references both op-pp and op-pp-2chains compose files
  • Timeout increased from implicit default to 60 min - appropriate for GER propagation latency

L2→L2 Bridge Implementation

  • GER propagation wait (lines 613-629) is the key differentiator from L1↔L2 flows
    • Correctly polls env.L2B.BridgeService.GetInjectedL1InfoLeaf on the destination chain
    • Proves agglayer settled the origin exit root and aggoracle injected the GER leaf onto L2B
  • Proof fetching correctly queries the origin (L2A) bridge service for local exit proofs
  • Wrapped token handling properly computes proxy address and handles non-deployed token case (lines 633-649)
  • Error messages are descriptive and include context (origin/dest network IDs, indices)

Loader Generalization

  • Multi-network architecture:
    • Introduced l2NetworkKeyA = "001" and l2NetworkKeyB = "002" constants
    • L2Config now includes per-network Client, BridgeService, Keys, AggsenderRPCURL
    • Top-level Env.L2B is nil for single-chain, populated for multi-chain
  • Execution client detection (summaryL2Network.l2RPCExternal(), line 199-204)
    • Transparently supports both op-geth (op-pp) and op-reth (op-pp-2chains) snapshots
    • Falls back from OpGeth to OpReth URL based on which is populated
  • Chain ID validation updated in checks.go to handle env-specific expectations:
    • op-pp: L2 chain ID 2151908
    • op-pp-2chains: L2A=20201, L2B=20202

Claimsync Bug Fixes

Bug 1: Concurrent block 0 insert (storage.go:150-166)

  • Root cause: L2ClaimSyncer auto-start goroutine and aggsender's SetClaimSyncerNextRequiredBlock both call InsertBlock(0, ...) concurrently
  • Fix: Catch sqlite.ErrConstraint and treat as successful no-op
  • Test coverage: TestInsertBlockIdempotent validates duplicate inserts (same hash, different hash, in-transaction)
  • Correctness: ✅ Proper use of type assertion and SQLite error code check

Bug 2: Incorrect finality parameter (claimsync.go:102)

  • Root cause: EVMDownloader's safe-zone parameter was using rd.GetFinalizedBlockType() (ReorgDetector's finality) instead of cfg.BlockFinality (ClaimSync's configured finality)
  • Fix: Changed line 102 from rd.GetFinalizedBlockType()cfg.BlockFinality
  • Impact: In op-pp-2chains config, ReorgDetectorL2 uses LatestBlock finality but ClaimSync should use its own configured finality for safe log queries
  • Correctness: ✅ Aligns with the intended configuration separation

Configuration Files

  • 37,000+ lines of generated genesis/config files from kurtosis-cdk snapshot
  • Spot-checked rollup.json, aggkit-config.toml - configurations look standard
  • .bak file present: op-pp-2chains/config/agglayer/config.toml.bak - consider removing or documenting purpose

Test Structure

  • TestMain updated to read AGGKIT_E2E_ENV environment variable for CI matrix
  • TestBridgeL2ToL2 properly skips when testEnv.L2B == nil (single-chain env)
  • Mint and approve MintableERC20 before bridge to bypass Local Balance Tree underflow check
  • Transactor copies prevent race on Value field mutation (originOpts := *testEnv.L2.Transactor)

Security Review

  • ✅ No SQL injection vectors (uses parameterized queries and prepared statements)
  • ✅ No command injection risks (file paths are sanitized with filepath.Clean)
  • ✅ Docker images from arnaubennassar namespace are documented as generated from known kurtosis-cdk commit
  • ✅ Keystore password is test-only (clearly a kurtosis snapshot default)
  • ✅ Error messages don't leak sensitive information

Performance Considerations

  • ✅ Parallel tool calls used appropriately (e.g., CI image pulls, post-test L1↔L2 bridge checks)
  • ✅ Polling intervals are reasonable (2-5 seconds depending on operation)
  • ⚠️ MintableERC20 deployment on every LoadEnv adds startup overhead (suggestion Split L1 / L2 responsibilities for lastgersync #2 above)
  • ✅ Docker compose artifact caching minimizes redundant pulls in CI

Code Quality

  • ✅ Follows Go best practices and project conventions from CLAUDE.md
  • ✅ Error wrapping with context: fmt.Errorf("context: %w", err)
  • ✅ Proper use of require (not assert) in tests as specified in project guidelines
  • ✅ Line length under 120 chars limit
  • ✅ Imports grouped correctly (stdlib, external, internal)
  • ✅ Doc comments on exported functions start with function name

Recommendation

APPROVE

This is a high-quality PR that adds significant testing capability while maintaining backward compatibility. The two claimsync bugs discovered and fixed during development demonstrate the value of the new test environment. All suggestions above are minor optimizations and don't block merge.

The only item worth addressing before merge is the balance assertion inconsistency (suggestion #3) to prevent potential flakiness, but even this is non-blocking if the team has verified L2→L2 claims don't consume gas from bridged amounts.


@arnaubennassar arnaubennassar changed the title feat(e2e): add op-pp-2chains env and L2→L2 bridge test feat: add op-pp-2chains env and L2→L2 bridge test Jul 1, 2026
@arnaubennassar arnaubennassar self-assigned this Jul 1, 2026
arnaubennassar and others added 3 commits July 1, 2026 10:07
- Scope the op-pp-2chains CI job to only run TestBridgeL2ToL2 via a
  TEST_RUN Makefile variable, so existing single-chain tests (BackwardForwardLET,
  RemoveGER, etc.) don't run against the 2-chain env and fail.
- Increase bridge-deposit polling in BridgeL2ToL2 from 30x2s (60s) to
  60x5s (300s) to accommodate slower CI environments where aggkit indexing
  takes longer than on local.
- Remove two unused //nolint:gosec directives in loader_test.go flagged by
  nolintlint linter.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The bridge service may report L1InfoTree leaf injection slightly before
the op-reth RPC node exposes the new block via 'latest'. This caused
ClaimAsset simulation (eth_call) to revert with 'execution reverted'
because the GER wasn't visible yet in the on-chain GlobalExitRootMap.

Add a direct on-chain poll of GlobalExitRootMap(gerHash) after fetching
the claim proof in both BridgeL1ToL2 and BridgeL1ToL2WithResult, so the
claim is only attempted once the L2 RPC node confirms the GER.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
GetInjectedL1InfoLeaf returns the first GER injected on L2 with index
>= the deposit's inclusion index, not necessarily the exact index. When
a parallel L2->L1 bridge updates the L1InfoTree to a higher index first,
the aggoracle may inject that higher index on L2A, skipping the lower one.

Using the original (lower) index for GetClaimProof produces a GER hash
that is absent from L2's GlobalExitRootMap, causing ClaimAsset to revert.

Fix: capture the injected leaf's L1InfoTreeIndex and use it for
GetClaimProof in both BridgeL1ToL2 and BridgeL1ToL2WithResult.
The on-chain GER confirmation poll is kept as an additional safety check.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
arnaubennassar and others added 2 commits July 2, 2026 11:41
…outs

The `docker compose up -d` startup chain for op-pp-2chains was exceeding
the 15-minute LoadEnv timeout on slow CI runners.  Root cause: `start_period:
300s` on all four op-reth/op-node containers creates a mandatory 600s+ serial
wait (geth→beacon→op-reth→op-node) before `docker compose up -d` returns,
leaving barely 240s of margin before the 900s context deadline.

Changes:
- op-reth-001/002, op-node-001/002: start_period 300s → 120s; interval 2s →
  10s; retries 3 → 6.  Each container now has 120s + 60s = 180s max to become
  healthy, while the worst-case serial chain drops from ~660s to ~420s.
- LoadEnv context timeout: 15min → 20min (extra buffer for slow runners).
- `make test-e2e` go test -timeout: 30m → 45m (keeps test/job ratio safe).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01MjDD58nJmcTg7nqgt4nYZN
@sonarqubecloud

sonarqubecloud Bot commented Jul 3, 2026

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant