Skip to content

feat: add bridgeservicefinder package#1692

Open
arnaubennassar wants to merge 2 commits into
developfrom
feat/bridge-service-finder
Open

feat: add bridgeservicefinder package#1692
arnaubennassar wants to merge 2 commits into
developfrom
feat/bridge-service-finder

Conversation

@arnaubennassar

@arnaubennassar arnaubennassar commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator

🔄 Changes Summary

  • Adds a new bridgeservicefinder package that resolves and serves the bridge service URL for every network attached to a rollup manager, exposing New(...), Start(ctx), and GetURL(networkID). Not yet wired into cmd/run.go or the app config — it's intended to be consumed by other components in the future.
  • URL resolution follows three sources in strict priority: (1) static config map, (2) on-chain aggchainMetadata key "BRIDGE_SERVICE_URL" on the aggchain contract, (3) trustedSequencerURL() with the port substituted to the hardcoded bridge REST port 5577.
  • Networks are enumerated from the real rollup manager (RollupCount/RollupIDToRollupData) rather than a static list; networkID == rollupID for L2s, with network 0 (L1) only servable via config.
  • A lightweight, self-contained polling loop (FilterLogs over a configurable block-finality window) watches SetTrustedSequencerURL and AggchainMetadataSet events and updates the cache live, respecting source priority (config is never overridden) and a health-gating rule (only replace a cached URL with a new one if it's reachable, unless the previous URL was itself unreachable).
  • Adds selector/event-compatible mock Solidity contracts (test/contracts/rollupmanagermock, test/contracts/aggchainrollupmock) + generated bindings, verified against the real agglayermanager/aggchainbase/polygonrollupbaseetrog bindings via a smoke test.

⚠️ Breaking Changes

  • 🛠️ Config: None.
  • 🔌 API/CLI: None.
  • 🗑️ Deprecated Features: None.

📋 Config Updates

  • 🧾 Diff/Config snippet: None — no app config or wiring is added in this PR. The package is standalone and not yet started by any binary.

✅ Testing

  • 🤖 Automatic: Extensive unit tests on a go-ethereum simulated backend (bridgeservicefinder/*_test.go) covering each source in isolation, all priority combinations, config-immunity to on-chain updates, metadata-over-sequencer precedence, multi-network initial-cache builds, unreachable-service handling at Start, live updates via both event types, and health-gating edge cases. A smoke test (test/contracts/bridgeservicefinder_mocks_test.go) confirms the mock contracts are selector/event-compatible with the real bindings.
  • 🖱️ Manual: make build, make lint (0 issues), make test-unit, and make generate-mocks (no diff) all verified clean.

🐞 Issues

🔗 Related PRs

  • None

📝 Notes

  • The aggchainMetadata/trustedSequencerURL calls are treated as "source not available" (graceful fallthrough) on any contract-call error, since go-ethereum bindings don't reliably distinguish "method absent/reverted" from other RPC errors — this favors graceful degradation over strict error classification.
  • Live-update seeding starts from the current finalized block at Start() time (no historical backfill), since the initial cache build already covers the current state.
  • Intentionally not wired into cmd/run.go/config/* in this PR — a future component will construct and start the finder itself.

Resolves and serves the bridge service URL for every network attached to
a rollup manager, from three sources in strict priority: config, on-chain
aggchainMetadata ("BRIDGE_SERVICE_URL"), and trustedSequencerURL + port
5577. Enumerates networks via agglayermanager and keeps the cache updated
via a lightweight FilterLogs polling loop with health-gated updates, wired
into cmd/run.go behind a config flag.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
@claude

claude Bot commented Jul 1, 2026

Copy link
Copy Markdown

Claude finished @arnaubennassar's task in 2m 25s —— View job


Code Review Complete ✅

Reviewing PR #1692: feat: add bridgeservicefinder package


🚀 What's New

This PR introduces a new bridgeservicefinder package that automatically discovers and serves bridge service URLs for all networks attached to a rollup manager. It implements a three-tier priority system (config → on-chain metadata → sequencer URL), health-gated caching, and live event-driven updates via finality-bounded polling. The package is lightweight, self-contained, and disabled by default.

🐛 Bug Fixes

None.

📋 Config Updates

[BridgeServiceFinder]
Enabled = false  # Disabled by default
RollupManagerAddr = "{{L1NetworkConfig.RollupManagerAddr}}"
# Optional static networkID -> URL overrides (only way to configure network 0/L1)
# URLs = { 0 = "http://localhost:5577" }
BlockFinality = "FinalizedBlock"  # Options: PendingBlock, LatestBlock, SafeBlock, FinalizedBlock, EarliestBlock
PollInterval = "30s"
BlockChunkSize = 10000
HealthCheckPath = "/health"
HealthCheckTimeout = "5s"
RequireAllHealthyOnStart = false  # false = continue with unhealthy services

⚠️ Breaking Changes

None. The new feature is additive and disabled by default.


📊 Code Review Findings

🟢 Strengths

  1. Outstanding Documentation - The 177-line doc.go is exemplary, providing complete context on design decisions, data model mapping, priority rules, health-gating logic, and binding verification.

  2. Excellent Test Coverage - 6 comprehensive test files covering:

    • Source resolution in isolation and priority combinations
    • Live event updates with health gating
    • Mock contract compatibility verification
    • Edge cases and error paths
  3. Clean Architecture - Strong separation of concerns:

    • resolver.go - stateless priority logic
    • cache.go - thread-safe storage
    • listener.go - event polling
    • health.go - HTTP probing
    • Clear interfaces with dependency injection
  4. Proper Concurrency - Uses sync.RWMutex correctly with read locks for GetURL and write locks for updates.

  5. Graceful Degradation - Treats contract call errors as "source not available" rather than hard failures, allowing fallthrough to lower-priority sources.

  6. No Heavy Dependencies - Intentionally avoids sync/reorgdetector/DB stack; uses lightweight FilterLogs polling.

🟡 Suggested Improvements

Priority: Medium

  1. Race condition in probeAll (bridgeservicefinder.go:271-288)

    func (f *finder) probeAll(ctx context.Context) int {
        unhealthy := 0
        f.cache.mu.Lock()  // ← Single lock for entire operation
        for networkID, entry := range f.cache.entries {
            healthy := f.healthChecker.IsHealthy(ctx, entry.url)  // ← HTTP call under lock!
            entry.healthy = healthy
            f.cache.entries[networkID] = entry
            if !healthy {
                unhealthy++
                f.logger.Warnf(...)
            }
        }
        f.cache.mu.Unlock()
        return unhealthy
    }

    Issue: Health checks (potentially slow HTTP operations) are performed while holding the cache write lock, blocking all GetURL readers and other writers. With many networks or slow/timing-out health endpoints, this could cause significant blocking.

    Suggested fix: Collect URLs outside the lock, probe them, then acquire the lock only for the write phase:

    func (f *finder) probeAll(ctx context.Context) int {
        f.cache.mu.RLock()
        toProbe := make([]struct{ id uint32; url string }, 0, len(f.cache.entries))
        for id, entry := range f.cache.entries {
            toProbe = append(toProbe, struct{ id uint32; url string }{id, entry.url})
        }
        f.cache.mu.RUnlock()
        
        results := make(map[uint32]bool, len(toProbe))
        for _, p := range toProbe {
            results[p.id] = f.healthChecker.IsHealthy(ctx, p.url)
        }
        
        unhealthy := 0
        f.cache.mu.Lock()
        for networkID, healthy := range results {
            if entry, ok := f.cache.entries[networkID]; ok {
                entry.healthy = healthy
                f.cache.entries[networkID] = entry
                if !healthy {
                    unhealthy++
                    f.logger.Warnf(...)
                }
            }
        }
        f.cache.mu.Unlock()
        return unhealthy
    }
  2. Context handling in health checker (health.go:42-75)

    func (h *httpHealthChecker) IsHealthy(ctx context.Context, baseURL string) bool {
        reqCtx := ctx
        if h.timeout > 0 {
            var cancel context.CancelFunc
            reqCtx, cancel = context.WithTimeout(ctx, h.timeout)
            defer cancel()
        }

    Issue: If ctx is already cancelled/expired and h.timeout > 0, this creates a child context with timeout that will still respect the parent's cancellation, but the timeout may be misleading. If h.timeout == 0, no timeout is applied at all.

    Suggestion: Always apply the timeout if configured, regardless of parent context state, or document the behavior more explicitly.

  3. Error wrapping in resolver.go loses source information

    } else if err != nil && !errors.Is(err, ErrSourceNotAvailable) {
        return "", SourceMetadata, fmt.Errorf("failed to read aggchain metadata for network %d: %w", networkID, err)
    }

    The error returns SourceMetadata as the source even though the metadata source failed. This could be confusing. Consider returning a zero-value source or a special "error" source when resolution fails.

Priority: Low

  1. Magic numbers in test code - Several test files use magic numbers without constants (e.g., 1337 for chainID, 999999999999999999 for gas limit). Consider extracting to named constants for clarity.

  2. Potential improvement: parallel health checks in applyUpdate - Currently applyUpdate (listener.go:324) probes one URL at a time during event processing. If many events arrive in a burst, this could be slow. Consider batching.

  3. Documentation: withPort behavior with fragments/query params - The withPort function (resolver.go:73) preserves path but doesn't explicitly document behavior with URL fragments (#) or query parameters (?). Testing shows it preserves them correctly, but adding a comment or test case would be helpful.

🟢 Minor Observations

  1. Excellent error handling - Proper use of errors.Is, clear sentinel errors, appropriate error wrapping.

  2. CLAUDE.md compliance:

    • ✅ Imports grouped correctly (stdlib, external, internal)
    • ✅ Exported functions have doc comments starting with function name
    • ✅ Uses require (not assert) in tests
    • ✅ Error wrapping follows pattern: fmt.Errorf("context: %w", err)
    • ✅ Mocks in mocks/ subdirectory
    • ✅ Line length within 120 chars
  3. Config defaults well thought out - Sane defaults for all fields, documented clearly.

  4. Mock contracts - Clever approach to testing without depending on full contract deployments. The smoke test (test/contracts/bridgeservicefinder_mocks_test.go) verifies ABI compatibility.

🔒 Security

  • ✅ No SQL injection risks (no direct SQL, uses abstractions)
  • ✅ No command injection risks
  • ✅ HTTP client has timeout configured
  • ✅ No credentials or secrets in config/code
  • ✅ Proper context cancellation handling
  • ⚠️ Minor: Health check follows redirects by default (Go's http.Client default behavior). Consider documenting this or setting CheckRedirect to control redirect behavior.

⚡ Performance

  • ✅ Uses read locks for GetURL (read-heavy workload optimized)
  • ✅ Chunks FilterLogs requests (BlockChunkSize)
  • ✅ Respects finality to avoid re-processing reorged blocks
  • ⚠️ See race condition issue in probeAll above

🧪 Testing

  • ✅ Extensive unit test coverage
  • ✅ Integration tests with simulated backend
  • ✅ Health-gating edge cases covered
  • ✅ Priority rule combinations tested
  • ✅ Mock compatibility verified

🎯 Recommendations

🔴 Required (Before Merge)

None - the code is production-ready as-is.

🟡 Suggested (Consider for Future)

  1. Fix the race condition in probeAll (see detailed suggestion above) - while the current implementation works, it could cause latency issues at scale.
  2. Consider adding metrics/observability (Prometheus counters for health check failures, cache hits/misses, source priority distribution).
  3. Add a GetAllURLs() or GetNetworks() method if consumers need to enumerate all discovered networks.

✅ Final Verdict

APPROVED - This is exceptionally well-engineered code. The design is thoughtful, documentation is outstanding, test coverage is comprehensive, and the implementation is clean. The minor issues identified are truly minor and don't block merging. The race condition in probeAll is worth addressing but is unlikely to cause issues in typical deployments (10s of networks, fast health endpoints).

Excellent work! 🎉


The package isn't consumed by anything yet — it will be wired up by a
future component instead of being started directly from cmd/run.go, so
drop the unused config plumbing and Start() call along with it.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
@sonarqubecloud

sonarqubecloud Bot commented Jul 1, 2026

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sequencer URL monitor

1 participant