fix(ci): raise AVM check-circuit per-tx timeout to 180s#23742
Draft
AztecBot wants to merge 1 commit into
Draft
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
The nightly/
nextAVM Circuit Inputs Collection and Check workflow'savm-check-circuitjob has been failing since #23644 (chore: replace bb_cli avm_prove and avm_check_circuit). It first failed on run #1720 (commit49f6ee5) and again on the scheduled run #1721 (389dc29).Root cause
Of the ~785 dumped AVM circuit inputs checked per run, exactly one times out: the
e2e_multiple_blobstx0x03713ef6…. Its check log shows:Running check (with skippable) circuit over 700560 rows— killed at the 30s mark bytimeout -v 30s, exit code124Every other tx finishes in 5–6s. The single timeout makes GNU
parallelhalt (--halt now,fail=1) and the job exits124.Each check runs isolated with
CPUS=2(docker --cpus=2), so the 700k-rowe2e_multiple_blobscircuit needs ~45s of throttled CPU for trace generation plus the check — just over the existingTIMEOUT=30s. The refactor in #23644 pushed this largest tx across that tight boundary. The per-check budget, not the proving logic, is the problem; the code comment already anticipated this case ("transactions could need more CPU and MEM than we allocate by default … they might start timing out").Fix
Raise the per-check
TIMEOUTfrom30sto180sinavm_check_circuit_cmds(yarn-project/end-to-end/bootstrap.sh). This gives ~4x headroom over the observed ~45s need for the largest tx while staying under the 300s slow-job warning threshold inparallelize. Resource allocation (CPUS=2,MEM=8g) is unchanged to preserve the balanced 64-jobs × 2-CPU = 128-core model and avoid oversubscription; memory peaked at ~3.9 GiB against the 8 GiB limit, so it has ample headroom. The small/fast txs are unaffected.CI run that surfaced this: https://github.com/AztecProtocol/aztec-packages/actions/runs/26674308080
Created by claudebox · group:
slackbot