fix(ci): give avm-check-circuit more CPU and timeout for large traces#23747
Draft
AztecBot wants to merge 1 commit into
Draft
fix(ci): give avm-check-circuit more CPU and timeout for large traces#23747AztecBot wants to merge 1 commit into
AztecBot wants to merge 1 commit into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The
AVM Circuit Inputs Collection and Checkworkflow has been failing consistently onnext(runs 1716–1723, since ~2026-05-29 14:00 UTC). Theavm-check-circuitjob exits with code 124 (timeout).Root cause
avm_check_circuit_cmdsinyarn-project/end-to-end/bootstrap.shruns each dumped AVM input throughbb-avm avm_check_circuitwith the per-test prefixISOLATE=1:TIMEOUT=30sand the defaultCPUS=2.Every input passes in 3–5s except the
e2e_multiple_blobstx (0x2f5d642b…), which produces a much larger circuit. From the failing input's log:With only 2 CPUs, trace generation alone consumed ~23s, so the circuit check never finished before the 30s
timeoutfired.parallelizeruns with--halt now,fail=1, so this one timed-out input failed the entire job. It was killed mid-check — there is no circuit correctness error, purely an insufficient resource/time budget. This is exactly the case the file's existingWARNINGcomment anticipated.Fix
Bump the per-input allocation from
CPUS=2 / TIMEOUT=30stoCPUS=8 / TIMEOUT=300s:--cpusquota), so it speeds up the large stragglers that run near-alone at the tail of the parallel run, while staying neutral during the parallel burst of small txs (which finish before CPU contention matters, and where CFS throttles all containers to the physical core count regardless).Memory follows the default
CPUS*4 = 32g; the heavy tx peaked at ~4 GiB, so memory was never the constraint.Testing
Not reproduced locally: the check runs against a cached, freshly-built
bb-avmplus the S3 inputs tarball on a CI EC2 host, which isn't feasible to stand up in this session. The change is confined to the CI resource prefix and is validated by the failing input's own log (700k-row trace, killed mid-check at the 30s boundary with no assertion failure). The fix will be exercised by the next push run of this workflow onnext.Created by claudebox · group:
slackbot