Split Full Suite YAML batches so each heavy folder runs isolated#8093
Merged
Split Full Suite YAML batches so each heavy folder runs isolated#8093
Conversation
Previous grouping (3 folders per contrib batch, usda+hhs paired in baseline-other) pushed peak memory to ~8-9 GB per subprocess on the 16 GB ubuntu-latest runner. Once policyengine-core 3.24+ overhead landed this exceeded the cap and surfaced as 'The runner has received a shutdown signal' mid-batch, intermittently failing Full Suite - Baseline States / Baseline (excl States) / Structural (Other). Every heavy folder now gets its own batch (~3-5 GB peak each). The remaining small folders and root YAML files split across two deterministic catch-all groups so new unknown folders have somewhere safe to land without pushing either group past ~5 GB. Batch counts: Structural (Other) policy/contrib: 7 -> 15 batches Baseline (excl States): 5 -> 6 batches Trade-off: ~3-5 min extra wall time from subprocess startup, in exchange for CI stability. Each subprocess starts fresh so holder memory is fully freed between batches regardless of PE-core version. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Full Suite jobs on ubuntu-latest have been intermittently failing with
The runner has received a shutdown signalmid-batch (see #8069 / #8077 / #8078 across the last two days). The signal is a runner OOM kill — our grouped batches peak at ~8-9 GB per subprocess, which is borderline on 16 GB runners and tips over once the policyengine-core 3.24+ per-simulation overhead is added.Fix
Give every heavy folder its own batch. Each subprocess now peaks around ~3-5 GB instead of ~8-9 GB, so the runner never runs out of memory regardless of the PE-core version.
Batch count changes
policy/contrib)policy/baselinegov/)Small folders and root YAML files split across two deterministic catch-all groups so new additions to the repo have somewhere safe to land.
Trade-off
Extra 3-5 minutes of wall time per job from the additional subprocess startups. In exchange, Full Suite stops getting killed mid-batch and we stop needing
--adminmerges to land dependency bumps.Test plan
split_into_batchesreturns the expected lists locallyGenerated with Claude Code