Split Full Suite YAML batches so each heavy folder runs isolated by MaxGhenis · Pull Request #8093 · PolicyEngine/policyengine-us

MaxGhenis · 2026-04-19T16:22:21Z

Why

Full Suite jobs on ubuntu-latest have been intermittently failing with The runner has received a shutdown signal mid-batch (see #8069 / #8077 / #8078 across the last two days). The signal is a runner OOM kill — our grouped batches peak at ~8-9 GB per subprocess, which is borderline on 16 GB runners and tips over once the policyengine-core 3.24+ per-simulation overhead is added.

Fix

Give every heavy folder its own batch. Each subprocess now peaks around ~3-5 GB instead of ~8-9 GB, so the runner never runs out of memory regardless of the PE-core version.

Batch count changes

Job	Before	After
Full Suite - Structural (Other) (`policy/contrib`)	7	15
Full Suite - Baseline (excl States) (`policy/baseline` gov/)	5	6

Small folders and root YAML files split across two deterministic catch-all groups so new additions to the repo have somewhere safe to land.

Trade-off

Extra 3-5 minutes of wall time per job from the additional subprocess startups. In exchange, Full Suite stops getting killed mid-batch and we stop needing --admin merges to land dependency bumps.

Test plan

split_into_batches returns the expected lists locally
CI runs to completion without runner shutdowns

Generated with Claude Code

Previous grouping (3 folders per contrib batch, usda+hhs paired in baseline-other) pushed peak memory to ~8-9 GB per subprocess on the 16 GB ubuntu-latest runner. Once policyengine-core 3.24+ overhead landed this exceeded the cap and surfaced as 'The runner has received a shutdown signal' mid-batch, intermittently failing Full Suite - Baseline States / Baseline (excl States) / Structural (Other). Every heavy folder now gets its own batch (~3-5 GB peak each). The remaining small folders and root YAML files split across two deterministic catch-all groups so new unknown folders have somewhere safe to land without pushing either group past ~5 GB. Batch counts: Structural (Other) policy/contrib: 7 -> 15 batches Baseline (excl States): 5 -> 6 batches Trade-off: ~3-5 min extra wall time from subprocess startup, in exchange for CI stability. Each subprocess starts fresh so holder memory is fully freed between batches regardless of PE-core version. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

MaxGhenis merged commit 40157c4 into main Apr 19, 2026
10 of 13 checks passed

MaxGhenis deleted the tighten-test-batches branch April 19, 2026 17:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split Full Suite YAML batches so each heavy folder runs isolated#8093

Split Full Suite YAML batches so each heavy folder runs isolated#8093
MaxGhenis merged 1 commit intomainfrom
tighten-test-batches

MaxGhenis commented Apr 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MaxGhenis commented Apr 19, 2026

Why

Fix

Batch count changes

Trade-off

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant