Anchor host memory-baseline hard gate to the recent ceiling by habdelra · Pull Request #5352 · cardstack/boxel

habdelra · 2026-06-28T00:30:34Z

Background

The Host Memory Baseline check (packages/host/scripts/check-memory-baseline.mjs) compares each test module's post-GC heap boundary delta — usedJSHeapSize(moduleEnd) − usedJSHeapSize(moduleStart), each measured after a 3-cycle settle-GC — against a rolling-window baseline, and hard-fails (blocking) when the current delta exceeds 2× the window mean or +50 MB.

For a memory-heavy module that metric is dominated by whether the settle-GC happens to drain a large collectable transient before the boundary measurement. That drain timing is non-deterministic, so the delta swings run-to-run by 100 MB+ with no underlying retention. The clearest tell is a baseline window whose samples straddle zero — e.g. Integration | search-entries resource: [106.7, −1.4, −8.5, −8.6, 131.0]. A negative sample means the heap was smaller at module end than at module start; a module that actually retained memory could never produce one.

Why this module swings without leaking

The flagged module stands up two in-browser realms plus the base realm per test and runs real searches, so each test allocates a large transient graph that GC reclaims. Heap analysis over its run confirms nothing survives:

Post-GC used-heap is flat across every test in the module (end-to-end drift < 1 MB, app_instances=0 throughout) — no per-test growth.
A heap-snapshot diff from early in the run to module end shows the total node count decreasing, with the instance counts of every candidate retainer flat or down: SearchEntriesResource 4→4, Realm 5→5, Loader 9→9, ApplicationInstance 4→3, StoreService 5→5. A genuine per-test leak would climb each of these by one per test.

So the recurring red is the gate misreading boundary GC-timing noise as a regression, not a heap regression in the module under test.

The change

Anchor the hard (build-blocking) gate to the recent ceiling (the max of the rolling samples) instead of the mean: a run can't hard-fail on a delta the module has already produced in its window, while a value that clears the ceiling by the hard threshold still fails. When a module's variance is low (ceiling ≈ mean) the gate is unchanged. The soft, non-blocking warning stays anchored to the mean, so a genuine upward trend still surfaces early.

Behavior against the current baseline

Module	Current delta	Ceiling (max recent sample)	Result
`Integration \| search-entries resource`	129.4 MB	131.0 MB	passes (non-blocking warning)
`Integration \| search-entries resource`	205 MB	131.0 MB	hard-fails (clears ceiling +74)
low-variance module (samples ≈ 17.6 MB)	70 MB	17.7 MB	hard-fails (unchanged)

🤖 Generated with Claude Code

The hard (build-blocking) gate compared a module's current post-GC heap boundary delta against the rolling-mean baseline. For memory-heavy modules that delta is dominated by non-deterministic settle-GC drain timing and swings 100MB+ run-to-run with no retention, so the gate fired on values the module had already produced. Measure the hard regression from the recent ceiling (max sample) instead; low-variance modules (ceiling ~= mean) are unchanged, and the soft warning stays mean-based. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR adjusts the host memory-baseline gating logic so the hard (build-blocking) failure check is anchored to the recent observed ceiling (max of rolling samples) rather than the rolling mean, reducing false CI failures for high-variance modules while keeping the soft warning anchored to the mean.

Changes:

Add a baselineCeiling() helper that derives a module’s recent-window max delta (with backward-compatible fallback to legacy delta_mb).
Change the hard-failure condition to compare current delta against the computed ceiling (by hardThreshold), while leaving the soft-warning comparison against the rolling mean.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

github-actions · 2026-06-28T00:35:14Z

Preview deployments

Host staging preview

Host production preview

Host Test Results

1 files 1 suites 2h 26m 29s ⏱️
3 272 tests 3 257 ✅ 15 💤 0 ❌
3 291 runs 3 276 ✅ 15 💤 0 ❌

Results for commit f864e11.

Realm Server Test Results

1 files 1 suites 10m 13s ⏱️
1 661 tests 1 661 ✅ 0 💤 0 ❌
1 740 runs 1 740 ✅ 0 💤 0 ❌

Results for commit f864e11.

habdelra requested a review from Copilot June 28, 2026 00:31

Copilot started reviewing on behalf of habdelra June 28, 2026 00:31 View session

Copilot AI reviewed Jun 28, 2026

View reviewed changes

habdelra requested a review from a team June 28, 2026 01:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Anchor host memory-baseline hard gate to the recent ceiling#5352

Anchor host memory-baseline hard gate to the recent ceiling#5352
habdelra wants to merge 1 commit into
mainfrom
host-memory-baseline-ceiling-gate

habdelra commented Jun 28, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions Bot commented Jun 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

habdelra commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

Why this module swings without leaking

The change

Behavior against the current baseline

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

github-actions Bot commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Preview deployments

Host Test Results

Realm Server Test Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

habdelra commented Jun 28, 2026 •

edited

Loading

github-actions Bot commented Jun 28, 2026 •

edited

Loading