fix(scripts): trend annotations fall back to nearest non-null prior release#1120
fix(scripts): trend annotations fall back to nearest non-null prior release#1120carlos-alm wants to merge 7 commits into
Conversation
…nnotations scripts/update-incremental-report.ts compared each release's metrics against the immediately prior release only. When that prior release had null build/rebuild metrics (e.g. 3.9.5 had null after both engine workers were SIGKILL'd on timeout), the trend cell was silently emitted blank, hiding real regressions: 3.9.6 wasm full-build was 14.0s vs 3.9.4's 7.6s (~86% slower) and no-op was 131ms vs 19ms (~589% slower), but the table showed no arrow at all. Replace the per-release prev lookup with findPrevMetric — a per-metric walk back through history that skips releases without a value for that specific metric. Apply consistently in engineRow (the table) and the regression-warning block. Regenerated the report so existing trend annotations now reflect the fallback (3.9.6, 3.8.0 native, 3.6.0 wasm). Closes #1043
|
Claude finished @carlos-alm's task —— View job PR Review CompleteFinal AssessmentRoot Cause AnalysisProblem Being Solved: The incremental benchmark script was showing blank trend arrows when comparing against releases with null metrics (e.g., when workers timed out and stored null values). For example, version 3.9.6 had 86% slower WASM builds and 589% slower no-ops compared to 3.9.4, but showed no trend arrows because it only looked at the immediate prior release (3.9.5) which had null metrics. Why This Approach: The PR introduces Risk Assessment: Low risk. This is a reporting/analytics fix with comprehensive test coverage. No runtime or CLI behavior changes. Backlog Compliance
Code Quality ReviewImplementation Quality ⭐⭐⭐⭐☆The
Test Coverage ⭐⭐⭐⭐⭐Excellent test implementation (
Code Changes Analysis
|
Greptile SummaryThis PR fixes trend annotations in the incremental benchmark report so that null-metric releases (where engine workers were killed) no longer silently break the comparison chain. A new
Confidence Score: 5/5Safe to merge — the change is additive (a fallback that only activates on null-metric gaps), the zero-division guard is a pure defensive tightening, and the new test exercises both the fallback path and the empty-history path end-to-end against a real script invocation. All three paths that consume findPrevMetric (table rows, regression detection, resolve-metric trends) behave correctly and consistently. The previous two review concerns — the previous === 0 guard and the duplicate findPrevMetric calls per release — are both addressed in this version. No correctness issues remain. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[For each release i in history] --> B[Pre-compute prevResolveNative\nprevResolveJs via findPrevMetric]
B --> C[engineRow for native]
B --> D[engineRow for wasm]
C --> E{engine key present?}
D --> E
E -->|No| F[return null — skip row]
E -->|Yes| G[findPrevMetric for\nfullBuildMs / noopRebuildMs\noneFileRebuildMs]
G --> H{Walk history fromIdx+1}
H -->|version === 'dev'| H
H -->|getter returns null| H
H -->|getter returns non-null v| I[return v]
H -->|end of history| J[return null]
I --> K[trend current vs v]
J --> K
K -->|v == null or v === 0| L[return '' blank cell]
K -->|abs pct lt 2| M[return ' ~']
K -->|pct significant| N[return ↑/↓ pct%]
Reviews (8): Last reviewed commit: "Merge branch 'main' into fix/incremental..." | Re-trigger Greptile |
| for (let i = 0; i < history.length; i++) { | ||
| const h = history[i]; | ||
| const prev = findPrevRelease(history, i); | ||
|
|
||
| const nativeRow = engineRow(h, prev, 'native'); | ||
| const wasmRow = engineRow(h, prev, 'wasm'); | ||
| const nativeRow = engineRow(history, i, 'native'); | ||
| const wasmRow = engineRow(history, i, 'wasm'); | ||
| if (nativeRow) md += nativeRow + '\n'; | ||
| if (wasmRow) md += wasmRow + '\n'; | ||
| } |
There was a problem hiding this comment.
The
natT/jsT resolve trends are computed inside engineRow, which is called once for each engine type per release. Since h.resolve is release-level (not engine-specific), findPrevMetric for nativeBatchMs and jsFallbackMs is invoked twice per release — returning identical results both times. This is harmless but wasteful for long history arrays. Consider hoisting the resolve trend calls to the call site, alongside the nativeRow/wasmRow pair, so they are computed once and passed in.
| for (let i = 0; i < history.length; i++) { | |
| const h = history[i]; | |
| const prev = findPrevRelease(history, i); | |
| const nativeRow = engineRow(h, prev, 'native'); | |
| const wasmRow = engineRow(h, prev, 'wasm'); | |
| const nativeRow = engineRow(history, i, 'native'); | |
| const wasmRow = engineRow(history, i, 'wasm'); | |
| if (nativeRow) md += nativeRow + '\n'; | |
| if (wasmRow) md += wasmRow + '\n'; | |
| } | |
| for (let i = 0; i < history.length; i++) { | |
| // Resolve metrics are release-level; pre-compute their trends once per | |
| // release so engineRow doesn't redundantly walk history twice. | |
| const prevNative = findPrevMetric(history, i, (x) => x.resolve?.nativeBatchMs); | |
| const prevJs = findPrevMetric(history, i, (x) => x.resolve?.jsFallbackMs); | |
| const nativeRow = engineRow(history, i, 'native', prevNative, prevJs); | |
| const wasmRow = engineRow(history, i, 'wasm', prevNative, prevJs); | |
| if (nativeRow) md += nativeRow + '\n'; | |
| if (wasmRow) md += wasmRow + '\n'; | |
| } |
There was a problem hiding this comment.
Fixed in 628d37d — hoisted findPrevMetric for the release-level resolve metrics (nativeBatchMs, jsFallbackMs) to the call site and pass them into engineRow, so they are computed once per release instead of twice.
Codegraph Impact Analysis3 functions changed → 2 callers affected across 1 files
|
…lookup (#1120) - trend(): treat previous === 0 the same as null to avoid emitting '↑Infinity%' if a historical entry stores 0 for a metric. Matches the existing previous === 0 guard in checkRegression. - engineRow(): accept pre-computed prevResolveNative/prevResolveJs from the caller instead of recomputing them inside the function. Resolve metrics are release-level (not engine-specific), so the previous code walked history twice per release (once for native, once for wasm) returning identical results both times.
|
Addressed Greptile review feedback (commit 628d37d):
|
Summary
scripts/update-incremental-report.tscompared each release against the immediately prior release only. When that prior release had null build/rebuild metrics (e.g. 3.9.5, where both engine workers were SIGKILL'd on timeout), the trend cell was silently emitted blank — hiding real regressions.findPrevMetric— a per-metric walk back through history that skips releases without a value for that specific metric.engineRow(the version-history table) and the regression-warning block.generated/benchmarks/INCREMENTAL-BENCHMARKS.mdso the existing rows reflect the fallback. Notable previously-blank rows now annotated: 3.9.6 (vs 3.9.4), 3.8.0 native (vs 3.6.0 native), 3.6.0 wasm (vs 3.4.1 wasm).tests/unit/incremental-report-trend-fallback.test.tsgated on theCODEGRAPH_INCREMENTAL_REPORT_PATHenv override so the script can target a temp file from tests.Closes #1043
Test plan
npx vitest run tests/unit/incremental-report-trend-fallback.test.ts— both cases passINCREMENTAL-BENCHMARKS.mdconfirms previously-blank trend cells now show the expected fallback annotationsnpm run lint— clean