perf(bench): exclude resolution-benchmark fixtures from dogfooding sweep#1134
perf(bench): exclude resolution-benchmark fixtures from dogfooding sweep#1134carlos-alm wants to merge 3 commits into
Conversation
Adds an `exclude` field to BuildGraphOpts that augments `config.exclude` for a single build, then wires the three dogfooding benchmark scripts (scripts/benchmark.ts, scripts/incremental-benchmark.ts, scripts/query-benchmark.ts) to pass `tests/benchmarks/resolution/fixtures/**`. Those fixtures are hand-annotated test scaffolding for the static-resolution suite, not representative source code — each new heavyweight grammar (e.g. tree-sitter-verilog in #1107) inflates `fullBuildMs` by hundreds of ms purely from parsing fixture files that have no business in dogfooding timing measurements. The 3.10.0:Full build KNOWN_REGRESSIONS exemption that papered over the Verilog cost is removed now that the root cause is addressed at the build invocation. Closes #1112
|
Claude finished @carlos-alm's task —— View job Reviewing PR #1134: Exclude benchmark fixtures from dogfoodingTodo List:
Root Cause AnalysisProblem Being Solved: Performance regression false positives in benchmarking when new heavyweight grammars (like Verilog) are added to codegraph. The resolution-benchmark fixtures inflate build times disproportionately as they're test scaffolding, not representative code. Why This Approach: Source-level fix that excludes problematic files from the build corpus entirely, rather than papering over symptoms with Risk Assessment: Low risk - addresses legitimate testing artifact problem at the source. Backlog Compliance
Code Quality AnalysisImplementation Quality: ⭐⭐⭐⭐☆ (4/5) Strengths:
Architecture adherence: ✅ Follows existing patterns, maintains engine parity, no breaking changes Critical ConcernsNone identified - This is a well-executed fix that addresses the root cause rather than masking symptoms. Final Recommendation
|
Greptile SummaryThis PR fixes the dogfooding benchmarks inflating
Confidence Score: 4/5Core library changes (types + pipeline) are correct and well-tested; the benchmark scripts have a measurement gap in --npm comparison mode. The pipeline.ts merge is cache-safe (loadConfig returns a deep clone), both WASM and native paths pick up the merged config, and the new integration tests cover fresh-build and incremental round-trips. The one gap is in the benchmark scripts: when run in --npm mode against an older release, the dynamically-imported buildGraph silently ignores the unknown exclude option, so the baseline run processes ~745 files while dev runs ~607 — making timing comparisons misleading. This doesn't affect production code or the automated regression guard, but could cause incorrect go/no-go conclusions during manual pre-release benchmarking. scripts/benchmark.ts, scripts/incremental-benchmark.ts, and scripts/query-benchmark.ts — all share the same --npm mode comparison gap. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["buildGraph(root, opts)"] --> B["setupPipeline(ctx)"]
B --> C["loadConfig(rootDir)\n(returns structuredClone)"]
C --> D{"opts.exclude?.length?"}
D -- yes --> E["ctx.config.exclude =\n[...config.exclude, ...opts.exclude]"]
D -- no --> F["ctx.config.exclude unchanged"]
E --> G["initializeEngine(ctx)"]
F --> G
G --> H{"engine?"}
H -- native --> I["tryNativeOrchestrator\n(reads ctx.config.exclude\nvia serialized config)"]
H -- wasm --> J["WASM pipeline\n(reads ctx.config.exclude)"]
I --> K["collectFiles (fixtures excluded)"]
J --> K
K --> L["detectChanges / parse / insert"]
|
|
|
||
| it('wasm: opts.exclude rejects matching files without writing config', async () => { | ||
| const root = fs.mkdtempSync(path.join(tmpDir, 'opts-wasm-')); | ||
| writeFixture(root); | ||
| const files = await buildWithOptsExclude(root, 'wasm', ['**/*.test.js', '**/*.spec.js']); | ||
| expect(files).toContain('src/math.js'); | ||
| expect(files).not.toContain('src/math.test.js'); | ||
| expect(files).not.toContain('src/util.spec.js'); | ||
| }); | ||
|
|
||
| itNative('native: opts.exclude rejects matching files without writing config', async () => { | ||
| const root = fs.mkdtempSync(path.join(tmpDir, 'opts-native-')); | ||
| writeFixture(root); | ||
| const files = await buildWithOptsExclude(root, 'native', ['**/*.test.js', '**/*.spec.js']); | ||
| expect(files).toContain('src/math.js'); | ||
| expect(files).not.toContain('src/math.test.js'); | ||
| expect(files).not.toContain('src/util.spec.js'); | ||
| }); | ||
| }); |
There was a problem hiding this comment.
opts.exclude incremental coverage gap
Both new tests always wipe the DB before building, so they only exercise the fresh-build path. The scenario where files that were previously indexed become excluded on a subsequent incremental run (i.e. opts.exclude changes between builds against the same DB) is untested. In practice this path should work — collectFiles would omit the newly-excluded files and detectChanges would surface them as removals — but a short test covering one incremental round trip would lock in that behaviour and guard against regressions in the collect/detect stages.
There was a problem hiding this comment.
Fixed in b008ffb — added wasm + native parity tests opts.exclude introduced on second incremental build drops previously-indexed files that build the fixture twice against the same DB (first without exclude, then with). The second build observes the previously-indexed test files as removals via detectChanges, and they are dropped from file_hashes. Verified locally: first build indexes 5 files, second build reports "0 changed, 2 removed", and asserts the test/spec files no longer appear in file_hashes.
Codegraph Impact Analysis5 functions changed → 9 callers affected across 7 files
|
Greptile flagged that the existing opts.exclude tests always wipe the DB before building, so they only exercise the fresh-build path. The scenario where opts.exclude changes between builds against the same DB (files previously indexed must be detected as removals on the second pass) was untested. Add wasm + native parity tests that build the fixture twice against the same DB — first with no exclude, then with exclude — and assert the second build drops the previously-indexed test/spec files from file_hashes. Locks in the collect → detect-removal contract that the benchmark scripts depend on once their exclude list changes mid-life.
The fixture-exclude in this PR shifts the denominator of every per-file build metric: the 3.10.0 baseline was measured over ~745 files (codegraph source + resolution-benchmark fixtures), while dev now measures the ~607 source files alone. DB content is dominated by src/, so absolute bytes stay roughly constant while the file count drops — inflating dbSizeBytes/file from 41614 to ~52211 (+25%, exactly at the threshold). This is a one-time methodology shift, not a real regression in the schema or extraction layer (which is why the old 3.10.0:Full build exemption could be dropped — Full build absolute timing actually improved with the exclude). Add a one-release exemption with a detailed comment so the distinction is visible to future maintainers; remove once 3.11.0+ data is captured under the post-#1134 methodology.
|
CI follow-up addressed in e2346e0: The regression-guard caught This is a one-time methodology shift, not a real regression. Added a single-release The |
Summary
exclude?: string[]toBuildGraphOpts; merged intoctx.config.excludeinsidesetupPipelineso the native orchestrator picks it up via the serialized config without a separate code path.tests/benchmarks/resolution/fixtures/**) inscripts/lib/bench-config.tsand wire it into everybuildGraph(root, ...)call inscripts/benchmark.ts,scripts/incremental-benchmark.ts, andscripts/query-benchmark.ts.3.10.0:Full buildentry (and its docstring section) fromKNOWN_REGRESSIONSintests/benchmarks/regression-guard.test.ts— the underlying cause is fixed at the source, not papered over.Why
Each new heavyweight grammar landing in
codegraph(the Verilog port in #1107 being the trigger) pulls its resolution-benchmark fixtures into the corpus that all three dogfooding benchmarks sweep. Those fixtures are hand-annotated scaffolding for the static-resolution suite, not representative source code, and parsing them inflatesfullBuildMsby hundreds of ms. The previous workaround was aKNOWN_REGRESSIONSexemption per release; this change excludes the fixtures at the build invocation so the next big-grammar PR will not trip the gate at all.Local verification: native
fullBuildMson this repo returns from the regressed ~2.8s back to ~1.7-1.9s (in line with the 3.10.0 baseline of 1959 ms) while file count drops 745 → 607 (138 fixture files excluded).Test plan
npm run lintclean on changed filesnpx vitest run tests/integration/config-include-exclude.test.ts— 9/9 pass (7 existing + 2 newopts.excludeparity tests for wasm + native)RUN_REGRESSION_GUARD=1 npx vitest run tests/benchmarks/regression-guard.test.ts— 17/17 pass with the exemption removednpx vitest run tests/integration/— 574/574 passnpx tsx scripts/incremental-benchmark.ts— fixtures excluded (607 files vs prior 745); native fullBuildMs back to baseline rangeCloses #1112