perf(bench): exclude resolution-benchmark fixtures from dogfooding sweep by carlos-alm · Pull Request #1134 · optave/ops-codegraph-tool

carlos-alm · 2026-05-15T14:42:34Z

Summary

Add exclude?: string[] to BuildGraphOpts; merged into ctx.config.exclude inside setupPipeline so the native orchestrator picks it up via the serialized config without a separate code path.
Centralize the bench exclude list (tests/benchmarks/resolution/fixtures/**) in scripts/lib/bench-config.ts and wire it into every buildGraph(root, ...) call in scripts/benchmark.ts, scripts/incremental-benchmark.ts, and scripts/query-benchmark.ts.
Drop the 3.10.0:Full build entry (and its docstring section) from KNOWN_REGRESSIONS in tests/benchmarks/regression-guard.test.ts — the underlying cause is fixed at the source, not papered over.

Why

Each new heavyweight grammar landing in codegraph (the Verilog port in #1107 being the trigger) pulls its resolution-benchmark fixtures into the corpus that all three dogfooding benchmarks sweep. Those fixtures are hand-annotated scaffolding for the static-resolution suite, not representative source code, and parsing them inflates fullBuildMs by hundreds of ms. The previous workaround was a KNOWN_REGRESSIONS exemption per release; this change excludes the fixtures at the build invocation so the next big-grammar PR will not trip the gate at all.

Local verification: native fullBuildMs on this repo returns from the regressed ~2.8s back to ~1.7-1.9s (in line with the 3.10.0 baseline of 1959 ms) while file count drops 745 → 607 (138 fixture files excluded).

Test plan

npm run lint clean on changed files
npx vitest run tests/integration/config-include-exclude.test.ts — 9/9 pass (7 existing + 2 new opts.exclude parity tests for wasm + native)
RUN_REGRESSION_GUARD=1 npx vitest run tests/benchmarks/regression-guard.test.ts — 17/17 pass with the exemption removed
npx vitest run tests/integration/ — 574/574 pass
npx tsx scripts/incremental-benchmark.ts — fixtures excluded (607 files vs prior 745); native fullBuildMs back to baseline range

Closes #1112

Adds an `exclude` field to BuildGraphOpts that augments `config.exclude` for a single build, then wires the three dogfooding benchmark scripts (scripts/benchmark.ts, scripts/incremental-benchmark.ts, scripts/query-benchmark.ts) to pass `tests/benchmarks/resolution/fixtures/**`. Those fixtures are hand-annotated test scaffolding for the static-resolution suite, not representative source code — each new heavyweight grammar (e.g. tree-sitter-verilog in #1107) inflates `fullBuildMs` by hundreds of ms purely from parsing fixture files that have no business in dogfooding timing measurements. The 3.10.0:Full build KNOWN_REGRESSIONS exemption that papered over the Verilog cost is removed now that the root cause is addressed at the build invocation. Closes #1112

claude · 2026-05-15T14:42:48Z

Claude finished @carlos-alm's task —— View job

Reviewing PR #1134: Exclude benchmark fixtures from dogfooding

Todo List:

Phase 1: Root Cause Analysis - understand the problem and approach
Phase 2: Configuration Change Detection - check for high-risk changes
Phase 3: Test Integrity - verify tests aren't weakened
Phase 4: Code Quality - review implementation details
Phase 5: Backlog Compliance - rate against BACKLOG.md criteria
Final Assessment - provide comprehensive review verdict

Root Cause Analysis

Problem Being Solved: Performance regression false positives in benchmarking when new heavyweight grammars (like Verilog) are added to codegraph. The resolution-benchmark fixtures inflate build times disproportionately as they're test scaffolding, not representative code.

Why This Approach: Source-level fix that excludes problematic files from the build corpus entirely, rather than papering over symptoms with KNOWN_REGRESSIONS exemptions. Architecturally sound with centralized configuration and proper native/WASM integration.

Risk Assessment: Low risk - addresses legitimate testing artifact problem at the source.

Backlog Compliance

Zero-dep: ✓ — No new runtime dependencies added
Foundation-aligned: ✓ — Fully compliant with all core principles
Problem-fit: 4 — Directly improves benchmarking accuracy and prevents false regression alerts, supporting better development velocity
Breaking: No — Purely additive, no behavior changes for existing users
Tier: 1 (zero-dep ✓ + foundation-aligned ✓)

Code Quality Analysis

Implementation Quality: ⭐⭐⭐⭐☆ (4/5)

Strengths:

Consistent application: All 3 benchmark scripts properly use BENCHMARK_EXCLUDES
Proper type safety: BuildGraphOpts.exclude correctly typed as optional string array
Correct merging logic (pipeline.ts:190-192): Preserves existing excludes while extending programmatically
Comprehensive test coverage: Both engines tested for opts.exclude functionality
Clear documentation: Well-documented rationale in bench-config.ts
Engine parity maintained: Native orchestrator automatically picks up merged excludes

Architecture adherence: ✅ Follows existing patterns, maintains engine parity, no breaking changes

Critical Concerns

None identified - This is a well-executed fix that addresses the root cause rather than masking symptoms.

Final Recommendation

Rating: ⭐⭐⭐⭐☆ (4/5)
Action: APPROVE
Reasoning: Solid implementation that fixes performance regression false positives at the source. Proper typing, comprehensive testing, maintains engine parity, and removes technical debt (KNOWN_REGRESSIONS exemption). The centralized exclude configuration is well-designed and the local verification confirms the fix works as intended.

greptile-apps · 2026-05-15T14:45:36Z

Greptile Summary

This PR fixes the dogfooding benchmarks inflating fullBuildMs by excluding tests/benchmarks/resolution/fixtures/** — hand-annotated scaffolding that was being swept by all three benchmark scripts after heavyweight grammars (Verilog in #1107) were added. The fix threads exclude?: string[] through BuildGraphOpts and setupPipeline, centralizes the exclude list in bench-config.ts, and replaces the previous KNOWN_REGRESSIONS workaround with a cleaner exclusion at the source.

src/types.ts + src/domain/graph/builder/pipeline.ts: adds exclude to BuildGraphOpts; setupPipeline merges it into ctx.config.exclude after loadConfig (which returns a structuredClone, so the mutation is cache-safe and doesn't leak between calls).
scripts/lib/bench-config.ts: exports BENCHMARK_EXCLUDES; all three benchmark scripts now spread it into every buildGraph invocation.
tests/integration/config-include-exclude.test.ts: adds fresh-build and incremental round-trip tests for opts.exclude on both wasm and native engines, addressing the previous feedback thread.

Confidence Score: 4/5

Core library changes (types + pipeline) are correct and well-tested; the benchmark scripts have a measurement gap in --npm comparison mode.

The pipeline.ts merge is cache-safe (loadConfig returns a deep clone), both WASM and native paths pick up the merged config, and the new integration tests cover fresh-build and incremental round-trips. The one gap is in the benchmark scripts: when run in --npm mode against an older release, the dynamically-imported buildGraph silently ignores the unknown exclude option, so the baseline run processes ~745 files while dev runs ~607 — making timing comparisons misleading. This doesn't affect production code or the automated regression guard, but could cause incorrect go/no-go conclusions during manual pre-release benchmarking.

scripts/benchmark.ts, scripts/incremental-benchmark.ts, and scripts/query-benchmark.ts — all share the same --npm mode comparison gap.

Important Files Changed

Filename	Overview
scripts/benchmark.ts	Adds BENCHMARK_EXCLUDES to all buildGraph calls; in --npm mode older versions silently ignore exclude, skewing timing comparisons
scripts/incremental-benchmark.ts	Same npm-mode comparison skew as benchmark.ts; all buildGraph calls consistently use BENCHMARK_EXCLUDES
scripts/lib/bench-config.ts	Adds BENCHMARK_EXCLUDES constant in a well-documented export; clean addition with no issues
scripts/query-benchmark.ts	Same npm-mode comparison skew as benchmark.ts; single buildGraph call correctly uses BENCHMARK_EXCLUDES
src/domain/graph/builder/pipeline.ts	Correctly merges opts.exclude into ctx.config.exclude; loadConfig returns a structuredClone so mutation is cache-safe
src/types.ts	Adds exclude?: string[] to BuildGraphOpts with clear JSDoc; straightforward and consistent with other optional opts
tests/benchmarks/regression-guard.test.ts	Swaps 3.10.0:Full build exemption for 3.10.0:DB bytes/file; rationale is well documented and matches the methodology change
tests/integration/config-include-exclude.test.ts	Adds 4 new tests covering fresh-build and incremental round-trip paths for opts.exclude on both wasm and native engines

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["buildGraph(root, opts)"] --> B["setupPipeline(ctx)"]
    B --> C["loadConfig(rootDir)\n(returns structuredClone)"]
    C --> D{"opts.exclude?.length?"}
    D -- yes --> E["ctx.config.exclude =\n[...config.exclude, ...opts.exclude]"]
    D -- no --> F["ctx.config.exclude unchanged"]
    E --> G["initializeEngine(ctx)"]
    F --> G
    G --> H{"engine?"}
    H -- native --> I["tryNativeOrchestrator\n(reads ctx.config.exclude\nvia serialized config)"]
    H -- wasm --> J["WASM pipeline\n(reads ctx.config.exclude)"]
    I --> K["collectFiles (fixtures excluded)"]
    J --> K
    K --> L["detectChanges / parse / insert"]

Comments Outside Diff (1)

scripts/benchmark.ts, line 81 (link)

--npm baseline comparison becomes corpus-mismatched

buildGraph is dynamically imported from the resolved srcDir — in --npm mode that points to an older published package. Older versions don't know about exclude in BuildGraphOpts, so the option is silently dropped at runtime and the baseline run processes all ~745 files while the dev run processes only ~607. A naive timing comparison then shows dev is faster, but part of that gap is purely the smaller corpus, not a genuine perf improvement. The same problem affects incremental-benchmark.ts and query-benchmark.ts. One guard would be to skip BENCHMARK_EXCLUDES when running in npm mode, or at minimum emit a warning to stderr when the option is detected as unsupported (e.g. by checking the installed package version against the version that introduced exclude).

_{Reviews (3): Last reviewed commit: "fix(bench): exempt 3.10.0:DB bytes/file ..." | Re-trigger Greptile}

greptile-apps · 2026-05-15T14:45:39Z

+
+  it('wasm: opts.exclude rejects matching files without writing config', async () => {
+    const root = fs.mkdtempSync(path.join(tmpDir, 'opts-wasm-'));
+    writeFixture(root);
+    const files = await buildWithOptsExclude(root, 'wasm', ['**/*.test.js', '**/*.spec.js']);
+    expect(files).toContain('src/math.js');
+    expect(files).not.toContain('src/math.test.js');
+    expect(files).not.toContain('src/util.spec.js');
+  });
+
+  itNative('native: opts.exclude rejects matching files without writing config', async () => {
+    const root = fs.mkdtempSync(path.join(tmpDir, 'opts-native-'));
+    writeFixture(root);
+    const files = await buildWithOptsExclude(root, 'native', ['**/*.test.js', '**/*.spec.js']);
+    expect(files).toContain('src/math.js');
+    expect(files).not.toContain('src/math.test.js');
+    expect(files).not.toContain('src/util.spec.js');
+  });
 });


opts.exclude incremental coverage gap

Both new tests always wipe the DB before building, so they only exercise the fresh-build path. The scenario where files that were previously indexed become excluded on a subsequent incremental run (i.e. opts.exclude changes between builds against the same DB) is untested. In practice this path should work — collectFiles would omit the newly-excluded files and detectChanges would surface them as removals — but a short test covering one incremental round trip would lock in that behaviour and guard against regressions in the collect/detect stages.

Fixed in b008ffb — added wasm + native parity tests opts.exclude introduced on second incremental build drops previously-indexed files that build the fixture twice against the same DB (first without exclude, then with). The second build observes the previously-indexed test files as removals via detectChanges, and they are dropped from file_hashes. Verified locally: first build indexes 5 files, second build reports "0 changed, 2 removed", and asserts the test/spec files no longer appear in file_hashes.

github-actions · 2026-05-15T14:49:57Z

Codegraph Impact Analysis

5 functions changed → 9 callers affected across 7 files

handle_external_function in crates/codegraph-core/src/extractors/gleam.rs:55 (1 transitive callers)
extracts_constant in crates/codegraph-core/src/extractors/gleam.rs:437 (0 transitive callers)
setupPipeline in src/domain/graph/builder/pipeline.ts:168 (6 transitive callers)
handleExternalFunction in src/extractors/gleam.ts:84 (2 transitive callers)
BuildGraphOpts.exclude in src/types.ts:1072 (0 transitive callers)

Greptile flagged that the existing opts.exclude tests always wipe the DB before building, so they only exercise the fresh-build path. The scenario where opts.exclude changes between builds against the same DB (files previously indexed must be detected as removals on the second pass) was untested. Add wasm + native parity tests that build the fixture twice against the same DB — first with no exclude, then with exclude — and assert the second build drops the previously-indexed test/spec files from file_hashes. Locks in the collect → detect-removal contract that the benchmark scripts depend on once their exclude list changes mid-life.

The fixture-exclude in this PR shifts the denominator of every per-file build metric: the 3.10.0 baseline was measured over ~745 files (codegraph source + resolution-benchmark fixtures), while dev now measures the ~607 source files alone. DB content is dominated by src/, so absolute bytes stay roughly constant while the file count drops — inflating dbSizeBytes/file from 41614 to ~52211 (+25%, exactly at the threshold). This is a one-time methodology shift, not a real regression in the schema or extraction layer (which is why the old 3.10.0:Full build exemption could be dropped — Full build absolute timing actually improved with the exclude). Add a one-release exemption with a detailed comment so the distinction is visible to future maintainers; remove once 3.11.0+ data is captured under the post-#1134 methodology.

carlos-alm · 2026-05-15T16:11:41Z

CI follow-up addressed in e2346e0:

The regression-guard caught DB bytes/file: 41614 → 52211 (+25%, threshold 25%) after this PR landed locally. Root cause: excluding tests/benchmarks/resolution/fixtures/** shifts the denominator of every per-file build metric — the 3.10.0 baseline measured ~745 files (source + fixtures), dev now measures ~607 source files alone. DB content is dominated by src/, so absolute bytes stay roughly constant while the file count drops, inflating the per-file ratio.

This is a one-time methodology shift, not a real regression. Added a single-release 3.10.0:DB bytes/file exemption to KNOWN_REGRESSIONS with a detailed comment distinguishing it from the dropped 3.10.0:Full build entry (Full build absolute timing actually improved — that one was correctly removed; per-file bytes is a measurement-scope artifact). Exemption will fall off naturally once 3.11.0 baseline data is captured under the post-#1134 methodology.

The KNOWN_REGRESSIONS entries are not stale test still passes (3.10.0 is within the 1-minor-version window of the current 3.10.0 package version).

carlos-alm · 2026-05-15T16:11:55Z

@greptileai

greptile-apps Bot reviewed May 15, 2026

View reviewed changes

carlos-alm added 2 commits May 15, 2026 10:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(bench): exclude resolution-benchmark fixtures from dogfooding sweep#1134

perf(bench): exclude resolution-benchmark fixtures from dogfooding sweep#1134
carlos-alm wants to merge 3 commits into
mainfrom
perf/1112-exclude-bench-fixtures

carlos-alm commented May 15, 2026

Uh oh!

claude Bot commented May 15, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented May 15, 2026 •

edited

Loading

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot May 15, 2026

Uh oh!

carlos-alm May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026 •

edited

Loading

Uh oh!

carlos-alm commented May 15, 2026

Uh oh!

carlos-alm commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

carlos-alm commented May 15, 2026

Summary

Why

Test plan

Uh oh!

claude Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewing PR #1134: Exclude benchmark fixtures from dogfooding

Todo List:

Root Cause Analysis

Backlog Compliance

Code Quality Analysis

Critical Concerns

Final Recommendation

Uh oh!

greptile-apps Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm May 15, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codegraph Impact Analysis

Uh oh!

carlos-alm commented May 15, 2026

Uh oh!

carlos-alm commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude Bot commented May 15, 2026 •

edited

Loading

greptile-apps Bot commented May 15, 2026 •

edited

Loading

github-actions Bot commented May 15, 2026 •

edited

Loading