Skip to content

perf(bench): exclude resolution-benchmark fixtures from dogfooding sweep#1134

Open
carlos-alm wants to merge 3 commits into
mainfrom
perf/1112-exclude-bench-fixtures
Open

perf(bench): exclude resolution-benchmark fixtures from dogfooding sweep#1134
carlos-alm wants to merge 3 commits into
mainfrom
perf/1112-exclude-bench-fixtures

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

Summary

  • Add exclude?: string[] to BuildGraphOpts; merged into ctx.config.exclude inside setupPipeline so the native orchestrator picks it up via the serialized config without a separate code path.
  • Centralize the bench exclude list (tests/benchmarks/resolution/fixtures/**) in scripts/lib/bench-config.ts and wire it into every buildGraph(root, ...) call in scripts/benchmark.ts, scripts/incremental-benchmark.ts, and scripts/query-benchmark.ts.
  • Drop the 3.10.0:Full build entry (and its docstring section) from KNOWN_REGRESSIONS in tests/benchmarks/regression-guard.test.ts — the underlying cause is fixed at the source, not papered over.

Why

Each new heavyweight grammar landing in codegraph (the Verilog port in #1107 being the trigger) pulls its resolution-benchmark fixtures into the corpus that all three dogfooding benchmarks sweep. Those fixtures are hand-annotated scaffolding for the static-resolution suite, not representative source code, and parsing them inflates fullBuildMs by hundreds of ms. The previous workaround was a KNOWN_REGRESSIONS exemption per release; this change excludes the fixtures at the build invocation so the next big-grammar PR will not trip the gate at all.

Local verification: native fullBuildMs on this repo returns from the regressed ~2.8s back to ~1.7-1.9s (in line with the 3.10.0 baseline of 1959 ms) while file count drops 745 → 607 (138 fixture files excluded).

Test plan

  • npm run lint clean on changed files
  • npx vitest run tests/integration/config-include-exclude.test.ts — 9/9 pass (7 existing + 2 new opts.exclude parity tests for wasm + native)
  • RUN_REGRESSION_GUARD=1 npx vitest run tests/benchmarks/regression-guard.test.ts — 17/17 pass with the exemption removed
  • npx vitest run tests/integration/ — 574/574 pass
  • npx tsx scripts/incremental-benchmark.ts — fixtures excluded (607 files vs prior 745); native fullBuildMs back to baseline range

Closes #1112

Adds an `exclude` field to BuildGraphOpts that augments `config.exclude`
for a single build, then wires the three dogfooding benchmark scripts
(scripts/benchmark.ts, scripts/incremental-benchmark.ts,
scripts/query-benchmark.ts) to pass `tests/benchmarks/resolution/fixtures/**`.

Those fixtures are hand-annotated test scaffolding for the static-resolution
suite, not representative source code — each new heavyweight grammar (e.g.
tree-sitter-verilog in #1107) inflates `fullBuildMs` by hundreds of ms
purely from parsing fixture files that have no business in dogfooding
timing measurements. The 3.10.0:Full build KNOWN_REGRESSIONS exemption
that papered over the Verilog cost is removed now that the root cause is
addressed at the build invocation.

Closes #1112
@claude
Copy link
Copy Markdown

claude Bot commented May 15, 2026

Claude finished @carlos-alm's task —— View job


Reviewing PR #1134: Exclude benchmark fixtures from dogfooding

Todo List:

  • Phase 1: Root Cause Analysis - understand the problem and approach
  • Phase 2: Configuration Change Detection - check for high-risk changes
  • Phase 3: Test Integrity - verify tests aren't weakened
  • Phase 4: Code Quality - review implementation details
  • Phase 5: Backlog Compliance - rate against BACKLOG.md criteria
  • Final Assessment - provide comprehensive review verdict

Root Cause Analysis

Problem Being Solved: Performance regression false positives in benchmarking when new heavyweight grammars (like Verilog) are added to codegraph. The resolution-benchmark fixtures inflate build times disproportionately as they're test scaffolding, not representative code.

Why This Approach: Source-level fix that excludes problematic files from the build corpus entirely, rather than papering over symptoms with KNOWN_REGRESSIONS exemptions. Architecturally sound with centralized configuration and proper native/WASM integration.

Risk Assessment: Low risk - addresses legitimate testing artifact problem at the source.

Backlog Compliance

  • Zero-dep: ✓ — No new runtime dependencies added
  • Foundation-aligned: ✓ — Fully compliant with all core principles
  • Problem-fit: 4 — Directly improves benchmarking accuracy and prevents false regression alerts, supporting better development velocity
  • Breaking: No — Purely additive, no behavior changes for existing users
  • Tier: 1 (zero-dep ✓ + foundation-aligned ✓)

Code Quality Analysis

Implementation Quality: ⭐⭐⭐⭐☆ (4/5)

Strengths:

  1. Consistent application: All 3 benchmark scripts properly use BENCHMARK_EXCLUDES
  2. Proper type safety: BuildGraphOpts.exclude correctly typed as optional string array
  3. Correct merging logic (pipeline.ts:190-192): Preserves existing excludes while extending programmatically
  4. Comprehensive test coverage: Both engines tested for opts.exclude functionality
  5. Clear documentation: Well-documented rationale in bench-config.ts
  6. Engine parity maintained: Native orchestrator automatically picks up merged excludes

Architecture adherence: ✅ Follows existing patterns, maintains engine parity, no breaking changes

Critical Concerns

None identified - This is a well-executed fix that addresses the root cause rather than masking symptoms.

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE
  • Reasoning: Solid implementation that fixes performance regression false positives at the source. Proper typing, comprehensive testing, maintains engine parity, and removes technical debt (KNOWN_REGRESSIONS exemption). The centralized exclude configuration is well-designed and the local verification confirms the fix works as intended.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 15, 2026

Greptile Summary

This PR fixes the dogfooding benchmarks inflating fullBuildMs by excluding tests/benchmarks/resolution/fixtures/** — hand-annotated scaffolding that was being swept by all three benchmark scripts after heavyweight grammars (Verilog in #1107) were added. The fix threads exclude?: string[] through BuildGraphOpts and setupPipeline, centralizes the exclude list in bench-config.ts, and replaces the previous KNOWN_REGRESSIONS workaround with a cleaner exclusion at the source.

  • src/types.ts + src/domain/graph/builder/pipeline.ts: adds exclude to BuildGraphOpts; setupPipeline merges it into ctx.config.exclude after loadConfig (which returns a structuredClone, so the mutation is cache-safe and doesn't leak between calls).
  • scripts/lib/bench-config.ts: exports BENCHMARK_EXCLUDES; all three benchmark scripts now spread it into every buildGraph invocation.
  • tests/integration/config-include-exclude.test.ts: adds fresh-build and incremental round-trip tests for opts.exclude on both wasm and native engines, addressing the previous feedback thread.

Confidence Score: 4/5

Core library changes (types + pipeline) are correct and well-tested; the benchmark scripts have a measurement gap in --npm comparison mode.

The pipeline.ts merge is cache-safe (loadConfig returns a deep clone), both WASM and native paths pick up the merged config, and the new integration tests cover fresh-build and incremental round-trips. The one gap is in the benchmark scripts: when run in --npm mode against an older release, the dynamically-imported buildGraph silently ignores the unknown exclude option, so the baseline run processes ~745 files while dev runs ~607 — making timing comparisons misleading. This doesn't affect production code or the automated regression guard, but could cause incorrect go/no-go conclusions during manual pre-release benchmarking.

scripts/benchmark.ts, scripts/incremental-benchmark.ts, and scripts/query-benchmark.ts — all share the same --npm mode comparison gap.

Important Files Changed

Filename Overview
scripts/benchmark.ts Adds BENCHMARK_EXCLUDES to all buildGraph calls; in --npm mode older versions silently ignore exclude, skewing timing comparisons
scripts/incremental-benchmark.ts Same npm-mode comparison skew as benchmark.ts; all buildGraph calls consistently use BENCHMARK_EXCLUDES
scripts/lib/bench-config.ts Adds BENCHMARK_EXCLUDES constant in a well-documented export; clean addition with no issues
scripts/query-benchmark.ts Same npm-mode comparison skew as benchmark.ts; single buildGraph call correctly uses BENCHMARK_EXCLUDES
src/domain/graph/builder/pipeline.ts Correctly merges opts.exclude into ctx.config.exclude; loadConfig returns a structuredClone so mutation is cache-safe
src/types.ts Adds exclude?: string[] to BuildGraphOpts with clear JSDoc; straightforward and consistent with other optional opts
tests/benchmarks/regression-guard.test.ts Swaps 3.10.0:Full build exemption for 3.10.0:DB bytes/file; rationale is well documented and matches the methodology change
tests/integration/config-include-exclude.test.ts Adds 4 new tests covering fresh-build and incremental round-trip paths for opts.exclude on both wasm and native engines

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["buildGraph(root, opts)"] --> B["setupPipeline(ctx)"]
    B --> C["loadConfig(rootDir)\n(returns structuredClone)"]
    C --> D{"opts.exclude?.length?"}
    D -- yes --> E["ctx.config.exclude =\n[...config.exclude, ...opts.exclude]"]
    D -- no --> F["ctx.config.exclude unchanged"]
    E --> G["initializeEngine(ctx)"]
    F --> G
    G --> H{"engine?"}
    H -- native --> I["tryNativeOrchestrator\n(reads ctx.config.exclude\nvia serialized config)"]
    H -- wasm --> J["WASM pipeline\n(reads ctx.config.exclude)"]
    I --> K["collectFiles (fixtures excluded)"]
    J --> K
    K --> L["detectChanges / parse / insert"]
Loading

Comments Outside Diff (1)

  1. scripts/benchmark.ts, line 81 (link)

    P1 --npm baseline comparison becomes corpus-mismatched

    buildGraph is dynamically imported from the resolved srcDir — in --npm mode that points to an older published package. Older versions don't know about exclude in BuildGraphOpts, so the option is silently dropped at runtime and the baseline run processes all ~745 files while the dev run processes only ~607. A naive timing comparison then shows dev is faster, but part of that gap is purely the smaller corpus, not a genuine perf improvement. The same problem affects incremental-benchmark.ts and query-benchmark.ts. One guard would be to skip BENCHMARK_EXCLUDES when running in npm mode, or at minimum emit a warning to stderr when the option is detected as unsupported (e.g. by checking the installed package version against the version that introduced exclude).

    Fix in Claude Code

Fix All in Claude Code

Reviews (3): Last reviewed commit: "fix(bench): exempt 3.10.0:DB bytes/file ..." | Re-trigger Greptile

Comment on lines +196 to 214

it('wasm: opts.exclude rejects matching files without writing config', async () => {
const root = fs.mkdtempSync(path.join(tmpDir, 'opts-wasm-'));
writeFixture(root);
const files = await buildWithOptsExclude(root, 'wasm', ['**/*.test.js', '**/*.spec.js']);
expect(files).toContain('src/math.js');
expect(files).not.toContain('src/math.test.js');
expect(files).not.toContain('src/util.spec.js');
});

itNative('native: opts.exclude rejects matching files without writing config', async () => {
const root = fs.mkdtempSync(path.join(tmpDir, 'opts-native-'));
writeFixture(root);
const files = await buildWithOptsExclude(root, 'native', ['**/*.test.js', '**/*.spec.js']);
expect(files).toContain('src/math.js');
expect(files).not.toContain('src/math.test.js');
expect(files).not.toContain('src/util.spec.js');
});
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 opts.exclude incremental coverage gap

Both new tests always wipe the DB before building, so they only exercise the fresh-build path. The scenario where files that were previously indexed become excluded on a subsequent incremental run (i.e. opts.exclude changes between builds against the same DB) is untested. In practice this path should work — collectFiles would omit the newly-excluded files and detectChanges would surface them as removals — but a short test covering one incremental round trip would lock in that behaviour and guard against regressions in the collect/detect stages.

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b008ffb — added wasm + native parity tests opts.exclude introduced on second incremental build drops previously-indexed files that build the fixture twice against the same DB (first without exclude, then with). The second build observes the previously-indexed test files as removals via detectChanges, and they are dropped from file_hashes. Verified locally: first build indexes 5 files, second build reports "0 changed, 2 removed", and asserts the test/spec files no longer appear in file_hashes.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 15, 2026

Codegraph Impact Analysis

5 functions changed9 callers affected across 7 files

  • handle_external_function in crates/codegraph-core/src/extractors/gleam.rs:55 (1 transitive callers)
  • extracts_constant in crates/codegraph-core/src/extractors/gleam.rs:437 (0 transitive callers)
  • setupPipeline in src/domain/graph/builder/pipeline.ts:168 (6 transitive callers)
  • handleExternalFunction in src/extractors/gleam.ts:84 (2 transitive callers)
  • BuildGraphOpts.exclude in src/types.ts:1072 (0 transitive callers)

Greptile flagged that the existing opts.exclude tests always wipe the DB
before building, so they only exercise the fresh-build path. The scenario
where opts.exclude changes between builds against the same DB (files
previously indexed must be detected as removals on the second pass) was
untested.

Add wasm + native parity tests that build the fixture twice against the
same DB — first with no exclude, then with exclude — and assert the
second build drops the previously-indexed test/spec files from
file_hashes. Locks in the collect → detect-removal contract that the
benchmark scripts depend on once their exclude list changes mid-life.
The fixture-exclude in this PR shifts the denominator of every per-file
build metric: the 3.10.0 baseline was measured over ~745 files (codegraph
source + resolution-benchmark fixtures), while dev now measures the ~607
source files alone. DB content is dominated by src/, so absolute bytes
stay roughly constant while the file count drops — inflating
dbSizeBytes/file from 41614 to ~52211 (+25%, exactly at the threshold).

This is a one-time methodology shift, not a real regression in the schema
or extraction layer (which is why the old 3.10.0:Full build exemption
could be dropped — Full build absolute timing actually improved with the
exclude). Add a one-release exemption with a detailed comment so the
distinction is visible to future maintainers; remove once 3.11.0+ data
is captured under the post-#1134 methodology.
@carlos-alm
Copy link
Copy Markdown
Contributor Author

CI follow-up addressed in e2346e0:

The regression-guard caught DB bytes/file: 41614 → 52211 (+25%, threshold 25%) after this PR landed locally. Root cause: excluding tests/benchmarks/resolution/fixtures/** shifts the denominator of every per-file build metric — the 3.10.0 baseline measured ~745 files (source + fixtures), dev now measures ~607 source files alone. DB content is dominated by src/, so absolute bytes stay roughly constant while the file count drops, inflating the per-file ratio.

This is a one-time methodology shift, not a real regression. Added a single-release 3.10.0:DB bytes/file exemption to KNOWN_REGRESSIONS with a detailed comment distinguishing it from the dropped 3.10.0:Full build entry (Full build absolute timing actually improved — that one was correctly removed; per-file bytes is a measurement-scope artifact). Exemption will fall off naturally once 3.11.0 baseline data is captured under the post-#1134 methodology.

The KNOWN_REGRESSIONS entries are not stale test still passes (3.10.0 is within the 1-minor-version window of the current 3.10.0 package version).

@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: exclude verilog benchmark fixtures from incremental benchmark sweep

1 participant