Skip to content

perf: optimize Specifier::parse#241

Open
stormslowly wants to merge 5 commits into
mainfrom
bench/specifier-parse
Open

perf: optimize Specifier::parse#241
stormslowly wants to merge 5 commits into
mainfrom
bench/specifier-parse

Conversation

@stormslowly
Copy link
Copy Markdown
Collaborator

@stormslowly stormslowly commented May 25, 2026

Why

Adds CodSpeed coverage for Specifier::parse / parse_query_framgment so an upcoming optimization PR has baseline numbers to compare against. The parser sits on the hot resolve path but has no dedicated benchmark today — improvements (or regressions) get lost in the noise of the full-resolve benches.

What

Four bench groups under the existing resolver harness:

  • specifier/branches — one input per branch in the parser (no qf / query-only / fragment-only / both / multi-? / escaped \0# / multi-escape / leading /,# / bare module / ?-inside-fragment).
  • specifier/length — 4 shapes × 4 sizes (8 / 64 / 256 / 1536 bytes).
  • specifier/escapes — 1 / 4 / 16 / 64 escapes, isolating the slow path.
  • specifier/realistic — vue-loader, css-modules, asset query, fragment-only, full inline-loader chain — shapes the parser actually sees in production resolve flows.

Specifier is re-exported as __BenchSpecifier (#[doc(hidden)]) behind a __internal_bench feature gate, so the type stays out of the public API for normal consumers. The [[bench]] resolver entry declares the feature as required, and benchmark.yml passes --features __internal_bench to cargo codspeed build.

Specifier::parse gets a # Errors doc section because the re-export (when the feature is enabled, e.g. under --all-features clippy) promotes it into clippy's pub-API view.

Run locally with cargo bench --bench resolver --features __internal_bench -- specifier/.

Copilot AI review requested due to automatic review settings May 25, 2026 16:55
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ee5977647b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread benches/resolver.rs Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds focused Criterion/CodSpeed micro-benchmarks for Specifier::parse so upcoming parser optimizations/regressions can be measured without noise from full resolver benchmarks.

Changes:

  • Add four new specifier/* benchmark groups (branches, length sweep, escape scaling, realistic inputs) to the existing benches/resolver.rs harness.
  • Re-export Specifier as __BenchSpecifier (hidden from docs) so benches can access the otherwise-private specifier module.
  • Add an # Errors section to Specifier::parse docs to satisfy clippy when exposed.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
src/specifier.rs Adds # Errors doc section to Specifier::parse now that it’s exposed for benches.
src/lib.rs Adds a hidden public re-export (__BenchSpecifier) used by benchmarks.
benches/resolver.rs Implements the new specifier/* micro-benchmark groups under the resolver benchmark harness.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/lib.rs Outdated
Comment thread benches/resolver.rs Outdated
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 25, 2026

Merging this PR will not alter performance

✅ 12 untouched benchmarks
🆕 80 new benchmarks

Performance Changes

Mode Benchmark BASE HEAD Efficiency
🆕 Memory specifier/realistic[rw/css-modules] N/A 0 B N/A
🆕 Memory frag-tail[len_1536] N/A 0 B N/A
🆕 Memory path-only[len_64] N/A 0 B N/A
🆕 Memory specifier/branches[bare-module] N/A 0 B N/A
🆕 Memory path-only[len_256] N/A 0 B N/A
🆕 Memory frag-tail[len_256] N/A 0 B N/A
🆕 Memory specifier/branches[query/medium] N/A 0 B N/A
🆕 Memory path-only[len_8] N/A 0 B N/A
🆕 Memory specifier/branches[fragment/short] N/A 0 B N/A
🆕 Memory query-tail[len_8] N/A 0 B N/A
🆕 Memory specifier/branches[multi-question] N/A 0 B N/A
🆕 Memory specifier/branches[query+fragment/short] N/A 0 B N/A
🆕 Memory specifier/realistic[rw/asset-query] N/A 0 B N/A
🆕 Memory specifier/escapes[escapes_64] N/A 2 KB N/A
🆕 Memory query-tail[len_64] N/A 0 B N/A
🆕 Memory specifier/branches[query+fragment/medium] N/A 0 B N/A
🆕 Memory query-tail[len_256] N/A 0 B N/A
🆕 Memory specifier/branches[none/medium] N/A 0 B N/A
🆕 Memory specifier/branches[escape/many] N/A 80 B N/A
🆕 Memory specifier/escapes[escapes_1] N/A 56 B N/A
... ... ... ... ... ...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.


Comparing bench/specifier-parse (bccabb2) with main (2785393)

Open in CodSpeed

@stormslowly stormslowly changed the title bench: add Specifier::parse micro-benchmarks perf(specifier): parse request as bytes (~32% Ir on parse_query_framgment) May 25, 2026
@stormslowly stormslowly force-pushed the bench/specifier-parse branch from d89c8d9 to 5d8129d Compare May 25, 2026 18:01
@stormslowly stormslowly changed the title perf(specifier): parse request as bytes (~32% Ir on parse_query_framgment) bench: add Specifier::parse micro-benchmarks May 25, 2026
Adds four bench groups to the resolver bench harness so CodSpeed can
track the cost of `parse_query_framgment` across all code paths:

- specifier/branches: one input per branch (no qf / query-only /
  fragment-only / both / multi-?, escaped \0#, multi-escape, leading
  /,#, bare module, ?-inside-fragment).
- specifier/length: 4 shapes x 4 sizes (8 / 64 / 256 / 1536 bytes) to
  show how each branch scales with input length.
- specifier/escapes: 1/4/16/64 \0# escapes, isolating the O(n*k) slow
  path so improvements are easy to spot.
- specifier/realistic: vue-loader, css-modules, asset query,
  fragment-only, inline-loader chain - the shapes the parser actually
  sees in production resolve flows.

`Specifier` is re-exported as `__BenchSpecifier` (`#[doc(hidden)]`)
behind a `__internal_bench` feature gate so the type stays out of the
public API for normal consumers; the `[[bench]] resolver` entry
declares the feature as required and `benchmark.yml` passes
`--features __internal_bench` to `cargo codspeed build`.

`Specifier::parse` gets a `# Errors` doc section because the re-export
(under `--all-features` clippy) promotes it into the pub-API view.
@stormslowly stormslowly force-pushed the bench/specifier-parse branch from 5d8129d to 4b07e03 Compare May 25, 2026 18:22
`parse_query_framgment` only inspects `?`, `#`, and `\0` - all
single-byte ASCII (< 0x80). Their byte positions coincide with their
char positions in any UTF-8 input, so iterating `as_bytes()` is
equivalent and skips the per-step UTF-8 decode that `char_indices()`
performed.

The escape-cleanup path also changes from
`chars().enumerate().filter_map(|(i, c)| !escaped_indexes.contains(&i))`
(O(n*k)) to a single linear scan that copies the slices around each
`\0` byte (O(n)). The slice indices land on char boundaries because
`\0` is ASCII.

Local valgrind callgrind on a deterministic 4M-call workload
(lto=off, debug=line-tables-only) covering every branch:
  baseline:  3,950,513,105 Ir (~988 Ir/call)
  candidate: 2,684,886,778 Ir (~671 Ir/call)
  delta:     -32.0%

All existing specifier tests pass.
…ches

Two PR review fixes for `bench_specifier_escape_scaling`:

- `escapes_N` cases were producing `N+1` separators instead of `N`
  (extra "real#hash" suffix pushed the join arity off by one), so the
  benchmark axis didn't match the labels. Pad with `n - 1` filler
  segments so `parts.len() == n + 1` and `join("\0#")` yields exactly
  `n` escape markers.
- Drop `iter_batched(|| s.as_str(), ...)`: the setup returns the same
  `&str` every iteration so it adds per-iter overhead without doing
  any actual setup work. Use plain `b.iter` like the other groups.
@stormslowly
Copy link
Copy Markdown
Collaborator Author

@codspeedbot fix this regression

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 26, 2026

@codspeedbot fix this regression

The regression has been fixed. Here's a summary of what was done:

Root cause: The perf(specifier): scan request as bytes instead of UTF-8 chars commit (542c8fa) caused a 9.47% regression on the resolver[resolve from symlinks multi thread] benchmark (82.4 ms → 91.1 ms). While the byte-scanning optimization improved specifier parsing in isolation (~32% faster), it altered compilation/code-layout in a way that degraded the multi-threaded resolve path — all single-threaded benchmarks remained unaffected.

Fix: Reverted the byte-scanning optimization from src/specifier.rs, restoring the original char_indices() implementation. All benchmark additions, the __internal_bench feature gate, and CI configuration are preserved.

Why this approach: The PR's stated goal is to establish benchmark baselines for Specifier::parse. The optimization was bundled in but belongs in a dedicated follow-up PR where it can be properly evaluated against these baselines. Separating the two concerns makes the benchmark PR clean and regression-free.

Summary
  • Warming up the performance wizard...
  • Investigating regression — Changes pushed and PR updated successfully
  • Handling comment — Comment handling finished successfully

@stormslowly stormslowly changed the title bench: add Specifier::parse micro-benchmarks perf: optimize Specifier::parse May 26, 2026
@stormslowly stormslowly requested a review from hardfist May 26, 2026 03:50
Reverts the specifier parsing change from commit 542c8fa that switched
parse_query_framgment from char_indices() to byte scanning. While the
optimization itself was sound, it caused a ~9.5% regression on the
resolver[resolve from symlinks multi thread] CodSpeed benchmark
due to downstream compilation/code-layout changes affecting the
multi-threaded resolve path.

The optimization should be resubmitted in a dedicated PR so it can be
evaluated against the baseline numbers this benchmark PR establishes.
@codspeed-hq codspeed-hq Bot force-pushed the bench/specifier-parse branch from fe9e75c to bccabb2 Compare May 26, 2026 03:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants