Skip to content

perf: byte-level path handling on Unix to bypass Components iterator#245

Draft
stormslowly wants to merge 1 commit into
mainfrom
perf/ascii-fast-path
Draft

perf: byte-level path handling on Unix to bypass Components iterator#245
stormslowly wants to merge 1 commit into
mainfrom
perf/ascii-fast-path

Conversation

@stormslowly
Copy link
Copy Markdown
Collaborator

@stormslowly stormslowly commented May 26, 2026

Why

std::path::Components runs a full state machine on every iteration — Windows-prefix detection, Component enum construction, double-ended-iter bookkeeping. On Unix the resolver doesn't need any of that: OsStr is raw bytes and /, ., .. are always single-byte ASCII regardless of UTF-8 content in segments.

What

  • src/path.rsunix_normalize and unix_normalize_with replace Components-based PathUtil::normalize / normalize_with on Unix. Non-Unix paths keep the existing implementation.
  • src/cache.rs — byte-level parent_path replaces Path::parent in Cache::value (the recursive parent walk that drove most of parse_next_component_back). realpath_uncached uses a new join_last_segment instead of Path::strip_prefix + normalize_with, since parent.path is already a strict byte prefix of self.path when produced by parent_path.

Scope change

An earlier revision of this PR also rewrote require_without_parse to dispatch on the specifier's first byte via a hand-rolled classify_specifier_head table. That hunk was extracted to #246 and closed — the maintenance cost of keeping a parallel impl of Path::components in sync with std outweighed the small isolated win.

The previous PR body cited −7.28% Ir on resolver[single-thread]; that number included the now-dropped opt3. A fresh callgrind diff for opt1+opt2 alone is pending and will be posted before this PR is taken out of draft.

Test plan

  • cargo test --lib -- --skip 'tests::pnp::' — 138 / 138 pass (1 environment-only symlink fixture failure also present on main)
  • cargo clippy --all-features -- -D warnings — clean
  • Fresh callgrind diff on resolver[single-thread] for opt1+opt2 alone
  • CI bench job to confirm the same delta in the codspeed simulator
  • Windows CI for the #[cfg(not(unix))] branches

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 26, 2026

Merging this PR will improve performance by 5.13%

⚡ 5 improved benchmarks
✅ 5 untouched benchmarks

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Memory resolver[pnp resolve] 8.4 KB 8.2 KB +3.1%
Simulation resolver[resolve from symlinks] 146.4 ms 139.1 ms +5.21%
Simulation resolver[single-thread] 50.3 ms 46.1 ms +9.24%
Simulation resolver[[multi-threaded]resolve] 57.9 ms 55.7 ms +3.98%
Simulation resolver[[multi-threaded]resolve from symlinks] 75.6 ms 72.6 ms +4.2%

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.


Comparing perf/ascii-fast-path (8cefe78) with main (080188f)

Open in CodSpeed

On Unix, an OsStr is raw bytes and '/', '.', '..' are always single-byte
ASCII, so the heavy std::path::Components state machine (Windows-prefix
detection, Component enum construction, double-ended-iter bookkeeping) is
unnecessary for the resolver's hot paths.

Changes:
- src/path.rs: unix_normalize and unix_normalize_with replace the
  Components-based PathUtil::normalize / normalize_with on Unix.
- src/cache.rs: byte-level parent_path replaces Path::parent in
  Cache::value; join_last_segment replaces Path::strip_prefix +
  normalize_with in realpath_uncached, since parent.path is already a
  strict byte prefix of self.path.

Note: an earlier draft of this PR also rewrote require_without_parse to
dispatch on the specifier head via a byte table. That hunk was dropped
(see #246) because the maintenance cost of keeping a parallel impl of
Path::components in sync with std outweighed the small isolated win.
The remaining changes here cover the dominant Ir cost paths.

138/138 non-PNP unit tests pass. The 6 PNP tests already fail on main
without these changes.
@stormslowly stormslowly force-pushed the perf/ascii-fast-path branch from bc97d1c to 8cefe78 Compare May 28, 2026 06:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant