perf: use ASCII bitsets for strip chars#789
Draft
He-Pin wants to merge 1 commit intodatabricks:masterfrom
Draft
Conversation
Port the useful part of He-Pin/sjsonnet jit commit dd90b11 onto current master. The new path avoids Set[Int] allocation for ASCII-only stripChars, lstripChars, and rstripChars calls, preserves the existing codepoint-aware fallback for non-ASCII strip sets, and uses platform-specific masks selected from local measurements. Upstream-source: dd90b11
9004941 to
7f64a1d
Compare
Contributor
Author
|
Closing as low priority after rebase: current master already has the main strip-char fast path, and this PR shows no stable standout delta on the go_suite strip docs cases or Native hyperfine. |
Contributor
Author
|
Reopened. This is not negative; keep it as draft as a small GC-friendly strip delimiter cleanup. Current docs strip benchmarks do not show standout gains, so it should not be moved to ready yet. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation:
std.stripChars,std.lstripChars, andstd.rstripCharsnow already avoid boxedSet[Int]for single/BMP delimiter sets onmaster. This PR keeps that Unicode-aware path, and narrows the ASCII-only multi-character delimiter case further by replacing the per-calljava.util.BitSetallocation with primitive masks.Key Design:
Intmasks; Scala Native uses twoLongmasks.BitSetfast path.charsis forced beforestr, preserving existing error evaluation order.Benchmark Results:
Rebased onto
upstream/masterat8b67cb1e(Add workaround to make CI more reliable (#824)).JMH full regression command:
Full JMH ran all 45 regression cases on
masterand this PR. There is no stable standout runtime delta after rebase. The strip-char cases are within normal local JMH noise:lstripChars.jsonnet:0.111 -> 0.112 ms/oprstripChars.jsonnet:0.113 -> 0.114 ms/opstripChars.jsonnet:0.111 -> 0.110 ms/opNative hyperfine command shape:
Native CLI also has no standout delta vs current
master:lstripChars.jsonnet:3.81 -> 3.80 ms; jrsonnet2.00 msrstripChars.jsonnet:3.74 -> 3.81 ms; jrsonnet2.02 msstripChars.jsonnet:3.82 -> 3.87 ms; jrsonnet2.13 msAnalysis:
The earlier positive JMH signal disappeared after rebasing because current
masteralready has the BMPBitSetstrip fast path. This PR should be treated as a small GC-friendly allocation cleanup for ASCII delimiter sets, not as a visible improvement on the go_suite strip benchmarks or the jrsonnet gap.Validation:
./mill -i __.checkFormat: passgit diff --check: pass./mill -i 'sjsonnet.jvm[3.3.7].test': pass./mill -i 'sjsonnet.js[3.3.7].test': pass./mill -i 'sjsonnet.native[3.3.7].test': pass./mill -i bench.runRegressions: pass onmasterand this PRhyperfine --shell=none --warmup 5 --runs 50: pass