Add lazy array views for large stdlib arrays by He-Pin · Pull Request #814 · databricks/sjsonnet

He-Pin · 2026-04-30T16:35:23Z

Summary

This PR adds array-level lazy views for large std.map, std.mapWithIndex, and non-constant std.makeArray results.

The old representation eagerly allocated one LazyApply1/LazyApply2 per element. For large arrays this burns allocation even when callers only force a small suffix/prefix. The new representation keeps the callback and evaluation context once per array, computes elements on demand, and caches computed values.

Important JVM/JIT/GC choices:

Keep the old flat Array[LazyApply*] representation below LAZY_VIEW_THRESHOLD = 4096, where the JVM is usually better off with the simple monomorphic layout.
Materialize back to Array[Eval] from asLazyArray so existing full-array stdlib consumers remain stack-safe and predictable.
std.reverse(lazyView) materializes before reversing so the original and reversed arrays share the same element thunks; this preserves trace/native callback semantics and avoids duplicate evaluation.
Release captured evaluator/function state after full materialization.

Correctness Coverage

Added directional tests for selective forcing and a trace regression covering the reverse-cache behavior for all three lazy view subclasses:

MakeArrayArr
MappedArr
MappedWithIndexArr

Also added bench/resources/sjsonnet_suite/lazy_array_sparse_indexing.jsonnet, a reproducible sparse-indexing workload for the benchmark harness.

JMH

Environment: JMH 1.37, Mill JVM zulu:21, -wi 3 -i 5 -w 1s -r 2s -f 1 -tu ms. Baseline is upstream/master at 3a9a4928; benchmark files introduced by this PR were passed to master by absolute path for the baseline run. This sweep was rerun after the lazy-array hot-path fix in 65840c5c.

Benchmark path	master	this PR	Result
`bench/resources/sjsonnet_suite/lazy_array_comprehension.jsonnet`	`27.772 ± 9.548 ms/op`	`19.945 ± 2.908 ms/op`	1.39x faster
`bench/resources/cpp_suite/realistic2.jsonnet`	`44.530 ± 0.685 ms/op`	`39.766 ± 0.661 ms/op`	1.12x faster
`bench/resources/go_suite/foldl.jsonnet`	`0.070 ± 0.001 ms/op`	`0.072 ± 0.001 ms/op`	neutral
`bench/resources/sjsonnet_suite/lazy_array_sparse_indexing.jsonnet`	`20.023 ± 4.123 ms/op`	`3.488 ± 0.135 ms/op`	5.74x faster
`bench/resources/go_suite/reverse.jsonnet`	`6.434 ± 0.224 ms/op`	`5.001 ± 0.150 ms/op`	1.29x faster
`bench/resources/go_suite/base64_byte_array.jsonnet`	`0.749 ± 0.037 ms/op`	`0.748 ± 0.042 ms/op`	neutral

Earlier GC profiler on the directional lazy-view benchmark:

Benchmark path	master alloc	this PR alloc	Result
`sjsonnet/test/resources/new_test_suite/lazy_array_views.jsonnet`	`12,918,912.631 B/op`	`1,728,840.113 B/op`	86.6% less allocation

Hyperfine vs jrsonnet

This is a Scala Native CLI comparison against the local jrsonnet binary at measurement time: jrsonnet 0.5.0-pre98, 80cd36a docs: abandon wide logo version. It is CLI wall-clock evidence, not JVM/JIT/GC evidence; JVM claims should be read from the JMH section above.

Command template:

hyperfine -N --warmup 20 --runs 50 \
  -n 'sjsonnet native' \
  'out/sjsonnet/native/3.3.7/nativeWorkdir.dest/native/sjsonnet.SjsonnetMain <benchmark.jsonnet>' \
  -n 'jrsonnet' \
  '/Users/hepin/IdeaProjects/sjsonnet/jrsonnet/target/release/jrsonnet <benchmark.jsonnet>'

Benchmark path	sjsonnet native	jrsonnet	Result
`bench/resources/sjsonnet_suite/lazy_array_comprehension.jsonnet`	`73.1 ± 1.1 ms`	`125.1 ± 0.8 ms`	1.71x faster
`bench/resources/cpp_suite/realistic2.jsonnet`	`80.5 ± 0.9 ms`	`93.9 ± 0.8 ms`	1.17x faster
`bench/resources/go_suite/foldl.jsonnet`	`3.3 ± 0.1 ms`	`4.2 ± 0.1 ms`	1.25x faster
`bench/resources/sjsonnet_suite/lazy_array_sparse_indexing.jsonnet`	`14.0 ± 0.5 ms`	`24.6 ± 0.7 ms`	1.76x faster
`bench/resources/go_suite/reverse.jsonnet`	`12.5 ± 0.5 ms`	`18.7 ± 0.2 ms`	1.49x faster
`bench/resources/go_suite/base64_byte_array.jsonnet`	`5.8 ± 0.1 ms`	`14.1 ± 0.2 ms`	2.42x faster

Validation

./mill -i 'sjsonnet.jvm[3.3.7].test'
./mill -i '__.checkFormat'
./mill -i 'sjsonnet.js[3.3.7].compile'
./mill -i 'sjsonnet.native[3.3.7].nativeLink'
git diff --check
JMH regression sweep on this PR and upstream/master: lazy_array_comprehension, realistic2, foldl, lazy_array_sparse_indexing, reverse, base64_byte_array
Hyperfine: sjsonnet native vs local jrsonnet on the same 6 regression workloads

He-Pin · 2026-04-30T19:45:26Z

I think a SliceArr and RepeatedArr can be added in the later pr once this got merged to reduce the allocations

stephenamar-db · 2026-05-01T17:57:08Z

please rebase

Motivation: Reduce per-element thunk allocations and add cheap array paths to improve performance on hot lazy-array workloads. Modification: - Replace per-element LazyApply materialization with array-level lazy views - Add ReversedLazyViewArr and index-based evaluator/comprehension iteration - Add RangeArr/ByteArr fast-paths for sum/avg - Update stdlib call-sites to use arr.eval(i)/arr.value(i) Result: Significant wins on reverse-sparse and range_sum_avg benchmarks; minor, acceptable regressions on some composite workloads. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Rebased onto current `master` at `b4c667d5` (`Add lazy array views for large stdlib arrays (#814)`). ## Motivation `error value` evaluates `value` immediately before throwing. That value is a tail position for auto-TCO and explicit `tailstrict`, but resolving a `TailCall` inside the `error` expression would re-enter the trampoline from the current function body and can bring back recursive stack growth for deep `error f(n - 1)` chains. `error value` also must not be treated as an unconditional non-recursive exit. A function like `local f() = error f()` has no base case; auto-TCOing it would turn the existing max-stack diagnostic into an infinite trampoline loop. ## Implementation - Evaluate tail-position `error value` with tail-call support and attach a delayed error-value continuation to the returned `TailCall`. - Store delayed result continuations directly on `TailCall` as primitive/ref fields: optional `&&`/`||` validation plus optional `error value` throw. - Avoid linked continuation nodes, wrapper values, and per-wrapper check objects on deep recursive paths. - Keep the trampoline hot path simple: `TailCall.resolve` carries pending checks in local parameters and returns immediately when no final check is pending. - Preserve continuation order: - inner `error value` preempts any outer boolean check or outer error throw - inner boolean validation runs before an outer error throw - redundant outer boolean validations collapse after the innermost boolean check - Update auto-TCO exit analysis so `Expr.Error` recursively inspects its value expression instead of always counting as a non-recursive exit. ## Performance notes - No allocation is introduced for delayed boolean/error continuations; the only state is stored on the existing `TailCall`. - The resolve loop is tail-recursive, monomorphic in the common path, and uses null/reference checks rather than allocating small continuation objects. - Final result validation is skipped entirely for tail calls without delayed checks. ## Tests New golden tests cover: - auto-TCO and explicit `tailstrict` through `error value` - lazy unused arguments through a 10000-deep recursive `error f(n - 1, unused)` chain - boolean-check-before-outer-error ordering - inner-error-before-outer-bool ordering - nested boolean continuation ordering - a non-tail lhs negative case - an `error value` expression whose non-recursive exit is inside the error value - a no-exit `error f()` case that must keep the max-stack diagnostic ## Verification - `./mill --no-server 'sjsonnet.jvm[3.3.7].compile'` - `./mill --no-server 'sjsonnet.jvm[2.13.18].compile'` - `./mill --no-server 'sjsonnet.jvm[2.12.21].compile'` - `./mill --no-server 'sjsonnet.jvm[3.3.7].test.testOnly' sjsonnet.FileTests sjsonnet.TailCallOptimizationTests` - `./mill --no-server 'sjsonnet.js[2.13.18].compile' 'sjsonnet.wasm[2.13.18].compile'` - `./mill --no-server 'sjsonnet.native[2.13.18].compile'` - `./mill --no-server '__.checkFormat'` - `./mill --no-server '_.jvm[_].__.test'` - `./mill --no-server '_.js[_].__.test'` - `./mill --no-server '_.wasm[_].__.test'` - `./mill --no-server '_.native[_].__.test'` - `git diff --check` ## Rebase note The previous CI failures were caused by the stale base: `EncodingModule.scala` had an unused `UTF_8` import on that revision. Current `master` uses that import, so rebasing fixes those compile failures.

## Motivation After #814, sjsonnet has lazy array views for large stdlib arrays, but array slicing and `std.remove` / `std.removeAt` still had paths that eagerly allocated new `Array[Eval]` backing arrays. This PR adds a focused slice view so large slices avoid copying unless fully materialized, while keeping JVM/Scala Native behavior conservative. Constraints: - keep Jsonnet indexed laziness: `length`, `eval(i)`, and `value(i)` stay O(1) - do not force elements while slicing - keep small slices eager to avoid retaining large sources - prevent deep concat trees - keep hot paths JIT/GC friendly ## Modification Add a lazy `SliceArr` view and route `Arr.sliced(...)` through it when the slice is large enough or the source is already compact/view-backed. Changed behavior: - array slicing can return `SliceArr` instead of eagerly copying `Array[Eval]` - `std.remove` and `std.removeAt` reuse slice + concat views - compact sources (`RangeArr`, `ByteArr`, lazy indexed arrays, repeat, slice) can slice as O(1) views - flat/reversed/concat arrays only use a view when the slice is large enough to justify source retention - concat still caps tree depth by flattening when either side is already a concat view ## Result Verification passed: - `./mill --no-server 'sjsonnet.jvm[3.3.7].compile'` - `./mill --no-server 'sjsonnet.jvm[2.13.18].compile'` - `./mill --no-server 'sjsonnet.jvm[2.12.21].compile'` - `./mill --no-server 'sjsonnet.jvm[3.3.7].test.testOnly' sjsonnet.ValArrayViewTests sjsonnet.Std0150FunctionsTests` - `./mill --no-server 'sjsonnet.jvm[3.3.7].test'` - `./mill --no-server 'sjsonnet.jvm[3.3.7].checkFormat'` - `./mill --no-server bench.checkFormat` - `./mill --no-server 'sjsonnet.native[3.3.7].nativeLink'` JMH, full JVM regression sweep, compared with `upstream/master` (lower is better, notable result only): - Baseline: `upstream/master` at `b4c667d5` - PR head: `3bde215b` - Command: `./mill --no-server bench.runRegressions` - Full sweep covered 45 regression inputs; non-target movement was noise-level, so only the targeted slice/remove case is listed. | Benchmark | Before | After | Result | | --- | ---: | ---: | ---: | | `lazy_array_slice_remove` | 5.805 ms/op | 1.088 ms/op | 5.34x faster | Scala Native hyperfine, full regression-input sweep, compared with `upstream/master` (lower is better, notable result only): - Binaries: `./mill --no-server show 'sjsonnet.native[3.3.7].nativeLink'` - Command shape: `hyperfine --warmup 5 --min-runs 20 -N --output=null ...` - Full sweep covered the same 45 regression inputs; `bench.07` was run with `ulimit -s 65520` for both sides because the native binary needs a larger process stack for that input. | Benchmark | Before | After | Result | | --- | ---: | ---: | ---: | | `lazy_array_slice_remove` | 13.2 +/- 0.4 ms | 5.89 +/- 0.32 ms | 2.24x faster | External performance diff, against jrsonnet built from source at `80cd36a` with `cargo build --release -p jrsonnet` (`jrsonnet 0.5.0-pre98`): | Benchmark | sjsonnet Scala Native (#821) | source-built jrsonnet | Result | | --- | ---: | ---: | --- | | `lazy_array_slice_remove` | 5.8 ms +/- 0.2 ms | 7.0 ms +/- 0.2 ms | sjsonnet 1.21 +/- 0.05x faster | JIT / GC review: - `SliceArr` preserves indexed laziness: `eval(i)` returns an `Eval`; `value(i)` forces only the requested element. - Materializing a slice releases the source reference, so long-lived fully-consumed slices do not keep the original array alive. - Large slices avoid allocating and copying `Array[Eval]`; small slices still copy to avoid source-retention overhead. - `std.remove` / `std.removeAt` reuse slice and concat views, avoiding large intermediate arrays. - Concat depth remains bounded. Rollback boundary: - This PR only changes slice/remove array representation. - If a retained-source workload regresses, the slice threshold is the rollback lever without changing the public API. ## References - Builds on #814 lazy array architecture. - Follow-up stack: #822 adds shared `Arr.copyEvalTo`; #823 presizes selected copy consumers.

He-Pin force-pushed the perf/lazy-array-architecture branch from 6569dd0 to be42d80 Compare April 30, 2026 16:56

He-Pin marked this pull request as draft April 30, 2026 17:43

He-Pin marked this pull request as ready for review April 30, 2026 19:36

He-Pin and others added 4 commits May 3, 2026 01:10

Add lazy array views for large stdlib arrays

d5704b5

perf: tighten lazy array hot paths

0649747

fix: remove duplicate UTF_8 import

a7574ce

He-Pin force-pushed the perf/lazy-array-architecture branch from 65840c5 to a7574ce Compare May 2, 2026 17:31

stephenamar-db merged commit b4c667d into databricks:master May 4, 2026
5 checks passed

He-Pin mentioned this pull request May 5, 2026

Add lazy slice array view #821

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add lazy array views for large stdlib arrays#814

Add lazy array views for large stdlib arrays#814
stephenamar-db merged 4 commits intodatabricks:masterfrom
He-Pin:perf/lazy-array-architecture

He-Pin commented Apr 30, 2026 •

edited

Loading

Uh oh!

He-Pin commented Apr 30, 2026

Uh oh!

stephenamar-db commented May 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

He-Pin commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Correctness Coverage

JMH

Hyperfine vs jrsonnet

Validation

Uh oh!

He-Pin commented Apr 30, 2026

Uh oh!

stephenamar-db commented May 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

He-Pin commented Apr 30, 2026 •

edited

Loading