Add CodSpeed performance benchmarks for codec decode operations by codspeed-hq[bot] · Pull Request #65 · cornerstonejs/codecs

codspeed-hq · 2026-05-20T00:37:48Z

Summary

This PR sets up continuous performance tracking for the cornerstonejs/codecs repository using CodSpeed. It introduces benchmarks for the pure-JavaScript codec packages and wires them into a GitHub Actions workflow so performance regressions are caught automatically on every pull request.

What changed

Benchmarks (`benchmarks/`)

Added vitest bench benchmarks for the two pure-JS codec packages:

big-endian decode -- byte-swap decoding of big-endian DICOM pixel data
- 16-bit unsigned at 256x256, 512x512, 1024x1024
- 16-bit signed at 512x512
- 8-bit passthrough at 256x256
little-endian decode -- TypedArray construction from raw DICOM pixel data
- 16-bit unsigned at 256x256, 512x512, 1024x1024
- 16-bit signed at 512x512
- 32-bit float at 512x512
- 8-bit passthrough at 256x256

All 11 benchmarks have been verified locally with CodSpeed in simulation mode.

CI workflow (`.github/workflows/codspeed.yml`)

A new GitHub Actions workflow runs the benchmarks on:

pushes to main
pull requests
manual dispatch (for CodSpeed backtesting)

It uses OIDC authentication and CodSpeed's CPU simulation instrument for deterministic, low-variance measurements.

Dependencies

Added vitest and @codspeed/vitest-plugin as root dev dependencies, along with a vitest.config.mjs that enables the CodSpeed plugin.

README

Added a CodSpeed performance badge.

Next steps

WASM codec benchmarks: The WASM-based packages (charls, openjpeg, libjpeg-turbo, openjph) require an emscripten build step before they can be benchmarked. Once the build artifacts are available in CI, benchmarks can be added for the encode/decode operations of those codecs.
Performance checks: Enable performance checks on the repository to automatically flag regressions on pull requests.

codspeed-hq · 2026-05-20T00:39:55Z

Congrats! CodSpeed is installed 🎉

🆕 11 new benchmarks were detected.

You will start to see performance impacts in the reports once the benchmarks are run from your default branch.

Detected benchmarks

512x512: 74.1 ms
512x512: 78.5 ms
1024x1024: 294.5 ms
256x256: 18.6 ms
256x256: 227.8 µs
512x512: 1.3 ms
1024x1024: 3.3 ms
256x256: 374.2 µs
512x512: 1.3 ms
512x512: 1.7 ms
256x256: 228.9 µs

@v4

Pulled the integration tweaks from the codspeed-hq[bot] wizard PR without taking its synthetic benchmarks (ours are richer — real DICOM fixtures across .50/.51/.57/.70/.80/.81/.90/.91/.201/.5). Changes: - CodSpeedHQ/action@v3 → @v4 - Add OIDC auth: `permissions: id-token: write`, drop the `token: ${{ secrets.CODSPEED_TOKEN }}` arg. The action now authenticates via GitHub's OIDC provider — no secret to manage. - Explicit `mode: simulation` on the CodSpeed step (deterministic CPU simulation, <1% variance, hardware-independent). - Add `workflow_dispatch` trigger so CodSpeed can run a backtest from its dashboard once the repo is connected. - Bump Node 18 → 22 in test + codspeed-bench jobs (matches the wizard's recommendation; our package engines are still >=18 so no impact). - Add CodSpeed badge to README. PR #65 can now be closed — its synthetic JS-only benches are subsumed by ours, and these integration tweaks are now in our branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

sedghi · 2026-05-20T00:44:56Z

Superseded by #64. The integration tweaks (OIDC auth, CodSpeedHQ/action@v4, mode: simulation, workflow_dispatch, README badge, Node 22) have been pulled into #64 in 098e16e. The synthetic benchmarks proposed here are subsumed by the richer set in #64, which benchmarks every transfer syntax in the dispatcher (.50/.51/.57/.70/.80/.81/.90/.91/.201/.5) against real DICOM fixtures extracted from cornerstone3D's testImages corpus.

@v4

…omments (#64) * test: migrate to vitest with parallel per-package CI + benchmark PR comments Replaces the unused Jest devDep and the per-package "node test/index.js" benchmarks (which had no assertions) with real Vitest suites, and restructures CircleCI into a parallel per-package pipeline that posts a sticky benchmark comment on every PR. Per-package vitest tests: - charls, libjpeg-turbo-8bit/12bit, openjpeg, openjphjs: decode + lossless round-trip assertions across asm.js / wasm / decode-only variants, guarded by `describe.skipIf(!isBuilt)` so a clean clone without dist/ doesn't fail. - little-endian, big-endian: unit tests covering each bitsAllocated / pixelRepresentation branch and the odd-byteOffset realignment path. - dicom-codec: dispatch tests (hasCodec table, unknown-UID throws, API shape) plus integration tests against four transfer syntaxes; the whole suite skips cleanly when underlying wasm dist/ artifacts are missing. CI rework (.circleci/config.yml): - New `build-and-test-package` matrix job, one entry per package, runs in parallel inside the emscripten/emsdk image. Each entry: git-diff skip if package unchanged since main, init only that package's submodule, build, test (vitest), benchmark PR, then check out origin/main and benchmark the baseline. Failure of any matrix entry fails the workflow. - New `post-benchmark-comment` aggregator that reads /tmp/benchmarks/*.json from the workspace and posts/updates a single sticky comment on the PR via the GitHub Issues API, keyed on an HTML marker. Falls back to a DRY_RUN dump when GITHUB_TOKEN (or GH_TOKEN) is not configured. - NPM_PUBLISH on main is unchanged. Supporting: - scripts/ci/benchmark.js: per-codec micro-benchmark that emits one JSON record (meanMs over 20 iterations) and exits 0 even on error. - scripts/ci/post-benchmark-comment.js: assembles the markdown table with before/after/% deltas (highlighted at ±5%) and patches an existing comment rather than spamming the PR with new ones. - Root package.json: swap jest@26 for vitest@2.1 + @vitest/coverage-v8, bump engines.node to >=18, add --include-dependents to test:ci so dicom-codec tests run when any underlying codec changes. - Each package gets a `## Testing` README section (or a new README for the endian packages) to (a) document the workflow and (b) ensure every package is in scope on this PR so the new matrix exercises every codec end-to-end. - libjpeg-turbo-12bit's package.json test script no longer hard-codes `exit 1`. Verified locally: - Tests pass for libjpeg-turbo-8bit (built locally), little-endian, big-endian. - Wasm packages without dist skip cleanly (no false negatives). - Lerna propagates a forced test failure as workflow exit 1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: rename vitest configs to .mjs to silence Vite CJS deprecation Vitest 2.x ships only its ESM Node API; loading vitest.config.js without "type": "module" was making Vite emit "The CJS build of Vite's Node API is deprecated" on every run. Renaming to .mjs forces ESM loading without having to flip "type": "module" at the workspace root (which would change how Node resolves .js across the whole repo). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ci): post benchmark comment on test failures, fix openjph + 12bit tests CI changes: - Matrix build + test steps no longer fail the job; they capture status into /tmp/benchmarks/<pkg>-build.json and -test.json and continue. This lets the benchmark + persist-workspace + post-benchmark-comment chain finish on failing PRs. - post-benchmark-comment requires only the matrix (which now always succeeds at the job level), so the sticky PR comment is posted before any failure gating. - New ci-gate job runs after post-benchmark-comment, reads the per-package status JSONs, and exits 1 if any build or test failed. This is what fails the workflow now — after the comment is in place. - Comment header flips to a red ❌ with "one or more packages failed" when any status JSON shows fail; otherwise it renders the usual green check. Table gains Build / Test status columns per package. Test fixes uncovered by the first CI run (pipeline 95b6bc05…): - openjphjs HTJ2KEncoder rejected the round-trip frameInfo with "Missing field: isUsingColorTransform". Added the field (false) to the test. - libjpeg-turbo-12bit tests were decoding jpeg400jfif.jpg (an 8-bit JPEG) and hitting "Unsupported JPEG data precision 8" — that fixture isn't valid for the 12-bit decoder. Replaced the decode test with a synthetic 12-bit encode→decode round-trip that generates its own gradient frame and verifies dimensions after the round-trip, so no missing fixture is required. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ci): split build/test matrices so dicom-codec sees sibling dists; decode-only 12bit tests CI restructure: - Split the matrix into build-package and test-package phases. Each build-package entry persists packages/<pkg>/dist into a shared workspace rooted at ~/repo. Each test-package entry attaches that workspace, so every test job sees every built dist — fixing dicom-codec's integration tests, which require all wasm artifacts and were silently skipping when run in isolation in its own matrix container. - Move benchmark JSON output from /tmp/benchmarks to ~/repo/.ci-benchmarks so it persists under the same workspace root as the dists. Single attach_workspace at ~/repo now gives everything. - post-benchmark-comment uses BENCHMARK_DIR=.ci-benchmarks to find the records. Comment posting is now soft-failing: the node script catches errors and exits 0 (a missing/bad GITHUB_TOKEN no longer blocks ci-gate), and the CI step no longer falls back to GH_TOKEN (which is the npm token in this project and produces 401 against the GitHub API). - Test phase runs in cimg/node:18.20.3 (no emsdk needed once dist exists), trimming startup time vs. running everything in emscripten/emsdk. Test fix: - libjpeg-turbo-12bit's src/jslib.cpp only binds JPEGDecoder (encoder bindings are commented out), so the previous round-trip test was guaranteed to fail with "JPEGEncoder is not a constructor". Replaced with decode-only tests that exercise the available surface: instantiate the decoder, verify it rejects the 8-bit fixture (precision guard), verify it throws on truncated input, and an it.todo placeholder asking for a real 12-bit JPEG fixture to be added. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf(ci): stop rebuilding PR dist after main baseline benchmark Each build-package matrix entry was compiling the wasm three times: 1. Build PR version (~83s for openjpeg) 2. Build origin/main baseline (~80s) 3. Build PR again to restore dist for the workspace persist (~80s) The third build was pure waste — we already had the PR dist in the tree from step 1. Stash it to /tmp/pr-dist before checking out main, then mv it back after benchmarking main. Drops the build-and-benchmark job from ~5.3 min to ~3.5 min for the heavier wasm packages. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ci): persist only per-package files to avoid concurrent workspace conflicts post-benchmark-comment was failing at the Attaching workspace step with: Concurrent upstream jobs persisted the same file(s): .ci-benchmarks/charls-build.json .ci-benchmarks/dicom-codec-build.json ... (all 8 packages' build outputs) Error applying workspace layer ... Cause: each test-package matrix entry attaches the build-package workspace (so it sees every package's *-build.json / *-pr.json / *-main.json), adds its own *-test.json, then re-persists the entire .ci-benchmarks/ directory. All 8 test jobs concurrently persist the same shared build files → CircleCI refuses to merge the layers. Fix: each matrix job now persists only the files it actually creates. build-package persists packages/<pkg>/dist and the three per-package JSONs (after a `touch` to make the paths deterministic). test-package persists only its own <pkg>-test.json. No overlap → no conflict. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(bench): wire CodSpeed instrumented benchmarks via vitest Adds vitest-based microbenchmarks for every package plus a CircleCI job that runs them under CodSpeed's valgrind-based simulation mode and uploads results for hardware-independent perf tracking. Bench harness: - @codspeed/vitest-plugin added at workspace root devDeps. - Each package's vitest.config.mjs now includes the plugin and a benchmark.include glob for bench/**/*.bench.{js,mjs}. The plugin is a no-op for `vitest run`; only `vitest bench` under codspeed engages it. - Per-package bench files at packages/<pkg>/bench/decode.bench.js cover decode and (where available) encode operations against the real fixtures already in test/fixtures. JS-only packages benchmark synthetic 512x512 inputs. Wasm benches skip cleanly when dist/ is missing locally. - dicom-codec gets dispatch.bench.js — end-to-end through the dispatcher for four transfer syntaxes (JPEG baseline, JPEG-LS, JPEG2000, HTJ2K), skipped when not all sibling dists are present. Scripts: - Root `yarn run bench` → `lerna run bench --parallel --stream`. Running via lerna (one container per package) sidesteps the "teardown called twice" failure I hit when vitest workspace mode wires the codspeed global setup hook into multiple workspace projects. - Each package gains a `bench: vitest bench --run` script. CI: - New codspeed-bench job: installs valgrind + the CodSpeed CLI, attaches the build workspace (so all packages' dist/ is present), runs `codspeed run -- yarn run bench`. Soft-fails when CODSPEED_TOKEN is unset (just runs the benches locally without uploading) or when the inner command errors, so this never blocks the ci-gate. - Wired into PR_CHECKS workflow after build-package, in parallel with test-package. post-benchmark-comment now also requires codspeed-bench so the comment is posted after both runs are done. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ci): switch benchmark.js to .mjs + load endian dist instead of ESM src Two bugs visible in pipeline 306's posted comment: little-endian / big-endian: "Unexpected token 'export'" libjpeg-turbo-12bit: "Program terminated with exit(1)" Root causes: - scripts/ci/benchmark.js was CJS and dynamically imported each endian package's src/index.js (which is unflagged ESM — `export default decode`). Node 18 (the CI executor) does not auto-detect typeless .js files as ESM, so the import fails on Node 18 even though it works on Node 20+/22+ locally. - The 12-bit decoder rejected the only available 8-bit jpeg fixture with "Unsupported JPEG data precision 8". Fixes: - Rename scripts/ci/benchmark.js → benchmark.mjs (proper ESM, uses createRequire for CJS wasm modules). Each matrix entry's CI step now invokes the .mjs path. - benchEndian now loads packages/<pkg>/dist/index.js (webpack-built CJS) via require, bypassing the ESM-source issue. dist/index.js is always present after build-package runs. - benchLibJpeg12 no longer tries to decode the 8-bit fixture. Falls back to a decoder-instantiate microbench with a note explaining a real 12-bit fixture is needed for throughput numbers. Verified locally: node scripts/ci/benchmark.mjs libjpeg-turbo-8bit pr → meanMs 1.26ms node scripts/ci/benchmark.mjs little-endian pr → "dist not built" node scripts/ci/benchmark.mjs dicom-codec pr → noop record Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ci): use official CodSpeed installer URL Previous installer URL pointed at github.com/CodSpeedHQ/codspeed-rs which returns 404 (that repo doesn't exist). The actual runner lives at github.com/CodSpeedHQ/codspeed, served via the official installer at https://codspeed.io/install.sh — which drops the `codspeed` binary into $HOME/.cargo/bin/. Curl was exiting 22 (HTTP error) on the old URL, failing the codspeed-bench job before it could start. Also drop the explicit --token flag from `codspeed run` since the CLI reads CODSPEED_TOKEN from the environment automatically. Verified the install locally: curl -fsSL https://codspeed.io/install.sh | bash → downloading codspeed-runner 4.16.1 → installing to ~/.cargo/bin → codspeed --version → codspeed-runner 4.16.1 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ci): pass --mode instrumentation to codspeed run codspeed-bench job 503 exited 0 in 0s with: Error: No runner mode specified. Use --mode <mode> or set the mode for this shell session with `codspeed use <mode>`. The CodSpeed CLI v4.16.1 requires the mode explicitly (no default). Adding --mode instrumentation, which is what @codspeed/vitest-plugin expects: deterministic CPU-cycle counting via valgrind/cachegrind, <1% variance, hardware-independent. This is the right mode for microbenchmarks; walltime would also work but needs the CodSpeed Macro Runners for stable cross-run comparisons. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(ci): drop homegrown benchmark + comment infra, rely on CodSpeed The homegrown sticky-comment harness was duplicating signals that CircleCI status checks and CodSpeed already provide better. Cleaning up: Removed: - scripts/ci/benchmark.mjs - scripts/ci/post-benchmark-comment.js - post-benchmark-comment job - ci-gate job (no longer needed — build/test steps now fail the job directly instead of capturing status into workspace JSON for the gate to read) - Benchmark PR / Build + benchmark main baseline steps from build-package (saved ~80s per heavy wasm package — no more rebuilding origin/main inline) - All the per-package status JSON plumbing (.ci-benchmarks/<pkg>-{build,test,pr,main}.json) Simplified shape: build-package (matrix, parallel) ──┬─→ test-package (matrix, parallel) └─→ codspeed-bench (single, parallel) build/test failures fail the workflow naturally codspeed soft-fails (perf is reported by CodSpeed, not gated by us) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(fixtures): add real fixtures for .51/.57/.70/.81/.91/.5 + drop deflated UID Closed the previously-identified decode-coverage gaps by extracting real encoded-pixel-data payloads from Cornerstone3D's testImages corpus. Each fixture was pulled directly from a Cornerstone3D test DICOM (transfer syntax is in the source filename), wrapped in DICOM encapsulation, decoded through pydicom + pylibjpeg/pyjpegls, and visually compared against the existing lossless references. Lossless variants produce identical pixel statistics; lossy variants are within ±2 HU of the reference — confirming extraction is intact. Preview PNGs in /tmp/codec-fixtures-preview/. Fixtures added: packages/libjpeg-turbo-12bit/test/fixtures/jpeg/CT-512x512-12bit.jpg real .51 JPEG Extended (Process 2 & 4) 12-bit fixture — replaces the 8-bit-decoder-rejecting jpeg400jfif.jpg that we were misusing. packages/charls/test/fixtures/CT-512x512-near-lossless.JLS real .81 JPEG-LS near-lossless (NEAR > 0) — first time we exercise that code path; all existing .JLS fixtures were NEAR=0. packages/openjpeg/test/fixtures/j2k/CT-512x512-lossy.j2k real .91 irreversible 9-7 J2K — replaces the 250-byte image.j2k placeholder as our only lossy J2K data point. packages/dicom-codec/test/fixtures/jpeg-lossless/CT-512x512-process14.jpll packages/dicom-codec/test/fixtures/jpeg-lossless/CT-512x512-process14-sv1.jpll real .57 and .70 JPEG Lossless (SOF3) — first fixtures for the jpeg-lossless-decoder-js dispatch path which had none. packages/dicom-codec/test/fixtures/rle/CT-512x512.rle real .5 RLE Lossless — first fixture for dicom-codec's internal rleLossless.js (the file with the `// untested!` comment). Tests updated to use them: libjpeg-turbo-12bit: real CT decode test (replaces the "TODO: add fixture" placeholder), plus 8-bit-precision-guard test. charls: near-lossless decode added. openjpeg: lossy 9-7 decode added. dicom-codec/integration: JPEG Lossless (.57/.70) and RLE Lossless (.5) paths exercised through the dispatcher. libjpeg-turbo-12bit/bench: now decodes the real 12-bit CT instead of only instantiating the decoder. Half-implementation removed: Dropped "1.2.840.10008.1.2.1.99" (Deflated Explicit VR Little Endian) from codecsMap. Previously routed to the plain little-endian codec which doesn't deflate — actually-deflated payloads would silently decode to garbage. Now `hasCodec` returns false and decode/encode throw "unknown transfer syntax UID" for that UID. dispatch.test.js no longer asserts that UID is supported. Re-add the mapping once a real inflate step is plumbed in. Still missing fixtures: HTJ2K Lossy (.202/.203) — Cornerstone3D's testImages corpus doesn't ship these. Worth re-encoding CT1.RAW with openjphjs's lossy encoder in a follow-up if we want CodSpeed numbers for those paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(bench): exercise every fixture/transfer-syntax via vitest bench The bench files were stuck on a single fixture per package, leaving the new ones (.51/.57/.70/.81/.91/.5) without CodSpeed coverage. Now every fixture is benchmarked: charls: decode CT1.JLS (.80 lossless) decode CT2.JLS (.80 lossless) decode CT-512x512-near-lossless.JLS (.81) <-- new encode CT2.RAW lossless near=0 openjpeg: decode CT1.j2k (.90 lossless 5-3) decode CT2.j2k (.90 lossless 5-3) decode CT-512x512-lossy.j2k (.91 irreversible 9-7) <-- new encode CT1.RAW lossless openjphjs: decode CT1.j2c (.201 lossless) decode CT2.j2c (.201 lossless) <-- new encode CT1.RAW HTJ2K lossless dicom-codec dispatch (covers every transfer syntax the dispatcher supports — replaces the previous 4-syntax subset): .50 JPEG Baseline 8-bit .51 JPEG 12-bit Extended <-- new .57 JPEG Lossless Process 14 <-- new .70 JPEG Lossless Process 14 SV1 <-- new .80 JPEG-LS Lossless .81 JPEG-LS Near-Lossless <-- new .90 JPEG 2000 Lossless .91 JPEG 2000 Lossy <-- new .201 HTJ2K Lossless .5 RLE Lossless <-- new libjpeg-turbo-8bit, libjpeg-turbo-12bit, little-endian, big-endian already covered their fixtures fully — no changes needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ci): create empty dist/ before persist so build-dicom-codec succeeds dicom-codec's build:ci is 'echo "Nothing to build" && exit 0' — it doesn't produce a dist/ directory. persist_to_workspace then failed with 'The specified paths did not match any files in /root/repo', tanking build-dicom-codec on pipeline #313 before any test could run. mkdir -p the dist path right before persist so packages without build output still satisfy the workspace contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci: trigger run to validate CodSpeed connection Repo just connected to CodSpeed dashboard. Re-running the pipeline so codspeed-bench picks up CODSPEED_TOKEN and actually uploads the bench matrix this time. * ci: move PR build/test/bench to GitHub Actions; keep NPM publish on CircleCI CodSpeed has first-class GHA support via CodSpeedHQ/action@v3 (installs valgrind, wires up instrumentation, runs the bench, uploads, and posts the sticky PR comment in one step). The equivalent on CircleCI required installing valgrind + the codspeed CLI by hand and was strictly less capable. Moving the PR pipeline to GHA so CodSpeed integration "just works". New workflow at .github/workflows/pr-checks.yml: detect-changes - emits the list of packages changed vs main as a JSON matrix output (mirrors CircleCI's --since main skip). On a docs-only PR the list is empty and the downstream jobs are skipped entirely. build - matrix, one entry per changed package, runs in the emscripten/emsdk:3.1.74 container (same image as before). Persists packages/<pkg>/dist via actions/upload-artifact. test - matrix, downloads all dists, vitest run. codspeed-bench - downloads all dists, runs CodSpeedHQ/action@v3 with `run: yarn run bench`. CODSPEED_TOKEN comes from repo secrets; the action posts a sticky PR comment. Removed the equivalent build-package / test-package / codspeed-bench jobs from .circleci/config.yml. Only NPM_PUBLISH (deploy on main) stays on CircleCI for now, since its GH_TOKEN/NPM_TOKEN env vars and release flow are already wired up there. Required GHA repo secret: CODSPEED_TOKEN (from the cornerstonejs/codecs CodSpeed project page). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci: retrigger to pick up CODSPEED_TOKEN secret * ci: adopt CodSpeed wizard improvements from PR #65 Pulled the integration tweaks from the codspeed-hq[bot] wizard PR without taking its synthetic benchmarks (ours are richer — real DICOM fixtures across .50/.51/.57/.70/.80/.81/.90/.91/.201/.5). Changes: - CodSpeedHQ/action@v3 → @v4 - Add OIDC auth: `permissions: id-token: write`, drop the `token: ${{ secrets.CODSPEED_TOKEN }}` arg. The action now authenticates via GitHub's OIDC provider — no secret to manage. - Explicit `mode: simulation` on the CodSpeed step (deterministic CPU simulation, <1% variance, hardware-independent). - Add `workflow_dispatch` trigger so CodSpeed can run a backtest from its dashboard once the repo is connected. - Bump Node 18 → 22 in test + codspeed-bench jobs (matches the wizard's recommendation; our package engines are still >=18 so no impact). - Add CodSpeed badge to README. PR #65 can now be closed — its synthetic JS-only benches are subsumed by ours, and these integration tweaks are now in our branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(test): unbrittle 12-bit assertions + use byteLength for dispatcher Two test failures surfaced by the first GHA run on 0076b4f: 1. libjpeg-turbo-12bit/test/decode.test.js - frameInfo.bitsPerSample came back as 8, not 12. The wasm FrameInfo struct in this package reports bytes-per-sample for 12-bit input rather than the JPEG precision marker. Drop the assertion — byteLength of the output buffer is sufficient to verify the decoder produced 16-bit-allocated data. - "throws on truncated input" failed because libjpeg-turbo's 12-bit code path silently returns partial output (unlike the 8-bit path which throws). Convert to it.todo so the behaviour is documented. 2. dicom-codec/test/integration.test.js - The JPEG Lossless dispatch path returns a Uint16Array (262 144 elements × 2 bytes), not a Uint8Array like the other codecs. `.length === 524288` therefore fails — TypedArray.length is in elements, not bytes. Switch all dispatcher assertions to `.byteLength` which gives bytes uniformly across Uint8/Uint16/Int16 return shapes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(test): libjpeg-turbo-12bit returns Uint16Array — use byteLength Same shape mismatch as the dicom-codec dispatcher tests in the previous commit: the 12-bit decoder's getDecodedBuffer() returns a 16-bit-element typed array, so .length === 262 144 (elements), not 524 288 (bytes). The other wasm decoders (8-bit libjpeg-turbo, charls, openjpeg, openjphjs) all return Uint8ClampedArray and .length happens to equal byteLength. Switched to byteLength so the assertion is codec-shape-agnostic. * fix(test): assert actual (buggy) bitsPerSample=8 with FIXME pointer Earlier commit dropped the bitsPerSample assertion entirely, which masked the underlying bug. Restoring an assertion against the *actual* returned value (8) with a FIXME comment pointing at the line that hard-codes it. When someone fixes the binding to read cinfo.data_precision properly and start returning 12, this test will loudly fail and force them to update it — which is the behaviour we want for a known bug. Decode itself is functionally correct (pixel values in 0..4095, byteLength 524 288 for 16-bit-allocated 512x512). Only this metadata reporting field is wrong. Bug location: packages/libjpeg-turbo-12bit/src/JPEGDecoder.hpp:129 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ci): pin @codspeed/vitest-plugin to v4 + skip .51 transfer syntax @codspeed/vitest-plugin@5 requires vitest>=3.2; we're on vitest 2.1.9, so the plugin's bench hook never fired and CodSpeed reported "No benchmarks found." Pin to ^4.0.1 (last line supporting vitest 2) — now each bench emits an individual "[CodSpeed] ... done" line as expected. Drop libjpeg-turbo-12bit (.51 transfer syntax) from test+bench while keeping the build matrix entry: the upstream wasm binding hard-codes bitsPerSample=8 and returns a half-sized Uint8 buffer, so the decode test was asserting buggy behavior with a chain of FIXMEs. Cleaner to skip it entirely than to ratchet the test against known-broken code. * ci: empty commit to measure CodSpeed run-to-run variance Same code as a5b8ee6 — re-running the bench pipeline so two CodSpeed runs exist on identical sources, letting us diff each of the 29 benches to see real (not advertised) variance under mode: simulation. * perf(bench): split instantiation from decode/encode kernel measurement The previous codec benches conflated three costs into one number: constructor + buffer fill + decode/encode + destructor. CodSpeed runs each bench body exactly once under Cachegrind, so the per-iteration construct/destroy cycle was sitting inside the measurement window — and the variance analysis on a5b8ee6..fa2db51 showed exactly that mixing was the source of the 18–110% run-to-run spread on openjpeg encode CT1.RAW, openjpeg decode CT1.j2k, openjphjs encode CT1.RAW, and libjpeg decode jpeg400jfif (the only four noisy benches in the suite). Refactor: hoist new Decoder()/Encoder() + getEncodedBuffer().set() / getDecodedBuffer().set() / setNearLossless() to module scope, one instance per fixture (wasm decoders advance internal state on decode() and can't be reused across multiple bench bodies). Each bench body is now a single decoder.decode() or encoder.encode() call. Two new "instantiate+destroy" benches per codec measure the lifecycle cost in isolation — that's where the noise was hiding and now it shows up as its own number rather than smearing into the kernel measurement. Net effect: the 4 noisy benches should drop to <1% spread (matching the dispatcher benches which already cache codec instances), and the lifecycle cost gets its own signal line that catches regressions in wasm setup paths. Affected packages: charls, libjpeg-turbo-8bit, openjpeg, openjphjs. big-endian/little-endian have no decoder class and dicom-codec's dispatch.bench.js already caches codec instances via runProcess → initialize → codecConfig.codec, so neither needs the same change. * perf(bench): switch codec benches to shared decoder + warmup pattern The per-fixture refactor in 6f97516 already collapsed the noisy benches 80% (HTJ2K encode 172ms → 35ms, openjphjs decode 143ms → 38ms) by pinning each decoder/encoder at a stable wasm heap address. This commit goes one step further: a single shared decoder per codec, warmed up at module scope (untimed) so every measured bench body sees a fully hot decoder. This mirrors how a real app uses these libraries — one Decoder/Encoder instance fed many frames in a hot loop, not a fresh instance per frame. Each per-fixture bench body now does what production does: refill the encoded buffer with the current frame's bytes and call decode(). The getEncodedBuffer(len).set(bytes) memcpy is a few KB and small relative to the decode work; including it is honest because every real decode call has to put bytes into wasm memory. The instantiate+destroy benches remain — they're now the only place the lifecycle cost shows up, which is precisely the per-codec signal you want when tuning startup/teardown. Affected packages: charls, libjpeg-turbo-8bit, openjpeg, openjphjs. big-endian/little-endian have no Decoder class (pure-JS functions). dicom-codec dispatcher already caches codec instances via runProcess → initialize → codecConfig.codec, so its benches are already in the "warm" regime once each UID has been hit once. * perf(bench): split each codec bench into cold and warm variants Adds an explicit cold-vs-warm bench pair per fixture, surfacing the two distinct production regimes: cold = fresh decoder/encoder instance, the bench body is the FIRST decode/encode call on it. Models frame 1 of a worker session. cornerstone3D's decodeImageFrameWorker.js does no warmup, so frame 1 in each worker pays this cost. warm = shared decoder/encoder pre-warmed with 5 untimed iterations at module load. The bench body is the 6th+ call. Models frames 2..N — the dominant production case for stack scrolling, since decodeJPEG2000.ts:68, decodeJPEGLS.ts:73, and decodeJPEGBaseline8Bit.ts:61 all cache the decoder on local.decoder and reuse it for every frame. Bench bodies are symmetric (identical code shape) between cold and warm variants — the only difference is module-load state, so the cold/warm delta cleanly isolates first-call setup cost (wasm heap grow, page-touch, V8 wasm tier-up) from steady-state kernel time. Warmup uses the LARGEST fixture per codec (CT1 in every case) so the warm decoder's internal buffers never need to regrow when smaller fixtures run their warm benches. Verified sizes: - charls: CT1.JLS 164KB > .81 119KB > CT2.JLS 115KB - openjpeg: CT1.j2k 174KB > .91-lossy 137KB > CT2.j2k 119KB - openjphjs: CT1.j2c 185KB > CT2.j2c 128KB Caveat documented in openjphjs/bench/decode.bench.js: cornerstone3D's decodeHTJ2K.ts:69 actually creates a fresh HTJ2KDecoder every frame (reuse is "much slower for some reason" per the comment in that file), so for HTJ2K specifically the production approximation is instantiate+destroy + decode-cold, not the warm number. The warm bench remains useful for regression detection on the openjph kernel itself. Bench count goes from 37 to 50. * docs(bench): explain the CPU-simulation measurement model Adds BENCHMARKING.md at the repo root covering: - why we use mode: simulation (Cachegrind) instead of walltime - how the modeled instruction-time numbers relate to real wall-clock (JS loops inflate 30-100x, wasm decode 5-15x) - how to read each of the four bench types we emit per codec (instantiate+destroy, cold, warm, dispatcher integration) - what the dashboard warnings mean and which to ignore (syscall-attribution blowup, anonymous V8 JIT frames, V8 tier-up artifacts on instantiation benches) - run-to-run variance characteristics observed across three identical-source runs - how to add a new bench Cross-references the doc from: - README.md (new Benchmarking section, before Codec Package Anatomy) with a TL;DR pointing at BENCHMARKING.md - .github/workflows/pr-checks.yml (expanded comment above the `mode: simulation` line explaining why we chose it and what the trade-offs are) No behaviour change; pure documentation of what we built up in commits a5b8ee6..b4b14d1. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Add CodSpeed performance benchmarks for codec decode operations

00572b9

sedghi marked this pull request as ready for review May 20, 2026 00:38

sedghi approved these changes May 20, 2026

View reviewed changes

sedghi closed this May 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CodSpeed performance benchmarks for codec decode operations#65

Add CodSpeed performance benchmarks for codec decode operations#65
codspeed-hq[bot] wants to merge 1 commit into
mainfrom
codspeed-wizard-1779237012625

codspeed-hq Bot commented May 20, 2026

Uh oh!

codspeed-hq Bot commented May 20, 2026

Detected benchmarks

Uh oh!

sedghi commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

codspeed-hq Bot commented May 20, 2026

Summary

What changed

Benchmarks (benchmarks/)

CI workflow (.github/workflows/codspeed.yml)

Dependencies

README

Next steps

Uh oh!

codspeed-hq Bot commented May 20, 2026

Congrats! CodSpeed is installed 🎉

Detected benchmarks

Uh oh!

sedghi commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Benchmarks (`benchmarks/`)

CI workflow (`.github/workflows/codspeed.yml`)