Skip to content

Add CodSpeed performance benchmarks for codec decode operations#65

Closed
codspeed-hq[bot] wants to merge 1 commit into
mainfrom
codspeed-wizard-1779237012625
Closed

Add CodSpeed performance benchmarks for codec decode operations#65
codspeed-hq[bot] wants to merge 1 commit into
mainfrom
codspeed-wizard-1779237012625

Conversation

@codspeed-hq
Copy link
Copy Markdown

@codspeed-hq codspeed-hq Bot commented May 20, 2026

Summary

This PR sets up continuous performance tracking for the cornerstonejs/codecs repository using CodSpeed. It introduces benchmarks for the pure-JavaScript codec packages and wires them into a GitHub Actions workflow so performance regressions are caught automatically on every pull request.

What changed

Benchmarks (benchmarks/)

Added vitest bench benchmarks for the two pure-JS codec packages:

  • big-endian decode -- byte-swap decoding of big-endian DICOM pixel data

    • 16-bit unsigned at 256x256, 512x512, 1024x1024
    • 16-bit signed at 512x512
    • 8-bit passthrough at 256x256
  • little-endian decode -- TypedArray construction from raw DICOM pixel data

    • 16-bit unsigned at 256x256, 512x512, 1024x1024
    • 16-bit signed at 512x512
    • 32-bit float at 512x512
    • 8-bit passthrough at 256x256

All 11 benchmarks have been verified locally with CodSpeed in simulation mode.

CI workflow (.github/workflows/codspeed.yml)

A new GitHub Actions workflow runs the benchmarks on:

  • pushes to main
  • pull requests
  • manual dispatch (for CodSpeed backtesting)

It uses OIDC authentication and CodSpeed's CPU simulation instrument for deterministic, low-variance measurements.

Dependencies

Added vitest and @codspeed/vitest-plugin as root dev dependencies, along with a vitest.config.mjs that enables the CodSpeed plugin.

README

Added a CodSpeed performance badge.

Next steps

  • WASM codec benchmarks: The WASM-based packages (charls, openjpeg, libjpeg-turbo, openjph) require an emscripten build step before they can be benchmarked. Once the build artifacts are available in CI, benchmarks can be added for the encode/decode operations of those codecs.
  • Performance checks: Enable performance checks on the repository to automatically flag regressions on pull requests.

@sedghi sedghi marked this pull request as ready for review May 20, 2026 00:38
@codspeed-hq
Copy link
Copy Markdown
Author

codspeed-hq Bot commented May 20, 2026

Congrats! CodSpeed is installed 🎉

🆕 11 new benchmarks were detected.

You will start to see performance impacts in the reports once the benchmarks are run from your default branch.

Detected benchmarks


Open in CodSpeed

sedghi added a commit that referenced this pull request May 20, 2026
Pulled the integration tweaks from the codspeed-hq[bot] wizard PR
without taking its synthetic benchmarks (ours are richer — real DICOM
fixtures across .50/.51/.57/.70/.80/.81/.90/.91/.201/.5).

Changes:
- CodSpeedHQ/action@v3 → @v4
- Add OIDC auth: `permissions: id-token: write`, drop the
  `token: ${{ secrets.CODSPEED_TOKEN }}` arg. The action now
  authenticates via GitHub's OIDC provider — no secret to manage.
- Explicit `mode: simulation` on the CodSpeed step (deterministic CPU
  simulation, <1% variance, hardware-independent).
- Add `workflow_dispatch` trigger so CodSpeed can run a backtest from
  its dashboard once the repo is connected.
- Bump Node 18 → 22 in test + codspeed-bench jobs (matches the wizard's
  recommendation; our package engines are still >=18 so no impact).
- Add CodSpeed badge to README.

PR #65 can now be closed — its synthetic JS-only benches are subsumed
by ours, and these integration tweaks are now in our branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sedghi
Copy link
Copy Markdown
Member

sedghi commented May 20, 2026

Superseded by #64. The integration tweaks (OIDC auth, CodSpeedHQ/action@v4, mode: simulation, workflow_dispatch, README badge, Node 22) have been pulled into #64 in 098e16e. The synthetic benchmarks proposed here are subsumed by the richer set in #64, which benchmarks every transfer syntax in the dispatcher (.50/.51/.57/.70/.80/.81/.90/.91/.201/.5) against real DICOM fixtures extracted from cornerstone3D's testImages corpus.

@sedghi sedghi closed this May 20, 2026
wayfarer3130 pushed a commit that referenced this pull request May 20, 2026
…omments (#64)

* test: migrate to vitest with parallel per-package CI + benchmark PR comments

Replaces the unused Jest devDep and the per-package "node test/index.js"
benchmarks (which had no assertions) with real Vitest suites, and restructures
CircleCI into a parallel per-package pipeline that posts a sticky benchmark
comment on every PR.

Per-package vitest tests:
- charls, libjpeg-turbo-8bit/12bit, openjpeg, openjphjs: decode + lossless
  round-trip assertions across asm.js / wasm / decode-only variants, guarded
  by `describe.skipIf(!isBuilt)` so a clean clone without dist/ doesn't fail.
- little-endian, big-endian: unit tests covering each bitsAllocated /
  pixelRepresentation branch and the odd-byteOffset realignment path.
- dicom-codec: dispatch tests (hasCodec table, unknown-UID throws, API shape)
  plus integration tests against four transfer syntaxes; the whole suite skips
  cleanly when underlying wasm dist/ artifacts are missing.

CI rework (.circleci/config.yml):
- New `build-and-test-package` matrix job, one entry per package, runs in
  parallel inside the emscripten/emsdk image. Each entry: git-diff skip if
  package unchanged since main, init only that package's submodule, build,
  test (vitest), benchmark PR, then check out origin/main and benchmark the
  baseline. Failure of any matrix entry fails the workflow.
- New `post-benchmark-comment` aggregator that reads /tmp/benchmarks/*.json
  from the workspace and posts/updates a single sticky comment on the PR via
  the GitHub Issues API, keyed on an HTML marker. Falls back to a DRY_RUN
  dump when GITHUB_TOKEN (or GH_TOKEN) is not configured.
- NPM_PUBLISH on main is unchanged.

Supporting:
- scripts/ci/benchmark.js: per-codec micro-benchmark that emits one JSON
  record (meanMs over 20 iterations) and exits 0 even on error.
- scripts/ci/post-benchmark-comment.js: assembles the markdown table with
  before/after/% deltas (highlighted at ±5%) and patches an existing comment
  rather than spamming the PR with new ones.
- Root package.json: swap jest@26 for vitest@2.1 + @vitest/coverage-v8, bump
  engines.node to >=18, add --include-dependents to test:ci so dicom-codec
  tests run when any underlying codec changes.
- Each package gets a `## Testing` README section (or a new README for the
  endian packages) to (a) document the workflow and (b) ensure every package
  is in scope on this PR so the new matrix exercises every codec end-to-end.
- libjpeg-turbo-12bit's package.json test script no longer hard-codes
  `exit 1`.

Verified locally:
- Tests pass for libjpeg-turbo-8bit (built locally), little-endian, big-endian.
- Wasm packages without dist skip cleanly (no false negatives).
- Lerna propagates a forced test failure as workflow exit 1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: rename vitest configs to .mjs to silence Vite CJS deprecation

Vitest 2.x ships only its ESM Node API; loading vitest.config.js without
"type": "module" was making Vite emit "The CJS build of Vite's Node API
is deprecated" on every run. Renaming to .mjs forces ESM loading without
having to flip "type": "module" at the workspace root (which would change
how Node resolves .js across the whole repo).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ci): post benchmark comment on test failures, fix openjph + 12bit tests

CI changes:
- Matrix build + test steps no longer fail the job; they capture status into
  /tmp/benchmarks/<pkg>-build.json and -test.json and continue. This lets the
  benchmark + persist-workspace + post-benchmark-comment chain finish on
  failing PRs.
- post-benchmark-comment requires only the matrix (which now always succeeds
  at the job level), so the sticky PR comment is posted before any failure
  gating.
- New ci-gate job runs after post-benchmark-comment, reads the per-package
  status JSONs, and exits 1 if any build or test failed. This is what fails
  the workflow now — after the comment is in place.
- Comment header flips to a red ❌ with "one or more packages failed" when
  any status JSON shows fail; otherwise it renders the usual green check.
  Table gains Build / Test status columns per package.

Test fixes uncovered by the first CI run (pipeline 95b6bc05…):
- openjphjs HTJ2KEncoder rejected the round-trip frameInfo with
  "Missing field: isUsingColorTransform". Added the field (false) to the test.
- libjpeg-turbo-12bit tests were decoding jpeg400jfif.jpg (an 8-bit JPEG) and
  hitting "Unsupported JPEG data precision 8" — that fixture isn't valid for
  the 12-bit decoder. Replaced the decode test with a synthetic 12-bit
  encode→decode round-trip that generates its own gradient frame and verifies
  dimensions after the round-trip, so no missing fixture is required.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ci): split build/test matrices so dicom-codec sees sibling dists; decode-only 12bit tests

CI restructure:
- Split the matrix into build-package and test-package phases. Each
  build-package entry persists packages/<pkg>/dist into a shared workspace
  rooted at ~/repo. Each test-package entry attaches that workspace, so
  every test job sees every built dist — fixing dicom-codec's integration
  tests, which require all wasm artifacts and were silently skipping when
  run in isolation in its own matrix container.
- Move benchmark JSON output from /tmp/benchmarks to ~/repo/.ci-benchmarks
  so it persists under the same workspace root as the dists. Single
  attach_workspace at ~/repo now gives everything.
- post-benchmark-comment uses BENCHMARK_DIR=.ci-benchmarks to find the
  records. Comment posting is now soft-failing: the node script catches
  errors and exits 0 (a missing/bad GITHUB_TOKEN no longer blocks ci-gate),
  and the CI step no longer falls back to GH_TOKEN (which is the npm token
  in this project and produces 401 against the GitHub API).
- Test phase runs in cimg/node:18.20.3 (no emsdk needed once dist exists),
  trimming startup time vs. running everything in emscripten/emsdk.

Test fix:
- libjpeg-turbo-12bit's src/jslib.cpp only binds JPEGDecoder (encoder
  bindings are commented out), so the previous round-trip test was
  guaranteed to fail with "JPEGEncoder is not a constructor". Replaced
  with decode-only tests that exercise the available surface: instantiate
  the decoder, verify it rejects the 8-bit fixture (precision guard),
  verify it throws on truncated input, and an it.todo placeholder asking
  for a real 12-bit JPEG fixture to be added.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* perf(ci): stop rebuilding PR dist after main baseline benchmark

Each build-package matrix entry was compiling the wasm three times:
  1. Build PR version            (~83s for openjpeg)
  2. Build origin/main baseline  (~80s)
  3. Build PR again to restore dist for the workspace persist (~80s)

The third build was pure waste — we already had the PR dist in the tree
from step 1. Stash it to /tmp/pr-dist before checking out main, then
mv it back after benchmarking main. Drops the build-and-benchmark job
from ~5.3 min to ~3.5 min for the heavier wasm packages.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ci): persist only per-package files to avoid concurrent workspace conflicts

post-benchmark-comment was failing at the Attaching workspace step with:
  Concurrent upstream jobs persisted the same file(s):
    .ci-benchmarks/charls-build.json
    .ci-benchmarks/dicom-codec-build.json
    ... (all 8 packages' build outputs)
  Error applying workspace layer ...

Cause: each test-package matrix entry attaches the build-package workspace
(so it sees every package's *-build.json / *-pr.json / *-main.json), adds
its own *-test.json, then re-persists the entire .ci-benchmarks/ directory.
All 8 test jobs concurrently persist the same shared build files →
CircleCI refuses to merge the layers.

Fix: each matrix job now persists only the files it actually creates.
build-package persists packages/<pkg>/dist and the three per-package JSONs
(after a `touch` to make the paths deterministic). test-package persists
only its own <pkg>-test.json. No overlap → no conflict.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(bench): wire CodSpeed instrumented benchmarks via vitest

Adds vitest-based microbenchmarks for every package plus a CircleCI job
that runs them under CodSpeed's valgrind-based simulation mode and uploads
results for hardware-independent perf tracking.

Bench harness:
- @codspeed/vitest-plugin added at workspace root devDeps.
- Each package's vitest.config.mjs now includes the plugin and a
  benchmark.include glob for bench/**/*.bench.{js,mjs}. The plugin is a
  no-op for `vitest run`; only `vitest bench` under codspeed engages it.
- Per-package bench files at packages/<pkg>/bench/decode.bench.js cover
  decode and (where available) encode operations against the real fixtures
  already in test/fixtures. JS-only packages benchmark synthetic 512x512
  inputs. Wasm benches skip cleanly when dist/ is missing locally.
- dicom-codec gets dispatch.bench.js — end-to-end through the dispatcher
  for four transfer syntaxes (JPEG baseline, JPEG-LS, JPEG2000, HTJ2K),
  skipped when not all sibling dists are present.

Scripts:
- Root `yarn run bench` → `lerna run bench --parallel --stream`. Running
  via lerna (one container per package) sidesteps the
  "teardown called twice" failure I hit when vitest workspace mode wires
  the codspeed global setup hook into multiple workspace projects.
- Each package gains a `bench: vitest bench --run` script.

CI:
- New codspeed-bench job: installs valgrind + the CodSpeed CLI, attaches
  the build workspace (so all packages' dist/ is present), runs
  `codspeed run -- yarn run bench`. Soft-fails when CODSPEED_TOKEN is
  unset (just runs the benches locally without uploading) or when the
  inner command errors, so this never blocks the ci-gate.
- Wired into PR_CHECKS workflow after build-package, in parallel with
  test-package. post-benchmark-comment now also requires codspeed-bench
  so the comment is posted after both runs are done.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ci): switch benchmark.js to .mjs + load endian dist instead of ESM src

Two bugs visible in pipeline 306's posted comment:
  little-endian / big-endian: "Unexpected token 'export'"
  libjpeg-turbo-12bit:        "Program terminated with exit(1)"

Root causes:
- scripts/ci/benchmark.js was CJS and dynamically imported each endian
  package's src/index.js (which is unflagged ESM — `export default
  decode`). Node 18 (the CI executor) does not auto-detect typeless .js
  files as ESM, so the import fails on Node 18 even though it works on
  Node 20+/22+ locally.
- The 12-bit decoder rejected the only available 8-bit jpeg fixture
  with "Unsupported JPEG data precision 8".

Fixes:
- Rename scripts/ci/benchmark.js → benchmark.mjs (proper ESM, uses
  createRequire for CJS wasm modules). Each matrix entry's CI step now
  invokes the .mjs path.
- benchEndian now loads packages/<pkg>/dist/index.js (webpack-built CJS)
  via require, bypassing the ESM-source issue. dist/index.js is always
  present after build-package runs.
- benchLibJpeg12 no longer tries to decode the 8-bit fixture. Falls back
  to a decoder-instantiate microbench with a note explaining a real
  12-bit fixture is needed for throughput numbers.

Verified locally:
  node scripts/ci/benchmark.mjs libjpeg-turbo-8bit pr  → meanMs 1.26ms
  node scripts/ci/benchmark.mjs little-endian pr       → "dist not built"
  node scripts/ci/benchmark.mjs dicom-codec pr         → noop record

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ci): use official CodSpeed installer URL

Previous installer URL pointed at github.com/CodSpeedHQ/codspeed-rs which
returns 404 (that repo doesn't exist). The actual runner lives at
github.com/CodSpeedHQ/codspeed, served via the official installer at
https://codspeed.io/install.sh — which drops the `codspeed` binary into
$HOME/.cargo/bin/. Curl was exiting 22 (HTTP error) on the old URL,
failing the codspeed-bench job before it could start.

Also drop the explicit --token flag from `codspeed run` since the CLI
reads CODSPEED_TOKEN from the environment automatically.

Verified the install locally:
  curl -fsSL https://codspeed.io/install.sh | bash
  → downloading codspeed-runner 4.16.1
  → installing to ~/.cargo/bin
  → codspeed --version → codspeed-runner 4.16.1

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ci): pass --mode instrumentation to codspeed run

codspeed-bench job 503 exited 0 in 0s with:
  Error: No runner mode specified. Use --mode <mode> or set the mode for
  this shell session with `codspeed use <mode>`.

The CodSpeed CLI v4.16.1 requires the mode explicitly (no default).
Adding --mode instrumentation, which is what @codspeed/vitest-plugin
expects: deterministic CPU-cycle counting via valgrind/cachegrind,
<1% variance, hardware-independent. This is the right mode for
microbenchmarks; walltime would also work but needs the CodSpeed
Macro Runners for stable cross-run comparisons.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(ci): drop homegrown benchmark + comment infra, rely on CodSpeed

The homegrown sticky-comment harness was duplicating signals that
CircleCI status checks and CodSpeed already provide better. Cleaning up:

Removed:
- scripts/ci/benchmark.mjs
- scripts/ci/post-benchmark-comment.js
- post-benchmark-comment job
- ci-gate job (no longer needed — build/test steps now fail the job
  directly instead of capturing status into workspace JSON for the gate
  to read)
- Benchmark PR / Build + benchmark main baseline steps from
  build-package (saved ~80s per heavy wasm package — no more rebuilding
  origin/main inline)
- All the per-package status JSON plumbing
  (.ci-benchmarks/<pkg>-{build,test,pr,main}.json)

Simplified shape:
  build-package (matrix, parallel) ──┬─→ test-package (matrix, parallel)
                                     └─→ codspeed-bench (single, parallel)

  build/test failures fail the workflow naturally
  codspeed soft-fails (perf is reported by CodSpeed, not gated by us)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(fixtures): add real fixtures for .51/.57/.70/.81/.91/.5 + drop deflated UID

Closed the previously-identified decode-coverage gaps by extracting real
encoded-pixel-data payloads from Cornerstone3D's testImages corpus. Each
fixture was pulled directly from a Cornerstone3D test DICOM (transfer
syntax is in the source filename), wrapped in DICOM encapsulation, decoded
through pydicom + pylibjpeg/pyjpegls, and visually compared against the
existing lossless references. Lossless variants produce identical pixel
statistics; lossy variants are within ±2 HU of the reference — confirming
extraction is intact. Preview PNGs in /tmp/codec-fixtures-preview/.

Fixtures added:
  packages/libjpeg-turbo-12bit/test/fixtures/jpeg/CT-512x512-12bit.jpg
    real .51 JPEG Extended (Process 2 & 4) 12-bit fixture — replaces the
    8-bit-decoder-rejecting jpeg400jfif.jpg that we were misusing.
  packages/charls/test/fixtures/CT-512x512-near-lossless.JLS
    real .81 JPEG-LS near-lossless (NEAR > 0) — first time we exercise
    that code path; all existing .JLS fixtures were NEAR=0.
  packages/openjpeg/test/fixtures/j2k/CT-512x512-lossy.j2k
    real .91 irreversible 9-7 J2K — replaces the 250-byte image.j2k
    placeholder as our only lossy J2K data point.
  packages/dicom-codec/test/fixtures/jpeg-lossless/CT-512x512-process14.jpll
  packages/dicom-codec/test/fixtures/jpeg-lossless/CT-512x512-process14-sv1.jpll
    real .57 and .70 JPEG Lossless (SOF3) — first fixtures for the
    jpeg-lossless-decoder-js dispatch path which had none.
  packages/dicom-codec/test/fixtures/rle/CT-512x512.rle
    real .5 RLE Lossless — first fixture for dicom-codec's internal
    rleLossless.js (the file with the `// untested!` comment).

Tests updated to use them:
  libjpeg-turbo-12bit: real CT decode test (replaces the
    "TODO: add fixture" placeholder), plus 8-bit-precision-guard test.
  charls:  near-lossless decode added.
  openjpeg: lossy 9-7 decode added.
  dicom-codec/integration: JPEG Lossless (.57/.70) and RLE Lossless (.5)
    paths exercised through the dispatcher.
  libjpeg-turbo-12bit/bench: now decodes the real 12-bit CT instead of
    only instantiating the decoder.

Half-implementation removed:
  Dropped "1.2.840.10008.1.2.1.99" (Deflated Explicit VR Little Endian)
  from codecsMap. Previously routed to the plain little-endian codec
  which doesn't deflate — actually-deflated payloads would silently
  decode to garbage. Now `hasCodec` returns false and decode/encode
  throw "unknown transfer syntax UID" for that UID. dispatch.test.js
  no longer asserts that UID is supported. Re-add the mapping once a
  real inflate step is plumbed in.

Still missing fixtures: HTJ2K Lossy (.202/.203) — Cornerstone3D's
testImages corpus doesn't ship these. Worth re-encoding CT1.RAW with
openjphjs's lossy encoder in a follow-up if we want CodSpeed numbers
for those paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(bench): exercise every fixture/transfer-syntax via vitest bench

The bench files were stuck on a single fixture per package, leaving the
new ones (.51/.57/.70/.81/.91/.5) without CodSpeed coverage. Now every
fixture is benchmarked:

charls:
  decode CT1.JLS (.80 lossless)
  decode CT2.JLS (.80 lossless)
  decode CT-512x512-near-lossless.JLS (.81)            <-- new
  encode CT2.RAW lossless near=0

openjpeg:
  decode CT1.j2k (.90 lossless 5-3)
  decode CT2.j2k (.90 lossless 5-3)
  decode CT-512x512-lossy.j2k (.91 irreversible 9-7)   <-- new
  encode CT1.RAW lossless

openjphjs:
  decode CT1.j2c (.201 lossless)
  decode CT2.j2c (.201 lossless)                       <-- new
  encode CT1.RAW HTJ2K lossless

dicom-codec dispatch (covers every transfer syntax the dispatcher
supports — replaces the previous 4-syntax subset):
  .50  JPEG Baseline 8-bit
  .51  JPEG 12-bit Extended                            <-- new
  .57  JPEG Lossless Process 14                        <-- new
  .70  JPEG Lossless Process 14 SV1                    <-- new
  .80  JPEG-LS Lossless
  .81  JPEG-LS Near-Lossless                           <-- new
  .90  JPEG 2000 Lossless
  .91  JPEG 2000 Lossy                                 <-- new
  .201 HTJ2K Lossless
  .5   RLE Lossless                                    <-- new

libjpeg-turbo-8bit, libjpeg-turbo-12bit, little-endian, big-endian
already covered their fixtures fully — no changes needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ci): create empty dist/ before persist so build-dicom-codec succeeds

dicom-codec's build:ci is 'echo "Nothing to build" && exit 0' — it
doesn't produce a dist/ directory. persist_to_workspace then failed with
'The specified paths did not match any files in /root/repo', tanking
build-dicom-codec on pipeline #313 before any test could run.

mkdir -p the dist path right before persist so packages without build
output still satisfy the workspace contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci: trigger run to validate CodSpeed connection

Repo just connected to CodSpeed dashboard. Re-running the pipeline so
codspeed-bench picks up CODSPEED_TOKEN and actually uploads the bench
matrix this time.

* ci: move PR build/test/bench to GitHub Actions; keep NPM publish on CircleCI

CodSpeed has first-class GHA support via CodSpeedHQ/action@v3 (installs
valgrind, wires up instrumentation, runs the bench, uploads, and posts
the sticky PR comment in one step). The equivalent on CircleCI required
installing valgrind + the codspeed CLI by hand and was strictly less
capable. Moving the PR pipeline to GHA so CodSpeed integration "just
works".

New workflow at .github/workflows/pr-checks.yml:
  detect-changes  - emits the list of packages changed vs main as a JSON
                    matrix output (mirrors CircleCI's --since main skip).
                    On a docs-only PR the list is empty and the
                    downstream jobs are skipped entirely.
  build           - matrix, one entry per changed package, runs in the
                    emscripten/emsdk:3.1.74 container (same image as
                    before). Persists packages/<pkg>/dist via
                    actions/upload-artifact.
  test            - matrix, downloads all dists, vitest run.
  codspeed-bench  - downloads all dists, runs CodSpeedHQ/action@v3 with
                    `run: yarn run bench`. CODSPEED_TOKEN comes from
                    repo secrets; the action posts a sticky PR comment.

Removed the equivalent build-package / test-package / codspeed-bench
jobs from .circleci/config.yml. Only NPM_PUBLISH (deploy on main)
stays on CircleCI for now, since its GH_TOKEN/NPM_TOKEN env vars and
release flow are already wired up there.

Required GHA repo secret: CODSPEED_TOKEN (from the cornerstonejs/codecs
CodSpeed project page).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci: retrigger to pick up CODSPEED_TOKEN secret

* ci: adopt CodSpeed wizard improvements from PR #65

Pulled the integration tweaks from the codspeed-hq[bot] wizard PR
without taking its synthetic benchmarks (ours are richer — real DICOM
fixtures across .50/.51/.57/.70/.80/.81/.90/.91/.201/.5).

Changes:
- CodSpeedHQ/action@v3 → @v4
- Add OIDC auth: `permissions: id-token: write`, drop the
  `token: ${{ secrets.CODSPEED_TOKEN }}` arg. The action now
  authenticates via GitHub's OIDC provider — no secret to manage.
- Explicit `mode: simulation` on the CodSpeed step (deterministic CPU
  simulation, <1% variance, hardware-independent).
- Add `workflow_dispatch` trigger so CodSpeed can run a backtest from
  its dashboard once the repo is connected.
- Bump Node 18 → 22 in test + codspeed-bench jobs (matches the wizard's
  recommendation; our package engines are still >=18 so no impact).
- Add CodSpeed badge to README.

PR #65 can now be closed — its synthetic JS-only benches are subsumed
by ours, and these integration tweaks are now in our branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(test): unbrittle 12-bit assertions + use byteLength for dispatcher

Two test failures surfaced by the first GHA run on 0076b4f:

1. libjpeg-turbo-12bit/test/decode.test.js
   - frameInfo.bitsPerSample came back as 8, not 12. The wasm
     FrameInfo struct in this package reports bytes-per-sample for
     12-bit input rather than the JPEG precision marker. Drop the
     assertion — byteLength of the output buffer is sufficient to
     verify the decoder produced 16-bit-allocated data.
   - "throws on truncated input" failed because libjpeg-turbo's 12-bit
     code path silently returns partial output (unlike the 8-bit path
     which throws). Convert to it.todo so the behaviour is documented.

2. dicom-codec/test/integration.test.js
   - The JPEG Lossless dispatch path returns a Uint16Array (262 144
     elements × 2 bytes), not a Uint8Array like the other codecs.
     `.length === 524288` therefore fails — TypedArray.length is in
     elements, not bytes. Switch all dispatcher assertions to
     `.byteLength` which gives bytes uniformly across Uint8/Uint16/Int16
     return shapes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(test): libjpeg-turbo-12bit returns Uint16Array — use byteLength

Same shape mismatch as the dicom-codec dispatcher tests in the previous
commit: the 12-bit decoder's getDecodedBuffer() returns a 16-bit-element
typed array, so .length === 262 144 (elements), not 524 288 (bytes).
The other wasm decoders (8-bit libjpeg-turbo, charls, openjpeg,
openjphjs) all return Uint8ClampedArray and .length happens to equal
byteLength.

Switched to byteLength so the assertion is codec-shape-agnostic.

* fix(test): assert actual (buggy) bitsPerSample=8 with FIXME pointer

Earlier commit dropped the bitsPerSample assertion entirely, which
masked the underlying bug. Restoring an assertion against the *actual*
returned value (8) with a FIXME comment pointing at the line that
hard-codes it. When someone fixes the binding to read
cinfo.data_precision properly and start returning 12, this test will
loudly fail and force them to update it — which is the behaviour we
want for a known bug.

Decode itself is functionally correct (pixel values in 0..4095,
byteLength 524 288 for 16-bit-allocated 512x512). Only this metadata
reporting field is wrong.

Bug location: packages/libjpeg-turbo-12bit/src/JPEGDecoder.hpp:129

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ci): pin @codspeed/vitest-plugin to v4 + skip .51 transfer syntax

@codspeed/vitest-plugin@5 requires vitest>=3.2; we're on vitest 2.1.9,
so the plugin's bench hook never fired and CodSpeed reported "No
benchmarks found." Pin to ^4.0.1 (last line supporting vitest 2) — now
each bench emits an individual "[CodSpeed] ... done" line as expected.

Drop libjpeg-turbo-12bit (.51 transfer syntax) from test+bench while
keeping the build matrix entry: the upstream wasm binding hard-codes
bitsPerSample=8 and returns a half-sized Uint8 buffer, so the decode
test was asserting buggy behavior with a chain of FIXMEs. Cleaner to
skip it entirely than to ratchet the test against known-broken code.

* ci: empty commit to measure CodSpeed run-to-run variance

Same code as a5b8ee6 — re-running the bench pipeline so two CodSpeed
runs exist on identical sources, letting us diff each of the 29 benches
to see real (not advertised) variance under mode: simulation.

* perf(bench): split instantiation from decode/encode kernel measurement

The previous codec benches conflated three costs into one number:
constructor + buffer fill + decode/encode + destructor. CodSpeed runs
each bench body exactly once under Cachegrind, so the per-iteration
construct/destroy cycle was sitting inside the measurement window —
and the variance analysis on a5b8ee6..fa2db51 showed exactly that
mixing was the source of the 18–110% run-to-run spread on
openjpeg encode CT1.RAW, openjpeg decode CT1.j2k, openjphjs encode
CT1.RAW, and libjpeg decode jpeg400jfif (the only four noisy benches
in the suite).

Refactor: hoist new Decoder()/Encoder() + getEncodedBuffer().set() /
getDecodedBuffer().set() / setNearLossless() to module scope, one
instance per fixture (wasm decoders advance internal state on
decode() and can't be reused across multiple bench bodies). Each
bench body is now a single decoder.decode() or encoder.encode() call.

Two new "instantiate+destroy" benches per codec measure the lifecycle
cost in isolation — that's where the noise was hiding and now it
shows up as its own number rather than smearing into the kernel
measurement.

Net effect: the 4 noisy benches should drop to <1% spread (matching
the dispatcher benches which already cache codec instances), and the
lifecycle cost gets its own signal line that catches regressions in
wasm setup paths.

Affected packages: charls, libjpeg-turbo-8bit, openjpeg, openjphjs.
big-endian/little-endian have no decoder class and dicom-codec's
dispatch.bench.js already caches codec instances via runProcess →
initialize → codecConfig.codec, so neither needs the same change.

* perf(bench): switch codec benches to shared decoder + warmup pattern

The per-fixture refactor in 6f97516 already collapsed the noisy benches
80% (HTJ2K encode 172ms → 35ms, openjphjs decode 143ms → 38ms) by
pinning each decoder/encoder at a stable wasm heap address. This commit
goes one step further: a single shared decoder per codec, warmed up at
module scope (untimed) so every measured bench body sees a fully hot
decoder.

This mirrors how a real app uses these libraries — one Decoder/Encoder
instance fed many frames in a hot loop, not a fresh instance per frame.
Each per-fixture bench body now does what production does: refill the
encoded buffer with the current frame's bytes and call decode(). The
getEncodedBuffer(len).set(bytes) memcpy is a few KB and small relative
to the decode work; including it is honest because every real decode
call has to put bytes into wasm memory.

The instantiate+destroy benches remain — they're now the only place
the lifecycle cost shows up, which is precisely the per-codec signal
you want when tuning startup/teardown.

Affected packages: charls, libjpeg-turbo-8bit, openjpeg, openjphjs.
big-endian/little-endian have no Decoder class (pure-JS functions).
dicom-codec dispatcher already caches codec instances via
runProcess → initialize → codecConfig.codec, so its benches are
already in the "warm" regime once each UID has been hit once.

* perf(bench): split each codec bench into cold and warm variants

Adds an explicit cold-vs-warm bench pair per fixture, surfacing the
two distinct production regimes:

  cold = fresh decoder/encoder instance, the bench body is the FIRST
         decode/encode call on it. Models frame 1 of a worker session.
         cornerstone3D's decodeImageFrameWorker.js does no warmup, so
         frame 1 in each worker pays this cost.

  warm = shared decoder/encoder pre-warmed with 5 untimed iterations
         at module load. The bench body is the 6th+ call. Models
         frames 2..N — the dominant production case for stack
         scrolling, since decodeJPEG2000.ts:68, decodeJPEGLS.ts:73,
         and decodeJPEGBaseline8Bit.ts:61 all cache the decoder on
         local.decoder and reuse it for every frame.

Bench bodies are symmetric (identical code shape) between cold and
warm variants — the only difference is module-load state, so the
cold/warm delta cleanly isolates first-call setup cost (wasm heap
grow, page-touch, V8 wasm tier-up) from steady-state kernel time.

Warmup uses the LARGEST fixture per codec (CT1 in every case) so the
warm decoder's internal buffers never need to regrow when smaller
fixtures run their warm benches. Verified sizes:
- charls: CT1.JLS 164KB > .81 119KB > CT2.JLS 115KB
- openjpeg: CT1.j2k 174KB > .91-lossy 137KB > CT2.j2k 119KB
- openjphjs: CT1.j2c 185KB > CT2.j2c 128KB

Caveat documented in openjphjs/bench/decode.bench.js: cornerstone3D's
decodeHTJ2K.ts:69 actually creates a fresh HTJ2KDecoder every frame
(reuse is "much slower for some reason" per the comment in that file),
so for HTJ2K specifically the production approximation is
instantiate+destroy + decode-cold, not the warm number. The warm
bench remains useful for regression detection on the openjph kernel
itself.

Bench count goes from 37 to 50.

* docs(bench): explain the CPU-simulation measurement model

Adds BENCHMARKING.md at the repo root covering:
  - why we use mode: simulation (Cachegrind) instead of walltime
  - how the modeled instruction-time numbers relate to real
    wall-clock (JS loops inflate 30-100x, wasm decode 5-15x)
  - how to read each of the four bench types we emit per codec
    (instantiate+destroy, cold, warm, dispatcher integration)
  - what the dashboard warnings mean and which to ignore
    (syscall-attribution blowup, anonymous V8 JIT frames, V8
    tier-up artifacts on instantiation benches)
  - run-to-run variance characteristics observed across three
    identical-source runs
  - how to add a new bench

Cross-references the doc from:
  - README.md (new Benchmarking section, before Codec Package
    Anatomy) with a TL;DR pointing at BENCHMARKING.md
  - .github/workflows/pr-checks.yml (expanded comment above the
    `mode: simulation` line explaining why we chose it and what
    the trade-offs are)

No behaviour change; pure documentation of what we built up in
commits a5b8ee6..b4b14d1.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant