A narrow, reproducible experiment: take one realistic blocking Spring Boot 4 service, flip
it from platform threads to virtual threads, and measure exactly what changes — throughput, tail
latency, and where the bottleneck moves (the Tomcat thread pool → the HikariCP connection
pool). The rig also carries a synchronized request path and JFR pinning capture to let you
verify the JEP 491 change on Java 25 yourself (the pre-Java-24 "avoid synchronized" advice
is now out of date).
This repo backs the Tucanoo article Virtual Threads in Spring Boot 4: I Rewrote a Blocking
Service and Measured Everything. The raw results in results/ and the chart script
in charts/ are the reproducibility evidence — clone it, run it on your own hardware,
and tell us what you get.
Honest framing up front: single host, loopback networking. Absolute latencies are optimistic versus production. The claims here are the relative platform-vs-virtual delta and the bottleneck shift — both hold on loopback. See Caveats.
| Concurrency | Platform threads | Virtual threads |
|---|---|---|
| 500 VUs | ~1,996 req/s · p99 ~1.3 s | ~5,262 req/s · p99 ~127 ms |
| 2,500 VUs | ~2,226 req/s · p99 ~2.8 s | ~13,673 req/s · p99 ~690 ms |
| 10,000 VUs | ~2,607 req/s · p99 ~6.6 s | ~10,091 req/s · p99 ~3.4 s* |
Platform threads plateau at the ~2,000–2,600 req/s Tomcat 200-thread wall; virtual threads scale to ~6.1× that at 2,500 concurrent users, from a single configuration flag. (*10,000 VUs is a closed-model overload point, not steady state — see Caveats.)
And the bottleneck-shift experiment — with the connection held across the downstream call, the HikariCP pool becomes a hard throughput ceiling. Platform threads flatline at their 200-thread wall; virtual threads ride the pool:
| Host | Intel Core i9-9900K @ 3.6 GHz |
| Cores | 8 physical / 16 logical (Hyper-Threading on), Coffee Lake — homogeneous (no P/E-core split) |
| RAM | 32 GB |
| OS | Linux (WSL2, Ubuntu 24.04) on Windows 10 — .wslconfig: processors=16, memory=16GB |
| JDK | Temurin 25.0.3 LTS (/usr/lib/jvm/temurin-25-jdk-amd64) — includes JEP 491 |
| Spring Boot | 4.0.7 (Tomcat 11 MVC, not WebFlux) · Servlet 6.1 |
| DB | PostgreSQL 16, dedicated standalone instance on port 5544, through HikariCP |
| Downstream stub | WebFlux/Netty stub (stub-app/), non-blocking Mono.delayElement(100ms) |
| Load generator | k6 v2.0.0 (tools/k6/k6), closed-model ramping-vus |
Each tier is pinned to its own CPUs with taskset. On Linux availableProcessors() auto-tracks
the affinity mask, so the carrier (ForkJoinPool) size follows the App tier's cores automatically —
no -XX:ActiveProcessorCount hack is needed. HT siblings are adjacent pairs on this box
(verified via lscpu -e: physical core N = logical CPUs 2N, 2N+1), so every tier owns whole
physical cores and no two tiers share one:
| Tier | Logical CPUs (taskset) |
Physical cores | Notes |
|---|---|---|---|
| App (SUT) | 0-7 |
0–3 (4 cores) | kept generous so it stays HikariCP-limited, not CPU-pegged (peaks ~80%) |
| PostgreSQL | 8-9 |
4 (1 core) | light load (~0.5 core at peak) |
| Load gen + stub | 10-15 |
5–7 (3 cores) | k6 + WebFlux stub; peaks ~74%, never the limiter |
The core map lives at the top of bench/lib.sh (CPUS_APP / CPUS_PG / CPUS_LOAD).
The application code is identical across every run. Only environment variables change, all set
by the harness and reported live at GET /api/runtime-info:
| Env var | Values | What it controls |
|---|---|---|
VT_ENABLED |
false / true |
platform vs virtual threads (spring.threads.virtual.enabled) |
HIKARI_MAX_POOL |
10, 50, 100, … |
HikariCP maximum-pool-size (fixed-size: min = max) |
CONN_MODE |
released / held |
when the DB connection is returned (see below) |
PATH_MODE |
normal / synchronized |
striped-synchronized variant for the JEP 491 pinning check |
The system under test (app/) is one endpoint, GET /api/order-summary/{id}: a
HikariCP/JDBC primary-key lookup (orders joined to customers, 1M rows seeded) plus a
blocking RestClient call to the stub (~100 ms). It combines both into JSON.
Two request paths drive the two experiments:
released(default, the hero path) returns the connection after the sub-millisecond query, before the downstream call → the pool governs latency, not throughput.heldwraps the query + downstream call in oneTransactionTemplateso the connection stays checked out across the 100 ms call (the real-world "transaction spans an external HTTP call" anti-pattern) → the pool is a hard throughput ceiling. This is the "your bottleneck just moved" demo.
On Linux (or WSL2). Prerequisites, with the paths the harness expects (override at the top of
bench/lib.sh if yours differ):
- Temurin 25 (or any JDK 25 with JEP 491) at
/usr/lib/jvm/temurin-25-jdk-amd64 - PostgreSQL 16 binaries at
/usr/lib/postgresql/16/bin(the harness runs its own standalone instance on port 5544 with.pgdata/— it does not touch any system cluster) - k6 v2.0.0 at
tools/k6/k6(fetch the static Linux binary into that path)
# 0. Clone
git clone https://github.com/tucanoo/springboot-virtualthreads.git
cd springboot-virtualthreads
# 1. Build the two jars (Spring Boot SUT + the WebFlux stub).
# The Maven wrapper lives in app/; build the stub with it via -f.
export JAVA_HOME=/usr/lib/jvm/temurin-25-jdk-amd64
( cd app && ./mvnw -B -DskipTests package )
( cd app && ./mvnw -B -DskipTests -f ../stub-app/pom.xml package )
# 2. Run the article dataset (~50 min): hero (released) + bottleneck (held), 3 reps each.
# Starts Postgres (+seed on first run), the stub and the app; pins every tier; samples
# per-tier CPU; appends results/matrix.csv; tears everything down at the end.
bash bench/run-lean.sh
# 3. Render the figures from the CSV
python -m venv charts/.venv && source charts/.venv/bin/activate
pip install -r charts/requirements.txt
python charts/make_charts.py results/matrix.csv # -> charts/out/figA_*, figB_*run-lean.sh is just two run-matrix.sh invocations. To drive the matrix directly:
# Hero (released): pool fixed, concurrency swept
bash bench/run-matrix.sh --conn-mode released --pools 50 \
--conc 100,500,1000,2500,5000,10000 --reps 3 --ramp 8 --warmup 10 --hold 20
# Bottleneck (held): pool swept at fixed concurrency
bash bench/run-matrix.sh --conn-mode held \
--pools 50,100,200,400,800 --conc 2000 --reps 3 --ramp 8 --warmup 10 --hold 20Flags: --modes platform,virtual · --pools · --conc · --reps · --ramp/--warmup/--hold
(seconds) · --conn-mode released|held · --path-mode normal|synchronized · --keep. The
5-rep full matrix is bench/run-full.sh (several hours). The seeded DB
persists in .pgdata/ across runs; force a reseed with RESEED=1.
Note: the author develops on Windows and executes the rig inside WSL2; that host-specific sync workflow is not required to reproduce — on a native Linux box the three steps above are the whole story.
results/matrix.csv is the article dataset: 66 rows, both thread modes,
3 reps, collected in one clean run. Each row is one completed scenario (23 columns):
ts_iso, thread_mode, hikari_pool, path_mode, conn_mode, concurrency, rep,
throughput_rps, p50_ms, p95_ms, p99_ms, p999_ms, max_ms, error_rate, http_reqs,
app_cpu_pct, pg_cpu_pct, load_cpu_pct, hikari_active_max, hikari_pending_max,
jvm_threads_max, rss_mb_max, pinned_events
The conn_mode column separates the two experiments (released = hero, held = bottleneck).
Per-tier CPU is logged on every run precisely so you can confirm the load generator and Postgres
were never saturated — a saturated generator masquerading as the app topping out is the single
biggest benchmark failure mode, and the data lets you rule it out.
We'd rather you trust the numbers than be impressed by them, so:
- Single-host loopback — absolute latencies are optimistic; the claims are the relative platform-vs-virtual delta and the bottleneck shift, both of which hold on loopback.
- WSL2, not bare metal — the
cpusetpinning is real within the VM, but the VM shares the physical machine with the Windows host. For bare-metal isolation, run the same harness on native Linux; the core map is unchanged. - 3 reps, not 5 — the shipped lean dataset trades reps for turnaround. Fine for effects this
large, but stated honestly;
run-full.shdoes 5 reps if you want tighter variance bars. - The 10,000-VU rows are a closed-model overload cliff (throughput drops, p99 ≈ 3–4 s, a fraction of a percent of errors) — a legitimate data point, but label it overloaded, not steady state.
- Low-concurrency p99 spikes — a recurring ~1.2 s p99 blip at low load in some reps (JIT/GC/cold-connection warm-up) inflates low-concurrency p99; read median/p95 there.
- The
heldexperiment stays at c2000 — at small pools, much higher concurrency blows past HikariCP's 10 sconnection-timeoutand the rows become timeout-dominated rather than clean throughput. - Pinning — JEP 491 makes the expected
jdk.VirtualThreadPinnedcount on asynchronizedhot path on Java 25 ≈ 0. The rig wires thesynchronizedpath and JFR capture so you can confirm this; the shipped dataset focuses on throughput/latency/bottleneck. - Numbers are i9-9900K-specific. Different CPU counts change the carrier-pool size and move these curves — which is exactly why reproducing on your own hardware is interesting.
app/ Spring Boot 4 SUT — one blocking endpoint (RestClient -> stub + JDBC -> Postgres)
stub-app/ WebFlux/Netty downstream stub (non-blocking, fixed 100ms delay)
load/ k6 scenario (closed-model ramping-vus) + profiles
bench/ Bash harness: lib.sh (config + helpers), run-matrix.sh, run-lean.sh, run-full.sh
sql/ schema.sql + seed.sql (100k customers / 1M orders via generate_series)
results/ raw matrix.csv (committed = reproducibility)
charts/ make_charts.py -> hero + bottleneck figures (charts/out/)
tools/ portable k6 binary
Legacy artifacts: the
bench/*.ps1scripts,bench/CpuProbe.java/bench/OutboundProbe.java, and thestub/(WireMock) directory are from an earlier Windows-native attempt that hit a loopback ceiling. The Linux/WSL2 bash harness above is the rig that produced the published data; the PowerShell files are kept for history and are not required.
Run it, fork it, and re-plot the raw CSVs however you like. If you reproduce the benchmark on different hardware — a cloud instance, a bigger box, native Linux, ARM — we'd genuinely like to hear what you get. Read the full write-up and methodology in the companion article: Virtual Threads in Spring Boot 4: I Rewrote a Blocking Service and Measured Everything.

