perf(vanilla-io_uring): converge onto the epoll twin — zero-alloc rendering, /pipeline skip-decode, crud slab (impl vanilla#84, #85)#965
Open
enghitalo wants to merge 1 commit into
Conversation
…ering, /pipeline skip-decode, crud slab) Port every backend-agnostic optimization from the vanilla-epoll entry so the two share one audited set of response builders and diff cleanly. The io_uring backend supports only a stateless request_handler (no async_handler / make_state / TLS, enghitalo/vanilla#83), so DB access stays on the blocking db.pg client; everything else now matches epoll byte-for-byte. Implements enghitalo/vanilla#84 (zero-alloc int parse/format) and enghitalo/vanilla#85 (crud: 1-query list, byte-rendered GET, fast body parse): - wi: negative-aware (fixes a latent wrong body for a negative /baseline11 sum) - emit / emit_int (stack scratch) / emit_xcache: zero-alloc response framing; /baseline11 and /upload no longer allocate an int->string per request - /pipeline: skip-decode fast path (blit the const before parsing) + decode_into (no Result boxing) on the main parse path - render_item_pg: byte-level JSON straight from db.pg text rows — removes the per-request json.encode reflection on /async-db, /crud list, /crud GET - crud cache: id-indexed slab (replaces map[int]string) with in-place buffer reuse and cache-aside invalidation, shared across ring workers under RwMutex - crud_list: single windowed query (count(*) OVER()) instead of page + count(*) - parse_crud_body_fast + borrowed json field parsers (json.decode fallback kept) - parse_i64_slice / dechunk_into / parse_hex_slice: allocation-free parsing - static: sendfile_min_bytes=16KiB, matching epoll (bounds per-conn RSS at high conns) DB profiles remain capped by the blocking db.pg on the single ring worker (enghitalo/vanilla#83) — unchanged here by design. Validated: both images build; every route (pipeline, baseline +/-, upload, json, json-comp, async-db, fortunes, static, crud list/get/create/update, 404, json-tls) is byte-for-byte identical to vanilla-epoll against a pristine seeded Postgres, and the X-Cache MISS->HIT->re-MISS-after-PUT sequence holds. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
Author
|
/benchmark -f vanilla-io_uring |
Contributor
|
👋 |
Contributor
Benchmark ResultsFramework:
Full log |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What & why
vanilla-io_uringwas an under-optimized copy of itsvanilla-epolltwin: same handlers, but allocating throwaway strings per request, usingjson.encode/json.decodereflection on the DB paths, and parsing the full request even for the fixed/pipelineblit. This PR ports every backend-agnostic optimization from the epoll entry so the two share one audited set of response builders and diff cleanly — the stated goal of keeping the two entries easy to maintain in lock-step.The io_uring backend of the vanilla library supports only a stateless
request_handler(noasync_handler/make_state/ TLS — see enghitalo/vanilla#83, enghitalo/vanilla#93), so DB access necessarily stays on the blockingdb.pgclient. Everything that does not require the async runtime or per-worker state now matches epoll byte-for-byte.Implements enghitalo/vanilla#84 (zero-alloc int parse/format) and enghitalo/vanilla#85 (crud: 1-query list, byte-rendered GET, fast crud-body parse).
Changes (all entry-only, no lib change)
wiis now negative-aware — fixes a latent wrong body for a negative/baseline11sum (a=-10&b=3returned garbage; now-7).emit/emit_int(stack scratch) /emit_xcache— zero-alloc response framing./baseline11and/uploadno longer allocate anint -> stringper request (the highest-RPS non-DB profiles)./pipelineskip-decode fast path — blit the constant before any parsing;decode_into(no!HttpRequestResult boxing) on the main parse path.render_item_pg— byte-level JSON straight fromdb.pgtext rows, removing the per-requestjson.encodereflection on/async-db,/crudlist and/crudGET.map[int]string) with in-place buffer reuse and cache-aside invalidation, shared across ring workers underRwMutex— identical structure to the epoll twin.crud_listuses a single windowed query (count(*) OVER()) instead of a pageSELECT+ a separatecount(*).parse_crud_body_fast+ borrowed JSON field parsers (with thejson.decodefallback kept for escaped bodies).parse_i64_slice/dechunk_into/parse_hex_slice— allocation-free query/body parsing (qintno longer materializes a string per param).sendfile_min_bytes = 16 KiB, matching epoll (bounds per-connection RSS at high conn counts).Net: the non-DB hot paths (pipeline / baseline / upload / json / json-comp / static) are now zero-alloc under the default GC, which should also pull io_uring's steady-state memory down toward epoll's (baseline was ~1.5 GiB vs epoll's ~78 MiB, dominated by the per-request
sum.str()/json.encodechurn this PR removes).What is intentionally NOT changed
DB profiles (
fortunes,async-db,api-*,crud) stay capped by the blockingdb.pgon the single ring worker — the io_uring backend has no async runtime to await DB readiness on the ring. Tracked in enghitalo/vanilla#83 (async runtime) and enghitalo/vanilla#93 (per-worker state). Those are the only remaining divergences from the epoll twin.Validation
v -prod -d vanilla_tls).pipeline,baseline11(positive and negative),upload,json,json-comp,async-db,fortunes,static(br negotiation),crudlist,crudGET (MISS then HIT),crudcreate,crudupdate, 404, andjson-tls— all 17 byte-for-byte identical to vanilla-epoll.GETMISS → HIT, then re-MISS after aPUT(slab invalidation).POST→ 201.json-comp→Content-Encoding: gzip.json-tls→ 200 over TLS 1.3.🤖 Generated with Claude Code