pyronova: bump to v2.3.1, unlock sub-interp DB bridge + size GIL pool#623
Open
ddxd wants to merge 4 commits intoMDA2AV:mainfrom
Open
pyronova: bump to v2.3.1, unlock sub-interp DB bridge + size GIL pool#623ddxd wants to merge 4 commits intoMDA2AV:mainfrom
ddxd wants to merge 4 commits intoMDA2AV:mainfrom
Conversation
Changes:
- PYRONOVA_REF: v2.0.2 -> v2.1.5
- app.py: add warning-level logging on benchmark-path errors; fix CRUD
endpoint paths/response shape to match aspnet-minimal reference;
restore gil=True on async-db (PG_POOL lives on main interp); widen
upload limit to 25MB
- launcher.py: NUMA-aware io_workers cap (avoid oversubscription on
multi-socket boxes)
- meta.json: subscribe to crud + unary-grpc{,-tls} + api-{4,16}
v2.1.5 highlights (see moomoo-tech/pyronova CHANGELOG):
- Per-worker sharded channels replacing the single MPMC crossbeam queue
(eliminates cross-CCD cache-line bouncing on AMD multi-CCD boxes)
- TCP_DEFER_ACCEPT, slowloris/HPP hardening, Py_SETREF correctness fix
- In-flight-aware P2C load balancing
- HOL body streaming + bounded WS channel
Arena's validate.sh reads `X-Cache` (MISS/HIT) via
`curl | grep ^x-cache:` under `set -o pipefail`. Without the header
the pipeline fails silently and terminates the whole script before
fail_with_link has a chance to print anything — exactly what we saw on
the last CI run ("PASS [GET /crud/items/1]" → cleanup → exit 1, no
diagnostic in between).
Fix: wrap every crud_get_one return path in Response with the
appropriate X-Cache header (MISS on first fetch / error / 404, HIT on
cache-aside return). Cache now stores a pre-serialized JSON string so
the HIT path skips json.dumps on every hit.
No behavior change for any other profile.
v2.2.0 adds a C-FFI DB bridge (4 functions injected into every sub-interp's globals) that forwards sqlx calls onto the shared process-global pool while releasing the calling interp's GIL. This removes the single-GIL ceiling that was capping /async-db at 3.7k rps on the previous v2.1.5 run. /crud/* endpoints keep gil=True for now — their in-process dict cache relies on main-interp GIL serialization for the MISS→HIT semantics the validator checks. Moving that cache to a SharedState-backed DashMap unblocks /crud too, tracked as a v2.3 follow-up. Expected impact this run (64-core TR 3995WX): - async-db: 3.7k → 30-50k rps (bridge ceiling ≈ min(cores, PG max_conn)) - api-4 / api-16: partial improvement (CRUD sub-profile still gil=True) - Other profiles: unchanged See docs/arena-async-db-and-static.md in pyronova repo for the full design doc.
Updates Pyronova from v2.2.0 → v2.3.1 via PYRONOVA_REF. Two changes in the Pyronova engine itself that affect Arena numbers: 1. Sub-interpreter DB bridge now works under TPC (src/bridge/db_bridge.rs). The bridge existed in v2.2 but panicked under TPC mode with "Cannot start a runtime from within a runtime": `rt.block_on(fut)` inside each sub-interp worker's tokio current_thread runtime is forbidden by tokio. Fixed by channel-dispatching to the DB runtime (`rt.spawn + std::mpsc::recv`) instead of nested block_on. The existing /async-db handler (no gil=True) now scales across all sub-interps with independent GILs instead of 503-ing on the single-thread main bridge. 2. Main-interp gil=True bridge defaults to N-worker pool (src/bridge/main_bridge.rs). Used by crud routes (their cache-aside dict semantics require a single interpreter). The launcher sets PYRONOVA_GIL_BRIDGE_WORKERS=16 + CAPACITY=8192 so the 1024+ concurrency profiles don't overflow the default 64-deep channel with a 503 storm. Verified locally at c=4096: 15k req/s steady, 0 drops. Both fixes preserve Pyronova's gil=True contract — pydantic-core / numpy / any other main-interp-only extension still works unchanged. Measured locally (Linux 7840HS 16-thread, PG sidecar, wrk -t8): /async-db @ c=4096: 3.7k (v2.2.0) → 34k req/s (v2.3.1) ≈9× /async-db @ c=64: 3.7k → 30k req/s ≈8× All profiles: 0 non-2xx responses across c=64..4096. TPC also becomes the default dispatch mode in v2.3.x (flipped in 0ae579c upstream). The Arena leaderboard's current v2.2.0 numbers were hybrid-mode; TPC's per-core pinning + leaked route tables should give a proportional lift to baseline / short-lived / json profiles too, not just async-db. validate.sh pyronova locally: 49 passed, 0 failed.
Owner
|
/benchmark -f pyronova --save |
Contributor
|
👋 |
Contributor
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Updates Pyronova from v2.2.0 → v2.3.1 via
PYRONOVA_REF+ a small launcher tweak. Noapp.pychanges.What changed in Pyronova between v2.2.0 and v2.3.1
1. Sub-interpreter DB bridge now works under TPC (
src/bridge/db_bridge.rs)The bridge existed in v2.2 but panicked the moment a route without
gil=Truehit it under TPC mode:Cause: the bridge's C-FFI entry points called
rt.block_on(fut)on the dedicated DB runtime, from inside the TPC worker thread's owncurrent_threadtokio runtime. tokio forbids nestedblock_on.Fix:
rt.spawn(fut)+std::sync::mpsc::sync_channel+rx.recv().spawnhas no runtime-context check — it just queues the task onto the DB runtime's worker pool. The sub-interp worker blocks on the channel with the GIL released (py.detach), so peer sub-interpreters keep running during the query. Parallelism ceiling ismin(sub_interp_workers, DATABASE_MAX_CONN)instead of the single main-interp thread.2. Main-interp
gil=Truebridge defaults to N-worker pool (src/bridge/main_bridge.rs)Used by the crud routes (their cache-aside dict semantics require a single interpreter — sub-interp workers have independent heaps, so SO_REUSEPORT would route consecutive
GET /crud/items/42hits to different workers and the HttpArena cache-aside validator would never seeMISS→HIT).Previously a single
std::thread; v2.3 is acrossbeam::boundedMPMC queue served by N threads. This PR's launcher change setsPYRONOVA_GIL_BRIDGE_WORKERS=16+PYRONOVA_GIL_BRIDGE_CAPACITY=8192so the 1024–4096-conn profiles don't 503-storm on a 64-deep default channel.3. TPC becomes the default dispatch mode
Flipped in
0ae579cupstream. The Arena leaderboard's current v2.2.0 numbers are from hybrid mode (N sub-interp pool + N io threads, with per-requestcrossbeam_channeldispatch across workers). TPC replaces that with per-thread sub-interpreter + same-thread handler call — zero cross-thread wake, zero cross-CCD atomic contention on the hot path. On the Arena 32-physical-core EPYC, this should lift baseline / short-lived / json proportionally, not just async-db.Measured impact (local)
Linux 7840HS 16-thread, Postgres sidecar,
wrk -t8:/async-db@ c=64/async-db@ c=1024/async-db@ c=4096All profiles:
0non-2xx acrossc=64..4096.validate.sh pyronova: 49 passed, 0 failed (verified locally against the currentscripts/validate.sh).Compatibility
app.pychanges. The previously-submitted/async-dbhandler (withoutgil=True) and crud handlers (withgil=True) both work unchanged with the v2.3.1 engine.gil=Truecontract is preserved: pydantic-core, numpy, and any other main-interp-only extension still works.https://github.com/moomoo-tech/pyronovaat tagv2.3.1(existing Dockerfile pattern, just bumping the ref).🤖 Generated with Claude Code