perf: buffer accumulation in BatchMessage.send_body() (1.6-1.8x speedup, us improvement, depends on PR #790) by mykaul · Pull Request #791 · scylladb/python-driver

mykaul · 2026-04-04T15:50:45Z

Summary

Replace per-call write_value()/write_byte()/write_short() in BatchMessage.send_body() with buffer accumulation (list.append + b"".join + single f.write()), reducing f.write() calls from Q*(4 + 2*P) + footer to 1 for Q queries with P params each.

Depends on PR #790 (perf/buffer-accum-write-params).

What changed

`cassandra/protocol.py`

BatchMessage.send_body() -- Full buffer accumulation for the entire message: batch header, per-query framing (prepared/unprepared), all parameters (with NULL/UNSET/str handling), and trailer.

Pre-computed constants -- _INT32_NULL and _INT32_UNSET as module-level constants to avoid repeated int32_pack() calls in the hot loop.

`tests/unit/test_protocol.py`

Added 7 new batch-specific test methods: prepared queries, unprepared queries, mixed, empty batch, many queries (50), NULL/UNSET params, and vector params.

Benchmark

Measured with min() of timeit.repeat(repeat=7, number=50_000) on a quiet machine (load <3), Cython .so compiled, before/after rebuild on same machine.

Scenario	Baseline (ns/call)	Buffer accum (ns/call)	Speedup
10 queries x 2 params (128D vec)	7699	4866	1.58x
10 queries x 10 params (text)	18976	10490	1.81x
50 queries x 2 params (128D vec)	34492	20608	1.67x
50 queries x 10 params (text)	83815	48338	1.73x

Consistent 1.6-1.8x speedup across all batch scenarios. Larger batches with more params see the greatest absolute savings (35+ us saved for 50q x 10p).

Tests

7 new batch-specific tests + 14 from PR perf: buffer accumulation in _write_query_params() reduces f.write() calls (~100ns improvement, 1.1-1.3x speedup) #790
Full unit test suite passes (666 passed, 43 skipped)

Replace the per-parameter write_value(f, param) loop in _QueryMessage._write_query_params() with a buffer accumulation approach: list.append + b"".join + single f.write(). This reduces the number of f.write() calls from 2*N+1 to 1, which is significant for vector workloads with large parameters. Also removes the redundant ExecuteMessage._write_query_params() pass-through override to avoid extra MRO lookup per call. Includes 14 unit tests covering normal, NULL, UNSET, empty, large vector, and mixed parameter scenarios for both ExecuteMessage and QueryMessage. Includes a benchmark script (benchmarks/bench_execute_write_params.py).

Replace per-write_value()/write_byte()/write_short() calls in BatchMessage.send_body() with buffer accumulation (list.append + b"".join + single f.write()), reducing f.write() calls from Q*(4 + 2*P) + footer to 1 for Q queries with P params each. Benchmark results (Python 3.14, Cython .so, 50K iters, best of 3, quiet machine): Scenario Before After Speedup 10 queries x 2 params (128D vec) 8364 ns 4475 ns 1.87x 10 queries x 2 params (768D vec) 8081 ns 5516 ns 1.47x 50 queries x 2 params (128D vec) 32368 ns 16271 ns 1.99x 10 queries x 10 text params 19138 ns 9051 ns 2.11x 50 queries x 10 text params 86845 ns 40020 ns 2.17x 10 unprepared x 2 params 8666 ns 4252 ns 2.04x Also updates test_batch_message_with_keyspace to use BytesIO for byte-level verification (compatible with single-write output). Adds 7 batch-specific unit tests covering prepared, unprepared, mixed, empty, many-query, NULL/UNSET, and vector parameter scenarios. Includes benchmark script benchmarks/bench_batch_send_body.py.

Replace per-call int32_pack(-1) and int32_pack(-2) with module-level _INT32_NEG1 and _INT32_NEG2 constants. Avoids redundant struct packing on every null or unset parameter in the inner write_value loop. Benchmark: ~11% speedup on the parameter serialization loop for a typical 12-param mix of values, nulls, and unsets.

mykaul marked this pull request as draft April 4, 2026 17:01

mykaul force-pushed the perf/buffer-accum-batch-message branch 2 times, most recently from 5eec7ed to b1d1cd0 Compare April 5, 2026 17:29

mykaul changed the title ~~perf: buffer accumulation in BatchMessage.send_body()~~ perf: buffer accumulation in BatchMessage.send_body() (2x speedup, us improvement, depends on PR #790) Apr 7, 2026

mykaul added 3 commits April 7, 2026 11:28

mykaul force-pushed the perf/buffer-accum-batch-message branch from b1d1cd0 to 62f91eb Compare April 7, 2026 08:29

mykaul mentioned this pull request Apr 7, 2026

perf: buffer accumulation in _write_query_params() reduces f.write() calls (~100ns improvement, 1.1-1.3x speedup) #790

Draft

mykaul changed the title ~~perf: buffer accumulation in BatchMessage.send_body() (2x speedup, us improvement, depends on PR #790)~~ perf: buffer accumulation in BatchMessage.send_body() (1.6-1.8x speedup, us improvement, depends on PR #790) Apr 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: buffer accumulation in BatchMessage.send_body() (1.6-1.8x speedup, us improvement, depends on PR #790)#791

perf: buffer accumulation in BatchMessage.send_body() (1.6-1.8x speedup, us improvement, depends on PR #790)#791
mykaul wants to merge 3 commits intoscylladb:masterfrom
mykaul:perf/buffer-accum-batch-message

mykaul commented Apr 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mykaul commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

cassandra/protocol.py

tests/unit/test_protocol.py

Benchmark

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mykaul commented Apr 4, 2026 •

edited

Loading

`cassandra/protocol.py`

`tests/unit/test_protocol.py`