(improvement) Optimize _key_parts_packed routing key computation (100-300ns savings - 27-33% savings) by mykaul · Pull Request #799 · scylladb/python-driver

mykaul · 2026-04-06T18:45:09Z

Summary

Replace per-call struct.pack(">H%dsB" % l, l, p, 0) with uint16_pack(len(p)) + p + b'\x00' using the pre-compiled uint16_pack (struct.Struct('>H').pack) from cassandra.marshal. Eliminates format string interpolation and dynamic struct format creation on every call.

Motivation

The routing key computation (_key_parts_packed) is called for every query when TokenAwarePolicy is in use, making it a hot path. The original code creates a new format string (">H%dsB" % l) on every invocation, which triggers a new struct.pack format parse each time. Using the pre-compiled uint16_pack avoids this overhead.

Benchmark (CPython 3.14, per-call)

Key type	Original	Optimized	Savings per call
Single int key	402ns	292ns	110ns (27%)
Composite (3 parts)	974ns	652ns	322ns (33%)
Long key (200B)	662ns	474ns	188ns (28%)

The routing key computation runs on every query when TokenAwarePolicy is active (the default).

Changes

cassandra/query.py: Import uint16_pack from cassandra.marshal, replace struct.pack(">H%dsB" % l, l, p, 0) with uint16_pack(len(p)) + p + b'\x00'

Testing

Unit tests pass (43/43 in test_query.py and test_parameter_binding.py). Output verified to match the original format byte-for-byte.

Replace per-call struct.pack(">H%dsB" % l, l, p, 0) with pre-compiled uint16_pack(len(p)) + p + b'\\x00'. This eliminates the format string interpolation and dynamic struct format creation on every call, using the pre-compiled uint16_pack (struct.Struct('>H').pack) instead. The routing key computation is called for every query when TokenAwarePolicy is in use, making this a hot path.

mykaul · 2026-04-06T19:36:45Z

Benchmark results (CPython 3.14, 500k iterations)

Key type	Original	Optimized	Δ per call
single int key	402ns	292ns	-110ns
composite (3 parts)	974ns	652ns	-322ns
long key (200B)	662ns	474ns	-188ns

The routing key computation runs on every query when TokenAwarePolicy is active (the default). The saving comes from eliminating the per-call format string interpolation (">H%dsB" % l) and dynamic struct.pack format parsing, replacing it with a pre-compiled struct.Struct('>H').pack.

mykaul marked this pull request as draft April 6, 2026 19:26

mykaul changed the title ~~(improvement) Optimize _key_parts_packed routing key computation~~ (improvement) Optimize _key_parts_packed routing key computation (100-300ns savings - 27-33% savings) Apr 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(improvement) Optimize _key_parts_packed routing key computation (100-300ns savings - 27-33% savings)#799

(improvement) Optimize _key_parts_packed routing key computation (100-300ns savings - 27-33% savings)#799
mykaul wants to merge 1 commit intoscylladb:masterfrom
mykaul:perf/key-parts-packed

mykaul commented Apr 6, 2026 •

edited

Loading

Uh oh!

mykaul commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mykaul commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Benchmark (CPython 3.14, per-call)

Changes

Testing

Uh oh!

mykaul commented Apr 6, 2026

Benchmark results (CPython 3.14, 500k iterations)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mykaul commented Apr 6, 2026 •

edited

Loading