Skip to content

(improvement) Optimize _key_parts_packed routing key computation (100-300ns savings - 27-33% savings)#799

Draft
mykaul wants to merge 1 commit intoscylladb:masterfrom
mykaul:perf/key-parts-packed
Draft

(improvement) Optimize _key_parts_packed routing key computation (100-300ns savings - 27-33% savings)#799
mykaul wants to merge 1 commit intoscylladb:masterfrom
mykaul:perf/key-parts-packed

Conversation

@mykaul
Copy link
Copy Markdown

@mykaul mykaul commented Apr 6, 2026

Summary

Replace per-call struct.pack(">H%dsB" % l, l, p, 0) with uint16_pack(len(p)) + p + b'\x00' using the pre-compiled uint16_pack (struct.Struct('>H').pack) from cassandra.marshal. Eliminates format string interpolation and dynamic struct format creation on every call.

Motivation

The routing key computation (_key_parts_packed) is called for every query when TokenAwarePolicy is in use, making it a hot path. The original code creates a new format string (">H%dsB" % l) on every invocation, which triggers a new struct.pack format parse each time. Using the pre-compiled uint16_pack avoids this overhead.

Benchmark (CPython 3.14, per-call)

Key type Original Optimized Savings per call
Single int key 402ns 292ns 110ns (27%)
Composite (3 parts) 974ns 652ns 322ns (33%)
Long key (200B) 662ns 474ns 188ns (28%)

The routing key computation runs on every query when TokenAwarePolicy is active (the default).

Changes

  • cassandra/query.py: Import uint16_pack from cassandra.marshal, replace struct.pack(">H%dsB" % l, l, p, 0) with uint16_pack(len(p)) + p + b'\x00'

Testing

Unit tests pass (43/43 in test_query.py and test_parameter_binding.py). Output verified to match the original format byte-for-byte.

Replace per-call struct.pack(">H%dsB" % l, l, p, 0) with pre-compiled
uint16_pack(len(p)) + p + b'\\x00'. This eliminates the format string
interpolation and dynamic struct format creation on every call, using
the pre-compiled uint16_pack (struct.Struct('>H').pack) instead.

The routing key computation is called for every query when
TokenAwarePolicy is in use, making this a hot path.
@mykaul mykaul marked this pull request as draft April 6, 2026 19:26
@mykaul
Copy link
Copy Markdown
Author

mykaul commented Apr 6, 2026

Benchmark results (CPython 3.14, 500k iterations)

Key type Original Optimized Δ per call
single int key 402ns 292ns -110ns
composite (3 parts) 974ns 652ns -322ns
long key (200B) 662ns 474ns -188ns

The routing key computation runs on every query when TokenAwarePolicy is active (the default). The saving comes from eliminating the per-call format string interpolation (">H%dsB" % l) and dynamic struct.pack format parsing, replacing it with a pre-compiled struct.Struct('>H').pack.

@mykaul mykaul changed the title (improvement) Optimize _key_parts_packed routing key computation (improvement) Optimize _key_parts_packed routing key computation (100-300ns savings - 27-33% savings) Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant