perf: replace RLock with Lock where re-entrant locking is not needed (~11ns saving, -14%) by mykaul · Pull Request #796 · scylladb/python-driver

mykaul · 2026-04-05T15:39:28Z

Summary

Convert 7 of 8 RLock instances to plain Lock. All verified to use only flat (non-recursive) acquisition patterns:

Lock	File	Hot path?
`Connection.lock`	connection.py	Yes — every message send/receive
`Cluster._lock`	cluster.py	No — connect/shutdown only
`ControlConnection._lock`	cluster.py	No — schema/topology refresh
`ControlConnection._reconnection_lock`	cluster.py	No — reconnection only
`Metadata._hosts_lock`	metadata.py	No — host add/remove
`TokenMap._rebuild_lock`	metadata.py	No — keyspace rebuild
`Host.lock`	pool.py	No — reconnection handler
`cqlengine.Connection.lazy_connect_lock`	cqlengine/connection.py	No — lazy connect

Session._lock is kept as RLock because run_add_or_renew_pool() uses manual release()/acquire() inside a with block, which requires re-entrant semantics.

Benchmark

Lock type	Per-cycle (`with` stmt)	Overhead
`RLock`	92.9 ns	baseline
`Lock`	81.1 ns	-14%

Tests

9 focused unit tests verifying lock types and operations (metadata add/update/remove, host reconnection handler)
test_update_host_sequential_lock specifically validates that Metadata.update_host() works with plain Lock (sequential, not nested acquisition)
Full unit test suite passes (654 passed)

mykaul · 2026-04-06T16:10:49Z

V2 Changes

Fixed: deadlock in Cluster.connect() failure path

Cluster.connect() acquires self._lock, and on control connection failure the original code called self.shutdown() inside the with self._lock: block. Since shutdown() also acquires self._lock, this would deadlock with a plain Lock (any network failure during initial connection would trigger it).

Fix: Restructured connect() to save the exception and call shutdown() after the with self._lock: block exits, avoiding the re-entrant acquisition.

Additional cleanup:

Removed unused RLock import from cassandra/connection.py
Added TestClusterConnectFailureNoDeadlock test that verifies connect() calls shutdown() and re-raises without deadlocking when using a plain Lock

Tests: 683 unit tests pass (660 core + 23 IO), 0 failures.

mykaul · 2026-04-10T21:33:11Z

Follow-up: Skip lock acquisition when no orphaned requests in process_msg

Commit: 2e5a6c6

What changed

In process_msg(), the orphaned request check acquires a lock on every response to check self.orphaned_request_ids. Since orphaned requests only exist after timeouts (a rare event), the set is almost always empty.

Now we check if self.orphaned_request_ids: (set truthiness) before acquiring the lock. Empty set → skip the lock entirely.

Thread safety

The unlocked truthiness check on a set is safe under the GIL (atomic read of internal size field). Worst case:

False positive (set non-empty but stream_id not in it): enters lock block, re-checks, safe
False negative (set just got an entry): orphaned response processed normally — acceptable

Benchmark results (Python 3.14, 2M iterations)

Scenario	Before	After	Change
Empty orphaned set (common)	80.6 ns	23.2 ns	-57.4 ns (3.47x)
Non-empty set (rare)	79.7 ns	87.8 ns	+8.1 ns overhead

Testing

617 unit tests passed (10.4s)
No regressions

Convert 7 of 8 RLock instances to plain Lock. All verified to use only flat (non-recursive) acquisition patterns: - Connection.lock (hot path: every message send/receive) - Cluster._lock (connect/shutdown) - ControlConnection._lock and _reconnection_lock - Metadata._hosts_lock and TokenMap._rebuild_lock - Host.lock and cqlengine Connection.lazy_connect_lock Session._lock is kept as RLock because run_add_or_renew_pool() uses manual release/acquire inside a 'with' block. Benchmark: RLock 'with' stmt is ~14% slower than plain Lock.

Check orphaned_request_ids truthiness before acquiring the lock. Since orphaned requests are rare (only on timeouts), the set is almost always empty. Skipping the lock in the common case saves ~57 ns per response. The unlocked truthiness check on a set is thread-safe under the GIL. Worst case (false positive): we enter the lock block and re-check, which is correct. Worst case (false negative): an orphaned response is processed normally — acceptable behavior. Benchmark (2M iters, Python 3.14): Empty set (common): 80.6 -> 23.2 ns (3.47x, -57.4 ns/response) Non-empty set (rare): 79.7 -> 87.8 ns (+8.1 ns overhead)

mykaul force-pushed the perf/rlock-to-lock branch 2 times, most recently from fbb04b2 to 5f8a314 Compare April 6, 2026 16:10

mykaul changed the title ~~perf: replace RLock with Lock where re-entrant locking is not needed~~ perf: replace RLock with Lock where re-entrant locking is not needed (~11ns saving, -14%) Apr 7, 2026

mykaul mentioned this pull request Apr 8, 2026

perf: Replace dict-based request ID management with list-indexed structure #808

Open

mykaul force-pushed the perf/rlock-to-lock branch 2 times, most recently from c8f4db2 to 2e5a6c6 Compare April 11, 2026 16:07

mykaul added 2 commits April 11, 2026 19:27

mykaul force-pushed the perf/rlock-to-lock branch from 2e5a6c6 to 014e82e Compare April 11, 2026 16:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: replace RLock with Lock where re-entrant locking is not needed (~11ns saving, -14%)#796

perf: replace RLock with Lock where re-entrant locking is not needed (~11ns saving, -14%)#796
mykaul wants to merge 2 commits intoscylladb:masterfrom
mykaul:perf/rlock-to-lock

mykaul commented Apr 5, 2026 •

edited

Loading

Uh oh!

mykaul commented Apr 6, 2026

Uh oh!

mykaul commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mykaul commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Benchmark

Tests

Uh oh!

mykaul commented Apr 6, 2026

V2 Changes

Uh oh!

mykaul commented Apr 10, 2026

Follow-up: Skip lock acquisition when no orphaned requests in process_msg

What changed

Thread safety

Benchmark results (Python 3.14, 2M iterations)

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mykaul commented Apr 5, 2026 •

edited

Loading