Skip to content

perf: replace RLock with Lock where re-entrant locking is not needed (~11ns saving, -14%)#796

Draft
mykaul wants to merge 2 commits intoscylladb:masterfrom
mykaul:perf/rlock-to-lock
Draft

perf: replace RLock with Lock where re-entrant locking is not needed (~11ns saving, -14%)#796
mykaul wants to merge 2 commits intoscylladb:masterfrom
mykaul:perf/rlock-to-lock

Conversation

@mykaul
Copy link
Copy Markdown

@mykaul mykaul commented Apr 5, 2026

Summary

Convert 7 of 8 RLock instances to plain Lock. All verified to use only flat (non-recursive) acquisition patterns:

Lock File Hot path?
Connection.lock connection.py Yes — every message send/receive
Cluster._lock cluster.py No — connect/shutdown only
ControlConnection._lock cluster.py No — schema/topology refresh
ControlConnection._reconnection_lock cluster.py No — reconnection only
Metadata._hosts_lock metadata.py No — host add/remove
TokenMap._rebuild_lock metadata.py No — keyspace rebuild
Host.lock pool.py No — reconnection handler
cqlengine.Connection.lazy_connect_lock cqlengine/connection.py No — lazy connect

Session._lock is kept as RLock because run_add_or_renew_pool() uses manual release()/acquire() inside a with block, which requires re-entrant semantics.

Benchmark

Lock type Per-cycle (with stmt) Overhead
RLock 92.9 ns baseline
Lock 81.1 ns -14%

Tests

  • 9 focused unit tests verifying lock types and operations (metadata add/update/remove, host reconnection handler)
  • test_update_host_sequential_lock specifically validates that Metadata.update_host() works with plain Lock (sequential, not nested acquisition)
  • Full unit test suite passes (654 passed)

@mykaul mykaul force-pushed the perf/rlock-to-lock branch 2 times, most recently from fbb04b2 to 5f8a314 Compare April 6, 2026 16:10
@mykaul
Copy link
Copy Markdown
Author

mykaul commented Apr 6, 2026

V2 Changes

Fixed: deadlock in Cluster.connect() failure path

Cluster.connect() acquires self._lock, and on control connection failure the original code called self.shutdown() inside the with self._lock: block. Since shutdown() also acquires self._lock, this would deadlock with a plain Lock (any network failure during initial connection would trigger it).

Fix: Restructured connect() to save the exception and call shutdown() after the with self._lock: block exits, avoiding the re-entrant acquisition.

Additional cleanup:

  • Removed unused RLock import from cassandra/connection.py
  • Added TestClusterConnectFailureNoDeadlock test that verifies connect() calls shutdown() and re-raises without deadlocking when using a plain Lock

Tests: 683 unit tests pass (660 core + 23 IO), 0 failures.

@mykaul mykaul changed the title perf: replace RLock with Lock where re-entrant locking is not needed perf: replace RLock with Lock where re-entrant locking is not needed (~11ns saving, -14%) Apr 7, 2026
@mykaul
Copy link
Copy Markdown
Author

mykaul commented Apr 10, 2026

Follow-up: Skip lock acquisition when no orphaned requests in process_msg

Commit: 2e5a6c6

What changed

In process_msg(), the orphaned request check acquires a lock on every response to check self.orphaned_request_ids. Since orphaned requests only exist after timeouts (a rare event), the set is almost always empty.

Now we check if self.orphaned_request_ids: (set truthiness) before acquiring the lock. Empty set → skip the lock entirely.

Thread safety

The unlocked truthiness check on a set is safe under the GIL (atomic read of internal size field). Worst case:

  • False positive (set non-empty but stream_id not in it): enters lock block, re-checks, safe
  • False negative (set just got an entry): orphaned response processed normally — acceptable

Benchmark results (Python 3.14, 2M iterations)

Scenario Before After Change
Empty orphaned set (common) 80.6 ns 23.2 ns -57.4 ns (3.47x)
Non-empty set (rare) 79.7 ns 87.8 ns +8.1 ns overhead

Testing

  • 617 unit tests passed (10.4s)
  • No regressions

@mykaul mykaul force-pushed the perf/rlock-to-lock branch 2 times, most recently from c8f4db2 to 2e5a6c6 Compare April 11, 2026 16:07
mykaul added 2 commits April 11, 2026 19:27
Convert 7 of 8 RLock instances to plain Lock. All verified to use
only flat (non-recursive) acquisition patterns:
- Connection.lock (hot path: every message send/receive)
- Cluster._lock (connect/shutdown)
- ControlConnection._lock and _reconnection_lock
- Metadata._hosts_lock and TokenMap._rebuild_lock
- Host.lock and cqlengine Connection.lazy_connect_lock

Session._lock is kept as RLock because run_add_or_renew_pool() uses
manual release/acquire inside a 'with' block.

Benchmark: RLock 'with' stmt is ~14% slower than plain Lock.
Check orphaned_request_ids truthiness before acquiring the lock.  Since
orphaned requests are rare (only on timeouts), the set is almost always
empty.  Skipping the lock in the common case saves ~57 ns per response.

The unlocked truthiness check on a set is thread-safe under the GIL.
Worst case (false positive): we enter the lock block and re-check, which
is correct.  Worst case (false negative): an orphaned response is
processed normally — acceptable behavior.

Benchmark (2M iters, Python 3.14):
  Empty set (common):     80.6 -> 23.2 ns (3.47x, -57.4 ns/response)
  Non-empty set (rare):   79.7 -> 87.8 ns (+8.1 ns overhead)
@mykaul mykaul force-pushed the perf/rlock-to-lock branch from 2e5a6c6 to 014e82e Compare April 11, 2026 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant