Skip to content

Add TLS session resumption via SSLSessionCache#789

Open
sylwiaszunejko wants to merge 4 commits intoscylladb:masterfrom
sylwiaszunejko:tls-ticket
Open

Add TLS session resumption via SSLSessionCache#789
sylwiaszunejko wants to merge 4 commits intoscylladb:masterfrom
sylwiaszunejko:tls-ticket

Conversation

@sylwiaszunejko
Copy link
Copy Markdown
Collaborator

@sylwiaszunejko sylwiaszunejko commented Apr 3, 2026

Summary

This PR implements TLS session resumption for the Python driver. After the first
successful TLS handshake with a node, the negotiated session is stored in a
thread-safe cache and reused on subsequent connections, skipping the full
handshake.

Both TLS 1.2 (session IDs) and TLS 1.3 (session tickets / PSK) are supported.

Changes

cassandra/connection.pySSLSessionCache class & endpoint keys

  • _SessionCacheEntry namedtuple stores (session, timestamp) for TTL tracking.
  • SSLSessionCache: a thread-safe OrderedDict-based cache with LRU eviction,
    TTL expiration, and periodic cleanup (every 100 set() calls), keyed by
    endpoint tls_session_cache_key.
  • Configurable max_size (default 100) and ttl (default 3600 s).
  • Base EndPoint class provides a default tls_session_cache_key property
    returning (address, port). Subclasses override for context-specific keys:
    • DefaultEndPoint: (address, port) — inherits default
    • SniEndPoint: (address, port, server_name) — prevents proxy collisions
    • UnixSocketEndPoint: (unix_socket_path,)
    • ClientRoutesEndPoint: (host_id, address, port)

cassandra/connection.pyConnection wiring

  • Connection gains _ssl_session_cache attribute, set via ssl_session_cache
    kwarg in __init__.
  • _wrap_socket_from_context() restores a cached session via
    ssl_sock.session = ... after wrap_socket(); gracefully handles
    ssl.SSLError / AttributeError if the server rejects the session.
  • _ssl_session_cache_key() helper delegates to
    endpoint.tls_session_cache_key.
  • _cache_tls_session_if_needed() stores socket.session in the cache when
    ssl_context is set and the session is non-None.
  • Sessions are cached at three points to cover both TLS 1.2 and 1.3:
    1. After _initiate_connection() in _connect_socket() — TLS 1.2 sessions
      are available immediately after connect.
    2. After ReadyMessage in _handle_startup_response() — TLS 1.3 tickets
      arrive asynchronously after the first application-data exchange.
    3. After AuthSuccessMessage in _handle_auth_response() — same TLS 1.3
      coverage for authenticated connections.

cassandra/cluster.pyCluster integration

  • Imports SSLSessionCache.
  • Adds ssl_session_cache class attribute with docstring.
  • __init__ accepts ssl_session_cache=_NOT_SET parameter.
  • Auto-creates an SSLSessionCache() when ssl_context or ssl_options are
    set; no configuration required for the common case.
  • Pass ssl_session_cache=None explicitly to opt out.
  • A custom SSLSessionCache(max_size=…, ttl=…) can be supplied.
  • _make_connection_kwargs() passes the cache to every Connection via
    kwargs_dict.setdefault('ssl_session_cache', self.ssl_session_cache).

cassandra/io/eventletreactor.py — Eventlet (PyOpenSSL) support

  • _wrap_socket_from_context() restores cached PyOpenSSL sessions via
    set_session() before the handshake.
  • _initiate_connection() calls _cache_pyopenssl_session() after
    do_handshake().
  • New _cache_pyopenssl_session() helper stores the session via
    get_session(), logs whether the session was reused
    (session_reused()), and catches all exceptions silently.

cassandra/io/twistedreactor.py — Twisted (PyOpenSSL) support

  • _SSLCreator.__init__ accepts an optional ssl_session_cache parameter.
  • clientConnectionForTLS() restores cached sessions via set_session().
  • info_callback() stores sessions after SSL_CB_HANDSHAKE_DONE via
    get_session(), logs reuse status.
  • TwistedConnection.add_connection() passes ssl_session_cache=self._ssl_session_cache
    to _SSLCreator.

Tests

tests/unit/test_connection.py

  • TestSSLSessionCache — empty lookup, set/get, key isolation by
    address/port/SNI, overwrite, thread safety, TTL expiration, LRU eviction,
    max_size enforcement, clear(), clear_expired(), automatic periodic
    cleanup, None session handling.
  • TestEndPointTLSSessionCacheKey — cache key correctness for
    DefaultEndPoint, SniEndPoint, UnixSocketEndPoint,
    ClientRoutesEndPoint, plus isolation between different paths/addresses.
  • TestConnectionSSLSessionRestore — session restore from cache,
    tolerance when cache is None, ssl.SSLError on session setter,
    SNI-specific cached session lookup.
  • TestConnectionCacheTLSSession — session stored after connect,
    no-op when session=None, no-op when cache=None, no-op when
    ssl_context=None, SNI-specific key used for storage.

tests/unit/test_cluster.py

  • TestSSLSessionCacheAutoCreation — auto-create with ssl_context,
    auto-create with ssl_options, no cache without TLS, explicit None
    opt-out, custom cache injection, cache passed to connection_factory.

Fixes: https://scylladb.atlassian.net/browse/DRIVER-165

Pre-review checklist

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • I added relevant tests for new features and bug fixes.
  • All commits compile, pass static checks and pass test.
  • PR description sums up the changes and reasons why they should be introduced.
  • I have provided docstrings for the public items that I want to introduce.
  • I have adjusted the documentation in ./docs/source/.
  • I added appropriate Fixes: annotations to PR description.

@Lorak-mmk
Copy link
Copy Markdown

This reduces reconnection latency and CPU overhead, especially in
deployments with short-lived connections or frequent reconnects.

Such claims would ideally be supported by benchmarks. Could you try to create some?
I very vaguely remember this feature being postponed because the performance gains were underwhelming (but perhaps memory is failing me).

@sylwiaszunejko
Copy link
Copy Markdown
Collaborator Author

This reduces reconnection latency and CPU overhead, especially in
deployments with short-lived connections or frequent reconnects.

Such claims would ideally be supported by benchmarks. Could you try to create some? I very vaguely remember this feature being postponed because the performance gains were underwhelming (but perhaps memory is failing me).

That's the goal, but you're right, I don't have any tests to prove that, removed this claim from the PR description. If I manage to create proper benchmarks I will update on that

@mykaul
Copy link
Copy Markdown

mykaul commented Apr 3, 2026

We could, if it helps, only support this for TLS 1.3.

@sylwiaszunejko sylwiaszunejko self-assigned this Apr 4, 2026
Introduce SSLSessionCache in connection.py: a thread-safe OrderedDict-based
cache with LRU eviction (max_size, default 100) and TTL expiration (default
3600s), keyed by endpoint tls_session_cache_key.

Add tls_session_cache_key property to all EndPoint subclasses:
  - DefaultEndPoint: (address, port)
  - SniEndPoint: (address, port, server_name) — prevents proxy collisions
  - UnixSocketEndPoint: (unix_socket_path,)
  - ClientRoutesEndPoint: (host_id, address, port)

Includes unit tests for basic ops, key isolation, SNI keys, overwrite,
thread safety, TTL expiration, LRU eviction, clear/clear_expired,
automatic cleanup, custom parameters, and endpoint cache key tests.
@sylwiaszunejko
Copy link
Copy Markdown
Collaborator Author

@dkropachev @Lorak-mmk I pushed changes with improvement from older Dmitry's PR, will update PR description soon

Copy link
Copy Markdown
Collaborator

@dkropachev dkropachev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rechecked the TLS session-resumption path against the current branch. The ssl_options configuration still builds a fresh SSLContext per Connection, and a cached stdlib session from the previous connection is incompatible with that new context. I reproduced the failure locally on Python 3.10.12; the session restore path raises ValueError: Session refers to a different SSLContext. Since the new code only catches AttributeError and ssl.SSLError, reconnects fail instead of falling back to a full handshake, and the regression is enabled by default because Cluster auto-creates SSLSessionCache for ssl_options.

if cached_session is not None:
try:
ssl_sock.session = cached_session
except (AttributeError, ssl.SSLError):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When TLS is configured through ssl_options, each Connection still builds a fresh SSLContext in Connection.__init__, so the cached stdlib ssl session from the previous connection is not compatible with the next one. I rechecked this against the current branch and reproduced it locally on Python 3.10.12: after a successful TLS 1.2 handshake with one context, assigning the cached session to an SSLSocket created from a different context fails with ValueError: Session refers to a different SSLContext. This block only falls back on AttributeError and ssl.SSLError, so reconnects now fail instead of doing a full handshake. Because Cluster auto-enables SSLSessionCache for ssl_options, this regresses an existing supported TLS configuration by default.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks. Fixed in the latest push:

  • ValueError is now caught in _wrap_socket_from_context(), so even if a stale session somehow reaches that part of the code, the connection falls back to a full handshake instead of crashing.

  • Auto-creation of SSLSessionCache is now limited to the ssl_context path. Since only providing ssl_options without ssl_context is deprecated I believe this is the right solution

Added a unit tests to make sure it works correctly.

- Add _ssl_session_cache attribute on Connection, set via ssl_session_cache param
- Restore cached TLS sessions in _wrap_socket_from_context with error tolerance
- Add _cache_tls_session_if_needed helper (delegates to endpoint.tls_session_cache_key)
- Cache sessions at 3 points: after connect, ReadyMessage, AuthSuccessMessage
  (handles TLS 1.3 async ticket delivery)
- Add TestConnectionSSLSessionRestore and TestConnectionCacheTLSSession tests
- Import SSLSessionCache in cluster.py
- Add ssl_session_cache attribute with comprehensive docstring
- Add ssl_session_cache parameter to Cluster.__init__ (default _NOT_SET)
- Auto-create SSLSessionCache when ssl_context or ssl_options are set
- Pass ssl_session_cache to connection factory via _make_connection_kwargs
- Add TestSSLSessionCacheAutoCreation tests (6 tests)
- EventletConnection: restore cached session before handshake via set_session(),
  store session after do_handshake() via _cache_pyopenssl_session()
- TwistedConnection: pass ssl_session_cache to _SSLCreator, restore cached
  session in clientConnectionForTLS(), store after handshake in info_callback()
- All operations wrapped in try/except for error tolerance
- Debug logging for session reuse and restore/store failures
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants