Skip to content

perf(cosmos): strip unused fields from partition key range cache to reduce memory#46297

Draft
tvaron3 wants to merge 6 commits intoAzure:mainfrom
tvaron3:fix/strip-pk-range-fields
Draft

perf(cosmos): strip unused fields from partition key range cache to reduce memory#46297
tvaron3 wants to merge 6 commits intoAzure:mainfrom
tvaron3:fix/strip-pk-range-fields

Conversation

@tvaron3
Copy link
Copy Markdown
Member

@tvaron3 tvaron3 commented Apr 14, 2026

Summary

Three optimizations to reduce CollectionRoutingMap memory footprint when PPCB (Per Partition Circuit Breaker) is enabled. Each CosmosClient maintains its own routing map cache containing all partition key ranges. For accounts with many partitions and many client instances, this dominates memory usage.

Changes

1. Strip unused fields → compact PKRange namedtuple

_routing/aio/routing_map_provider.py + _routing/routing_map_provider.py

The service returns 13 fields per partition key range, but CollectionRoutingMap only uses 4 (id, minInclusive, maxExclusive, parents). After fetching, we now convert to a PKRange namedtuple that supports dict-style [key] access for backward compatibility.

Dropped fields: _rid, _etag, ridPrefix, _self, throughputFraction, status, ownedArchivalPKRangeIds, _ts, lsn

2. Add __slots__ to Range class

_routing/routing_range.py

Range objects store 4 instance attributes (min, max, isMinInclusive, isMaxInclusive). Adding __slots__ eliminates the per-instance __dict__, saving ~100 bytes per Range object. With 100 partitions x 150 clients = 15K Range objects.

3. Skip redundant .upper() on hex strings

_routing/routing_range.py

Range.__init__ calls .upper() unconditionally on min/max strings. The Cosmos service returns uppercase hex (e.g. 10F0F0F0...). We now check first and skip the copy when already uppercase.

Memory Profiling Results

Test setup:

  • Account: ~100 physical partitions, 2 regions (East US 2 + West US 3), multi-write
  • VM: Standard_D16s_v5
  • Tool: tracemalloc (retained memory)
  • Operations per client: 1 read_item + 1 upsert_item
  • PPCB: AZURE_COSMOS_ENABLE_CIRCUIT_BREAKER=True

Current Memory (MB)

Clients 4.15.0 Original Strip Only All 3 Patches PPCB=false
1 14.3 14.3 14.3 14.0
25 23.0 20.5 20.0 17.9
50 31.9 27.4 25.8 21.7
100 44.9 39.9 36.6 29.4
150 63.8 52.9 43.3 36.4

PPCB Overhead Reduction

Clients Original Strip Only All 3 Patches Reduction
25 5.1 MB 2.6 MB 2.1 MB -58%
50 10.3 MB 5.7 MB 4.1 MB -60%
100 15.4 MB 10.5 MB 7.2 MB -53%
150 27.4 MB 16.5 MB 6.9 MB -74%

Reproduction Script

import asyncio, os, tracemalloc
tracemalloc.start()
os.environ["AZURE_COSMOS_ENABLE_CIRCUIT_BREAKER"] = "True"

from azure.cosmos.aio import CosmosClient

N = int(os.environ.get("NUM_CLIENTS", "150"))

async def main():
    clients = []
    for i in range(N):
        c = CosmosClient(os.environ["COSMOS_URI"], os.environ["COSMOS_KEY"],
                         preferred_locations=["East US 2"])
        db = c.get_database_client("mydb")
        cont = db.get_container_client("mycont")
        try:
            await cont.read_item("x", partition_key="x")
        except Exception:
            pass
        clients.append(c)
    curr, peak = tracemalloc.get_traced_memory()
    print(f"{N} clients: {curr/1024/1024:.1f} MB current, {peak/1024/1024:.1f} MB peak")

asyncio.run(main())

Copy link
Copy Markdown
Member

@kushagraThapar kushagraThapar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tvaron3
I am curious, are there other places where we build the collection routing map? Shall we fix those as well?

@@ -39,6 +64,8 @@ class PartitionKeyRange(object):
class Range(object):
"""description of class"""

__slots__ = ('min', 'max', 'isMinInclusive', 'isMaxInclusive')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this used?

Copy link
Copy Markdown
Member Author

@tvaron3 tvaron3 Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__slots__ tells Python to store instance attributes in a fixed-size array instead of a per-instance __dict__ dictionary. Only thing we should watch out for is that we will get an error if we try to add a new field to this object at runtime, but we don't do this.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we please add a comment above to explain this, thanks!

Copy link
Copy Markdown
Member

@jeet1995 jeet1995 Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the __slots__ approach in _PartitionHealthInfo? I do not expect the attributes here to get added/removed dynamically?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I will add to PartitionHealthInfo as well and add a comment explaining

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add a comment in the next push. The __slots__ = ('min', 'max', 'isMinInclusive', 'isMaxInclusive') declaration tells Python to store instance attributes in a fixed-size C array instead of a per-instance __dict__. This reduces each Range object from ~250 bytes to ~64 bytes — significant when there are 100K+ partition ranges cached per client.

@tvaron3 tvaron3 force-pushed the fix/strip-pk-range-fields branch 2 times, most recently from 6b801a2 to 378f07e Compare April 14, 2026 06:19
for parentId in parents:
parentIds.add(parentId)
return (
PKRange(id=r[routing_range.PartitionKeyRange.Id],
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be helpful to understand if the PKRange reference can be used as-is in _GlobalPartitionEndpointManagerForCircuitBreaker and _GlobalPartitionEndpointManagerForPerPartitionAutomaticFailoverAsync.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes — PKRange is a drop-in replacement for the dict. It supports dict-style access (pk['id'], pk.get('minInclusive'), 'id' in pk) via __getitem__, get(), and __contains__. The circuit breaker and PPAF managers access partition key ranges through these same dict-style lookups, so they work transparently with PKRange without any code changes. The shared cache itself (_shared_routing_map_cache) is also transparent to those managers — they access the routing map through PartitionKeyRangeCache the same way as before; the cache just happens to be shared across clients with the same endpoint.

@tvaron3
Copy link
Copy Markdown
Member Author

tvaron3 commented Apr 14, 2026

Superseded by shared cache approach.

1. Share CollectionRoutingMap cache across clients per endpoint.
   Eliminates N-1 redundant copies when N clients target the same account.
2. Add __slots__ to Range class (64 bytes vs ~250 bytes per instance).
3. Skip .upper() when string is already uppercase.

PPCB overhead (150 clients, tracemalloc):
  Original: 27.4 MB -> Patched: ~0 MB (-100%)
  At customer scale (200K partitions x 152 clients): ~2.1 GB -> ~14 MB

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@tvaron3 tvaron3 force-pushed the fix/strip-pk-range-fields branch from 128d459 to 8b03fa2 Compare April 14, 2026 18:40
tvaron3 and others added 2 commits April 14, 2026 12:33
…storage

Convert raw service response dicts to PKRange namedtuples in both
full refresh (_build_routing_map_from_ranges) and incremental update
(process_fetched_ranges) paths. PKRange retains only 4 fields (id,
minInclusive, maxExclusive, parents) and supports dict-style access
for backward compatibility.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Import PKRange in _routing_map_provider_common.py (fixes all emulator tests)
- Fix namedtuple name mismatch (_PKRangeBase, not PKRange) for mypy
- Use raise-from pattern in PKRange.__getitem__ (pylint W0707)
- Move _locks_lock and _collection_locks init into __init__ (pylint W0201)
- Add 'pkrange' to cspell dictionary

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@tvaron3 tvaron3 force-pushed the fix/strip-pk-range-fields branch from 9f82746 to 2cd31c6 Compare April 14, 2026 21:52
…ge __slots__

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@tvaron3 tvaron3 force-pushed the fix/strip-pk-range-fields branch from bd9d741 to 5448e75 Compare April 14, 2026 21:56
tvaron3 and others added 2 commits April 14, 2026 15:45
- Widen range_tuples type to List[Tuple[Any, Any]] for PKRange compatibility
- Move pkrange word to sdk/cosmos/azure-cosmos/cspell.json (not .vscode)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use upstream's ignoreWords format, add pkrange.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants