[SPARK-57183][SS] Close LRUCache on RocksDB.close() in unbounded memory mode by kete1987 · Pull Request #56234 · apache/spark

kete1987 · 2026-05-31T14:32:41Z

What changes were proposed in this pull request?

In unbounded memory mode (the default, boundedMemoryUsage = false), RocksDBMemoryManager creates a new LRUCache per instance:

spark/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBMemoryManager.scala

Line 185 in d7df192

(null, new LRUCache(conf.blockCacheSizeMB * 1024 * 1024))

but RocksDB.close() never calls lruCache.close():

spark/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala

Lines 2125 to 2135 in d7df192

    
           def close(): Unit = { 
        
             // Acquire DB instance lock and release at the end to allow for synchronized access 
        
             try { 
        
               closeDB() 
        
               readOptions.close() 
        
               writeOptions.close() 
        
               flushOptions.close() 
        
               nativeStats.close() 
        
               rocksDbOptions.close() 
        
               dbLogger.close()

The Java LRUCache wrapper holds a C++ shared_ptr<Cache>, so the native object is only freed when the JVM GC finalizes the wrapper — which rarely happens under low heap pressure. This causes native memory to accumulate until GC eventually runs, leading to OOM kills in long-running processes or CI runs with many RocksDB-heavy test suites.

The fix adds an explicit lruCache.close() call in RocksDB.close() for unbounded mode. In bounded mode the cache is a shared singleton managed by RocksDBMemoryManager and must not be closed per instance.

This is a separate issue from SPARK-56523 (Statistics native memory leak), which was already fixed.

Why are the changes needed?

Without explicit close(), each RocksDB instance in unbounded mode leaks one LRUCache worth of native memory (blockCacheSizeMB, default 8 MB) for as long as GC does not run. The memory is never reclaimed deterministically.

A standalone reproducer tool confirms ~8.5 MB of native memory growth per open/close cycle in leak mode vs flat memory in fixed mode:
https://github.com/kete1987/rocksdb-leak-tool

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added a test in RocksDBSuite (SPARK-57183: LRUCache is closed on RocksDB.close() in unbounded memory mode) that verifies the native handle is released after close() via LRUCache.isOwningHandle().

I affirm that the contribution is my original work and that I license the work to the project under the project's open source license.

…ry mode In unbounded memory mode (the default, boundedMemoryUsage=false), RocksDBMemoryManager creates a new LRUCache per RocksDB instance but RocksDB.close() never calls lruCache.close(). The Java LRUCache wrapper holds a C++ shared_ptr<Cache>, so the native object is only freed when the JVM GC finalizes the wrapper -- which rarely happens under low heap pressure. Closing explicitly ensures native memory is reclaimed deterministically when the instance is released. In bounded mode the cache is a shared singleton managed by RocksDBMemoryManager and must not be closed per instance. Add a test that verifies the native handle is released after close() in unbounded mode via LRUCache.isOwningHandle(). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

HeartSaVioR · 2026-06-01T04:03:28Z

I'll ask other folks to review the change (I'm a bit away from recent improvement of RocksDB state store provider), but I'm going to give a general suggestion.

Please do not remove the section of PR template and consider filling the section as one of the requirement/duty.

I affirm that the contribution is my original work and that I license the work to the project under the project's open source license.

This doesn't replace the requirement the PR template asks about the usage of LLM model and the clarification of the model. The template guides how to describe the model - please check it out again.

https://github.com/apache/spark/blob/master/.github/PULL_REQUEST_TEMPLATE

### Was this patch authored or co-authored using generative AI tooling?
<!--
If generative AI tooling has been used in the process of authoring this patch, please include the
phrase: 'Generated-by: ' followed by the name of the tool and its version.
If no, write 'No'.
Please refer to the [ASF Generative Tooling Guidance](https://www.apache.org/legal/generative-tooling.html) for details.
-->

HeartSaVioR · 2026-06-01T04:04:34Z

cc. @anishshri-db @micheal-o

kete1987 force-pushed the SPARK-57183-rocksdb-lrucache-leak branch from d4b935b to 02fd273 Compare May 31, 2026 14:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-57183][SS] Close LRUCache on RocksDB.close() in unbounded memory mode#56234

[SPARK-57183][SS] Close LRUCache on RocksDB.close() in unbounded memory mode#56234
kete1987 wants to merge 1 commit into
apache:masterfrom
kete1987:SPARK-57183-rocksdb-lrucache-leak

kete1987 commented May 31, 2026 •

edited

Loading

Uh oh!

HeartSaVioR commented Jun 1, 2026 •

edited

Loading

Uh oh!

HeartSaVioR commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	def close(): Unit = {
	// Acquire DB instance lock and release at the end to allow for synchronized access
	try {
	closeDB()

	readOptions.close()
	writeOptions.close()
	flushOptions.close()
	nativeStats.close()
	rocksDbOptions.close()
	dbLogger.close()

Conversation

kete1987 commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

HeartSaVioR commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HeartSaVioR commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kete1987 commented May 31, 2026 •

edited

Loading

HeartSaVioR commented Jun 1, 2026 •

edited

Loading