RMCache

High-Performance, Billion-Scale Off-Heap Cache for Java 25+ (LTS)

RMCache is a specialized caching library for ultra-low latency and billion-scale capacity. It keeps both keys and values off-heap through the Java Foreign Function & Memory (FFM) API — with no sun.misc.Unsafe — so it sidesteps GC pauses at massive scale and stays future-proof as Unsafe's memory-access methods are deprecated for removal from the JVM. That's the structural edge: on-heap caches are GC-bound at scale, and most other off-heap caches either keep keys on-heap (heap-bound at scale) or are built on that deprecated Unsafe.

In head-to-head JMH benchmarks, RMCache is the fastest off-heap cache measured — ahead of Chronicle Map, OHC, MapDB, and EhCache at every scale — its write latency keeps pace with on-heap Caffeine, and its eviction hit rate matches Caffeine's W-TinyLFU — all while keeping the Java heap nearly empty. See the numbers ↓

Requires JDK 25 or later (LTS release). The FFM API is stable and fully supported from JDK 25. Run with --enable-native-access=ALL-UNNAMED.

Key Features

Zero-heap data path: keys, values, and index structures live off-heap.
Java 25+ (LTS) FFM API: no Unsafe dependency.
64-bit slot packing: one 64-bit read for hash + slot.
Key-match fast path + fingerprint check to reduce unnecessary comparisons.
Zero-copy reads and large-value streaming support.
GhostCache L1: AUTO defaults to OFF_HEAP; HEAP and DISABLED are explicit opt-ins.
Memory estimator and index memory budgeting for predictable capacity planning.
Background eviction to keep hot path latency low.
Pull-based metrics with zero hot-path cost: Micrometer and OpenTelemetry bindings.
JSR-107 (JCache) provider (Phase 1): drop-in for Spring Cache and Hibernate second-level cache — core operations; see the Phase-1 limitations.

Architecture

flowchart LR
  A["Client API"] --> B["CacheBuilder"]
  B --> C["OffHeapCacheImpl"]
  C --> D["OffHeapHashTable"]
  C --> E["EntryPool"]
  E --> F["SlabAllocator"]
  F --> G["BuddyAllocator for large blocks"]
  C --> H["OffHeapGhostCache (default)"]
  C --> I["GhostCache (HEAP opt-in)"]
  D --> J["Native Memory"]
  E --> J
  F --> J
  G --> J

Performance Benchmarks

RMCache is an off-heap cache, so the fair comparison is against other off-heap caches — and it is the fastest of them at every scale. On-heap Caffeine is included only as a reference ceiling: on-heap and off-heap are different categories, because an on-heap cache never pays the cost of crossing the heap boundary. The result worth highlighting is that RMCache's PUT keeps pace with on-heap Caffeine even though every byte it stores lives off-heap.

Measured with JMH on macOS / JDK 25, 4 threads, 256-byte values, no eviction — every cache is populated and measured in the same run, so the relative ordering holds even as absolute latencies shift with hardware and load.

GET latency (ns/op) — lower is better

Cache	10K	100K	1M
RMCache	107	257	424
RMCache + GhostCache	126	236	413
Chronicle Map	251	305	445
OHC	284	430	629
MapDB	1,098	1,731	2,214
EhCache	1,639	1,765	2,003
Caffeine — on-heap reference	65	105	254

PUT latency (ns/op) — lower is better

Cache	10K	100K	1M
RMCache + GhostCache	130	278	474
RMCache	169	320	488
Chronicle Map	603	622	715
OHC	426	595	1,044
EhCache	2,460	2,740	3,095
MapDB	2,621	4,205	4,660
Caffeine — on-heap reference	152	248	511

What the numbers say

Fastest off-heap cache at every scale. RMCache leads Chronicle Map by up to 2.3× on GET and 1.5–3.6× on PUT, beats OHC by roughly 2× on both, and is 5–15× faster than MapDB and EhCache.
PUT rivals on-heap Caffeine. At 10K–100K, RMCache + GhostCache (130 / 278 ns) is actually faster than Caffeine (152 / 248 ns); at 1M they sit within ~5%. Off-heap writes normally carry a heavy penalty — RMCache nearly erases it.
GET is bounded by physics, not design. On-heap Caffeine returns an object reference directly; any off-heap cache must read native memory and materialize the value. RMCache pays that unavoidable cost and still lands within ~1.6× of Caffeine — the smallest gap of any off-heap cache here.
What you get in return: zero GC pressure and headroom for billions of entries, because no key, value, or index structure ever touches the Java heap.

Methodology: JMH AverageTime, 1 fork, 1 warmup + 2 measurement iterations — representative single-machine numbers, not error-bar-grade. Peers: Caffeine 3.1.8, Chronicle Map 2026.1, OHC 0.7.4, MapDB 3.0.9, EhCache 3.10.8.

Reproduce on your own hardware:

./gradlew jmh -Pjmh.includes="FairComparisonScaleBenchmark|OHCComparisonBenchmark"

Tail latency (ns/op at 1M entries) — lower is better

Averages hide what latency-sensitive services actually feel: the tail. This is where off-heap earns its keep — no GC means no GC-induced jitter. Measured with JMH SampleTime, 4 threads, 256-byte values.

GET

Cache	p50	p90	p99	p99.9
RMCache	417	542	667	3,248
RMCache + GhostCache	416	583	750	3,164
Chronicle Map	500	625	750	3,208
Caffeine — on-heap reference	291	500	834	7,912

PUT

Cache	p50	p90	p99	p99.9
RMCache	459	625	917	7,520
RMCache + GhostCache	417	625	1,332	7,416
Chronicle Map	834	1,084	1,332	4,960
Caffeine — on-heap reference	459	709	1,250	9,584

The tail tells the real story:

RMCache has the lowest GET tail of every cache here — including on-heap Caffeine. At p99 it is 667 ns vs Caffeine's 834 ns; at p99.9 it is 3,248 ns vs Caffeine's 7,912 ns (2.4× wider). Caffeine wins the median (291 ns) because it's on-heap — but its tail pays for GC jitter, exactly what RMCache avoids by living off-heap.
RMCache leads PUT through p99 (917 ns, the lowest of all). At the extreme p99.9, Chronicle Map's mmap write path is tighter (4,960 ns); RMCache still beats Caffeine (7,520 vs 9,584 ns).
This is the off-heap payoff: predictable tails that don't move with GC. For p99-sensitive systems, the flat tail — not the average — is the headline.

Eviction quality (hit rate)

Speed is worthless if eviction discards the wrong entries. RMCache's SLRU + TinyLFU admission holds its own against Caffeine's W-TinyLFU: at matched capacity on a Zipfian workload, hit rates land within ±1 pp of Caffeine and well above plain LRU; on a looping scan (the classic LRU killer) RMCache and Caffeine both stay scan-resistant (~88%) while LRU collapses to 0%. So the speed above does not come at a hit-rate cost.

Reproduce:

java -cp build/libs/rmcache-0.0.2-jmh.jar --enable-native-access=ALL-UNNAMED \
  com.codeabbot.rmcache.benchmark.HitRateSimulation

Why It's Fast

Off-heap caches are usually slower than on-heap ones, because every access crosses the heap boundary and contends for locks. RMCache closes that gap with deliberate design choices. Here is what actually makes the hot path fast — so you can evaluate the approach rather than trust the numbers on faith.

Read path (GET)

One 64-bit read locates an entry. The hash table packs a key fingerprint and its slot index into a single 64-bit word, so a lookup resolves with one aligned memory read instead of pointer chasing.
Lock-free optimistic reads. Reads take a StampedLock optimistic stamp — no lock on the happy path. The reader validates the stamp afterward and only falls back to a real lock if a writer interfered, so GETs essentially never block under read-heavy load.
Robin Hood hashing keeps probe sequences short and uniform, holding lookups at O(1) with low variance even at high load factors — this is what keeps tail latency flat at 1M+ entries.
Single-copy materialization. A GET copies the value from native memory into the caller's byte[] exactly once, with no intermediate buffers. For read-only access, getZeroCopy() / getView() return a bounds-checked view with zero copies.

Write path (PUT)

Slab allocator with size classes. Values land in one of 11 size classes (64 B – 64 KB) via an O(1) lookup, and a free slot is claimed with a single CAS on a bitmap — no malloc-style search and no per-write object allocation. This is why PUT keeps pace with on-heap Caffeine.
Packed allocation handles. Slot, size class, and offset are packed into primitives, so the allocator creates zero Java objects on the hot path — nothing for the GC to scan.
Large values (> 64 KB) are served by an off-heap buddy allocator, so big payloads never fragment the slabs.

Concurrency & scale

Striped locks (up to 64 shards) partition the table so writers to different keys rarely contend.
Background eviction. W-TinyLFU admission (SLRU + Count-Min sketch) runs off the hot path against high/low memory watermarks, so eviction decisions never appear in your GET/PUT latency.
Everything is off-heap by default — keys, values, the hash table, free lists, GhostCache L1, and eviction metadata. The Java heap stays nearly empty (< 50 MB for 1M entries), so GC pauses don't grow with cache size. GhostCacheMode.HEAP remains available as an explicit opt-in for small caches.
No sun.misc.Unsafe. RMCache uses only the stable Java FFM API (java.lang.foreign), so it stays forward-compatible as the JDK locks Unsafe down — unlike older Unsafe-based off-heap caches.

Full memory layout, concurrency model, and data-structure internals are documented in ARCHITECTURE.md and ARCHITECTURE-DEEP-DIVE.md.

Optimizations for 1B Scale

RMCache includes 9 optimizations designed to minimize memory overhead, reduce hot-path latency, and improve concurrency at billion-entry scale.

Category	Optimization	Impact
Memory	Compact Entry Header (24B → 20B)	–4 GB @ 1B entries
Memory	Compact LRU Metadata (9B → 8B)	–1 GB @ 1B entries
Memory	Capped FrequencySketch Table (16M max)	–7.9 GB @ 1B entries
Latency	Vectorized Key Comparison (8B/compare)	20-40% faster GET
Latency	O(1) Size Class Lookup	Faster allocation
Latency	Packed AllocationHandle (zero object GC)	No GC on hot path
Concurrency	Striped LRU Lock (up to 64 shards)	N× less lock contention
Concurrency	Larger Async Access Buffers (4096)	Better eviction accuracy
GC	Off-Heap Buddy Allocator	Eliminated heap data structures

Projected memory at 1B entries: ~326 GB (down from ~339 GB baseline).

When to Use RMCache

RMCache is built for a specific job. Use it when that job is yours — and reach for a simpler tool when it isn't.

Use RMCache when:

Your cache is large (hundreds of thousands to billions of entries) and on-heap caching would cause long GC pauses or exceed your heap budget.
You run large heaps and want cache data out of the GC's reach entirely, so pause times stay flat regardless of cache size.
You need predictable, low tail latency under concurrent load at scale.
You're on JDK 25+ and want a pure-FFM solution with no sun.misc.Unsafe.
Values are byte-array / serializable payloads (sessions, rendered fragments, feature vectors, protobuf/JSON blobs, etc.).

Prefer on-heap Caffeine when:

Your working set is small (well under ~100K entries) and fits comfortably on-heap — on-heap GET is faster and the API is simpler.
You cache live object graphs you want to read back without any serialization step.
You don't have GC-pause or heap-pressure problems to solve.

Look elsewhere when:

You need a distributed / networked cache shared across machines — RMCache is in-process; use Redis, Hazelcast, or Infinispan.
You need durability across restarts today — RMCache is in-memory (persistence is on the roadmap, not shipped).

In short: RMCache trades a small, fixed off-heap access cost for freedom from GC at scale. If GC pauses or heap limits aren't hurting you, you may not need it — and that's an honest answer.

Memory Estimator

RMCache exposes a memory estimator to size off-heap allocations accurately.

Entry size formula (approx):

EntrySize = HEADER(20) + 4 + pad(keyLen) + 4 + valueLen

Index size formula (approx):

IndexBytes = HashTable(8 * slots) + Offsets(8 * slots) + FreeList(4 * slots)

Total bytes (approx):

Total = DataBytes + IndexBytes + AllocatorOverhead(~10% default)

Example

MemoryEstimator.MemoryEstimate estimate = new CacheBuilder<String, byte[]>()
        .maxEntries(1_000_000)
        .averageKeySize(16)
        .averageValueSize(256)
        .estimateMemory();

System.out.println("Total bytes: " + estimate.totalBytes());
System.out.println("Bytes/entry: " + estimate.bytesPerEntry());

Usage

Dependency

Requires Java 25+ (LTS) with --enable-native-access=ALL-UNNAMED.

dependencies {
    implementation 'com.codeabbot:rmcache:0.0.2'
}

<dependency>
    <groupId>com.codeabbot</groupId>
    <artifactId>rmcache</artifactId>
    <version>0.0.2</version>
</dependency>

Basic Example

import com.codeabbot.rmcache.CacheBuilder;
import com.codeabbot.rmcache.GhostCacheMode;
import com.codeabbot.rmcache.OffHeapCache;
import com.codeabbot.rmcache.Units;

try (OffHeapCache<String, byte[]> cache = new CacheBuilder<String, byte[]>()
        .maxEntries(1_000_000)
        .averageKeySize(16)
        .averageValueSize(256)
        .offHeapMemory(Units.gigabytes(4))
        .ghostCacheMode(GhostCacheMode.AUTO)
        .forByteArrayValues()
        .build()) {

    cache.put("user:123", new byte[256]);
    byte[] value = cache.get("user:123");
}

Zero-Heap Profile

try (OffHeapCache<String, byte[]> cache = new CacheBuilder<String, byte[]>()
        .zeroHeapProfile() // keeps off-heap ghost cache + background eviction
        .ghostCacheMode(GhostCacheMode.AUTO)
        .maxEntries(5_000_000)
        .offHeapMemory(Units.gigabytes(16))
        .forByteArrayValues()
        .build()) {

    // zero-heap hot-path
}

Modules & Integrations

RMCache is published as a set of modules — add only what you need. All share the core version (0.0.2) and are available from Maven Central under com.codeabbot.

Module	Artifact	Purpose
Core	`com.codeabbot:rmcache`	The off-heap cache + `CacheBuilder`
Metrics	`com.codeabbot:rmcache-metrics`	Opt-in latency-sampling decorator (`MeteredOffHeapCache`) + stats snapshot
Micrometer	`com.codeabbot:rmcache-micrometer`	Binds cache stats to a Micrometer `MeterRegistry`
OpenTelemetry	`com.codeabbot:rmcache-opentelemetry`	Exposes cache stats as OpenTelemetry observable metrics
JCache (JSR-107)	`com.codeabbot:rmcache-jcache`	Standard `javax.cache` provider — Phase 1 (Spring Cache / Hibernate L2; see limitations)

Metrics — Micrometer

implementation 'com.codeabbot:rmcache-micrometer:0.0.2'

import com.codeabbot.rmcache.micrometer.RMCacheMicrometerMetrics;

RMCacheMicrometerMetrics.monitor(meterRegistry, cache, "users");
// → cache.gets{result=hit|miss}, cache.puts, cache.removes, cache.evictions,
//   cache.puts.rejected, cache.evictions.cause{cause}, cache.size,
//   cache.memory.used / cache.memory.max

Pull-based: meters read cache.getStats() only on the registry's scrape interval — never on the get/put hot path.

Metrics — OpenTelemetry

implementation 'com.codeabbot:rmcache-opentelemetry:0.0.2'

import com.codeabbot.rmcache.opentelemetry.RMCacheOpenTelemetryMetrics;

AutoCloseable handle = RMCacheOpenTelemetryMetrics.register(meter, cache, "users");
// handle.close() on shutdown to stop observing

Observable instruments — read only on the OTel export interval, never on get/put.

JCache (JSR-107)

implementation 'com.codeabbot:rmcache-jcache:0.0.2'

import javax.cache.*;
import com.codeabbot.rmcache.jcache.RMCacheConfiguration;

CachingProvider provider = Caching.getCachingProvider();   // auto-discovers RMCache
CacheManager manager = provider.getCacheManager();
Cache<String, byte[]> cache = manager.createCache("users",
        new RMCacheConfiguration<String, byte[]>()
                .setTypes(String.class, byte[].class)
                .setOffHeapMemoryBytes(4L << 30)   // RMCache-specific sizing
                .setMaxEntries(20_000_000));

cache.put("user:1", data);
byte[] v = cache.get("user:1");

Drop-in JSR-107 provider for Spring Cache and Hibernate L2: store-by-value, atomic invoke, ExpiryPolicy → TTL, and JMX statistics. See JCache Provider for the full surface and Phase-1 limitations.

Serializer Helper

For custom value types, use SerializerHelper to build a SegmentValueSerializer without boilerplate.

SegmentValueSerializer<MyType> serializer = SerializerHelper.segment(
        MyType::estimatedSize,
        (value, segment, offset, maxLen) -> value.writeTo(segment, offset, maxLen),
        (bytes, off, len) -> MyType.from(bytes, off, len));

try (OffHeapCache<String, MyType> cache = new CacheBuilder<String, MyType>()
        .valueSerializer(serializer)
        .build()) {
    cache.put("k", new MyType());
}

Configuration

Parameter	Default	Description
`maxEntries`	1,000,000	Target entry count (capacity planning).
`averageKeySize`	32	Used for memory estimation.
`averageValueSize`	256	Used for memory estimation.
`offHeapMemory`	auto	Total off-heap pool size.
`hashTableStripes`	auto	Stripe count (power of 2). Auto: 256 (\u003c100k), 1024 (100k-1M), 4096 (1M-10M), 16384 (10M+).
`entryPoolPartitions`	auto	Entry pool partitions (power of 2).
`hashTableLoadFactor`	0.60	Lower = faster probes, higher = lower index memory.
`hashTableInitialCapacity`	auto	Per-stripe hash table capacity.
`indexMemoryBudgetBytes`	unset	Budget index memory and auto-adjust load factor.
`indexMemoryBudgetPercent`	unset	Budget index memory as % of off-heap pool.
`ghostCacheMode`	AUTO	AUTO resolves to OFF_HEAP by default; HEAP and DISABLED are explicit opt-ins.
`ghostCacheSize`	auto	L1 cache capacity.
`stringKeyEncoding`	UTF8	UTF8 or LATIN1 (faster for ASCII).
`backgroundEviction`	true	Enable background eviction.
`backgroundEvictionInterval`	10ms	Eviction poll interval.
`evictionMemoryWatermarks`	0.95 / 0.90	High/low thresholds.
`prefetch`	false	Hash-table prefetching.
`slabSize`	64KB	Slab allocator chunk size.

Index Memory Budgeting

If you want predictable index memory usage, you can set a budget and let the builder derive the hash table size and load factor.

new CacheBuilder<String, byte[]>()
    .maxEntries(1_000_000)
    .offHeapMemory(Units.gigabytes(8))
    .indexMemoryBudgetPercent(0.15) // 15% of off-heap pool
    .forByteArrayValues()
    .build();

Memory Scalability

Use the built-in suite to measure actual off-heap usage at different scales:

./gradlew runMemoryScalability

This prints 10k/100k/1M measurements and a 1B extrapolation. Results depend on key/value sizes.

Latest run (macOS, Java 25, 16B keys / 256B values, 4GB off-heap):

Scale	Heap (MB)	Off-Heap (MB)	Bytes/Entry
10k	1.50	4.88	669.8
100k	0.85	48.83	520.9
1M	1.28	488.28	513.3

1B extrapolation: ~326 GB off-heap, < 50 MB heap.

Large Values

Large values (4KB, 8KB, 128KB, 512KB and above) are supported. Values larger than slab size are served by the buddy allocator. For large values, always prefer a SegmentValueSerializer to avoid heap buffers.

Notes on Load Factor

Lower load factor:

Fewer probes
Lower tail latency
Higher index memory usage

Higher load factor:

Lower index memory usage
Longer probes and higher latency at scale

Documentation

Document	Description
Getting Started	Quick start, common patterns, sizing
Eviction Policies	LRU, TTL, composite, filters, listeners
Custom Serialization	Custom types, segment serializer, framework adapters
Zero-Copy Access	`getZeroCopy`/`getView` safety and usage
Heap Profile	Heap breakdown and zero-heap configurations
Metrics	Micrometer + OpenTelemetry integration (zero hot-path cost)
JCache Provider	JSR-107 provider for Spring Cache / Hibernate L2
Benchmark Results (2026-06-20)	Full competitor sweep: TPS, average latency, tail latency
Architecture	Internals: memory layout, concurrency model, data structures
Architecture Deep Dive	Full builder reference, troubleshooting
Security Policy	Vulnerability reporting

Contributing

See CONTRIBUTING.md for guidelines. All hot-path changes require JMH benchmarks before and after — zero regressions policy.

License

Apache License 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
.github		.github
docs		docs
examples		examples
gradle		gradle
rmcache-jcache		rmcache-jcache
rmcache-metrics		rmcache-metrics
rmcache-micrometer		rmcache-micrometer
rmcache-opentelemetry		rmcache-opentelemetry
scripts		scripts
src		src
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
ARCHITECTURE-DEEP-DIVE.md		ARCHITECTURE-DEEP-DIVE.md
ARCHITECTURE.md		ARCHITECTURE.md
CHANGELOG.md		CHANGELOG.md
CLA.md		CLA.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
IMPLEMENTATION_AUDIT.md		IMPLEMENTATION_AUDIT.md
ISSUE_TRACKER.md		ISSUE_TRACKER.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
SECURITY.md		SECURITY.md
USER_MANUAL.md		USER_MANUAL.md
build.gradle		build.gradle
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

Folders and files

Latest commit

History

Repository files navigation

RMCache

Key Features

Architecture

Performance Benchmarks

GET latency (ns/op) — lower is better

PUT latency (ns/op) — lower is better

Tail latency (ns/op at 1M entries) — lower is better

Eviction quality (hit rate)

Why It's Fast

Read path (GET)

Write path (PUT)

Concurrency & scale

Optimizations for 1B Scale

When to Use RMCache

Memory Estimator

Example

Usage

Dependency

Basic Example

Zero-Heap Profile

Modules & Integrations

Metrics — Micrometer

Metrics — OpenTelemetry

JCache (JSR-107)

Serializer Helper

Configuration

Index Memory Budgeting

Memory Scalability

Large Values

Notes on Load Factor

Documentation

Contributing

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages