[None][feat] Add KV cache prefetch by lowsfer · Pull Request #14748 · NVIDIA/TensorRT-LLM

lowsfer · 2026-05-29T12:55:29Z

Summary by CodeRabbit

New Features
- Introduced a new prefetch capability for KV cache management that allows explicit prefetching of cache data to a specified cache level, enhancing cache operations during suspended states.
Tests
- Added test coverage for the prefetch functionality.

Description

Adds a suspended KV cache prefetch path so callers can migrate pages to a target cache level, such as staging disk-resident pages into host memory before resume() needs HBM.

The storage prefetch path prepares target-level slots, migrates pages from lower tiers, and restores eviction scheduling for pages that remain evictable after migration.

Test Coverage

LD_LIBRARY_PATH=/home/yaoy/tekit/tensorrt_llm/libs PYTHONPATH=/home/yaoy/tekit/tensorrt_llm/runtime/ python /home/yaoy/tekit/tests/unittest/kv_cache_manager_v2_tests/test_kv_cache_manager_v2.py TestResizeQuota.test_resize_quota -v

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>

coderabbitai · 2026-05-29T13:01:46Z

📝 Walkthrough

Walkthrough

This PR introduces a prefetch(target: CacheLevel) method to the KV cache manager that enables moving cached pages between storage levels when a cache is suspended. The implementation spans the public API declaration, core orchestration logic with SSM page support, storage-level page migration and eviction scheduling, and integration tests validating the feature.

Changes

KV Cache Prefetch Implementation

Layer / File(s)	Summary
Prefetch API declaration `tensorrt_llm/runtime/kv_cache_manager_v2/__init__.pyi`	Public interface declares `_KVCache.prefetch(target: CacheLevel) -> None` method.
KV cache prefetch orchestration and page handling `tensorrt_llm/runtime/kv_cache_manager_v2/_core/_kv_cache.py`	Imports `Page` type, implements `_KVCache.prefetch()` to validate target level, group active pages by pool and cache level (filtering pages below target), and delegate to storage prefetch. Updates internal `_page()` and `_block()` accessors to handle SSM pages identified by `BAD_BLOCK_ORDINAL` with matching life-cycle assertions.
Storage manager prefetch implementation `tensorrt_llm/runtime/kv_cache_manager_v2/_storage_manager.py`	Implements `StorageManager.prefetch(dst_lvl, pages)` to temporarily un-evict pages, compute evictable pages at destination level, reserve free slots, migrate pages from higher cache levels into destination level via `_batched_migrate` with `update_src=True`, and use a finally block to ensure all tracked pages are rescheduled for eviction.
Prefetch test coverage `tests/unittest/kv_cache_manager_v2_tests/test_kv_cache_manager_v2.py`	Adds helper functions to count active pages per cache level and assert evictability of host-level pages. Tests prefetch by calling `prefetch(HOST_LEVEL)` on a suspended request after quota resize and validates page distribution (GPU unchanged, disk to zero, host increased) and eviction marking.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

eopXD
lancelly

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 17.65% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title '[None][feat] Add KV cache prefetch' clearly summarizes the main change: adding a prefetch feature to the KV cache system.
Description check	✅ Passed	The PR description addresses the template requirements with a clear explanation of the feature, test coverage details, and a completed checklist.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

tests/unittest/kv_cache_manager_v2_tests/test_kv_cache_manager_v2.py (1)
1278-1293: ⚡ Quick win

Coverage is strong for the happy path, but failure-path coverage is still insufficient.

Please add negative tests in tests/unittest/kv_cache_manager_v2_tests/test_kv_cache_manager_v2.py for: (1) calling prefetch when cache status is not SUSPENDED, and (2) migration failure path to verify eviction scheduling is restored correctly after exceptions.

As per coding guidelines: “tests/**: Act as a QA engineer reviewing test changes and coverage… suggest concrete list file names and whether coverage is sufficient, insufficient, or needs follow-up outside the PR.”

Also applies to: 1348-1359
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unittest/kv_cache_manager_v2_tests/test_kv_cache_manager_v2.py` around
lines 1278 - 1293, Summary: add two negative tests to cover prefetch when cache
status is not SUSPENDED and the migration failure path to ensure eviction
scheduling is restored. Add one test in
tests/unittest/kv_cache_manager_v2_tests/test_kv_cache_manager_v2.py that sets
the cache status to a non-SUSPENDED value and calls manager.prefetch (or
_KVCache.prefetch) and asserts the call either raises the expected error or
leaves state unchanged (no pages prefetched) by inspecting
_KVCache._active_pages()/_page() counts; add a second test that simulates a
migration exception by monkeypatching the migration helper (e.g., the method
used during prefetch/migrate) to raise, call prefetch/migrate inside a
pytest.raises context, and after the exception assert that pages at HOST_LEVEL
that manager._storage.is_evictable(page) still have page.scheduled_for_eviction
True (reuse assert_prefetched_pages_are_evictable logic or check
_page/_active_pages directly) to verify eviction scheduling is restored.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/unittest/kv_cache_manager_v2_tests/test_kv_cache_manager_v2.py`:
- Around line 1278-1293: Summary: add two negative tests to cover prefetch when
cache status is not SUSPENDED and the migration failure path to ensure eviction
scheduling is restored. Add one test in
tests/unittest/kv_cache_manager_v2_tests/test_kv_cache_manager_v2.py that sets
the cache status to a non-SUSPENDED value and calls manager.prefetch (or
_KVCache.prefetch) and asserts the call either raises the expected error or
leaves state unchanged (no pages prefetched) by inspecting
_KVCache._active_pages()/_page() counts; add a second test that simulates a
migration exception by monkeypatching the migration helper (e.g., the method
used during prefetch/migrate) to raise, call prefetch/migrate inside a
pytest.raises context, and after the exception assert that pages at HOST_LEVEL
that manager._storage.is_evictable(page) still have page.scheduled_for_eviction
True (reuse assert_prefetched_pages_are_evictable logic or check
_page/_active_pages directly) to verify eviction scheduling is restored.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f6a39114-ede4-4c88-9b02-9ec33c8500e9

📥 Commits

Reviewing files that changed from the base of the PR and between c7683f2 and a3c3b55.

📒 Files selected for processing (4)

tensorrt_llm/runtime/kv_cache_manager_v2/__init__.pyi
tensorrt_llm/runtime/kv_cache_manager_v2/_core/_kv_cache.py
tensorrt_llm/runtime/kv_cache_manager_v2/_storage_manager.py
tests/unittest/kv_cache_manager_v2_tests/test_kv_cache_manager_v2.py

lowsfer · 2026-05-30T07:18:46Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-30T07:24:58Z

PR_Github #51176 [ run ] triggered by Bot. Commit: a3c3b55 Link to invocation

Add KV cache prefetch

a3c3b55

Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>

lowsfer added the api-compatible Accepted LLM API contract change that is backwards-compatible label May 29, 2026

github-actions Bot assigned lowsfer May 29, 2026

coderabbitai Bot reviewed May 29, 2026

View reviewed changes

lowsfer requested review from longlee0622 and reasonsolo May 29, 2026 14:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[None][feat] Add KV cache prefetch#14748

[None][feat] Add KV cache prefetch#14748
lowsfer wants to merge 1 commit into
NVIDIA:mainfrom
lowsfer:prefetch

lowsfer commented May 29, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 29, 2026

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

lowsfer commented May 30, 2026

Uh oh!

tensorrt-cicd commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lowsfer commented May 29, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

coderabbitai Bot commented May 29, 2026

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

lowsfer commented May 30, 2026

Uh oh!

tensorrt-cicd commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lowsfer commented May 29, 2026 •

edited by coderabbitai Bot

Loading