Skip to content

[None][feat] Add KV cache prefetch#14748

Open
lowsfer wants to merge 1 commit into
NVIDIA:mainfrom
lowsfer:prefetch
Open

[None][feat] Add KV cache prefetch#14748
lowsfer wants to merge 1 commit into
NVIDIA:mainfrom
lowsfer:prefetch

Conversation

@lowsfer
Copy link
Copy Markdown
Member

@lowsfer lowsfer commented May 29, 2026

Summary by CodeRabbit

  • New Features

    • Introduced a new prefetch capability for KV cache management that allows explicit prefetching of cache data to a specified cache level, enhancing cache operations during suspended states.
  • Tests

    • Added test coverage for the prefetch functionality.

Review Change Stack

Description

Adds a suspended KV cache prefetch path so callers can migrate pages to a target cache level, such as staging disk-resident pages into host memory before resume() needs HBM.

The storage prefetch path prepares target-level slots, migrates pages from lower tiers, and restores eviction scheduling for pages that remain evictable after migration.

Test Coverage

  • LD_LIBRARY_PATH=/home/yaoy/tekit/tensorrt_llm/libs PYTHONPATH=/home/yaoy/tekit/tensorrt_llm/runtime/ python /home/yaoy/tekit/tests/unittest/kv_cache_manager_v2_tests/test_kv_cache_manager_v2.py TestResizeQuota.test_resize_quota -v

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>
@lowsfer lowsfer added the api-compatible Accepted LLM API contract change that is backwards-compatible label May 29, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 29, 2026

📝 Walkthrough

Walkthrough

This PR introduces a prefetch(target: CacheLevel) method to the KV cache manager that enables moving cached pages between storage levels when a cache is suspended. The implementation spans the public API declaration, core orchestration logic with SSM page support, storage-level page migration and eviction scheduling, and integration tests validating the feature.

Changes

KV Cache Prefetch Implementation

Layer / File(s) Summary
Prefetch API declaration
tensorrt_llm/runtime/kv_cache_manager_v2/__init__.pyi
Public interface declares _KVCache.prefetch(target: CacheLevel) -> None method.
KV cache prefetch orchestration and page handling
tensorrt_llm/runtime/kv_cache_manager_v2/_core/_kv_cache.py
Imports Page type, implements _KVCache.prefetch() to validate target level, group active pages by pool and cache level (filtering pages below target), and delegate to storage prefetch. Updates internal _page() and _block() accessors to handle SSM pages identified by BAD_BLOCK_ORDINAL with matching life-cycle assertions.
Storage manager prefetch implementation
tensorrt_llm/runtime/kv_cache_manager_v2/_storage_manager.py
Implements StorageManager.prefetch(dst_lvl, pages) to temporarily un-evict pages, compute evictable pages at destination level, reserve free slots, migrate pages from higher cache levels into destination level via _batched_migrate with update_src=True, and use a finally block to ensure all tracked pages are rescheduled for eviction.
Prefetch test coverage
tests/unittest/kv_cache_manager_v2_tests/test_kv_cache_manager_v2.py
Adds helper functions to count active pages per cache level and assert evictability of host-level pages. Tests prefetch by calling prefetch(HOST_LEVEL) on a suspended request after quota resize and validates page distribution (GPU unchanged, disk to zero, host increased) and eviction marking.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

  • eopXD
  • lancelly
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 17.65% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title '[None][feat] Add KV cache prefetch' clearly summarizes the main change: adding a prefetch feature to the KV cache system.
Description check ✅ Passed The PR description addresses the template requirements with a clear explanation of the feature, test coverage details, and a completed checklist.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/unittest/kv_cache_manager_v2_tests/test_kv_cache_manager_v2.py (1)

1278-1293: ⚡ Quick win

Coverage is strong for the happy path, but failure-path coverage is still insufficient.

Please add negative tests in tests/unittest/kv_cache_manager_v2_tests/test_kv_cache_manager_v2.py for: (1) calling prefetch when cache status is not SUSPENDED, and (2) migration failure path to verify eviction scheduling is restored correctly after exceptions.

As per coding guidelines: “tests/**: Act as a QA engineer reviewing test changes and coverage… suggest concrete list file names and whether coverage is sufficient, insufficient, or needs follow-up outside the PR.”

Also applies to: 1348-1359

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unittest/kv_cache_manager_v2_tests/test_kv_cache_manager_v2.py` around
lines 1278 - 1293, Summary: add two negative tests to cover prefetch when cache
status is not SUSPENDED and the migration failure path to ensure eviction
scheduling is restored. Add one test in
tests/unittest/kv_cache_manager_v2_tests/test_kv_cache_manager_v2.py that sets
the cache status to a non-SUSPENDED value and calls manager.prefetch (or
_KVCache.prefetch) and asserts the call either raises the expected error or
leaves state unchanged (no pages prefetched) by inspecting
_KVCache._active_pages()/_page() counts; add a second test that simulates a
migration exception by monkeypatching the migration helper (e.g., the method
used during prefetch/migrate) to raise, call prefetch/migrate inside a
pytest.raises context, and after the exception assert that pages at HOST_LEVEL
that manager._storage.is_evictable(page) still have page.scheduled_for_eviction
True (reuse assert_prefetched_pages_are_evictable logic or check
_page/_active_pages directly) to verify eviction scheduling is restored.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/unittest/kv_cache_manager_v2_tests/test_kv_cache_manager_v2.py`:
- Around line 1278-1293: Summary: add two negative tests to cover prefetch when
cache status is not SUSPENDED and the migration failure path to ensure eviction
scheduling is restored. Add one test in
tests/unittest/kv_cache_manager_v2_tests/test_kv_cache_manager_v2.py that sets
the cache status to a non-SUSPENDED value and calls manager.prefetch (or
_KVCache.prefetch) and asserts the call either raises the expected error or
leaves state unchanged (no pages prefetched) by inspecting
_KVCache._active_pages()/_page() counts; add a second test that simulates a
migration exception by monkeypatching the migration helper (e.g., the method
used during prefetch/migrate) to raise, call prefetch/migrate inside a
pytest.raises context, and after the exception assert that pages at HOST_LEVEL
that manager._storage.is_evictable(page) still have page.scheduled_for_eviction
True (reuse assert_prefetched_pages_are_evictable logic or check
_page/_active_pages directly) to verify eviction scheduling is restored.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f6a39114-ede4-4c88-9b02-9ec33c8500e9

📥 Commits

Reviewing files that changed from the base of the PR and between c7683f2 and a3c3b55.

📒 Files selected for processing (4)
  • tensorrt_llm/runtime/kv_cache_manager_v2/__init__.pyi
  • tensorrt_llm/runtime/kv_cache_manager_v2/_core/_kv_cache.py
  • tensorrt_llm/runtime/kv_cache_manager_v2/_storage_manager.py
  • tests/unittest/kv_cache_manager_v2_tests/test_kv_cache_manager_v2.py

@lowsfer lowsfer requested review from longlee0622 and reasonsolo May 29, 2026 14:20
@lowsfer
Copy link
Copy Markdown
Member Author

lowsfer commented May 30, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #51176 [ run ] triggered by Bot. Commit: a3c3b55 Link to invocation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api-compatible Accepted LLM API contract change that is backwards-compatible

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants