[Relax][Frontend][KVCache] Restructure kv_cache kernels by MasterJH5574 · Pull Request #19405 · apache/tvm

MasterJH5574 · 2026-04-15T01:11:19Z

Pure refactor — does not change any generated TIR / kernel behavior.

Dedupe the tiled prefill kernels in kv_cache.py by extracting the shared online-softmax pieces as T.macro helpers (init_states, compute_s_gemm, softmax_update_{causal,valid_length}, compute_o_gemm, advance_tile_batch, paged_store_output_lse), plus Python helpers for the common buffer allocations (softmax state, MHA/MLA Q/K/V/O, tile-walk scalars).

Split the kernel factories out of kv_cache.py into private sibling modules: _kernel_common.py (shared helpers + macros + schedule), _page_kernels.py (append/debug/copy/compact), _prefill_kernels.py (paged/ragged/MLA/dense/masked-sequence), _decode_kernels.py (decode + state merge). kv_cache.py now holds only the PagedKVCache classes and re-exports every moved symbol so existing imports keep working. tree_attn.py also switches to the shared helpers.

kv_cache.py drops from 2815 to 668 lines; the package is ~2.4k lines smaller overall. No test files modified; GPU tests pass unchanged (72 passed, 4 pre-existing skips).

gemini-code-assist

Code Review

This pull request introduces a comprehensive set of TIR kernels for LLM attention, covering decode, prefill, and paged KV-cache operations across CPU and GPU targets. It refactors shared logic into a common utility module to improve maintainability. The review feedback identifies a critical out-of-bounds indexing bug in the CPU prefill kernel, recommends dynamic calculation of data type sizes to replace hardcoded values, and points out incorrect TVMScript type hints for local variables.

python/tvm/relax/frontend/nn/llm/_prefill_kernels.py

python/tvm/relax/frontend/nn/llm/_decode_kernels.py

python/tvm/relax/frontend/nn/llm/_page_kernels.py

python/tvm/relax/frontend/nn/llm/_prefill_kernels.py

Pure refactor — does not change any generated TIR / kernel behavior. Dedupe the tiled prefill kernels in kv_cache.py by extracting the shared online-softmax pieces as @T.macro helpers (init_states, compute_s_gemm, softmax_update_{causal,valid_length}, compute_o_gemm, advance_tile_batch, paged_store_output_lse), plus Python helpers for the common buffer allocations (softmax state, MHA/MLA Q/K/V/O, tile-walk scalars). Split the kernel factories out of kv_cache.py into private sibling modules: _kernel_common.py (shared helpers + macros + schedule), _page_kernels.py (append/debug/copy/compact), _prefill_kernels.py (paged/ragged/MLA/dense/masked-sequence), _decode_kernels.py (decode + state merge). kv_cache.py now holds only the PagedKVCache classes and re-exports every moved symbol so existing imports keep working. tree_attn.py also switches to the shared helpers. kv_cache.py drops from 2815 to 668 lines; the package is ~2.4k lines smaller overall. No test files modified; GPU tests pass unchanged (72 passed, 4 pre-existing skips).

gemini-code-assist bot reviewed Apr 15, 2026

View reviewed changes

MasterJH5574 force-pushed the tvm-dev/2026-04-14-kv-cache branch from 98ea654 to 4dd41a1 Compare April 15, 2026 03:52

tlopex approved these changes Apr 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Relax][Frontend][KVCache] Restructure kv_cache kernels#19405

[Relax][Frontend][KVCache] Restructure kv_cache kernels#19405
MasterJH5574 wants to merge 1 commit intoapache:mainfrom
MasterJH5574:tvm-dev/2026-04-14-kv-cache

MasterJH5574 commented Apr 15, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MasterJH5574 commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MasterJH5574 commented Apr 15, 2026 •

edited

Loading