extension/llm/server: warm append-only session resume (V2b.1) by mergennachin · Pull Request #20160 · pytorch/executorch

mergennachin · 2026-06-09T16:23:17Z

Builds on the isolated named sessions: a named session now keeps its decoded
context across requests. When the next request's prompt tokens are an exact
prefix extension of the session's resident tokens (the same conversation plus a
new turn), the worker prefills ONLY the new suffix -- continuing the
KV/recurrent state in place -- instead of resetting and re-prefilling the whole
prompt. The match is exact-token (never re-tokenized text), so it is always
correct: a token mismatch, a stop-string trim, or a prior error falls back to a
full reset + prefill. This is per-session resume, not global prefix caching.

The decision lives in a dependency-free pure helper (worker_prefill_plan.h,
unit-tested standalone); worker_loop.h tracks each session's resident token ids
(invariant: resident size == session position) and executes the plan, and the
done event reports reused_prompt_tokens / prefilled_prompt_tokens /
session_reset_reason for measuring the hit rate. POST /v1/sessions/{id}/reset
clears a session's context while keeping its slot. The qwen worker's
--warm_resume (serve.py --no-warm-resume) gates the behavior for A/B measurement.

Review order: worker_prefill_plan.h + its test; then worker_loop.h (resident
tracking + plan execution); then the control-plane reset op + metrics; then docs.

Part of #20001

[ghstack-poisoned]

mergennachin · 2026-06-09T16:23:18Z

Stack from ghstack (oldest at bottom):

pytorch-bot · 2026-06-09T16:23:21Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20160

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 6 New Failures, 3 Pending

As of commit b930b9f with merge base eeb0646 ():

NEW FAILURES - The following jobs have failed:

pull / test-llama-runner-qnn-linux (fp32, qnn_16a16w, qnn) / linux-job (gh)
RuntimeError: Command docker exec -t 0f6f17a0f3974026c429abbbeed1bad55780f2605fd25428bd1ca3cf9012482e /exec failed with exit code 92
pull / unittest / linux / linux-job (gh)
RuntimeError: Command docker exec -t 0c86ab577016019fafc1aeda5f1a3d39579fe39ffe62fc5d2d936c4bd17be9f5 /exec failed with exit code 1
pull / unittest / macos / macos-job (gh)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1
pull / unittest-editable / linux / linux-job (gh)
RuntimeError: Command docker exec -t 069bfa93fb96275cb56393b9a6ebc0b635bd51bd53c663894cf07e852fad4062 /exec failed with exit code 1
pull / unittest-editable / macos / macos-job (gh)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1
pull / unittest-nxp-neutron / linux-job (gh)
RuntimeError: Command docker exec -t 35221cd6b022a631bfc59a5a81c442a980f1c3fdfef403ffc5cb3cd6dfc8c953 /exec failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[INITIAL] Update

b930b9f

[ghstack-poisoned]

mergennachin requested review from kirklandsign and larryliu0820 as code owners June 9, 2026 16:23

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

extension/llm/server: warm append-only session resume (V2b.1)#20160

extension/llm/server: warm append-only session resume (V2b.1)#20160
mergennachin wants to merge 1 commit into
gh/mergennachin/9/headfrom
gh/mergennachin/10/head

mergennachin commented Jun 9, 2026

Uh oh!

mergennachin commented Jun 9, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mergennachin commented Jun 9, 2026

Uh oh!

mergennachin commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20160

❌ 6 New Failures, 3 Pending

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mergennachin commented Jun 9, 2026 •

edited

Loading

pytorch-bot Bot commented Jun 9, 2026 •

edited

Loading