Skip to content

extension/llm/server: warm append-only session resume (V2b.1)#20160

Open
mergennachin wants to merge 1 commit into
gh/mergennachin/9/headfrom
gh/mergennachin/10/head
Open

extension/llm/server: warm append-only session resume (V2b.1)#20160
mergennachin wants to merge 1 commit into
gh/mergennachin/9/headfrom
gh/mergennachin/10/head

Conversation

@mergennachin

Copy link
Copy Markdown
Contributor

Builds on the isolated named sessions: a named session now keeps its decoded
context across requests. When the next request's prompt tokens are an exact
prefix extension of the session's resident tokens (the same conversation plus a
new turn), the worker prefills ONLY the new suffix -- continuing the
KV/recurrent state in place -- instead of resetting and re-prefilling the whole
prompt. The match is exact-token (never re-tokenized text), so it is always
correct: a token mismatch, a stop-string trim, or a prior error falls back to a
full reset + prefill. This is per-session resume, not global prefix caching.

The decision lives in a dependency-free pure helper (worker_prefill_plan.h,
unit-tested standalone); worker_loop.h tracks each session's resident token ids
(invariant: resident size == session position) and executes the plan, and the
done event reports reused_prompt_tokens / prefilled_prompt_tokens /
session_reset_reason for measuring the hit rate. POST /v1/sessions/{id}/reset
clears a session's context while keeping its slot. The qwen worker's
--warm_resume (serve.py --no-warm-resume) gates the behavior for A/B measurement.

Review order: worker_prefill_plan.h + its test; then worker_loop.h (resident
tracking + plan execution); then the control-plane reset op + metrics; then docs.

Part of #20001

[ghstack-poisoned]
@pytorch-bot

pytorch-bot Bot commented Jun 9, 2026

Copy link
Copy Markdown

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20160

Note: Links to docs will display an error until the docs builds have been completed.

❌ 6 New Failures, 3 Pending

As of commit b930b9f with merge base eeb0646 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant