Skip to content

feat(lora): fix LoRA weight-sync deadlock and add stable LoRA RL recipe#20

Open
WWWjiahui wants to merge 2 commits into
Infini-AI-Lab:mainfrom
WWWjiahui:chore/bump-sglang-0.5.12
Open

feat(lora): fix LoRA weight-sync deadlock and add stable LoRA RL recipe#20
WWWjiahui wants to merge 2 commits into
Infini-AI-Lab:mainfrom
WWWjiahui:chore/bump-sglang-0.5.12

Conversation

@WWWjiahui

@WWWjiahui WWWjiahui commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Summary

Makes LoRA RL train end-to-end. Two independent fixes: a weight-sync deadlock in the RaaS engine, and a training recipe that keeps LoRA from collapsing. Validated on Qwen3-1.7B / math with a clean rising eval curve.

1. Weight-sync

On every training step the freshly trained LoRA adapter is pushed to the SGLang inference server. The swap follows a fixed sequence so that no request ever runs against half-updated weights or stale cache:

  1. Pause — the server stops accepting new requests (/pause_generation).
  2. Drain / abort — it waits out a short grace period for in-flight requests to finish, aborting any still running, so nothing is mid-generation on the old adapter.
  3. Load — the new adapter is loaded onto every server.
  4. Flush — the KV cache is flushed to drop entries computed with the old adapter's weights.
  5. Resume — generation continues; new requests now use the new adapter.

The bug. The old code reloaded the adapter under a fixed name (lora_1), which required unloading the previous one before step 3. SGLang's /unload_lora_adapter calls wait_for_unload, which blocks until the adapter's usage counter reaches zero, But the requests aborted during the drain never released their hold on that counter, so it stayed non-zero and the unload hung forever, freezing the entire pipeline. This only struck LoRA runs under load (full fine-tuning takes a different, unload-free path), which made it look intermittent.

The fix. Never unload. Each sync loads the new adapter under a fresh versioned name (lora_v1, lora_v2, …) and lets SGLang's memory-pool LRU reclaim the stale ones. The deadlock-prone unload is removed entirely, while the pause → drain/abort → load → flush → resume sequence above is unchanged, so the swap stays correct. A lingering old adapter only costs GPU memory, which is bounded by max_loras_per_batch / max_loaded_loras.

Validation

Qwen3-1.7B, math DP-scaled SGLang: clean monotonic rise over the first ~100 steps with importance weight pinned at 1.0 throughout —

metric base step 100
eval-avg/math500/avg@4 70.0 77.55
eval-avg/overall_avg 30.2 36.58

@WWWjiahui WWWjiahui requested a review from haizhongzheng as a code owner June 18, 2026 08:34
Make LoRA RL (FSDP2 trainer + SGLang inference) train end-to-end.

Weight-sync deadlock (RaaS engine):
- Each sync used to unload+reload the adapter under a fixed name (lora_1).
  SGLang's /unload_lora_adapter blocks in wait_for_unload until every
  in-flight request releases the adapter's usage counter; paused/aborted
  requests at a sync never release it, so the unload hangs forever and the
  pipeline deadlocks.
- Fix: load each sync under a fresh versioned name (lora_v{seq}) and never
  unload; SGLang's mem-pool LRU evicts stale adapters. Track the active
  name (_current_lora_name), thread it through generation requests, and
  mirror it to the eval engines that share the server.

Stable LoRA RL recipe (examples/math/qwen3-1.7b-m2po-2gpus-lora):
- LoRA's alpha/rank scaling makes each weight update much larger than
  full-FT at the same lr, so the off-policy / multi-minibatch settings
  full-FT tolerates collapse the policy under LoRA. Use near-on-policy:
  ppo_n_minibatches=1, max_staleness=1, recompute_logprob=true (lr 5e-6).
- Validated: clean rising eval, math500 avg@4 70 -> 77.5, overall 30 -> 36.6.
@WWWjiahui WWWjiahui force-pushed the chore/bump-sglang-0.5.12 branch from 5f9c0a1 to cbd7f97 Compare June 19, 2026 06:38
Add a LoRA subsection to the math recipes doc: the 2-GPU LoRA variant,
the near-on-policy settings LoRA needs (ppo_n_minibatches=1,
max_staleness=1, recompute_logprob=true), and the pause/drain/abort/load
sequence used to swap the adapter on each weight sync.
@WWWjiahui WWWjiahui changed the base branch from dev to main June 19, 2026 07:12
haizhongzheng added a commit that referenced this pull request Jun 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant