feat(lora): fix LoRA weight-sync deadlock and add stable LoRA RL recipe by WWWjiahui · Pull Request #20 · Infini-AI-Lab/astraflow

WWWjiahui · 2026-06-18T08:34:36Z

Summary

Makes LoRA RL train end-to-end. Two independent fixes: a weight-sync deadlock in the RaaS engine, and a training recipe that keeps LoRA from collapsing. Validated on Qwen3-1.7B / math with a clean rising eval curve.

1. Weight-sync

On every training step the freshly trained LoRA adapter is pushed to the SGLang inference server. The swap follows a fixed sequence so that no request ever runs against half-updated weights or stale cache:

Pause — the server stops accepting new requests (/pause_generation).
Drain / abort — it waits out a short grace period for in-flight requests to finish, aborting any still running, so nothing is mid-generation on the old adapter.
Load — the new adapter is loaded onto every server.
Flush — the KV cache is flushed to drop entries computed with the old adapter's weights.
Resume — generation continues; new requests now use the new adapter.

The bug. The old code reloaded the adapter under a fixed name (lora_1), which required unloading the previous one before step 3. SGLang's /unload_lora_adapter calls wait_for_unload, which blocks until the adapter's usage counter reaches zero, But the requests aborted during the drain never released their hold on that counter, so it stayed non-zero and the unload hung forever, freezing the entire pipeline. This only struck LoRA runs under load (full fine-tuning takes a different, unload-free path), which made it look intermittent.

The fix. Never unload. Each sync loads the new adapter under a fresh versioned name (lora_v1, lora_v2, …) and lets SGLang's memory-pool LRU reclaim the stale ones. The deadlock-prone unload is removed entirely, while the pause → drain/abort → load → flush → resume sequence above is unchanged, so the swap stays correct. A lingering old adapter only costs GPU memory, which is bounded by max_loras_per_batch / max_loaded_loras.

Validation

Qwen3-1.7B, math DP-scaled SGLang: clean monotonic rise over the first ~100 steps with importance weight pinned at 1.0 throughout —

metric	base	step 100
`eval-avg/math500/avg@4`	70.0	77.55
`eval-avg/overall_avg`	30.2	36.58

Make LoRA RL (FSDP2 trainer + SGLang inference) train end-to-end. Weight-sync deadlock (RaaS engine): - Each sync used to unload+reload the adapter under a fixed name (lora_1). SGLang's /unload_lora_adapter blocks in wait_for_unload until every in-flight request releases the adapter's usage counter; paused/aborted requests at a sync never release it, so the unload hangs forever and the pipeline deadlocks. - Fix: load each sync under a fresh versioned name (lora_v{seq}) and never unload; SGLang's mem-pool LRU evicts stale adapters. Track the active name (_current_lora_name), thread it through generation requests, and mirror it to the eval engines that share the server. Stable LoRA RL recipe (examples/math/qwen3-1.7b-m2po-2gpus-lora): - LoRA's alpha/rank scaling makes each weight update much larger than full-FT at the same lr, so the off-policy / multi-minibatch settings full-FT tolerates collapse the policy under LoRA. Use near-on-policy: ppo_n_minibatches=1, max_staleness=1, recompute_logprob=true (lr 5e-6). - Validated: clean rising eval, math500 avg@4 70 -> 77.5, overall 30 -> 36.6.

Add a LoRA subsection to the math recipes doc: the 2-GPU LoRA variant, the near-on-policy settings LoRA needs (ppo_n_minibatches=1, max_staleness=1, recompute_logprob=true), and the pause/drain/abort/load sequence used to swap the adapter on each weight sync.

… add stable LoRA RL recipe

WWWjiahui requested a review from haizhongzheng as a code owner June 18, 2026 08:34

WWWjiahui force-pushed the chore/bump-sglang-0.5.12 branch from 5f9c0a1 to cbd7f97 Compare June 19, 2026 06:38

WWWjiahui changed the base branch from dev to main June 19, 2026 07:12

haizhongzheng added a commit that referenced this pull request Jun 19, 2026

Merge pull request #20: feat(lora): fix LoRA weight-sync deadlock and…

227d52f

… add stable LoRA RL recipe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(lora): fix LoRA weight-sync deadlock and add stable LoRA RL recipe#20

feat(lora): fix LoRA weight-sync deadlock and add stable LoRA RL recipe#20
WWWjiahui wants to merge 2 commits into
Infini-AI-Lab:mainfrom
WWWjiahui:chore/bump-sglang-0.5.12

WWWjiahui commented Jun 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

WWWjiahui commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. Weight-sync

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

WWWjiahui commented Jun 18, 2026 •

edited

Loading