feat(p2p): add shard-level weight update with automatic broadcast fallback#2146
Open
CalvinXKY wants to merge 1 commit into
Open
feat(p2p): add shard-level weight update with automatic broadcast fallback#2146CalvinXKY wants to merge 1 commit into
CalvinXKY wants to merge 1 commit into
Conversation
43a842a to
7571ef4
Compare
Contributor
Author
Benchmark: Qwen3-8B P2P vs NCCL broadcast (non-colocate, TP=4)Setup
Logs (local copy): Procedure
Results (
|
| Step 1 (warmup) | Steady-state mean (steps 2–49) | Median | Min | Max | Std dev | |
|---|---|---|---|---|---|---|
| P2P | 3.17 s | 0.484 s | 0.483 s | 0.406 s | 0.733 s | 0.054 s |
| NCCL broadcast | 3.77 s | 0.755 s | 0.751 s | 0.696 s | 0.840 s | 0.034 s |
| Delta | — | 1.56× faster | — | — | — | −35.9% sync time |
- 49 logged rollout steps (steps 1–49); step 1 excluded from steady-state stats due to first-connection warmup.
- Speedup is on the weight sync phase only, not end-to-end step time (rollout + train still dominate total step latency).
Takeaway
On Qwen3-8B non-colocate TP=4, P2P shard sync reduces steady-state update_weights_time from ~0.76s to ~0.48s versus the existing NCCL broadcast path, with stable training to completion over 49 rollout steps.
Contributor
Author
16 tasks
…lback Add P2P dist.send/recv path for non-colocate full weight sync when Megatron Bridge, shard conversion, and TP alignment preconditions are met; otherwise fall back to NCCL broadcast with a rank-0 log message. - Apply sglang_p2p.patch in Docker build after slime sglang patches - Add Qwen2/Qwen3 shard Megatron→HF conversion and updater implementation - Wire --use-p2p-weight-update and SGLang tp_tensor_counts payload - Document usage in docs/zh|en/advanced/p2p-weight-sync.md
7571ef4 to
839df78
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
Add an opt-in P2P shard weight sync path for non-colocate RL training with
--update-weight-mode full. Each Megatron TP rank sends its local HF shard to the matching SGLang TP rank viadist.send/recv, avoiding the default all_gather + NCCL broadcast.If preconditions are not met, slime automatically falls back to NCCL broadcast and logs the reason on rank 0. No breaking changes — default behavior is unchanged without
--use-p2p-weight-update.Motivation
In non-colocate mode, gathering full weights on the training side and broadcasting to rollout engines dominates
update_weightstime for large models. P2P shard sync keeps weights sharded end-to-end.Changes
--use-p2p-weight-updateflag (requires--update-weight-mode full)UpdateWeightFromDistributedP2P+ Qwen2/Qwen3 dense shard Megatron→HF conversioncommon.pysglang_p2p.patchapplied in Dockerfile aftersglang.patch/sglang-top_p.patchdocs/en/advanced/p2p-weight-sync.md,docs/zh/advanced/p2p-weight-sync.mdUsage
Preconditions (auto-fallback)
UpdateWeightFromTensor--megatron-to-hf-mode bridgesglang_pp_size > 1falls back (P2P send/recv pairing is TP-only in Phase 1)Megatron training PP and SGLang inference PP need not match when SGLang PP=1.
Manual SGLang patch (for reviewers only)
Merged PRs apply the patch at Docker build time. To validate on an existing
slimerl/slime:latestcontainer (which already hassglang.patch+sglang-top_p.patch) without rebuilding:Then run a non-colocate smoke test with
--use-p2p-weight-update(see docs).Testing
Validated on 8×A100,
slimerl/slime:latest+ manual patch above, Qwen3-4B TP=4:exit_code=0)update_weights~0.33–0.40ssglang_p2p.patchpassesgit apply --checkon top of slime SGLang patchesTest plan
docker buildwithENABLE_SGLANG_PATCH=1succeeds (includessglang_p2p.patch)--use-p2p-weight-update--megatron-to-hf-mode rawfalls back to broadcast with[P2P]logsglang_pp_size > 1falls back to broadcast with[P2P]log