Rework Megatron to Support Adding New Models by FurtherAI · Pull Request #674 · OpenPipe/ART

FurtherAI · 2026-05-08T18:42:42Z

New/General Model Support

This PR turns the Megatron/vLLM/Qwen model-support work into a production path rather than a pile of model-specific patches.

Motivation / Outcome

Decouple ART from vLLM dependencies. ART no longer installs or imports vllm in the main environment. vLLM now lives in a separate art-vllm-runtime package/venv with its own lockfile, so ART can pin its own Torch/Flash/TE stack independently.
Keep dedicated and shared-GPU serving working after separation. ART launches the external vLLM runtime, uses stock OpenAI/vLLM endpoints where possible, and keeps only small ART runtime routes for sleep/wake and served-model-name control. Shared-GPU mode still uses vLLM sleep mode.
Support efficient trainer-to-vLLM weight sync without importing vLLM. The trainer side now has an ART-owned NCCL weight-transfer subset and merged-weight export path, so Megatron can stream merged weights to the isolated runtime without depending on vLLM internals.
Introduce a real Megatron model-support framework. Model-specific behavior moved behind registry/handler boundaries: target modules, dense vs MoE topology, dependency floors, native vLLM LoRA status, provider patching, LoRA wrapping/export, and architecture discovery now live in one extensible system.
Add validated Qwen model families. Registers Qwen3 dense/MoE and Qwen3.5/Qwen3.6 dense/MoE support, including Qwen3.5/3.6 text-only Megatron runtime, GatedDeltaNet layers, packed mRoPE position handling, dense-vs-MoE topology selection, and explicit MTP disablement for ART training.
Fold in most of PR Transformers 5 and Qwen 3.5/3.6 official suppor #667 in the new design. This includes Qwen3.5/Qwen3.6 support, chat-template compatibility, official vLLM upgrade direction, and native vLLM LoRA serving, but implemented through the handler/registry/runtime-isolation architecture instead of scattered patches. Does not include some features such as chat-template kwargs.
Make LoRA disk format and serving paths explicit. Canonical Megatron LoRA checkpoints on disk are vLLM-compatible; Megatron loads through handler codecs, vLLM can serve native LoRA for validated handlers, and merged serving remains available for models that need it.
GDN packed training correctness and performance. Adds shared-prefix GDN execution support, uses Megatron/TE modules for linear/norm/LoRA behavior, preserves sequence-parallel semantics, and adds compile workarounds only through model-handler policy.
Harden process lifecycle. Adds managed child-process cleanup and backend/service teardown so vLLM and Megatron subprocesses do not survive parent death or failed runs.
Add packaging and release support for the split runtime. The package build now creates and bundles the art-vllm-runtime wheel plus its pyproject.toml, uv.lock, and manifest into the root openpipe-art wheel/sdist without adding vllm to root package metadata. Release/package workflows use the new build script and validate that published ART artifacts contain the runtime bundle while keeping vLLM install-time resolution isolated to the managed runtime environment. Packaging/release has been tested locally to the extent we can, but needs to be validated in the real GitHub release process and actual installation.
Reorganize Megatron code and tests. Megatron runtime, training, weights, model support, GDN, kernels, and runtime-isolation code are grouped into clearer submodules. New integration tests live under tests/integration/megatron/....

Validation

Added a model-support workflow with stages for dependency resolution, architecture discovery, HF parity, LoRA coverage, merged vLLM serving, correctness/sensitivity, chat-template rollout, packed position ids, native vLLM LoRA, and yes/no trainability. These are the validation steps that support that a new model has been implemented correctly.
Runtime-isolation tests verify ART import does not require vLLM or trigger vLLM import side effects, and that the runtime project/env boundary works.
Representative workflow artifacts cover Qwen3 MoE, Qwen3 dense, Qwen3.5/3.6 MoE, and Qwen3.5/3.6 dense paths, including trainability gates.

Future Validation

Main things that aren't included are training benchmarks and measurements of train-inference mismatch, which would ensure that the weights that are trained are all used and do the same math (are sliced into the right places, etc.) in vLLM.

…_and_trainability_main # Conflicts: # src/art/megatron/train.py

mintlify · 2026-05-08T18:42:49Z

Preview deployment for your docs. Learn more about Mintlify Previews.

Project	Status	Preview	Updated (UTC)
openpipe-art	🟢 Ready	View Preview	May 8, 2026, 6:43 PM

💡 Tip: Enable Workflows to automatically generate PRs for you.

# Conflicts: # pyproject.toml # src/art/local/backend.py # src/art/megatron/compile_workarounds.py # src/art/megatron/flex_attention.py # src/art/megatron/jobs.py # src/art/megatron/lora.py # src/art/megatron/offload.py # src/art/megatron/provider.py # src/art/megatron/runtime/backend.py # src/art/megatron/service.py # src/art/megatron/setup.sh # src/art/megatron/train.py # src/art/pipeline_trainer/trainer.py # src/art/preprocessing/tokenize.py # src/art/tinker/renderers.py # src/art/tinker/server.py # src/art/unsloth/service.py # src/art/unsloth/train.py # src/art/vllm/patches.py # tests/integration/megatron/model_support/oracle_worker.py # tests/unit/test_preprocessing_tokenize.py # tests/unit/test_vllm_patches_contract.py # uv.lock

FurtherAI added 30 commits April 8, 2026 02:53

Plumb packed sequence length through local training backends

127fb84

Add Megatron trainability runtime and service flow

2ef7969

Fix minor regressions

c52bff6

Merge remote-tracking branch 'origin/main' into austin/deepep_compile…

0199fc1

…_and_trainability_main # Conflicts: # src/art/megatron/train.py

Install nvshmem and remove patches

3d6e892

Update CI to sm_90 for DeepEP

16fd201

Fix CI uv cache upload hangs

9dfc106

Add megatron model support phase 1 scaffolding

4481357

Extract provider hooks into qwen model handler

c0d308b

Move megatron lora traversal into model handlers

78d07e8

Add canonical megatron adapter export helpers

e356dfb

Add megatron param name canonicalization helpers

8a9672d

Add dedicated megatron merged runtime flow

906f6ef

Add split vllm runtime package

654698b

Add megatron model support discovery scaffold

04cebfa

Add non-zero oracle signal checks

b2ce459

Improve architecture coverage recommendations

549f73d

Add minimal layer coverage workflow API

0ae31ce

Remove duplicate oracle replay suite variant

1b293e5

Add SFT HF parity scaffolding

9dc5cdc

Extract megatron weight export helpers

c2bec58

Use real HF parity deltas

4da6ab9

Achieve Qwen3.5 HF parity

60bc3f1

Remove flex attention compile disable plumbing

7076db9

Wire HF parity into validation workflow

2727104

Stabilize megatron HF parity runtime

e835237

Drop HF parity delta checks

84d59e0

Wire lora coverage and correctness into workflow

362160a

Wire merged vllm serving into workflow

8e43cdd

Isolate workflow stages in subprocesses

3580730

FurtherAI added 22 commits May 5, 2026 21:43

Avoid eager model support workflow imports

3d77ba3

Use compact packed GDN kernels for local buckets

3663266

Use chunked FLA GDN kernel

5d32ac0

Use fused Megatron cross entropy

697f392

Remove legacy GDN executor path

632eefb

Add harness CE fusion override worker

4d60c94

Add GDN timing hooks to harness wrapper

d57b48e

Organize Megatron modules and integration tests

02f221b

Fix HF parity invariant handler call

06814b0

Port main dependency and lifecycle updates

df52d07

Update Qwen handler for newer bridge mappings

4c1fde1

Validate Qwen3.5 vLLM LoRA layout

6c66d67

Remove flex attention compile tuning options

470f966

Ignore train inference mismatch artifacts

6b43ef0

Avoid assert bytecode in flex attention forward

5fe1f1b

Report flex attention bias type mismatches

70e9db4

Propagate Qwen3.5 MTP shared-prefix attention

f79e63e

Forward Qwen3.5 MTP attention bias to layers

1506236

Avoid checkpointing Qwen3.5 MTP attention state

dd16e0a

Disable Qwen3.5 MTP in ART Megatron

5bf2c87

Drop MTP diagnostic flex attention changes

e9b869d

Assert Qwen3.5 ART training has no MTP

d26ecb7

mintlify Bot deployed to staging - docs May 8, 2026 18:43 View deployment

Clean PR artifacts and fix type checks

6b40e71

mintlify Bot deployed to staging - docs May 8, 2026 19:15 View deployment

FurtherAI added 4 commits May 8, 2026 19:48

Unify runtime process supervision

7edba06

Model asyncio subprocess contract in runtime tests

a31a581

Defer supervised wait coroutine creation

815d577

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework Megatron to Support Adding New Models#674

Rework Megatron to Support Adding New Models#674
FurtherAI wants to merge 195 commits intomainfrom
austin/megatron_models

FurtherAI commented May 8, 2026

Uh oh!

mintlify Bot commented May 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

FurtherAI commented May 8, 2026

New/General Model Support

Motivation / Outcome

Validation

Future Validation

Uh oh!

mintlify Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mintlify Bot commented May 8, 2026 •

edited

Loading