Multimmodal: add VLM foundation + Joint-Decoder arch by amazloumi · Pull Request #76 · KempnerInstitute/KempnerForge

amazloumi · 2026-05-01T15:33:50Z

Summary

Adds end-to-end VLM training to KempnerForge as a thin wrapper around the existing Transformer. Image tokens come from a frozen HF vision encoder (SigLIP2, CLIP, or a random test stub), pass through a 2-layer adapter, and feed the backbone via an arch-specific ModalityStrategy resolved through the registry.

This PR ships the foundation plus the simplest concrete arch — Joint-Decoder (image embeds prepended to the text sequence, LM head applied to text positions only). The dispatch is registry-driven and discriminated on [model.vlm].arch, so future arches (Cross-Attention, Mixture-of-Transformers) are small additive PRs without touching existing call sites. Both are listed in _RESERVED_ARCHS, so TOMLs aimed at them get a clear NotImplementedError until they land.

Default model.vlm = None keeps text-only training bit-equal to today; CI for non-VLM paths is unchanged.

CI:

Drops pytest-cov from dev deps and the [tool.coverage] blocks (consistent with the README's codecov badge removal); .github/workflows/ci.yml no longer runs --cov or uploads to Codecov

Testing

ruff check kempnerforge/ tests/ scripts/ — clean
ruff format --check kempnerforge/ tests/ scripts/ — clean
pyright kempnerforge/ — 0 errors / 0 warnings
pytest tests/unit/ tests/integration/ — 1066 passed, 1 skipped (single GPU)
torchrun --nproc_per_node=2 -m pytest tests/distributed/ — 65 passed, 8 skipped on 2× H200
tests/distributed/test_vlm_fsdp.py (6 cases): 2-GPU build + forward, FSDP-sharded grads flow, variable-length rank consistency, dtype combinatorics (encoder fp32 / adapter+transformer bf16), inner_transformer under torch.compile + FSDP2, DCP checkpoint round-trip with freeze metadata.

Follow-ups (separate PRs)

PR — Cross-Attention arch. CrossAttentionConfig + CrossAttentionBlock + CrossAttentionStrategy. Llama-3-V style: residual stream stays text-only; image K/V flows into separate CA blocks at a configurable cadence. Reserved in _RESERVED_ARCHS here.
PR — Mixture-of-Transformers arch. MoTConfig + MoTBlock + MoTStrategy + JD→MoT warm-start helper. Per-modality Q/K/V/O + per-modality FFN at every layer, single global SDPA. Reserved in _RESERVED_ARCHS here.

# Conflicts: # CHANGELOG.md

Naeemkh

Thanks, @amazloumi . While I am reviewing, please bring back the code coverage setup.

Naeemkh · 2026-05-01T16:56:49Z

-          token: ${{ secrets.CODECOV_TOKEN }}
-          slug: KempnerInstitute/KempnerForge
+      - run: uv run pytest tests/unit/ -v --timeout=60



Please return back code coverage setup.

codecov · 2026-05-01T17:39:21Z

Codecov Report

❌ Patch coverage is 93.09211% with 42 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
kempnerforge/distributed/parallel.py	66.23%	21 Missing and 5 partials ⚠️
kempnerforge/config/loader.py	73.58%	6 Missing and 8 partials ⚠️
kempnerforge/checkpoint/manager.py	97.72%	0 Missing and 1 partial ⚠️
kempnerforge/data/vlm_dataset.py	98.96%	0 Missing and 1 partial ⚠️

Files with missing lines	Coverage Δ
kempnerforge/config/checkpoint.py	`91.30% <100.00%> (+0.39%)`	⬆️
kempnerforge/config/data.py	`100.00% <100.00%> (ø)`
kempnerforge/config/job.py	`96.61% <100.00%> (+0.69%)`	⬆️
kempnerforge/config/model.py	`100.00% <100.00%> (ø)`
kempnerforge/config/registry.py	`100.00% <100.00%> (ø)`
kempnerforge/config/schema.py	`100.00% <100.00%> (ø)`
kempnerforge/config/vlm.py	`100.00% <100.00%> (ø)`
kempnerforge/data/dataloader.py	`82.00% <100.00%> (+1.56%)`	⬆️
kempnerforge/distributed/setup.py	`50.00% <100.00%> (ø)`
kempnerforge/model/modality.py	`100.00% <100.00%> (ø)`
... and 9 more

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

add VLM foundation + Joint-Decoder arch

6de4919

amazloumi requested a review from mmshad May 1, 2026 15:33

Merge branch 'main' into feat/multimodal

9c9e6e7

# Conflicts: # CHANGELOG.md

amazloumi requested a review from Naeemkh May 1, 2026 15:49

fix docstring

2ed56ab

Naeemkh requested changes May 1, 2026

View reviewed changes

amazloumi added 2 commits May 1, 2026 13:26

restore pytest-cov and Codecov upload

d75c732

bump uv.lock revision to match main

5f7c780

This was referenced May 1, 2026

Multimodal: Add Cross-Attention Arch #77

Merged

Multimodal: Add Mixture-of-Transformers (MoT) Arch #78

Merged

add unit tests to lift coverage above CI threshold

ea2f1c2

amazloumi requested a review from Naeemkh May 1, 2026 20:44

Naeemkh approved these changes May 4, 2026

View reviewed changes

amazloumi merged commit e8d710b into main May 7, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multimmodal: add VLM foundation + Joint-Decoder arch#76

Multimmodal: add VLM foundation + Joint-Decoder arch#76
amazloumi merged 6 commits intomainfrom
feat/multimodal

amazloumi commented May 1, 2026

Uh oh!

Naeemkh left a comment

Uh oh!

Naeemkh May 1, 2026

Uh oh!

amazloumi May 1, 2026

Uh oh!

codecov Bot commented May 1, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

amazloumi commented May 1, 2026

Summary

Testing

Follow-ups (separate PRs)

Uh oh!

Naeemkh left a comment

Choose a reason for hiding this comment

Uh oh!

Naeemkh May 1, 2026

Choose a reason for hiding this comment

Uh oh!

amazloumi May 1, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented May 1, 2026 •

edited

Loading