Skip to content

Multimmodal: add VLM foundation + Joint-Decoder arch#76

Merged
amazloumi merged 6 commits intomainfrom
feat/multimodal
May 7, 2026
Merged

Multimmodal: add VLM foundation + Joint-Decoder arch#76
amazloumi merged 6 commits intomainfrom
feat/multimodal

Conversation

@amazloumi
Copy link
Copy Markdown
Member

Summary

Adds end-to-end VLM training to KempnerForge as a thin wrapper around the existing Transformer. Image tokens come from a frozen HF vision encoder (SigLIP2, CLIP, or a random test stub), pass through a 2-layer adapter, and feed the backbone via an arch-specific ModalityStrategy resolved through the registry.

This PR ships the foundation plus the simplest concrete arch — Joint-Decoder (image embeds prepended to the text sequence, LM head applied to text positions only). The dispatch is registry-driven and discriminated on [model.vlm].arch, so future arches (Cross-Attention, Mixture-of-Transformers) are small additive PRs without touching existing call sites. Both are listed in _RESERVED_ARCHS, so TOMLs aimed at them get a clear NotImplementedError until they land.

Default model.vlm = None keeps text-only training bit-equal to today; CI for non-VLM paths is unchanged.

CI:

  • Drops pytest-cov from dev deps and the [tool.coverage] blocks (consistent with the README's codecov badge removal); .github/workflows/ci.yml no longer runs --cov or uploads to Codecov

Testing

  • ruff check kempnerforge/ tests/ scripts/ — clean
  • ruff format --check kempnerforge/ tests/ scripts/ — clean
  • pyright kempnerforge/ — 0 errors / 0 warnings
  • pytest tests/unit/ tests/integration/ — 1066 passed, 1 skipped (single GPU)
  • torchrun --nproc_per_node=2 -m pytest tests/distributed/ — 65 passed, 8 skipped on 2× H200
  • tests/distributed/test_vlm_fsdp.py (6 cases): 2-GPU build + forward, FSDP-sharded grads flow, variable-length rank consistency, dtype combinatorics (encoder fp32 / adapter+transformer bf16), inner_transformer under torch.compile + FSDP2, DCP checkpoint round-trip with freeze metadata.

Follow-ups (separate PRs)

  • PR — Cross-Attention arch. CrossAttentionConfig + CrossAttentionBlock + CrossAttentionStrategy. Llama-3-V style: residual stream stays text-only; image K/V flows into separate CA blocks at a configurable cadence. Reserved in _RESERVED_ARCHS here.
  • PR — Mixture-of-Transformers arch. MoTConfig + MoTBlock + MoTStrategy + JD→MoT warm-start helper. Per-modality Q/K/V/O + per-modality FFN at every layer, single global SDPA. Reserved in _RESERVED_ARCHS here.

@amazloumi amazloumi requested a review from mmshad May 1, 2026 15:33
# Conflicts:
#	CHANGELOG.md
@amazloumi amazloumi requested a review from Naeemkh May 1, 2026 15:49
Copy link
Copy Markdown
Member

@Naeemkh Naeemkh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @amazloumi . While I am reviewing, please bring back the code coverage setup.

Comment thread .github/workflows/ci.yml
token: ${{ secrets.CODECOV_TOKEN }}
slug: KempnerInstitute/KempnerForge
- run: uv run pytest tests/unit/ -v --timeout=60

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please return back code coverage setup.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@codecov
Copy link
Copy Markdown

codecov Bot commented May 1, 2026

Codecov Report

❌ Patch coverage is 93.09211% with 42 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
kempnerforge/distributed/parallel.py 66.23% 21 Missing and 5 partials ⚠️
kempnerforge/config/loader.py 73.58% 6 Missing and 8 partials ⚠️
kempnerforge/checkpoint/manager.py 97.72% 0 Missing and 1 partial ⚠️
kempnerforge/data/vlm_dataset.py 98.96% 0 Missing and 1 partial ⚠️
Files with missing lines Coverage Δ
kempnerforge/config/checkpoint.py 91.30% <100.00%> (+0.39%) ⬆️
kempnerforge/config/data.py 100.00% <100.00%> (ø)
kempnerforge/config/job.py 96.61% <100.00%> (+0.69%) ⬆️
kempnerforge/config/model.py 100.00% <100.00%> (ø)
kempnerforge/config/registry.py 100.00% <100.00%> (ø)
kempnerforge/config/schema.py 100.00% <100.00%> (ø)
kempnerforge/config/vlm.py 100.00% <100.00%> (ø)
kempnerforge/data/dataloader.py 82.00% <100.00%> (+1.56%) ⬆️
kempnerforge/distributed/setup.py 50.00% <100.00%> (ø)
kempnerforge/model/modality.py 100.00% <100.00%> (ø)
... and 9 more
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@amazloumi amazloumi requested a review from Naeemkh May 1, 2026 20:44
@amazloumi amazloumi merged commit e8d710b into main May 7, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants