Multimodal: Add Cross-Attention Arch by amazloumi · Pull Request #77 · KempnerInstitute/KempnerForge

amazloumi · 2026-05-01T18:05:41Z

Summary

Adds the Cross-Attention VLM arch (arch = "cross_attention", Llama-3-V style) on top of the VLM foundation that landed in PR #76 . The residual stream stays text-only; image features flow as K/V into separate CrossAttentionBlocks inserted at a configurable cadence. CA blocks are zero-initialized so adding the arch on top of a text-only checkpoint is identity at step 0 and learns from there.

This PR stacks on PR #76, so the diff shows only the CA additions.
_RESERVED_ARCHS shrinks to ("mot",) — Mixture-of-Transformers will land in the next PR.

Testing

ruff check kempnerforge/ tests/ scripts/ — clean
ruff format --check kempnerforge/ tests/ scripts/ — clean
pyright kempnerforge/ — 0 errors / 0 warnings
sphinx-build -W --keep-going -b html docs docs/_build/html — clean (no Sphinx warnings)
pytest tests/unit/ tests/integration/ --cov --cov-branch — 1130 passed, 1 skipped, 81% coverage
torchrun --nproc_per_node=4 -m pytest tests/distributed/ --slow — 81 passed on 4× H200 (full distributed suite incl. 6 JD + 8 CA VLM tests)
CA + MoE smoke (tests/distributed/test_vlm_cross_attn_fsdp.py::TestMoEWithVLM) — passes on 2-GPU FSDP2

Follow-ups

PR 3 — Mixture-of-Transformers arch. MoTConfig + MoTBlock + MoTStrategy + JD→MoT warm-start helper. Per-modality Q/K/V/O + per-modality FFN at every layer, single global SDPA. Reserved in _RESERVED_ARCHS here.

add Cross-Attention arch

75944ff

amazloumi requested review from Naeemkh and mmshad May 1, 2026 18:05

amazloumi mentioned this pull request May 1, 2026

Multimodal: Add Mixture-of-Transformers (MoT) Arch #78

Merged

7 tasks

amazloumi added 2 commits May 1, 2026 16:26

Merge branch 'feat/multimodal' into feat/multimodal-cross-attn

0f2edd2

make reserved-arch TOML test portable across branches

61cbead

Naeemkh approved these changes May 4, 2026

View reviewed changes

amazloumi merged commit 5d151b5 into feat/multimodal May 7, 2026

amazloumi mentioned this pull request May 7, 2026

Multimodal: Add Cross-Attention and Mixture-of-Transformers Archs #79

Merged

amazloumi deleted the feat/multimodal-cross-attn branch May 7, 2026 18:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multimodal: Add Cross-Attention Arch#77

Multimodal: Add Cross-Attention Arch#77
amazloumi merged 3 commits intofeat/multimodalfrom
feat/multimodal-cross-attn

amazloumi commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

amazloumi commented May 1, 2026

Summary

Testing

Follow-ups

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants