Resolve VL-JEPA mode collapse on real images

## Goal

Get the VL-JEPA captioning loop to produce instance-discriminative outputs on real-world images (COCO and similar), not the per-image-identical caption observed in current validation runs.

## Current state

From `VALIDATION_FINDINGS.md`: 18M-param VL-JEPA hits 100% on synthetic colored shapes but mode-collapses across 4 runs on real COCO (baseline, stronger conditioning, prefix tokens + FiLM, word dropout — all collapse). The K=16 latent bottleneck carries category-level info (32.1% on 20-class, 6.4× above random) but not instance-level detail. Decoder learns to ignore the latent plan.

## Hypotheses to test

1. **Scale.** Train a 200M+ param model end-to-end on a larger dataset; see whether the bottleneck enriches enough to break collapse.
2. **Bottleneck capacity.** Increase K (16 → 64+) and/or per-vector dim; observe whether decoder starts using the plan.
3. **Auxiliary loss.** Add a contrastive or reconstructive auxiliary on the latent plan to force per-image distinctiveness before the decoder sees it.
4. **Decoder regularization.** Schedule prefix-token dropout / word-dropout differently to prevent decoder from learning the unconditional caption distribution.

## Acceptance

- A configuration where decoder produces image-specific captions on a held-out COCO subset, with caption diversity above a defined threshold (e.g. distinct-2 score > X).
- Findings written up in `VALIDATION_FINDINGS.md` (or a successor doc).

## Notes

- This is research, not pure implementation. Budget accordingly.
- The native captioning capability is downstream of this. Sponsor/dependent alignment scoring is **not** — that work uses the LLM backbone instead.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resolve VL-JEPA mode collapse on real images #10

Goal

Current state

Hypotheses to test

Acceptance

Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Resolve VL-JEPA mode collapse on real images #10

Description

Goal

Current state

Hypotheses to test

Acceptance

Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions