Unlocking Medical Video Generation by Scaling Granularly-annotated Medical Videos

[📃 Paper] ｜ [🤗 Dataset] ｜ [🤗 Model (1.3B)] ｜ [🤗 Model (14B)] ｜ [🚀 Blog]

🛎️ News

Apr 18, 2026: We released our training code.
Jul 8, 2025: We released our paper, data and project. Models are coming soon. Please stay tuned!

⚡ Introduction

Recent advances in video generation have shown remarkable progress in open-domain settings, yet medical video generation remains largely underexplored. Medical videos are critical for applications such as clinical training, education, and simulation, requiring not only high visual fidelity but also strict medical accuracy. However, current models often produce unrealistic or erroneous content when applied to medical prompts, largely due to the lack of large-scale, high-quality datasets tailored to the medical domain. To address this gap, we introduce MedVideoCap-55K, the first large-scale, diverse, and caption-rich dataset for medical video generation. It comprises over 55,000 curated clips spanning real-world medical scenarios, providing a strong foundation for training generalist medical video generation models. Built upon this dataset, we develop MedGen, which achieves leading performance among open-source models and rivals commercial systems across multiple benchmarks in both visual quality and medical accuracy. We hope our dataset and model can serve as a valuable resource and help catalyze further research in medical video generation.

Note

We open-sourced our models, data, and code here.

Release Progress

📚 Data

You can ⬇️download our full MedVideoCap-55K from HuggingFace. Our dataset has several features:

Superior in quantity. Our dataset comprising 55k medical videos. Supporting video generation across various medical scenarios, it includes medical education, medical imaging, clinical practice, and more.
Superior in visual quality. Our dataset is strictly selected from the aspects of aesthetics, temporal consistency, motion smoothness, and clarity assessment.
Expressive in caption. Previously proposed medical video datasets typically use category labels as captions. In contrast, our dataset provides expressive and coherent video descriptions with the help of MLLMs.

🚀 Training

# Install
pip install -r requirements.txt

# Train LoRA on 1.3B
bash scripts/train_lora_1.3B.sh

# Train full on 14B with DeepSpeed
bash scripts/train_full_14B.sh

# Generate single video
python inference.py --model_name "Wan-AI/Wan2.1-T2V-1.3B" \
  --checkpoint_dir "./models/train/wan_lora_1.3B" \
  --is_lora --prompt "your prompt"

# Batch generation
bash scripts/infer_batch.sh --metadata_path data/metadata.json

For more details, please refer to quick start.

📈 Evaluation

Please refer to evaluation.

🤩 Acknowledgement

Our works are inspired by the following works.

FastVideo: a lightweight framework for accelerating large video diffusion models.
DiffSynth-Studio: an open-source Diffusion model engine developed.
VBench: a comprehensive benchmark suite for video generative models.
VideoScore: a automatic metrics to simulate fine-grained human feedback for video generation.

📖 Citation

@misc{wang2025medgenunlockingmedicalvideo,
      title={MedGen: Unlocking Medical Video Generation by Scaling Granularly-annotated Medical Videos}, 
      author={Rongsheng Wang and Junying Chen and Ke Ji and Zhenyang Cai and Shunian Chen and Yunjin Yang and Benyou Wang},
      year={2025},
      eprint={2507.05675},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2507.05675}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets		assets
configs		configs
data		data
evaluation		evaluation
scripts		scripts
LICENSE		LICENSE
PROJECT_STRUCTURE.md		PROJECT_STRUCTURE.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
inference.py		inference.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unlocking Medical Video Generation by Scaling Granularly-annotated Medical Videos

🛎️ News

⚡ Introduction

Release Progress

📚 Data

🚀 Training

📈 Evaluation

🤩 Acknowledgement

📖 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Unlocking Medical Video Generation by Scaling Granularly-annotated Medical Videos

🛎️ News

⚡ Introduction

Release Progress

📚 Data

🚀 Training

📈 Evaluation

🤩 Acknowledgement

📖 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages