This is the official implementation of the approach described in:
FMPose3D: monocular 3D pose estimation via flow matching CVPR 2026
Ti Wang, Xiaohang Yu, Mackenzie Weygandt Mathis
FMPose3D creates a 3D pose from a single 2D image. It leverages fast Flow Matching, generating multiple plausible 3D poses via an ODE in just a few steps, then aggregates them using a reprojection-based Bayesian module (RPEA) for accurate predictions, achieving state-of-the-art results on human and animal 3D pose benchmarks.
- Feb 2026: The FMPose3D paper was accepted to CVPR 2026! π₯
- Feb 2026: the FMPose3D code and our arXiv paper is released - check out the demos here or on our project page
- March 2026: This method is integrated into DeepLabCut
Make sure you have Python 3.10. The installation and demos are tested with Python 3.10. You can set this up with:
conda create -n fmpose_3d python=3.10
conda activate fmpose_3d
pip install fmpose3dFor the animal pipeline, install the optional DeepLabCut dependency:
pip install "fmpose3d[animals]"PyTorch/CUDA note. FMPose3D pins
torch>=2.4.1,<2.5andtorchvision>=0.19.1,<0.20, which use CUDA 12.1 wheels by default on Linux. If your driver does not support CUDA 12.1, or if you need a specific CUDA build, install PyTorch first using the matching command from pytorch.org, then installfmpose3d.
This visualization script is designed for single-frame based model, allowing you to easily run 3D human pose estimation on any single image.
Pre-trained weights are downloaded automatically from Hugging Face the first time you run inference, so no manual setup is needed.
Alternatively, you can use your own trained weights or download ours from Google Drive, place them in the ./pre_trained_models directory, and set model_weights_path in the shell script (e.g. demo/vis_in_the_wild.sh).
Next, put your test images into folder demo/images. Then run the visualization script:
sh vis_in_the_wild.shThe predictions will be saved to folder demo/predictions.
You can obtain the Human3.6M dataset from the Human3.6M website, and then set it up using the instructions provided in VideoPose3D.
You also can access the processed data by downloading it from here.
Place the downloaded files in the dataset/ folder of this project:
<project_root>/
βββ dataset/
β βββ data_3d_h36m.npz
β βββ data_2d_h36m_gt.npz
β βββ data_2d_h36m_cpn_ft_h36m_dbb.npz
The training logs, checkpoints, and related files of each training time will be saved in the './checkpoint' folder.
For training on Human3.6M:
sh ./scripts/FMPose3D_train.shPre-trained weights are fetched automatically from Hugging Face on the first run. You can also use local weights by setting model_weights_path in the shell script (see Demos above for details).
To run inference on Human3.6M:
sh ./scripts/FMPose3D_test.shFMPose3D also ships a high-level Python API for end-to-end 3D pose estimation from images. See the Inference API documentation for the full reference.
For animal training/testing and demo scripts, see animals/README.md. The animal demo auto-downloads both checkpoints (a 26-joint SuperAnimal-Quadruped fine-tuned on Animal3D for 2D pose, and the FMPose3D animal flow-matching lifter for 3D) from Hugging Face on first run β no manual setup needed.
@misc{wang2026fmpose3dmonocular3dpose,
title={FMPose3D: monocular 3D pose estimation via flow matching},
author={Ti Wang and Xiaohang Yu and Mackenzie Weygandt Mathis},
year={2026},
journal={CVPR},
url={https://arxiv.org/abs/2602.05755},
}
We thank the Swiss National Science Foundation (SNSF Project # 320030-227871) and the Kavli Foundation for providing financial support for this project.
Our code is extended from the following repositories. We thank the authors for releasing the code.

