This is Hybrid 3D.
Built from AI depth + custom stereo logic —
Designed for cinema in VR.
Click to download or support the project 💙
- Key Features
- Guide Sheet: Install
- Guide Sheet: GUI Inputs
- Troubleshooting
- Dev Notes
- Acknowledgments & Credits
- Supports 20+ transformer-based models: ZoeDepth, Depth Anything, MiDaS, DPT, DepthPro, DINOv2, etc.
- One-click model selection with automatic downloads – zero CLI/config required
- PyTorch GPU-accelerated inference (no OpenCV recompile needed)
- Batch-ready: process image folders or video sequences
- Includes temporal smoothing and scene-adaptive normalization
- Built-in color inversion and customizable colormaps (Viridis, Inferno, Magma)
- Real-time GUI: progress bar, FPS meter, ETA tracker
- Handles large resolutions via auto-resizing + smart batching
- Full GPU task control: pause/resume/cancel
- Optional local ONNX model + TensorRT for max performance
- CUDA + PyTorch-powered depth parallax shifting (pixel-accurate)
- Fine-tuned control: foreground pop / midground balance / background pull
- Export formats: Half-SBS, Full-SBS, VR180, Anaglyph, Passive Interlaced
- Floating window engine with cinema-style dynamic masking
- Supports feathered masking, edge-aware smoothing, sharpening
- Convergence tracking based on subject movement
- Live performance stats: FPS, elapsed time, percent complete
- Integrated RIFE ONNX model – PyTorch-free, real-time frame doubling
- Supports 2x, 4x, 8x FPS interpolation
- Processes raw image folders + auto video reassembly
- Maintains frame count, resolution, audio sync, and aspect ratio
- Preview and export using FFmpeg codecs (GUI-integrated)
- Real-time progress, FPS, ETA feedback
- Uses Real-ESRGAN x4, exported to ONNX with full CUDA acceleration
- Intelligent VRAM-aware batching for 1–8 frames
- Upscaling: 720p → 1080p, 1080p → 4K, or custom targets
- Auto-scaling to match 3D or interpolated frame resolutions
- Uses fp16 inference for clean, artifact-free output
- Fully integrated into pipeline with FFmpeg NVENC export
- GUI includes progress bar, FPS, ETA tracking
- Edge-aware artifact suppression (e.g., hair, limbs)
- Feathered transition masks to eliminate ghosting/pop artifacts
- Adaptive depth sharpening + blending for polished output
- Floating window masking with cinematic easing
- Per-frame zero parallax estimation + smoothing
- Extract + reattach source audio using FFmpeg (GUI-based)
- Format options: AAC, MP3, WAV (bitrate adjustable)
- No shell access needed – fully built into GUI
- Real-time preview: Interlaced, HSBS, Depth Heatmap
- On-frame previews with convergence + parallax tuning
- Preview exports as images – no temp videos needed
- Language support: EN, FR, ES, DE
- Responsive multi-tab Tkinter interface with persistent settings
- Full GPU render control: pause, resume, cancel
- Codec selector with NVENC options (H.264, HEVC, AV1-ready)
- One-click launch – no pip or scripting required
- Formats: Half-SBS, Full-SBS, VR180, Anaglyph, Passive Interlaced
- Aspect Ratios: 16:9, 2.39:1, 2.76:1, 4:3, 21:9, 1:1, 2.35:1
- Export formats: MP4, MKV, AVI
- Codec support: XVID, MP4V, MJPG, DIVX, FFmpeg NVENC
- ✔️ This program runs on python 3.12
- ✔️ This program has been tested on cuda 12.8
- ✔️ Conda (Optional, Recommended for Simplicity)
- 1️⃣ Download the VisionDepth3D zip file from the official download source. (green button)
- 2️⃣ Extract the zip file to your desired folder (e.g., c:\user\VisionDepth3D).
- 3️⃣ Download models Here and extract weights folder into VisionDepth3D Main Folder
- 4️⃣ Download Distill Any Depth onnx models here (if you want to use it) and put the Distill Any Depth Folder into Weights Folder
- 1️. press (Win + R), type cmd, and hit Enter.
- 2. Clone the Repository (Skip the git clone if you downloaded the ZIP and start from cd)
git clone https://github.com/VisionDepth/VisionDepth3D.git cd C:\VisionDepth3D-main pip install -r requirements.txt
- continue to installing pytorch with cuda and then run VisionDepth3D.bat
(Automatically manages dependencies & isolates environment.)
- 1. Clone the Repository (Skip the git clone if you downloaded the ZIP and start from cd)
- 2. Create the Conda Environment
To create the environment, copy and past this in conda to run:
git clone https://github.com/VisionDepth/VisionDepth3D.git cd VisionDepth3D-main conda create -n VD3D python=3.12 conda activate VD3D pip install -r requirements.txt
🔍 Find Your CUDA Version: Before installing PyTorch, check which CUDA version your GPU supports:
- 1️⃣ Open Command Prompt (Win + R, type cmd, hit Enter)
- 2️⃣ Run the following command:
nvcc --version
or
nvidia-smi
- 3️⃣ Look for the CUDA version (e.g., CUDA 11.8, 12.1, etc.)
Go to the official PyTorch website to find the best install command for your setup: 🔗 https://pytorch.org/get-started/locally/
install Pytorch-Cuda 12.8 or which CUDA version you are running
if you are running AMD GPU select CPU build
- Once Pytorch and all dependancies are installed update the batch script for system you are running and run the following command:
Start_VD3D_Conda.bat
# or
Start_VD3D_Linux.bat
# or
Start_VD3D_Windows.bat
Congrats you have successfully downloaded VisionDepth3D! This quick setup ensures you clone the repository, configure your environment, and launch the app — all in just a few simple steps.
-
Backup Your Weights
Move yourweights
folder out of the oldVisionDepth3D-main
directory. -
Download the Latest Version
Delete the old folder and extract or clone the updated version ofVisionDepth3D-main
. -
Restore Weights Folder
Place yourweights
folder back inside the newly downloaded main directory:
VisionDepth3D-main/weights
-
Update the Path in Startup Scripts
Open the startup script matching your platform:Start_VD3D_Windows.bat
Start_VD3D_Conda.bat
Start_VD3D_Linux.sh
Edit the script and replace any old folder path with the new path to your updated
VisionDepth3D-main
. -
Activate Conda Environment (if needed)
If you are using the Conda starter script:- Open a terminal or Anaconda Prompt.
- Run:
cd path/to/updated/VisionDepth3D-main Start_VD3D_Conda.bat
-
Launch the App
Once everything is in place, run the appropriate script or shortcut to launch VisionDepth3D with your latest settings.
Note: If you customized any configuration, backup those files before replacing folders. and if you run into import errors
pip install -r requirements.txt
inside opened terminal and that will fix any dependancie errors
Use the GUI to fine-tune your 3D conversion settings.
- Description: Sets the output video encoder.
- Default:
mp4v
(CPU) - Options:
mp4v
,XVID
,DIVX
– CPU-basedlibx264
,libx265
– High-quality software (CPU)h264_nvenc
,hevc_nvenc
– GPU-accelerated (NVIDIA)
- Description: Pops foreground objects out of the screen.
- Default:
6.5
- Range:
3.0
to8.0
- Effect: Strong values create noticeable 3D "pop" in close objects.
- Description: Depth for mid-layer transition between foreground and background.
- Default:
1.5
- Range:
-3.0
to5.0
- Effect: Smooths the 3D transition — higher values exaggerate depth between layers.
- Description: Shift depth for background layers (far away).
- Default:
-6.0
- Range:
-10.0
to0.0
- Effect: More negative pushes content into the screen (deeper background).
- Description: Applies a sharpening filter to the output.
- Default:
0.2
- Range:
-1.0
(softer) to1.0
(sharper) - Effect: Brings clarity to 3D edges; avoid over-sharpening to reduce halos.
- Description: Shifts the entire stereo image inward or outward to adjust the overall convergence point (zero-parallax plane).
- Default:
0.000
- Range:
-0.050
to+0.050
- Effect:
- Positive values push the image deeper into the screen (stronger positive parallax).
- Negative values pull the scene forward (increased pop-out effect).
- Tip: Use small increments like
±0.010
for subtle depth balancing.
- Description: Limits the maximum pixel displacement caused by stereo shifting, expressed as a percentage of video width.
- Default:
0.020
(2%) - Range:
0.005
to0.100
- Effect:
- Low values reduce eye strain but can flatten the 3D effect.
- High values create more dramatic depth but may introduce ghosting or artifacts.
- Best Use: Keep between
0.015
–0.030
for clean results.
- Description: Adjusts how strongly the 3D effect favors the subject's depth versus full-scene stereo balance.
- Default:
0.80
- Range:
0.00
to1.00
- Effect:
1.0
= Full parallax (strong 3D depth everywhere).0.0
= Subject stays fixed, depth minimized elsewhere.
- Use For: Tuning stereo focus around people or central motion while avoiding exaggerated background distortion.
- Codec: Choose GPU-accelerated encoders (
h264_nvenc
,hevc_nvenc
) for faster renders. - CRF (Constant Rate Factor):
- Default:
23
- Range:
0
(lossless) to51
(worst) - Lower values = better visual quality.
- Default:
- Checkbox: Stabilize Zero-Parallax (center-depth)
- Effect: Enables Dynamic Zero Parallax Tracking — the depth plane will automatically follow the subject’s depth to minimize excessive 3D warping.
- Function: Dynamically adjusts the zero-parallax plane to follow the estimated subject depth (typically the central object or character). This keeps key elements at screen depth, reducing eye strain and excessive parallax.
- Effect: Helps stabilize the 3D effect by anchoring the subject at screen level, especially useful for scenes with depth jumps or fast movement.
- Recommended for: Dialogue scenes, human-centric content, or anything where central focus should feel "on screen" rather than floating in depth.
- Match resolution and FPS between your input video and depth map.
- Use the Inverse Depth checkbox if bright = far instead of close.
- Recommended depth models:
Distill Any Depth
,Depth Anything V2
,MiDaS
,DPT-Large
, etc.- Choose Large models for better fidelity.
Clip Length | Estimated Time (with GPU) |
---|---|
30 seconds | 1–4 mins |
5 minutes | 10–25 mins |
Full Movie | 6–24+ hours |
- Select your depth model from the dropdown.
- Choose an output directory for saving results.
- Enable your preferred settings (invert, colormap, etc.).
- Set batch size depending on GPU/VRAM capacity.
(Tip: Resize your video or switch to a lighter model if memory is limited.) - Select your image / video / folder and start processing.
- Once the depth map video is generated, head over to the 3D tab.
- Input your original video and the newly created depth map.
- Adjust 3D settings for the preferred stereo effect.
- Hit "Generate 3D Video" and let it roll!
Use these models to clean up and enhance 3D videos:
- In the Upscale tab, load your 3D video and enable “Save Frames Only”.
- Input the width × height of the 3D video.
(No need to set FPS or codec when saving frames.) - Set batch size to
1
— batch processing is unsupported by some AI models. - Select AI Blend Mode and Input Resolution:
Mode | Blend Ratio (AI : Original) | Description |
---|---|---|
OFF | 100% : 0% | Full AI effect (only the ESRGAN result is used). |
LOW | 85% : 15% | Strong AI enhancement with mild natural tone retention. |
MEDIUM | 50% : 50% | Balanced mix for natural image quality. |
HIGH | 25% : 75% | Subtle upscale; mostly original with a hint of enhancement. |
Input Resolution | Processing Behavior | Performance & Quality Impact |
---|---|---|
100% | Uses full-resolution frames for AI upscaling. | ✅ Best quality. ❌ Highest GPU usage. |
75% | Slightly downsamples before feeding into AI. | ⚖️ Good balance. Minimal quality loss. |
50% | Halves frame size before AI. | ⚡ 2× faster. Some detail loss possible. |
25% | Very low-resolution input. | 🚀 Fastest speed. Noticeable softness — best for previews/tests. |
- Select your Upscale Model and start the process.
- Once done, open the VDStitch tab:
- Input the upscaled frame folder.
- Set the video output directory and filename.
- Enter the same resolution and FPS as your original 3D video.
- Enable RIFE FPS Interpolation.
- Set the RIFE multiplier to ×2 for smooth results.
(⚠️ Higher multipliers like ×4 may cause artifacts on scene cuts.) - Start processing — you now have an enhanced 3D video with upscaled clarity and smoother motion!
- Black/Empty Output: Wrong depth map resolution or mismatch with input FPS.
- Halo/Artifacts:
- Increase feather strength and blur size.
- Enable subject tracking and clamp the zero parallax offset.
- Out of Memory (OEM):
- Enable FFmpeg rendering for better memory usage.
- Use
libx264
orh264_nvenc
and avoid long clips in one go.
This tool is being developed by a solo dev with nightly grind energy (🕐 ~4 hours a night). If you find it helpful, let me know — feedback, bug reports, and feature ideas are always welcome!
Thank You!
A heartfelt thank you to all the researchers, developers, and contributors behind the incredible depth estimation models and open-source tools used in this project. Your dedication, innovation, and generosity have made it possible to explore the frontiers of 3D rendering and video processing. Your work continues to inspire and empower developers like me to build transformative, creative applications.
Model Name | Creator / Organization | Hugging Face Repository |
---|---|---|
Distil-Any-Depth-Large | xingyang1 | Distill-Any-Depth-Large-hf |
Distil-Any-Depth-Small | xingyang1 | Distill-Any-Depth-Large-hf |
Depth Anything V2 Large | Depth Anything Team | Depth-Anything-V2-Large-hf |
Depth Anything V2 Base | Depth Anything Team | Depth-Anything-V2-Base-hf |
Depth Anything V2 Small | Depth Anything Team | Depth-Anything-V2-Small-hf |
Depth Anything V1 Large | LiheYoung | Depth-Anything-V2-Large |
Depth Anything V1 Base | LiheYoung | depth-anything-base-hf |
Depth Anything V1 Small | LiheYoung | depth-anything-small-hf |
V2-Metric-Indoor-Large | Depth Anything Team | Depth-Anything-V2-Metric-Indoor-Large-hf |
V2-Metric-Outdoor-Large | Depth Anything Team | Depth-Anything-V2-Metric-Outdoor-Large-hf |
DA_vitl14 | LiheYoung | depth_anything_vitl14 |
DA_vits14 | LiheYoung | depth_anything_vits14 |
DepthPro | Apple | DepthPro-hf |
ZoeDepth | Intel | zoedepth-nyu-kitti |
MiDaS 3.0 | Intel | dpt-hybrid-midas |
DPT-Large | Intel | dpt-large |
DinoV2 | dpt-dinov2-small-kitti | |
dpt-beit-large-512 | Intel | dpt-beit-large-512 |
This project utilizes the FFmpeg multimedia framework for video/audio processing via subprocess invocation. FFmpeg is licensed under the GNU GPL v3 or LGPL, depending on how it was built. No modifications were made to the FFmpeg source or binaries — the software simply executes FFmpeg as an external process.
You may obtain a copy of the FFmpeg license at: https://www.gnu.org/licenses/
VisionDepth3D calls FFmpeg strictly for encoding, muxing, audio extraction, and frame rendering operations, in accordance with license requirements.