Skip to content

egnake/LocalDub

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LocalDub Logo

LocalDub: Real-Time AI YouTube Dubbing

A 100% local, zero-latency, and professional AI dubbing extension for YouTube.


What is LocalDub?

LocalDub is a powerful Chrome Extension and Python backend combo that translates and dubs YouTube videos into your native language (English, Turkish, German, Spanish, French) in real-time. You do not pay for any cloud APIs. All audio extraction, AI speech-to-text, LLM translation, and AI text-to-speech happen entirely on your local machine.

Important

LocalDub features a revolutionary Zero-Drift HTML5 Time-Stretching Algorithm. When the AI generates a translation that is longer or shorter than the original video scene, the browser dynamically stretches the audio while perfectly preserving the human pitch. This guarantees that the dubbed audio perfectly matches the lip movements and video frames with absolutely zero delay accumulation.


Architecture & How it Works

The system operates using a seamless pipeline between the browser and your local GPU/CPU:

sequenceDiagram
    participant YT as YouTube (Chrome Extension)
    participant FF as FFmpeg (Downloader)
    participant ASR as Faster-Whisper (Speech-to-Text)
    participant LLM as Ollama Gemma2 (Translator)
    participant TTS as Edge-TTS (Voice Synthesizer)

    YT->>FF: "Video is at 0:03. Give me the audio."
    FF->>ASR: Extracts 3-second raw audio chunk
    ASR->>LLM: Transcribes: "Hello guys"
    LLM->>TTS: Translates: "Merhaba arkadaşlar"
    TTS->>YT: Sends perfectly timed MP3 via WebSockets
    Note over YT,TTS: The extension receives the audio, dynamically calculates playbackRate, and plays it in flawless lip-sync!
Loading

Core Technologies

  • Faster-Whisper: The fastest and most accurate offline Speech-to-Text (ASR) AI. Includes an advanced Voice Activity Detection (VAD) filter to strip silent frames.
  • Ollama (Gemma2 / Llama3): A strictly lobotomized local LLM pipeline. It utilizes Few-Shot completion prompting, Temperature 0.0, and strict stop tokens (\n) to act as a pure machine translator. It never hallucinates conversational text and preserves technical terms (e.g., Firewall, React, Cheatsheet).
  • Edge-TTS: Microsoft's highly natural, breathing neural voice synthesis engine.
  • HTML5 Time-Stretching: The frontend mathematical engine that scales audio lengths to perfectly fit the visual scenes without sounding distorted (preserves pitch).

Installation Guide (Step-by-Step)

Follow these instructions carefully to set up your local AI dubbing studio.

Step 1: Install Python & FFmpeg

  1. Download and install Python 3.12 from Python.org.
    • !Critical: During installation, make sure to check the box that says "Add Python to PATH".*
  2. Download FFmpeg. Extract the folder and add the bin directory to your Windows Environment Variables (PATH). Open your Command Prompt (CMD) and type ffmpeg to verify it is installed correctly.

Step 2: Install Ollama (The AI Brain)

  1. Download and install Ollama from Ollama.com.
  2. Open your terminal (CMD or PowerShell) and pull the translation model by running:
    ollama run gemma2:9b
    (Note: This is a 5GB+ download. Wait for it to finish and then close the terminal).

Step 3: Install the Backend Server

  1. Clone or download this repository to your computer.
  2. Open a terminal inside the downloaded folder and navigate to the backend directory.
  3. Install the required Python libraries:
    cd backend
    pip install -r requirements.txt

Step 4: Load the Chrome Extension

  1. Open Google Chrome and go to chrome://extensions/.
  2. Enable Developer mode using the toggle in the top right corner.
  3. Click the Load unpacked button in the top left.
  4. Select the extension folder located inside the LocalDub project directory.
  5. The LocalDub logo will now appear in your browser's extension bar!

How to Use

  1. Start the AI Server: Open a terminal in the backend folder and run:

    python main.py

    (Wait until you see "WebSocket connected" and "Uvicorn running" in the logs).

  2. Start Dubbing:

    • Open any foreign language video on YouTube.
    • Click the LocalDub extension icon in your browser.
    • Select your desired target language from the dropdown menu.
    • Click Enable Dubbing.

The video will pause for 1-2 seconds to buffer the initial AI generation, and then play continuously with perfectly synced, natural-sounding AI dubbing!


Industry Standard Roadmap (Coming Soon)

We are actively working on upgrading LocalDub to compete with multi-million dollar AI dubbing startups:

  • Zero-Shot Voice Cloning: Instead of using Microsoft's default voices, the backend will dynamically clone the original YouTuber's exact voice print using XTTSv2 or CosyVoice.
  • Audio Separation (Background Music Preservation): Integrating Facebook's Demucs AI to strip ONLY the vocals from the video. Background music, explosions, and sound effects will be preserved and mixed beneath the translated AI dub.
  • Speaker Diarization: Integrating Pyannote.audio to detect multiple speakers in interviews or podcasts (Speaker A vs. Speaker B) and assigning them distinct AI voices automatically.
  • Semantic Chunking: Advanced VAD logic that chunks audio based on breathing and sentence completion rather than strict 3-second blocks, ensuring 100% grammatical perfection.

License

This project is licensed under the MIT License. Feel free to use, modify, and distribute it.

About

A 100% local, zero-latency AI Chrome extension that translates and dubs YouTube videos in real-time using Faster-Whisper, Ollama, and Edge-TTS.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors