Skip to content

RTF22/VOCIX

Repository files navigation

🌐 English · Deutsch

VOCIX Landing Page

VOCIX — Voice Capture & Intelligent eXpression

Release Downloads License Platform

Local voice dictation app for Windows 11 with a global hotkey. Capture speech, transcribe it, transform it intelligently, and insert it system-wide at the cursor position — in any application (browser, Word, Outlook, IDEs, etc.).

Features

  • Push-to-Talk via global hotkey (default: Pause)
  • Three modes:
    • A — Clean: Clean transcription; strips filler words (um, uh, like, ...) with light corrections
    • B — Business: Rewrites speech into professional business language (LLM-powered)
    • C — Rage: De-escalates aggressive language into polite phrasing (LLM-powered)
  • Multi-provider LLM for modes B and C — pick your backend in the settings dialog: Anthropic Claude, any OpenAI-compatible API (OpenAI, Groq, OpenRouter, LM Studio, llama.cpp-server, vLLM via base_url) or local Ollama models. Per-mode override (e.g. Business on cloud Claude, Rage on local Llama). Provider failures fall back to Clean mode and surface an orange toast — no more silent degradation.
  • Settings dialog in the tray menu — four tabs (Basics / Advanced / Expert / AI Provider) with Test buttons, hotkey capture and per-mode validation
  • System tray with a colour-coded microphone icon and mode switching
  • Status overlay with a live VU meter while recording — instant visual feedback that the mic is picking up signal
  • History of the last 20 dictations in the tray — click an entry to re-insert it (saves your text when the target window has changed)
  • Usage statistics — words per day/week/total, estimated typing time saved (200 keystrokes/min), distribution across modes
  • Snippet expansion — your own shortcuts (/sig, /adr, …) inside the dictation are replaced with full text before insertion; Whisper transcripts like "slash sig" are normalised automatically
  • Auto-update from the tray — new releases are detected in the background; one click downloads the Win-x64 ZIP, verifies the SHA256 and swaps the files automatically
  • Local processing — speech-to-text runs fully offline (faster-whisper)
  • Configurable Whisper model — switch between tiny / base / small (default) / medium / large-v3 / large-v3-turbo at runtime via the tray menu (larger = more accurate, slower)
  • Optional NVIDIA GPU acceleration — auto-detects CUDA when available, falls back to CPU; switchable in the tray menu (Auto / GPU / CPU). Source install only (the packaged ZIP stays CPU-only)
  • Multilingual UI (German / English) — switchable at runtime via the tray menu, also drives Claude prompts and the Whisper STT language
  • Optional offline translation to English — toggle in the tray menu: speak in any of ~50 Whisper-supported languages and VOCIX inserts native English text at the cursor, fully offline (no API key needed)
  • Configurable hotkeys via .env
  • RDP mode for Remote Desktop sessions
  • Log file with configurable log level
  • Portable .exe — no Python installation required

Requirements

  • Windows 10/11
  • Microphone
  • Optional for modes B and C: any one of
    • Anthropic API key, or
    • an OpenAI-compatible endpoint (OpenAI, Groq, OpenRouter, LM Studio, llama.cpp-server, vLLM …), or
    • a local Ollama install — no API key needed

Installation

Option A: winget

winget install RTF22.VOCIX

Option B: Scoop

scoop bucket add vocix https://github.com/RTF22/scoop-vocix
scoop install vocix

Option C: Portable .exe

  1. Download a release
  2. Extract the folder anywhere
  3. Optional: rename .env.example to .env and fill in your API key
  4. Launch VOCIX.exe

The Whisper model (~500 MB) is downloaded automatically into the models/ subfolder on first start.

Option D: From source

git clone https://github.com/RTF22/VOCIX.git
cd VOCIX
python -m venv venv
venv\Scripts\activate
pip install -r requirements.txt
copy .env.example .env
python -m vocix.main

GPU acceleration (optional, NVIDIA only)

pip install -r requirements-gpu.txt

Pulls cuBLAS + cuDNN (~600 MB) and lets ctranslate2 use the GPU. After the install, pick Beschleunigung → GPU (CUDA) in the tray menu (or set VOCIX_WHISPER_ACCELERATION=gpu). The packaged Win-x64 ZIP does not include these libraries — GPU is opt-in for source installs only.

Build the .exe yourself

pip install pyinstaller
build_exe.bat

The result lives in dist\VOCIX\ — the whole folder is portable.

Configuration

The recommended way to configure VOCIX is the Settings dialog (tray icon → Settings). The AI Provider tab carries three slots — Anthropic, OpenAI-compatible and Ollama — each with its own Test button. Pick a default and optionally override per mode (Business / Rage).

For headless setups everything is also available via .env:

# --- LLM providers (modes B and C) -----------------------------------------
# Pick a default provider and optionally override per mode.
VOCIX_LLM_DEFAULT=anthropic            # anthropic | openai | ollama
VOCIX_LLM_BUSINESS=                    # leave empty to use the default
VOCIX_LLM_RAGE=

# Anthropic Claude
VOCIX_LLM_ANTHROPIC_API_KEY=sk-ant-your-key-here
VOCIX_LLM_ANTHROPIC_MODEL=claude-sonnet-4-6
VOCIX_LLM_ANTHROPIC_TIMEOUT=15

# OpenAI-compatible (OpenAI, Groq, OpenRouter, LM Studio, llama.cpp, vLLM …)
VOCIX_LLM_OPENAI_API_KEY=
VOCIX_LLM_OPENAI_BASE_URL=https://api.openai.com/v1
VOCIX_LLM_OPENAI_MODEL=gpt-4o-mini
VOCIX_LLM_OPENAI_TIMEOUT=15

# Ollama (local, no API key)
VOCIX_LLM_OLLAMA_BASE_URL=http://localhost:11434
VOCIX_LLM_OLLAMA_MODEL=llama3.1
VOCIX_LLM_OLLAMA_TIMEOUT=30

# --- App ------------------------------------------------------------------
# Language — controls UI, LLM prompts and Whisper STT (de, en)
# The tray selection (stored in state.json) overrides this value.
VOCIX_LANGUAGE=en

# Whisper model — tiny | base | small (default) | medium | large-v3 | large-v3-turbo
# Tray selection (state.json) overrides this value.
VOCIX_WHISPER_MODEL=small

# Acceleration — auto (GPU if available) | gpu (force CUDA) | cpu (force CPU)
# GPU mode needs `pip install -r requirements-gpu.txt` and is source-install only.
VOCIX_WHISPER_ACCELERATION=auto

# Hotkeys — push-to-talk requires a single key, mode switchers may be combos
VOCIX_HOTKEY_RECORD=pause
VOCIX_HOTKEY_MODE_A=ctrl+shift+1
VOCIX_HOTKEY_MODE_B=ctrl+shift+2
VOCIX_HOTKEY_MODE_C=ctrl+shift+3

# Logging (DEBUG, INFO, WARNING, ERROR)
VOCIX_LOG_LEVEL=INFO
VOCIX_LOG_FILE=vocix.log

# RDP mode (longer clipboard delays)
VOCIX_RDP_MODE=true

Without any configured provider, modes B and C automatically fall back to mode A (Clean). Configurations from VOCIX 1.3.x (ANTHROPIC_API_KEY, ANTHROPIC_MODEL, ANTHROPIC_TIMEOUT) keep working unchanged — saving once in the new settings tab migrates them.

Env precedence: variables already present in the process environment are not overridden by the .env file (default behaviour of python-dotenv). To temporarily override a value, export it before launching the app.

Usage

Shortcut Action
Pause (hold) Push-to-talk — speak, release to process
Ctrl+Shift+1 Mode A: Clean transcription
Ctrl+Shift+2 Mode B: Business mode
Ctrl+Shift+3 Mode C: Rage mode

Workflow:

  1. Place the cursor in the target field (e-mail, chat, text editor, …)
  2. Hold Pause and speak
  3. Release — the text is transcribed, transformed and automatically inserted

Tray menu: right-click the tray icon → mode switch, Language / Sprache (English / Deutsch — switches UI, Claude prompts and Whisper STT), Whisper-Modell (tinylarge-v3-turbo at runtime), Beschleunigung (Auto / GPU / CPU — GPU greyed out when no CUDA detected), About (version + repo link), Quit

Note: tray-menu choices (mode, language, Whisper model, acceleration) are persisted in state.json and override the corresponding .env values on next launch.

Troubleshooting

Problem Solution
SmartScreen: "Windows protected your PC" on first launch Click More info → Run anyway. VOCIX is open source and the release ZIP is reproducible from main via build_exe.bat. Code signing is tracked in #12.
Tray icon not visible Check hidden icons in the taskbar (arrow pointing up)
"VOCIX requires a CPU with AVX support" on startup Your CPU is older than ~2012 and cannot run CTranslate2. VOCIX will not work on this machine.
Hotkey doesn't respond Run the app as administrator
Laptop without a Pause key Set VOCIX_HOTKEY_RECORD=scroll lock (or f7) in .env
"Microphone unavailable" Check microphone permissions in Windows settings
Modes B/C only return Clean results Open Settings → AI Provider, configure at least one slot and hit Test
Whisper download fails Check your internet connection; configure proxy/firewall if needed
Text contains wrong characters Make sure the target app supports Ctrl+V / paste
RDP: text is not inserted Set VOCIX_RDP_MODE=true in .env

Project structure

vocix/
├── main.py              # Entry point, orchestration
├── config.py            # Settings (.env, paths, hotkeys)
├── i18n.py              # Translation lookup
├── locales/             # JSON translation files (de.json, en.json)
├── audio/recorder.py    # Microphone capture (sounddevice)
├── stt/
│   ├── base.py          # Abstract STT interface
│   └── whisper_stt.py   # faster-whisper implementation
├── processing/
│   ├── base.py          # Abstract processor interface
│   ├── clean.py         # Mode A: filler-word cleanup (local)
│   ├── llm_backed.py    # Shared LLM-backed processor (used by B/C)
│   ├── business.py      # Mode B: business language
│   ├── rage.py          # Mode C: de-escalation
│   └── providers/       # Anthropic / OpenAI-compatible / Ollama backends
├── output/injector.py   # Clipboard-based text insertion
└── ui/
    ├── tray.py          # System tray with microphone icon
    ├── overlay.py       # Status overlay (tkinter)
    └── settings.py      # Settings dialog (Basics / Advanced / Expert / AI Provider)

License

MIT License — free to use, including commercially. No warranty.

VOCIX bundles third-party Python libraries in the portable distribution. See THIRD_PARTY_LICENSES.md for the required copyright and permission notices (MIT / BSD / HPND / LGPL-3.0).