Skip to content

feat: add Ollama Cloud provider support#274

Open
saurondark22 wants to merge 2 commits into
theJayTea:mainfrom
saurondark22:ollama-improvements
Open

feat: add Ollama Cloud provider support#274
saurondark22 wants to merge 2 commits into
theJayTea:mainfrom
saurondark22:ollama-improvements

Conversation

@saurondark22
Copy link
Copy Markdown

This PR significantly upgrades the Ollama integration by adding a new cloud provider option, and disabled thinking feature for reasoning-capable models.

Changes:

Ollama Cloud Provider (new)

  • Added OllamaCloudProvider class API key entry just similar to gemini.
  • Provider-specific after_load() hook handles endpoint selection and auth configuration
  • pre-added "gemma4:31b", "gemma3:12b", "deepseek-v4-flash", "nemotron-3-nano:30b" and "gpt-oss:20b" models to select, also option for custom models.

Local Ollama Improvements

  • Exposed three user-configurable settings: num_ctx (context window), num_predict (max tokens), temperature
  • Disabled the think step on reasoning-capable models to eliminate wasted latency on internal chain-of-thought when not needed
  • Added special handling for gpt-oss since Ollama only accepts string think levels — maps "low" to approximate "off"

Files modified:

  • aiprovider.py — major expansion with cloud provider and local improvements
  • WritingToolApp.py — registration and wiring for cloud provider
  • ui/SettingsWindow.py — new UI controls for Ollama settings
  • ui/OnboardingWindow.py — cloud provider in onboarding flow
  • locales/en/LC_MESSAGES/messages.po — new translation strings

Why this matters:

  • Cloud support: Users can now use managed Ollama endpoints (e.g., Ollama Cloud), no need to install Ollama or manually manage models.
  • Performance control: Power users can tune context windows and token limits instead of fighting hidden defaults
  • Reasoning model UX: Disabled "thinking" for ollama local and cloud for better latency when users just want a quick rewrite/summary

Expand Ollama provider with three user-facing settings (num_ctx, num_predict, temperature) replacing internal defaults, preventing silent context truncation and unbounded latency. Disable `think` step on reasoning-capable models (qwen3, deepseek-r1, gpt-oss) as they waste latency on internal chain-of-thought. Handle GPT-OSS specially since Ollama only accepts string think levels — "low" approximates "off". Added safe-parse helpers for malformed TextSetting values to fall back to sensible defaults.
Add `OllamaCloudProvider` with Bearer API key auth and weekly-quota tier. Register as second default provider (after Gemini) in settings and onboarding. Unify response-generation for local and cloud Ollama via shared OpenAI-style contract, using provider-specific `after_load()` for endpoint/auth.
@saurondark22 saurondark22 marked this pull request as ready for review June 5, 2026 09:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant