diff --git a/.agents/skills/deepgram-python-audio-intelligence/SKILL.md b/.agents/skills/deepgram-python-audio-intelligence/SKILL.md new file mode 100644 index 00000000..3dff437d --- /dev/null +++ b/.agents/skills/deepgram-python-audio-intelligence/SKILL.md @@ -0,0 +1,165 @@ +--- +name: deepgram-python-audio-intelligence +description: Use when writing or reviewing Python code in this repo that calls Deepgram audio analytics overlays on `/v1/listen` - summarize, topics, intents, sentiment, diarize, redact, detect_language, entity detection. Same endpoint as plain STT but with analytics params. Covers both REST (`client.listen.v1.media.transcribe_url`/`transcribe_file`) and the WSS-supported subset (`client.listen.v1.connect`). Use `deepgram-python-speech-to-text` for plain transcription, `deepgram-python-text-intelligence` for analytics on already-transcribed text. Triggers include "diarize", "summarize audio", "sentiment from audio", "redact PII", "topic detection audio", "audio intelligence", "detect language audio". +--- + +# Using Deepgram Audio Intelligence (Python SDK) + +Analytics overlays applied to `/v1/listen` transcription: summarize, topics, intents, sentiment, language detection, diarization, redaction, entities. Same endpoint / same client methods as STT — enable features via params. + +## When to use this product + +- You have **audio** (file, URL, or live stream) and want analytics alongside the transcript. +- REST is the primary path — most analytics are REST-only. + +**Use a different skill when:** +- You want a pure transcript with no analytics → `deepgram-python-speech-to-text`. +- Your input is already transcribed text → `deepgram-python-text-intelligence` (`/v1/read`). +- You need conversational turn-taking → `deepgram-python-conversational-stt`. +- You need a full interactive agent → `deepgram-python-voice-agent`. + +## Feature availability: REST vs WSS + +| Feature | REST | WSS | +|---|---|---| +| `diarize` | yes | yes | +| `redact` | yes | yes | +| `punctuate`, `smart_format` | yes | yes | +| Entity detection | yes | yes | +| `summarize` | yes | **no** | +| `topics` | yes | **no** | +| `intents` | yes | **no** | +| `sentiment` | yes | **no** | +| `detect_language` | yes | **no** | +| `custom_topic` / `custom_intent` | yes | **no** | + +For the WSS-only subset, same code path as `deepgram-python-speech-to-text`. + +## Authentication + +```python +from dotenv import load_dotenv +load_dotenv() + +from deepgram import DeepgramClient +client = DeepgramClient() +``` + +Header: `Authorization: Token `. + +## Quick start — REST with full analytics + +```python +response = client.listen.v1.media.transcribe_url( + url="https://dpgr.am/spacewalk.wav", + model="nova-3", + smart_format=True, + punctuate=True, + diarize=True, # speaker separation + summarize="v2", # "v2" for the current model; True also accepted on /v1/listen + topics=True, + intents=True, + sentiment=True, + detect_language=True, + redact=["pci", "pii"], # or Sequence[str] + language="en-US", +) + +r = response.results +print("transcript:", r.channels[0].alternatives[0].transcript) +print("summary:", r.summary) +print("topics:", r.topics) +print("intents:", r.intents) +print("sentiments:", r.sentiments) +print("detected_language:", r.channels[0].detected_language) + +# Speaker diarization +for word in r.channels[0].alternatives[0].words or []: + speaker = getattr(word, "speaker", None) + if speaker is not None: + print(f"Speaker {speaker}: {word.word}") +``` + +## Quick start — REST file + +```python +with open("call.wav", "rb") as f: + audio = f.read() + +response = client.listen.v1.media.transcribe_file( + request=audio, + model="nova-3", + diarize=True, + redact=["pii"], + summarize="v2", + topics=True, +) +``` + +## Quick start — WSS subset (diarize / redact / entities only) + +```python +import threading +from deepgram.core.events import EventType + +with client.listen.v1.connect(model="nova-3", diarize=True, redact=["pii"]) as conn: + conn.on(EventType.MESSAGE, lambda m: print(m)) + threading.Thread(target=conn.start_listening, daemon=True).start() + for chunk in audio_chunks: + conn.send_media(chunk) + conn.send_finalize() +``` + +## Key parameters + +`summarize`, `topics`, `intents`, `sentiment`, `detect_language`, `diarize`, `redact`, `custom_topic`, `custom_topic_mode`, `custom_intent`, `custom_intent_mode`, `detect_entities`, plus all the standard STT params (`model`, `language`, `encoding`, `sample_rate`, ...). + +`redact` is typed as `Optional[str]` in the current generated SDK (`src/deepgram/listen/v1/media/client.py`). Pass a single redaction mode such as `"pci"`, `"pii"`, `"numbers"`, or `"phi"`. Multi-mode redaction at the transport level is supported by sending `redact` as a repeated query parameter — check `src/deepgram/types/listen_v1redact.py` for the current type and fall back to raw query-param construction (or multiple calls) if you need several modes. The earlier `Union[str, Sequence[str]]` override is no longer carried in `.fernignore`. + +## API reference (layered) + +1. **In-repo reference**: `reference.md` — "Listen V1 Media" (REST params include all analytics flags), "Listen V1 Connect" (WSS-supported subset). +2. **OpenAPI (REST)**: https://developers.deepgram.com/openapi.yaml +3. **AsyncAPI (WSS)**: https://developers.deepgram.com/asyncapi.yaml +4. **Context7**: library ID `/llmstxt/developers_deepgram_llms_txt`. +5. **Product docs**: + - https://developers.deepgram.com/docs/stt-intelligence-feature-overview + - https://developers.deepgram.com/docs/summarization + - https://developers.deepgram.com/docs/topic-detection + - https://developers.deepgram.com/docs/intent-recognition + - https://developers.deepgram.com/docs/sentiment-analysis + - https://developers.deepgram.com/docs/language-detection + - https://developers.deepgram.com/docs/redaction + - https://developers.deepgram.com/docs/diarization + +## Gotchas + +1. **`summarize` on `/v1/listen` accepts a boolean OR the string `"v2"`.** Use `"v2"` to pin the current summarization model; `True` also works (maps to the default model). `/v1/read` is the reverse — it accepts boolean only. If you need summarization on already-transcribed text, see `deepgram-python-text-intelligence`. +2. **Sentiment / topics / intents / summarize / detect_language are REST-only.** Don't pass them on WSS — they'll be ignored or rejected. +3. **English-only** for sentiment / topics / intents / summarize. +4. **Not all models support all overlays.** Flux / Base models have restrictions. Stick to `nova-3` unless you have a reason. +5. **Redaction values** are `pci`, `pii`, `phi`, `numbers`, etc. — not arbitrary strings. +6. **`custom_topic` / `custom_intent` need a mode** (`"extended"` or `"strict"`). +7. **Diarization is noisy on short / low-quality audio.** Expect speaker churn on <30s clips. + +## Example files in this repo + +- `examples/15-transcription-advanced-options.py` — smart_format, punctuate, diarize +- `tests/wire/test_listen_v1_media.py` — wire test covering intelligence params + +## Related skills + +- `deepgram-python-speech-to-text` — same endpoint, plain transcription +- `deepgram-python-text-intelligence` — same analytics, text input +- `deepgram-python-conversational-stt` — Flux for turn-taking +- `deepgram-python-voice-agent` — interactive assistants + +## Central product skills + +For cross-language Deepgram product knowledge — the consolidated API reference, documentation finder, focused runnable recipes, third-party integration examples, and MCP setup — install the central skills: + +```bash +npx skills add deepgram/skills +``` + +This SDK ships language-idiomatic code skills; `deepgram/skills` ships cross-language product knowledge (see `api`, `docs`, `recipes`, `examples`, `starters`, `setup-mcp`). diff --git a/.agents/skills/deepgram-python-conversational-stt/SKILL.md b/.agents/skills/deepgram-python-conversational-stt/SKILL.md new file mode 100644 index 00000000..888f1259 --- /dev/null +++ b/.agents/skills/deepgram-python-conversational-stt/SKILL.md @@ -0,0 +1,154 @@ +--- +name: deepgram-python-conversational-stt +description: Use when writing or reviewing Python code in this repo that calls Deepgram Conversational STT v2 / Flux (`/v2/listen`) for turn-aware streaming transcription. Covers `client.listen.v2.connect(...)`, Flux models, end-of-turn detection. Use `deepgram-python-speech-to-text` for standard v1 ASR, `deepgram-python-voice-agent` for full-duplex interactive assistants. Triggers include "flux", "v2 listen", "conversational STT", "turn detection", "end of turn", "EOT", "listen.v2", "flux-general-en", "flux-general-multi". +--- + +# Using Deepgram Conversational STT / Flux (Python SDK) + +Turn-aware streaming STT at `/v2/listen` — optimized for conversational audio (end-of-turn detection, eager EOT, barge-in scenarios). + +## When to use this product + +- You're building a **conversational UI** and need explicit turn boundaries. +- You want **Flux models** (optimized for human-to-human or human-to-agent conversation). +- You want lower latency turn signals than v1 utterance_end. + +**Use a different skill when:** +- You want general-purpose transcription (captions, batch, non-conversational) → `deepgram-python-speech-to-text`. +- You want a full interactive agent (STT + LLM + TTS) → `deepgram-python-voice-agent`. +- You want analytics (summarize/sentiment) → `deepgram-python-audio-intelligence`. + +## Authentication + +```python +import os +from dotenv import load_dotenv +load_dotenv() + +from deepgram import DeepgramClient +client = DeepgramClient(api_key=os.environ["DEEPGRAM_API_KEY"]) +``` + +Header: `Authorization: Token `. WSS only — no REST path on v2. + +## Quick start + +```python +import threading, time +from pathlib import Path +from deepgram.core.events import EventType +from deepgram.listen.v2.types import ( + ListenV2CloseStream, + ListenV2Connected, + ListenV2FatalError, + ListenV2TurnInfo, +) + +with client.listen.v2.connect( + model="flux-general-en", + encoding="linear16", + sample_rate="16000", +) as conn: + + def on_message(m): + if isinstance(m, ListenV2TurnInfo): + print(f"turn {m.turn_index} [{m.event}] {m.transcript}") + elif isinstance(m, dict): # untyped fallback + if m.get("type") == "TurnInfo": + print(f"turn {m.get('turn_index')} [{m.get('event')}] {m.get('transcript')}") + else: + print(f"event: {getattr(m, 'type', type(m).__name__)}") + + conn.on(EventType.OPEN, lambda _: print("open")) + conn.on(EventType.MESSAGE, on_message) + conn.on(EventType.CLOSE, lambda _: print("close")) + conn.on(EventType.ERROR, lambda e: print(f"err: {type(e).__name__}: {e}")) + + def send_audio(): + for chunk in mic_chunks(): # 80ms recommended + conn.send_media(chunk) + time.sleep(0.01) + conn.send_close_stream(ListenV2CloseStream(type="CloseStream")) + + threading.Thread(target=send_audio, daemon=True).start() + conn.start_listening() +``` + +## Key parameters + +| Param | Notes | +|---|---| +| `model` | `flux-general-en` (English) or `flux-general-multi` (multilingual) — REQUIRED, must be a Flux model | +| `encoding` | `linear16`, `mulaw`, etc. Omit for containerized audio | +| `sample_rate` | String in the SDK signature, e.g. `"16000"` | +| `eager_eot_threshold` | Fire end-of-turn early at this confidence | +| `eot_threshold` | Primary end-of-turn confidence | +| `eot_timeout_ms` | Time-based fallback turn end | +| `keyterm` | Bias for domain keywords | +| `mip_opt_out`, `tag` | Metadata / privacy flags | +| `language_hint` | **ONLY for `flux-general-multi`** | +| `authorization`, `request_options` | Override auth or request options | + +**No `language` parameter** on v2 — language is implied by model (`flux-general-en`) or hinted via `language_hint` on multi. + +## Events (server → client) + +- `ListenV2Connected` — connection established +- `ListenV2ConfigureSuccess` / `ListenV2ConfigureFailure` — mid-session config changes +- `ListenV2TurnInfo` — per-turn transcript + event (`Update`, `EndOfTurn`, `EagerEndOfTurn`, ...) + `turn_index` +- `ListenV2FatalError` — terminal error + +Client messages: `ListenV2Media`, `ListenV2Configure`, `ListenV2CloseStream`. + +## Async equivalent + +```python +from deepgram import AsyncDeepgramClient +client = AsyncDeepgramClient() + +async with client.listen.v2.connect(model="flux-general-en", ...) as conn: + # same .on(...) handlers, then: + await conn.start_listening() +``` + +## API reference (layered) + +1. **In-repo reference**: `reference.md` — "Listen V2 Connect". +2. **AsyncAPI (WSS)**: https://developers.deepgram.com/asyncapi.yaml +3. **Context7**: library ID `/llmstxt/developers_deepgram_llms_txt`. +4. **Product docs**: + - https://developers.deepgram.com/reference/speech-to-text/listen-flux + - https://developers.deepgram.com/docs/flux/quickstart + - https://developers.deepgram.com/docs/flux/language-prompting + +## Gotchas + +1. **`/v2/listen`, not `/v1/listen`.** Different route, different client path (`listen.v2` vs `listen.v1`). +2. **Flux models only.** `nova-3`, `base`, etc. will be rejected. Use `flux-general-en` or `flux-general-multi`. +3. **No `language` parameter.** Language is set by model choice. Use `language_hint` on `flux-general-multi`. +4. **`sample_rate` is a STRING** in the SDK (e.g. `"16000"`). +5. **Send ~80ms audio chunks** for best turn-detection latency. +6. **Close with `send_close_stream(ListenV2CloseStream(type="CloseStream"))`** — not `send_finalize` (that's v1). +7. **Messages may arrive as typed objects OR raw dicts** — the SDK uses a tagged union with `construct_type` for unknowns. Handle both branches (see `socket_client.py` patch in `.fernignore`). +8. **`socket_client.py` is patched / frozen** (see `.fernignore` → `src/deepgram/listen/v2/socket_client.py`). Don't overwrite that manual patch during regeneration; treat other `listen/v2` files as generated unless the regen workflow says otherwise. +9. **Omit `encoding`/`sample_rate` for containerized audio** (WAV, OGG, etc.) — the server detects them from the container. + +## Example files in this repo + +- `examples/14-transcription-live-websocket-v2.py` +- `tests/manual/listen/v2/connect/main.py` + +## Related skills + +- `deepgram-python-speech-to-text` — v1 general-purpose STT (REST + WSS) +- `deepgram-python-voice-agent` — full interactive assistant + +## Central product skills + +For cross-language Deepgram product knowledge — the consolidated API reference, documentation finder, focused runnable recipes, third-party integration examples, and MCP setup — install the central skills: + +```bash +npx skills add deepgram/skills +``` + +This SDK ships language-idiomatic code skills; `deepgram/skills` ships cross-language product knowledge (see `api`, `docs`, `recipes`, `examples`, `starters`, `setup-mcp`). diff --git a/.agents/skills/deepgram-python-maintaining-sdk/SKILL.md b/.agents/skills/deepgram-python-maintaining-sdk/SKILL.md new file mode 100644 index 00000000..21f14da2 --- /dev/null +++ b/.agents/skills/deepgram-python-maintaining-sdk/SKILL.md @@ -0,0 +1,84 @@ +--- +name: deepgram-python-maintaining-sdk +description: Use when regenerating this Python SDK with Fern, editing `.fernignore`, preparing the repo for a generator release, reconciling manually-patched files after regen, or deciding whether a file should be permanently frozen vs temporarily frozen. This SDK is Fern-generated - most files under `src/deepgram/` should NOT be edited directly. Triggers include "fern regen", "regenerate SDK", ".fernignore", "unfreeze", "re-apply patches", "SDK regeneration", "freeze classification", "generator release". +--- + +# Maintaining the Deepgram Python SDK + +This SDK is generated by [Fern](https://buildwithfern.com/). Most files under `src/deepgram/` are auto-generated and should not be edited directly. Some files have manual patches and are listed in `.fernignore` to prevent the generator from overwriting them. + +When a new Fern generator release is available, we prepare the repo so the generator can overwrite previously-frozen files, then re-apply manual patches after reviewing the diff. + +## Freeze classification rules + +Every entry in `.fernignore` falls into one of two categories. The **comment above each entry** in `.fernignore` indicates which category it belongs to, but when in doubt, apply these rules: + +### Never unfreeze (permanently frozen) + +These files are **entirely hand-written** — they have no Fern-generated counterpart. The generator would delete or replace them with something unrelated. They must stay in `.fernignore` at all times. + +How to identify: + +- The file was **created by us**, not by Fern (e.g. `src/deepgram/client.py`, custom tests, helpers, transport layer). +- The file is a **doc, config, or folder** we maintain independently (README, CHANGELOG, `.github`, `examples`, etc.). +- The file lives **outside `src/deepgram/`** in a hand-maintained location (e.g. `.claude/`, `docs/`). + +Current permanently frozen files: + +- `src/deepgram/client.py` — entirely custom (Bearer auth, session ID); no Fern equivalent +- `src/deepgram/helpers/` — hand-written TextBuilder helpers +- `src/deepgram/transport_interface.py`, `src/deepgram/transport.py`, `src/deepgram/transports/` — custom transport layer +- `tests/custom/test_text_builder.py`, `tests/custom/test_transport.py` — hand-written tests +- `tests/manual/` — manual standalone tests +- `README.md`, `CHANGELOG.md`, `CONTRIBUTING.md`, `reference.md` — docs +- `CLAUDE.md`, `AGENTS.md`, `.claude/`, `.agents/` — agent files (this skill lives under `.agents/`) +- `.github/`, `docs/`, `examples/` — folders + +### Unfreeze for regen (temporarily frozen) + +These files are **Fern-generated but carry manual patches** to fix issues in the generator output. We freeze them to protect our patches between regenerations, but unfreeze them before a regen so we can compare the new output against our patches. + +How to identify: + +- The file **exists in Fern's output** — if you removed it from `.fernignore` and ran the generator, Fern would produce a version of it. +- Our version is a **modified copy** of what Fern generates (e.g. changed `float` to `int`, added optional defaults, broadened a Union type). + +Current temporarily frozen files (reflects the live `.fernignore` in this repo; treat `.fernignore` as the source of truth and re-check before starting a regen): + +- `src/deepgram/agent/v1/socket_client.py`, `src/deepgram/listen/v1/socket_client.py`, `src/deepgram/listen/v2/socket_client.py`, `src/deepgram/speak/v1/socket_client.py` — generator bugs in `construct_type` call convention and exception handling; broad `except Exception` catch for custom transports; optional control-message params on `send_keep_alive` / `send_close_stream` / etc.; agent client additionally carries `_sanitize_numeric_types` (float → int) for unknown WS message shapes. +- `src/deepgram/types/listen_v1response_results_utterances_item.py`, `...item_words_item.py`, and `...channels_item_alternatives_item_paragraphs_paragraphs_item.py` — manual `float` → `int` corrections for speaker / channel / num_words fields (waiting on internal-api-specs#205). + +Any other entries (previously frozen `listen_v1redact.py`, `listen/v1/client.py`, `listen/v2/client.py`, `tests/wire/test_listen_v1_media.py`, `wiremock/wiremock-mappings.json`) have been removed from `.fernignore` as Fern output improved. Read `.fernignore` directly before a regen rather than relying on this list staying perfectly current. + +## Prepare repo for regeneration + +1. **Create a new branch** off `main` named `lo/sdk-gen-`. +2. **Push the branch** and create a PR titled `chore: SDK regeneration ` (empty commit if needed). +3. **Read `.fernignore`** and classify each entry using the rules above. +4. **For each temporarily frozen file only:** + - Copy the file to `.bak` alongside the original. + - In `.fernignore`, **replace the original path with the `.bak` path**. This protects our patched version from the generator while allowing Fern to overwrite the original. +5. **Never touch permanently frozen entries.** Leave them in `.fernignore` as-is. +6. **Commit** as `chore: unfreeze files pending regen` and push. +7. The branch is now ready for the Fern generator to push changes. + +## After regeneration + +The `.bak` files are our manually-patched versions (protected by `.fernignore`). The original paths now contain the freshly generated versions. By comparing the two, we can see what the generator now produces vs what we had patched. + +1. **Diff each `.bak` file against the new generated version** to understand what changed and whether our patches are still needed. +2. **Re-apply any patches** that are still necessary to the newly generated files. +3. **In `.fernignore`, replace each `.bak` path back to the original path** for any files that still need manual patches. +4. **Remove `.fernignore` entries entirely** for any files where the generator now produces correct output (patches no longer needed). +5. **Delete all `.bak` files** once review is complete. +6. **Run tests and linting**: + ```bash + pytest + ruff check + mypy + ``` +7. **Commit** as `chore: re-apply manual patches after regen` and push. + +## Source-of-truth note + +This skill is the canonical maintainer workflow. `AGENTS.md` in the repo root contains the same content for agents that look there first; keep the two in sync. If the procedure changes, update both. diff --git a/.agents/skills/deepgram-python-management-api/SKILL.md b/.agents/skills/deepgram-python-management-api/SKILL.md new file mode 100644 index 00000000..31f42752 --- /dev/null +++ b/.agents/skills/deepgram-python-management-api/SKILL.md @@ -0,0 +1,181 @@ +--- +name: deepgram-python-management-api +description: Use when writing or reviewing Python code in this repo that calls Deepgram Management APIs - projects, API keys, members, invites, usage, billing, models, and reusable Voice Agent configurations. Covers `client.manage.v1.projects`, project-scoped resources under `client.manage.v1.projects.*` (keys, members, members.invites, usage, billing, models, requests), global `client.manage.v1.models`, think-model discovery at `client.agent.v1.settings.think.models`, and `client.voice_agent.configurations.*`. Use `deepgram-python-voice-agent` when you want to run an agent interactively, this skill to PERSIST/LIST agent configs. Triggers include "management API", "list projects", "API keys", "members", "usage stats", "billing", "list models", "agent configurations", "manage.v1". +--- + +# Using Deepgram Management API (Python SDK) + +Administrative REST endpoints at `api.deepgram.com/v1/projects`, `/v1/models`, and reusable agent configuration storage. Project-scoped resources live under `client.manage.v1.projects.*` (keys, members, members.invites, usage, billing, models, requests). Global models at `client.manage.v1.models`. Think-model discovery at `client.agent.v1.settings.think.models`. Reusable agent configs at `client.voice_agent.configurations.*`. + +## When to use this product + +- **Discover / pin models**: `client.manage.v1.models.list()` returns the active STT/TTS set. +- **Project admin**: list/get/update/delete/leave projects. +- **API key lifecycle**: list/create/delete project keys. +- **Member + invite management**: add/remove members, manage roles, send/revoke invites. +- **Usage + billing**: query request volume, balances. +- **Reusable Voice Agent configs**: persist the **`agent` block** of a Settings message on the server, reference by `agent_id`. The stored blob is the `agent` object only (listen / think / speak providers + prompt), not the full `AgentV1Settings`. + +**Use a different skill when:** +- You want to actually talk to an agent → `deepgram-python-voice-agent`. +- You want to transcribe or synthesize → STT/TTS skills. + +## Authentication + +```python +from dotenv import load_dotenv +load_dotenv() + +from deepgram import DeepgramClient +client = DeepgramClient() +``` + +Header: `Authorization: Token `. All methods are REST. + +## Quick start — projects + models + +```python +# Projects +projects = client.manage.v1.projects.list() +for p in projects.projects: + print(p.project_id, p.name) + +project = client.manage.v1.projects.get(project_id=projects.projects[0].project_id) +client.manage.v1.projects.update(project_id=project.project_id, name="New name") +# client.manage.v1.projects.delete(project_id=...) # irreversible +# client.manage.v1.projects.leave(project_id=...) + +# Models +models = client.manage.v1.models.list() +print("STT:", [m.canonical_name for m in models.stt]) +print("TTS:", [m.canonical_name for m in models.tts]) + +# Include deprecated/outdated models +older = client.manage.v1.models.list(include_outdated=True) + +# Per-project model access +project_models = client.manage.v1.projects.models.list(project_id=project.project_id) +``` + +## Quick start — keys / members / invites / usage / billing + +All project-scoped resources live under `client.manage.v1.projects.*`: + +```python +# Keys — `create` takes a single `request=` payload, not top-level kwargs +keys = client.manage.v1.projects.keys.list(project_id=pid) +client.manage.v1.projects.keys.create( + project_id=pid, + request={"comment": "CI key", "scopes": ["usage:write"]}, +) +client.manage.v1.projects.keys.delete(project_id=pid, key_id=kid) + +# Members + invites (invites are nested under members; method is `create`, not `send`) +members = client.manage.v1.projects.members.list(project_id=pid) +invites = client.manage.v1.projects.members.invites.list(project_id=pid) +client.manage.v1.projects.members.invites.create(project_id=pid, email="new@example.com", scope="member") + +# Usage (get, not list) + billing balances (nested) +usage = client.manage.v1.projects.usage.get(project_id=pid) +usage_breakdown = client.manage.v1.projects.usage.breakdown.list(project_id=pid) +balance = client.manage.v1.projects.billing.balances.get(project_id=pid) +``` + +See `examples/51-55` for each sub-module. + +## Quick start — Voice Agent configurations + +```python +# List reusable configs +configs = client.voice_agent.configurations.list(project_id=pid) + +# Create: `config` is a JSON string of the `agent` BLOCK ONLY — not the full +# Settings message. Do NOT include top-level Settings fields like `audio`; +# those are sent at connect-time in the live Settings message. The stored +# `agent_id` later replaces the inline `agent` object in a Settings message. +import json +config_json = json.dumps({ + "listen": {"provider": {"type": "deepgram", "model": "nova-3"}}, + "think": {"provider": {"type": "open_ai", "model": "gpt-4o-mini"}, "prompt": "..."}, + "speak": {"provider": {"type": "deepgram", "model": "aura-2-asteria-en"}}, +}) +created = client.voice_agent.configurations.create( + project_id=pid, + config=config_json, + metadata={"label": "support-en"}, +) +print(created.agent_id) + +# Update metadata (immutable config body — create a new one to change behavior) +client.voice_agent.configurations.update(project_id=pid, agent_id=created.agent_id, metadata={"label": "v2"}) + +# Get / delete +one = client.voice_agent.configurations.get(project_id=pid, agent_id=created.agent_id) +# client.voice_agent.configurations.delete(project_id=pid, agent_id=...) +``` + +Think-provider model discovery (which LLMs Agent supports): + +```python +think_models = client.agent.v1.settings.think.models.list() +``` + +## Async equivalent + +```python +from deepgram import AsyncDeepgramClient +client = AsyncDeepgramClient() +projects = await client.manage.v1.projects.list() +``` + +## API reference (layered) + +1. **In-repo reference**: `reference.md` — "Manage V1 Projects/Keys/Members/Invites/Usage/Billing/Models", "Voice Agent Configurations". +2. **OpenAPI (REST)**: https://developers.deepgram.com/openapi.yaml +3. **Context7**: library ID `/llmstxt/developers_deepgram_llms_txt`. +4. **Product docs**: + - https://developers.deepgram.com/reference/manage/projects/list + - https://developers.deepgram.com/reference/manage/models/list + - https://developers.deepgram.com/reference/voice-agent/agent-configurations/list-agent-configurations + - https://developers.deepgram.com/reference/voice-agent/agent-configurations/create-agent-configuration + - https://developers.deepgram.com/reference/voice-agent/think-models + +## Gotchas + +1. **`Token` auth, not `Bearer`.** +2. **Project-scoped resources are nested under `.projects.*`.** There is no top-level `client.manage.v1.keys` / `.members` / `.invites` / `.usage` / `.billing`. Use `client.manage.v1.projects.keys`, `...projects.members`, `...projects.members.invites`, `...projects.usage`, `...projects.billing.balances`, and `...projects.requests` for request logs. The only top-level `client.manage.v1.*` namespaces are `projects` and `models`. +3. **Think-model discovery is on the Agent client**, not Manage: `client.agent.v1.settings.think.models.list()`. There is no `client.manage.v1.agent.*`. +4. **Agent config body is a JSON STRING on create**, not a nested object. Pass `config=json.dumps(...)`. +5. **Agent config is the `agent` block only**, not the full Settings message. Do not include top-level fields like `audio` — those go in the live Settings message at connect time. +6. **Agent configs are immutable** — you cannot edit the config body. Create a new one to change behavior. Only metadata is mutable. +7. **Use `include_outdated=True`** on `models.list()` when pinning older models. +8. **Delete is irreversible.** Wire tests typically comment out destructive calls. +9. **Project-scoped vs global models**: `client.manage.v1.models.list()` returns all; `client.manage.v1.projects.models.list(project_id=...)` returns what the project can access. +10. **Returned agent configs are uninterpolated** — raw stored JSON string. Parse before use. + +## Example files in this repo + +- `examples/50-management-projects.py` +- `examples/51-management-keys.py` +- `examples/52-management-members.py` +- `examples/53-management-invites.py` +- `examples/54-management-usage.py` +- `examples/55-management-billing.py` +- `examples/56-management-models.py` +- `tests/wire/test_manage_v1_projects.py` +- `tests/wire/test_manage_v1_models.py` +- `tests/wire/test_voiceAgent_configurations.py` + +## Related skills + +- `deepgram-python-voice-agent` — run an agent (use a config created here) + +## Central product skills + +For cross-language Deepgram product knowledge — the consolidated API reference, documentation finder, focused runnable recipes, third-party integration examples, and MCP setup — install the central skills: + +```bash +npx skills add deepgram/skills +``` + +This SDK ships language-idiomatic code skills; `deepgram/skills` ships cross-language product knowledge (see `api`, `docs`, `recipes`, `examples`, `starters`, `setup-mcp`). diff --git a/.agents/skills/deepgram-python-speech-to-text/SKILL.md b/.agents/skills/deepgram-python-speech-to-text/SKILL.md new file mode 100644 index 00000000..4332a9d3 --- /dev/null +++ b/.agents/skills/deepgram-python-speech-to-text/SKILL.md @@ -0,0 +1,147 @@ +--- +name: deepgram-python-speech-to-text +description: Use when writing or reviewing Python code in this repo that calls Deepgram Speech-to-Text v1 (`/v1/listen`) for prerecorded or live audio transcription. Covers `client.listen.v1.media.transcribe_url` / `transcribe_file` (REST) and `client.listen.v1.connect` (WebSocket). Use this skill for basic ASR; use `deepgram-python-audio-intelligence` for summarize/sentiment/topics/diarize overlays, `deepgram-python-conversational-stt` for turn-taking v2/Flux, and `deepgram-python-voice-agent` for full-duplex assistants. Triggers include "transcribe", "live transcription", "speech to text", "STT", "listen endpoint", "nova-3", "listen.v1". +--- + +# Using Deepgram Speech-to-Text (Python SDK) + +Basic transcription (ASR) for prerecorded audio (REST) or live audio (WebSocket) via `/v1/listen`. + +## When to use this product + +- **REST (`transcribe_url` / `transcribe_file`)** — one-shot transcription of a complete file or URL. Use for batch jobs, captioning pipelines, offline analysis. +- **WebSocket (`listen.v1.connect`)** — continuous streaming transcription. Use for live captions, real-time microphone input, phone audio. + +**Use a different skill when:** +- You want summaries, sentiment, topics, intents, diarization, or redaction on the audio → `deepgram-python-audio-intelligence` (same endpoint, different params). +- You need turn-taking / end-of-turn events → `deepgram-python-conversational-stt` (v2 / Flux). +- You need a full-duplex interactive assistant (STT + LLM + TTS + function calls) → `deepgram-python-voice-agent`. + +## Authentication + +```python +import os +from dotenv import load_dotenv +load_dotenv() + +from deepgram import DeepgramClient + +client = DeepgramClient() # reads DEEPGRAM_API_KEY from env +# or: DeepgramClient(api_key=os.environ["DEEPGRAM_API_KEY"]) +``` + +Header sent on every request: `Authorization: Token ` (NOT `Bearer`). + +## Quick start — REST (prerecorded URL) + +```python +response = client.listen.v1.media.transcribe_url( + url="https://dpgr.am/spacewalk.wav", + model="nova-3", + smart_format=True, + punctuate=True, +) +transcript = response.results.channels[0].alternatives[0].transcript +``` + +## Quick start — REST (prerecorded file) + +```python +with open("audio.wav", "rb") as f: + audio_bytes = f.read() + +response = client.listen.v1.media.transcribe_file( + request=audio_bytes, + model="nova-3", +) +``` + +`request=` accepts raw `bytes` or an iterator of `bytes` (stream large files chunk-by-chunk). Do NOT pass a file handle. + +## Quick start — WebSocket (live streaming) + +```python +import threading +from deepgram.core.events import EventType +from deepgram.listen.v1.types import ( + ListenV1Results, ListenV1Metadata, + ListenV1SpeechStarted, ListenV1UtteranceEnd, +) + +with client.listen.v1.connect(model="nova-3") as conn: + def on_message(m): + if isinstance(m, ListenV1Results) and m.channel and m.channel.alternatives: + print(m.channel.alternatives[0].transcript) + + conn.on(EventType.OPEN, lambda _: print("open")) + conn.on(EventType.MESSAGE, on_message) + conn.on(EventType.CLOSE, lambda _: print("close")) + conn.on(EventType.ERROR, lambda e: print(f"err: {e}")) + + # Start receive loop in background so we can send concurrently + threading.Thread(target=conn.start_listening, daemon=True).start() + + for chunk in audio_chunks: # raw PCM bytes at declared encoding/sample_rate + conn.send_media(chunk) + + conn.send_finalize() # flush final partial before closing +``` + +WSS message types live under `deepgram.listen.v1.types`. + +## Async equivalents + +```python +from deepgram import AsyncDeepgramClient +client = AsyncDeepgramClient() + +response = await client.listen.v1.media.transcribe_url(url=..., model="nova-3") + +async with client.listen.v1.connect(model="nova-3") as conn: + # same .on(...) handlers, then: + await conn.start_listening() +``` + +## Key parameters + +`model`, `language`, `encoding`, `sample_rate`, `channels`, `multichannel`, `punctuate`, `smart_format`, `diarize`, `endpointing`, `interim_results`, `utterance_end_ms`, `vad_events`, `keywords`, `search`, `redact`, `numerals`, `paragraphs`, `utterances`. + +## API reference (layered) + +1. **In-repo Fern-generated reference**: `reference.md` — sections "Listen V1 Media" (REST) and "Listen V1 Connect" (WSS). +2. **Canonical OpenAPI (REST)**: https://developers.deepgram.com/openapi.yaml +3. **Canonical AsyncAPI (WSS)**: https://developers.deepgram.com/asyncapi.yaml +4. **Context7** — natural-language queries over the full Deepgram docs corpus. Library ID: `/llmstxt/developers_deepgram_llms_txt`. +5. **Product docs**: + - https://developers.deepgram.com/reference/speech-to-text/listen-pre-recorded + - https://developers.deepgram.com/reference/speech-to-text/listen-streaming + +## Gotchas + +1. **Use the right auth scheme for the credential type.** API keys use `Authorization: Token `. Temporary / access tokens (from `client.auth.v1.tokens.grant()` or an equivalent server) use `Authorization: Bearer ` — the custom `DeepgramClient` installs a Bearer override when you pass `access_token=...` (see `src/deepgram/client.py`). Sending `Bearer ` with a long-lived API key is what fails. +2. **Encoding must match the audio.** Declaring `encoding="linear16"` but sending Opus → garbage output or 400. +3. **Close streams cleanly.** Call `send_finalize()` before exiting the WSS context — otherwise the last partial is dropped. +4. **Keepalive on long WSS sessions.** If idle > ~10s, the server closes. Send `KeepAlive` messages or audio chunks. +5. **Intelligence features are REST-only.** `summarize`, `topics`, `intents`, `sentiment`, `detect_language` do NOT work over WSS — see `deepgram-python-audio-intelligence`. +6. **`transcribe_file(request=...)` takes bytes or an iterator**, not a file handle. +7. **`nova-3` is the current flagship STT model.** Check `client.manage.v1.models.list()` for the live set. +8. **Sync `connection.start_listening()` blocks.** Run it in a thread (sync) or as a task (async) so you can send audio concurrently. + +## Example files in this repo + +- `examples/10-transcription-prerecorded-url.py` +- `examples/11-transcription-prerecorded-file.py` +- `examples/12-transcription-prerecorded-callback.py` +- `examples/13-transcription-live-websocket.py` +- `tests/wire/test_listen_v1_media.py` — wire-level fixtures +- `tests/manual/listen/v1/connect/main.py` — live WSS connection test + +## Central product skills + +For cross-language Deepgram product knowledge — the consolidated API reference, documentation finder, focused runnable recipes, third-party integration examples, and MCP setup — install the central skills: + +```bash +npx skills add deepgram/skills +``` + +This SDK ships language-idiomatic code skills; `deepgram/skills` ships cross-language product knowledge (see `api`, `docs`, `recipes`, `examples`, `starters`, `setup-mcp`). diff --git a/.agents/skills/deepgram-python-text-intelligence/SKILL.md b/.agents/skills/deepgram-python-text-intelligence/SKILL.md new file mode 100644 index 00000000..012baf66 --- /dev/null +++ b/.agents/skills/deepgram-python-text-intelligence/SKILL.md @@ -0,0 +1,121 @@ +--- +name: deepgram-python-text-intelligence +description: Use when writing or reviewing Python code in this repo that calls Deepgram Text Intelligence / Read (`/v1/read`) for sentiment, summarization, topic detection, and intent recognition on text input. Covers `client.read.v1.text.analyze(...)` with body `text` or `url`. Use `deepgram-python-audio-intelligence` when the source is audio instead of text. Triggers include "read API", "text intelligence", "analyze text", "sentiment", "summarize text", "topics", "intents", "read.v1". +--- + +# Using Deepgram Text Intelligence (Python SDK) + +Analyze plain text (or a hosted text URL) for sentiment, summarization, topics, and intents via `/v1/read`. + +## When to use this product + +- You have **text already** (a transcript, document, chat log, email) and want analytics. +- You want a quick one-shot analysis — REST only, no streaming. + +**Use a different skill when:** +- The source is audio and you want analytics overlays → `deepgram-python-audio-intelligence` (same analytics, applied at transcription time). + +## Authentication + +```python +from dotenv import load_dotenv +load_dotenv() + +from deepgram import DeepgramClient +client = DeepgramClient() +``` + +Header: `Authorization: Token `. + +## Quick start + +```python +response = client.read.v1.text.analyze( + request={"text": "Hello, world! This is a sample text for analysis."}, + language="en", + sentiment=True, + summarize=True, # /v1/read is boolean-only (see gotchas) + topics=True, + intents=True, +) + +if response.results.sentiments: + print("sentiment avg:", response.results.sentiments.average) +if response.results.summary: + print("summary:", response.results.summary.text) +if response.results.topics: + print("topics:", response.results.topics.segments) +if response.results.intents: + print("intents:", response.results.intents.segments) +``` + +Pass `request={"text": "..."}` for raw text OR `request={"url": "https://..."}` for a hosted plain-text document. + +## Async equivalent + +```python +from deepgram import AsyncDeepgramClient +client = AsyncDeepgramClient() +response = await client.read.v1.text.analyze(request={"text": "..."}, language="en", sentiment=True) +``` + +## Key parameters + +| Param | Type | Notes | +|---|---|---| +| `request` | `{"text": str}` or `{"url": str}` | One of these is required | +| `language` | `str` | Required for most analytics. English only today. | +| `sentiment` | `bool` | Per-segment + average sentiment | +| `summarize` | `bool` | `/v1/read` accepts **boolean only**. The SDK type alias `TextAnalyzeRequestSummarize = typing.Union[typing.Literal["v2"], typing.Any]` is shared with Listen and is broader than what Read actually supports — the `analyze` method docstring states: "For Read API, accepts boolean only." (Listen's `summarize="v2"` is a different product — see `deepgram-python-audio-intelligence`.) | +| `topics` | `bool` | Topic detection per segment | +| `intents` | `bool` | Intent recognition per segment | +| `custom_topic` / `custom_topic_mode` | `list[str]` / `str` | User-defined topics | +| `custom_intent` / `custom_intent_mode` | `list[str]` / `str` | User-defined intents | +| `callback`, `callback_method`, `tag` | | Async callback + metadata | + +## Response shape (abridged) + +``` +response.results.summary.text +response.results.sentiments.segments[] +response.results.sentiments.average +response.results.topics.segments[] +response.results.intents.segments[] +response.metadata +``` + +See `reference.md` → "Read V1 Text" for full shape. Request body model: `ReadV1RequestParams`. + +## API reference (layered) + +1. **In-repo reference**: `reference.md` — "Read V1 Text". +2. **OpenAPI (REST)**: https://developers.deepgram.com/openapi.yaml +3. **Context7**: library ID `/llmstxt/developers_deepgram_llms_txt`. +4. **Product docs**: + - https://developers.deepgram.com/reference/text-intelligence/analyze-text + - https://developers.deepgram.com/docs/text-intelligence + - https://developers.deepgram.com/docs/text-sentiment-analysis + +## Gotchas + +1. **`Token` auth, not `Bearer`.** +2. **English-only** for sentiment / summarize / topics / intents today. +3. **`summarize` on `/v1/read` is boolean only.** Pass `True` or `False`. Do not pass `"v2"` on `/v1/read` — that's a Listen-only option (see `deepgram-python-audio-intelligence`). The SDK type `Union[Literal["v2"], Any]` is shared with Listen and wider than Read actually accepts; the `analyze` docstring clarifies: "For Read API, accepts boolean only." The generated wire test passing `summarize="v2"` against a mock server is a Fern artifact and does not indicate real `/v1/read` support. +4. **`language` is required** for the gated analytics features above. +5. **Body is JSON `request=`**, not query parameters. Don't confuse with `/v1/listen` which takes audio as the body. +6. **Custom topics/intents need a mode** (`custom_topic_mode="extended"`, `"strict"`) or they are ignored. + +## Example files in this repo + +- `examples/40-text-intelligence.py` +- `tests/wire/test_read_v1_text.py` + +## Central product skills + +For cross-language Deepgram product knowledge — the consolidated API reference, documentation finder, focused runnable recipes, third-party integration examples, and MCP setup — install the central skills: + +```bash +npx skills add deepgram/skills +``` + +This SDK ships language-idiomatic code skills; `deepgram/skills` ships cross-language product knowledge (see `api`, `docs`, `recipes`, `examples`, `starters`, `setup-mcp`). diff --git a/.agents/skills/deepgram-python-text-to-speech/SKILL.md b/.agents/skills/deepgram-python-text-to-speech/SKILL.md new file mode 100644 index 00000000..a8fed784 --- /dev/null +++ b/.agents/skills/deepgram-python-text-to-speech/SKILL.md @@ -0,0 +1,164 @@ +--- +name: deepgram-python-text-to-speech +description: Use when writing or reviewing Python code in this repo that calls Deepgram Text-to-Speech v1 (`/v1/speak`) for audio synthesis. Covers one-shot REST (`client.speak.v1.audio.generate`) and streaming WebSocket (`client.speak.v1.connect`). Also covers the in-repo `deepgram.helpers.TextBuilder` for incremental text assembly before synthesis. Use `deepgram-python-voice-agent` when you need full-duplex STT + LLM + TTS with barge-in. Triggers include "TTS", "speak", "synthesize voice", "aura", "text to speech", "speak.v1", "TextBuilder". +--- + +# Using Deepgram Text-to-Speech (Python SDK) + +Convert text to audio: one-shot REST download or low-latency streaming synthesis via `/v1/speak`. + +## When to use this product + +- **REST (`speak.v1.audio.generate`)** — one-shot synthesis, returns audio bytes. Use for rendered files, pre-generated prompts, anything where you have the full text upfront. +- **WebSocket (`speak.v1.connect`)** — incremental text input, streaming audio output. Use for low-latency playback while an LLM is still producing tokens. + +**Use a different skill when:** +- You need the agent to also listen and converse (full-duplex) → `deepgram-python-voice-agent`. + +## Authentication + +```python +from dotenv import load_dotenv +load_dotenv() + +from deepgram import DeepgramClient +client = DeepgramClient() # reads DEEPGRAM_API_KEY +``` + +Header: `Authorization: Token ` (NOT `Bearer`). + +## Quick start — REST (one-shot) + +```python +audio_iter = client.speak.v1.audio.generate( + text="Hello, this is a text to speech example.", + model="aura-2-asteria-en", + encoding="linear16", + sample_rate=24000, +) + +with open("output.raw", "wb") as f: + for chunk in audio_iter: + f.write(chunk) +``` + +Returns an iterator of `bytes` (streaming audio response). The response body is `audio/*`, NOT JSON. Useful response headers: `dg-model-name`, `dg-char-count`, `dg-request-id`. + +## Quick start — WebSocket (streaming) + +```python +from deepgram.core.events import EventType +from deepgram.speak.v1.types import SpeakV1Text + +with client.speak.v1.connect( + model="aura-2-asteria-en", + encoding="linear16", + sample_rate=24000, +) as conn: + def on_message(m): + if isinstance(m, bytes): + # audio chunk — write to file or audio output + ... + else: + print(f"event: {getattr(m, 'type', 'Unknown')}") + + conn.on(EventType.OPEN, lambda _: print("open")) + conn.on(EventType.MESSAGE, on_message) + conn.on(EventType.CLOSE, lambda _: print("close")) + conn.on(EventType.ERROR, lambda e: print(f"err: {e}")) + + conn.send_text(SpeakV1Text(text="Hello, this is streaming TTS.")) + conn.send_flush() + conn.send_close() + conn.start_listening() # blocks until server closes +``` + +In **sync** mode, `start_listening()` blocks — send all text + flush + close BEFORE calling it, OR run it in a thread. In **async** mode, run `start_listening()` as a task and send concurrently. + +## TextBuilder helper (incremental text assembly) + +`deepgram.helpers.TextBuilder` is a hand-maintained helper (NOT Fern-generated) that assembles text incrementally — useful when streaming LLM tokens into TTS. + +```python +from deepgram.helpers import TextBuilder + +final_text = ( + TextBuilder() + .text("Hello,") + .text(" this is built incrementally.") + .pronunciation("Deepgram", "ˈdiːpɡɹæm") + .pause(200) + .build() +) +``` + +The fluent API is `.text(...)` (append raw text), `.pronunciation(word, ipa)` (pin pronunciation), `.pause(duration_ms)` (insert a pause), and `.build()` (return the final SSML-ish string). There is no `.add(...)` method. + +See `examples/22-text-builder-demo.py`, `examples/23-text-builder-helper.py`, `examples/24-text-builder-streaming.py`. + +## Async equivalents + +```python +from deepgram import AsyncDeepgramClient +client = AsyncDeepgramClient() + +# REST +audio_iter = await client.speak.v1.audio.generate(text=..., model="aura-2-asteria-en") +async for chunk in audio_iter: + ... + +# WSS +async with client.speak.v1.connect(model="aura-2-asteria-en", ...) as conn: + listen_task = asyncio.create_task(conn.start_listening()) + await conn.send_text(SpeakV1Text(text="...")) + await conn.send_flush() + await conn.send_close() + await listen_task +``` + +## Key parameters + +REST & WSS: `model` (e.g. `aura-2-asteria-en`), `encoding` (`linear16`, `mulaw`, `alaw`, `opus`, `flac`, `mp3`, `aac`), `sample_rate`, `bit_rate`, `container`, `callback` (REST async), `tag`, `mip_opt_out`. + +WSS client messages: `SpeakV1Text`, `Flush`, `Clear`, `Close`. + +## API reference (layered) + +1. **In-repo reference**: `reference.md` — sections "Speak V1 Audio" (REST) and "Speak V1 Connect" (WSS). +2. **OpenAPI (REST)**: https://developers.deepgram.com/openapi.yaml +3. **AsyncAPI (WSS)**: https://developers.deepgram.com/asyncapi.yaml +4. **Context7**: library ID `/llmstxt/developers_deepgram_llms_txt`. +5. **Product docs**: + - https://developers.deepgram.com/reference/text-to-speech/speak-request + - https://developers.deepgram.com/reference/text-to-speech/speak-streaming + - https://developers.deepgram.com/docs/tts-models + +## Gotchas + +1. **`Token` auth, not `Bearer`.** +2. **REST response is audio bytes, not JSON.** Iterate the response; don't `.json()` it. +3. **Flush before close (WSS).** `send_close()` without `send_flush()` may drop trailing audio. +4. **Sync `start_listening()` blocks.** Queue all messages first, or use async. +5. **`SpeakV1Text` is required** for WSS text input — don't send raw strings. +6. **`encoding`/`sample_rate`/`container` must match your playback path.** Mismatches cause silent failure or distortion. +7. **`TextBuilder` helpers are hand-maintained** (listed in `.fernignore` as permanently frozen). Don't move them under `src/deepgram/` auto-generated paths. + +## Example files in this repo + +- `examples/20-text-to-speech-single.py` — REST one-shot +- `examples/21-text-to-speech-streaming.py` — WSS streaming +- `examples/22-text-builder-demo.py` — TextBuilder (no API key) +- `examples/23-text-builder-helper.py` — TextBuilder + REST +- `examples/24-text-builder-streaming.py` — TextBuilder + WSS +- `tests/wire/test_speak_v1_audio.py` — REST wire test +- `tests/manual/speak/v1/connect/main.py` — live WSS test + +## Central product skills + +For cross-language Deepgram product knowledge — the consolidated API reference, documentation finder, focused runnable recipes, third-party integration examples, and MCP setup — install the central skills: + +```bash +npx skills add deepgram/skills +``` + +This SDK ships language-idiomatic code skills; `deepgram/skills` ships cross-language product knowledge (see `api`, `docs`, `recipes`, `examples`, `starters`, `setup-mcp`). diff --git a/.agents/skills/deepgram-python-voice-agent/SKILL.md b/.agents/skills/deepgram-python-voice-agent/SKILL.md new file mode 100644 index 00000000..e121fa9d --- /dev/null +++ b/.agents/skills/deepgram-python-voice-agent/SKILL.md @@ -0,0 +1,162 @@ +--- +name: deepgram-python-voice-agent +description: Use when writing or reviewing Python code in this repo that builds an interactive voice agent via `agent.deepgram.com/v1/agent/converse`. Covers `client.agent.v1.connect()`, `AgentV1Settings`, `send_settings`, `send_media`, event handling, and function/tool calling. Full-duplex STT + LLM + TTS with barge-in. Use `deepgram-python-text-to-speech` for one-way synthesis, `deepgram-python-speech-to-text` / `deepgram-python-conversational-stt` for transcription only. Triggers include "voice agent", "agent converse", "full duplex", "interactive assistant", "barge-in", "agent.v1", "function calling", "AgentV1Settings". +--- + +# Using Deepgram Voice Agent (Python SDK) + +Full-duplex voice agent runtime: STT + LLM (think) + TTS + function calling over a single WebSocket at `agent.deepgram.com/v1/agent/converse`. + +## When to use this product + +- You want an **interactive voice assistant**: user speaks, agent thinks, agent speaks, interruptions allowed. +- You want **function / tool calling** triggered by the conversation. +- You want Deepgram to host the orchestration (vs wiring STT + LLM + TTS yourself). + +**Use a different skill when:** +- One-way transcription → `deepgram-python-speech-to-text` or `deepgram-python-conversational-stt`. +- One-way synthesis → `deepgram-python-text-to-speech`. +- Analytics on finished audio → `deepgram-python-audio-intelligence`. +- Managing reusable agent configs (persisted on the server) → `deepgram-python-management-api`. + +## Authentication + +```python +from dotenv import load_dotenv +load_dotenv() + +from deepgram import DeepgramClient +client = DeepgramClient() +``` + +Header: `Authorization: Token `. Base URL: `wss://agent.deepgram.com/v1/agent/converse`. + +## Quick start + +```python +import threading, time +from deepgram.core.events import EventType +from deepgram.agent.v1.types import ( + AgentV1Settings, + AgentV1SettingsAgent, + AgentV1SettingsAgentListen, + AgentV1SettingsAgentListenProvider_V1, + AgentV1SettingsAudio, + AgentV1SettingsAudioInput, +) +from deepgram.types.speak_settings_v1 import SpeakSettingsV1 +from deepgram.types.speak_settings_v1provider import SpeakSettingsV1Provider_Deepgram +from deepgram.types.think_settings_v1 import ThinkSettingsV1 +from deepgram.types.think_settings_v1provider import ThinkSettingsV1Provider_OpenAi + +with client.agent.v1.connect() as agent: + settings = AgentV1Settings( + audio=AgentV1SettingsAudio( + input=AgentV1SettingsAudioInput(encoding="linear16", sample_rate=24000), + ), + agent=AgentV1SettingsAgent( + listen=AgentV1SettingsAgentListen( + provider=AgentV1SettingsAgentListenProvider_V1(type="deepgram", model="nova-3"), + ), + think=ThinkSettingsV1( + provider=ThinkSettingsV1Provider_OpenAi( + type="open_ai", model="gpt-4o-mini", temperature=0.7, + ), + prompt="You are a helpful assistant. Keep replies brief.", + ), + speak=SpeakSettingsV1( + provider=SpeakSettingsV1Provider_Deepgram(type="deepgram", model="aura-2-asteria-en"), + ), + ), + ) + + agent.send_settings(settings) # MUST be first message after connect + + def on_message(m): + if isinstance(m, bytes): + # agent speech audio — play or append to output buffer + return + t = getattr(m, "type", "Unknown") + if t == "ConversationText": + print(f"[{getattr(m, 'role', '?')}] {getattr(m, 'content', '')}") + elif t == "UserStartedSpeaking": print(">> user speaking") + elif t == "AgentThinking": print(">> agent thinking") + elif t == "AgentStartedSpeaking": print(">> agent speaking") + elif t == "AgentAudioDone": print(">> agent done") + elif t == "FunctionCallRequest": handle_tool_call(m) + + agent.on(EventType.OPEN, lambda _: print("open")) + agent.on(EventType.MESSAGE, on_message) + agent.on(EventType.CLOSE, lambda _: print("close")) + agent.on(EventType.ERROR, lambda e: print(f"err: {e}")) + + def send_audio(): + for chunk in mic_chunks(): + agent.send_media(chunk) + + threading.Thread(target=send_audio, daemon=True).start() + agent.start_listening() # blocks +``` + +## Event types (server → client) + +- `Welcome` — connection acknowledged +- `SettingsApplied` — your `Settings` accepted +- `ConversationText` — text of a turn (with `role`: `user` or `assistant`) +- `UserStartedSpeaking` — VAD detected user +- `AgentThinking` — LLM is working +- `FunctionCallRequest` — tool/function call initiated by the model +- `AgentStartedSpeaking` — TTS starting +- Binary frames — audio chunks +- `AgentAudioDone` — TTS finished for this turn +- `Warning`, `Error` + +## Client messages + +- Initial `Settings` (send first) +- `Media` (binary audio frames in declared encoding/sample_rate) +- `KeepAlive` (on long sessions) +- Prompt / think / speak update messages (change mid-session) +- User / assistant text injection +- Function call response (reply to `FunctionCallRequest`) + +## Reusable agent configurations + +You can persist the **`agent` block** of a Settings message server-side and reuse it by `agent_id`. `client.voice_agent.configurations.create` stores a JSON string representing the `agent` object only (listen / think / speak providers + prompt) — NOT the full `AgentV1Settings` payload. Do not send top-level Settings fields like `audio` to that API; those still go in the live Settings message at connect time. The returned `agent_id` replaces the inline `agent` object in future Settings messages. Managed via `client.voice_agent.configurations.*` — see `deepgram-python-management-api`. + +## API reference (layered) + +1. **In-repo reference**: `reference.md` — "Agent V1 Connect", "Voice Agent Configurations". +2. **AsyncAPI (WSS)**: https://developers.deepgram.com/asyncapi.yaml +3. **Context7**: library ID `/llmstxt/developers_deepgram_llms_txt`. +4. **Product docs**: + - https://developers.deepgram.com/reference/voice-agent/voice-agent + - https://developers.deepgram.com/docs/voice-agent + - https://developers.deepgram.com/docs/configure-voice-agent + - https://developers.deepgram.com/docs/voice-agent-message-flow + +## Gotchas + +1. **Pick the right auth scheme for the credential type.** API keys use `Authorization: Token `. Temporary / access tokens (created via `client.auth.v1.tokens.grant()` or an equivalent server) use `Authorization: Bearer `. The custom `DeepgramClient` in this repo accepts an `access_token` parameter and installs a Bearer override for all HTTP + WebSocket calls — see `src/deepgram/client.py`. +2. **Base URL is `agent.deepgram.com`, not `api.deepgram.com`.** +3. **Send `Settings` IMMEDIATELY after connect** — no audio before settings are applied. +4. **Listen/speak encoding + sample_rate must match** both your input audio and your playback path. +5. **Keepalive on long idle sessions**, otherwise the server closes. +6. **Function call responses are synchronous to the turn** — reply promptly. +7. **Provider types are tagged unions** (`ThinkSettingsV1Provider_OpenAi`, `SpeakSettingsV1Provider_Deepgram`, ...). Pick the right union variant; don't pass raw dicts. +8. **`socket_client.py` is temporarily frozen** (see `.fernignore` → `src/deepgram/agent/v1/socket_client.py`) and currently carries `_sanitize_numeric_types` plus the `construct_type` / broad-catch fixes — needed for unknown WS message shapes. Expected to be unfrozen during a future Fern regen and re-compared. + +## Example files in this repo + +- `examples/30-voice-agent.py` +- `tests/manual/agent/v1/connect/main.py` — live connection test + +## Central product skills + +For cross-language Deepgram product knowledge — the consolidated API reference, documentation finder, focused runnable recipes, third-party integration examples, and MCP setup — install the central skills: + +```bash +npx skills add deepgram/skills +``` + +This SDK ships language-idiomatic code skills; `deepgram/skills` ships cross-language product knowledge (see `api`, `docs`, `recipes`, `examples`, `starters`, `setup-mcp`). diff --git a/.fernignore b/.fernignore index 95cce1f0..066def13 100644 --- a/.fernignore +++ b/.fernignore @@ -55,10 +55,12 @@ src/deepgram/transport_interface.py src/deepgram/transport.py src/deepgram/transports -# Claude Code agent files +# Agent files (Claude Code, OpenCode, other agent tools) +# .agents/skills/ holds agent-agnostic skills discoverable via `npx skills` CLAUDE.md AGENTS.md .claude +.agents # Folders to ignore .github