deepgram · lukeocodes · Apr 24, 2026 · Apr 24, 2026 · Apr 24, 2026 · Apr 24, 2026
@@ -0,0 +1,165 @@
+---
+name: deepgram-python-audio-intelligence
+description: Use when writing or reviewing Python code in this repo that calls Deepgram audio analytics overlays on `/v1/listen` - summarize, topics, intents, sentiment, diarize, redact, detect_language, entity detection. Same endpoint as plain STT but with analytics params. Covers both REST (`client.listen.v1.media.transcribe_url`/`transcribe_file`) and the WSS-supported subset (`client.listen.v1.connect`). Use `deepgram-python-speech-to-text` for plain transcription, `deepgram-python-text-intelligence` for analytics on already-transcribed text. Triggers include "diarize", "summarize audio", "sentiment from audio", "redact PII", "topic detection audio", "audio intelligence", "detect language audio".
+---
+
+# Using Deepgram Audio Intelligence (Python SDK)
+
+Analytics overlays applied to `/v1/listen` transcription: summarize, topics, intents, sentiment, language detection, diarization, redaction, entities. Same endpoint / same client methods as STT — enable features via params.
+
+## When to use this product
+
+- You have **audio** (file, URL, or live stream) and want analytics alongside the transcript.
+- REST is the primary path — most analytics are REST-only.
+
+**Use a different skill when:**
+- You want a pure transcript with no analytics → `deepgram-python-speech-to-text`.
+- Your input is already transcribed text → `deepgram-python-text-intelligence` (`/v1/read`).
+- You need conversational turn-taking → `deepgram-python-conversational-stt`.
+- You need a full interactive agent → `deepgram-python-voice-agent`.
+
+## Feature availability: REST vs WSS
+
+| Feature | REST | WSS |
+|---|---|---|
+| `diarize` | yes | yes |
+| `redact` | yes | yes |
+| `punctuate`, `smart_format` | yes | yes |
+| Entity detection | yes | yes |
+| `summarize` | yes | **no** |
+| `topics` | yes | **no** |
+| `intents` | yes | **no** |
+| `sentiment` | yes | **no** |
+| `detect_language` | yes | **no** |
+| `custom_topic` / `custom_intent` | yes | **no** |
+
+For the WSS-only subset, same code path as `deepgram-python-speech-to-text`.
+
+## Authentication
+
+```python
+from dotenv import load_dotenv
+load_dotenv()
+
+from deepgram import DeepgramClient
+client = DeepgramClient()
+```
+
+Header: `Authorization: Token <api_key>`.
+
+## Quick start — REST with full analytics
+
+```python
+response = client.listen.v1.media.transcribe_url(
+    url="https://dpgr.am/spacewalk.wav",
+    model="nova-3",
+    smart_format=True,
+    punctuate=True,
+    diarize=True,              # speaker separation
+    summarize="v2",            # "v2" for the current model; True also accepted on /v1/listen
+    topics=True,
+    intents=True,
+    sentiment=True,
+    detect_language=True,
+    redact=["pci", "pii"],     # or Sequence[str]
+    language="en-US",
+)
+
+r = response.results
+print("transcript:", r.channels[0].alternatives[0].transcript)
+print("summary:",    r.summary)
+print("topics:",     r.topics)
+print("intents:",    r.intents)
+print("sentiments:", r.sentiments)
+print("detected_language:", r.channels[0].detected_language)
+
+# Speaker diarization
+for word in r.channels[0].alternatives[0].words or []:
+    speaker = getattr(word, "speaker", None)
+    if speaker is not None:
+        print(f"Speaker {speaker}: {word.word}")
+```
+
+## Quick start — REST file
+
+```python
+with open("call.wav", "rb") as f:
+    audio = f.read()
+
+response = client.listen.v1.media.transcribe_file(
+    request=audio,
+    model="nova-3",
+    diarize=True,
+    redact=["pii"],
+    summarize="v2",
+    topics=True,
+)
+```
+
+## Quick start — WSS subset (diarize / redact / entities only)
+
+```python
+import threading
+from deepgram.core.events import EventType
+
+with client.listen.v1.connect(model="nova-3", diarize=True, redact=["pii"]) as conn:
+    conn.on(EventType.MESSAGE, lambda m: print(m))
+    threading.Thread(target=conn.start_listening, daemon=True).start()
+    for chunk in audio_chunks:
+        conn.send_media(chunk)
+    conn.send_finalize()
+```
+
+## Key parameters
+
+`summarize`, `topics`, `intents`, `sentiment`, `detect_language`, `diarize`, `redact`, `custom_topic`, `custom_topic_mode`, `custom_intent`, `custom_intent_mode`, `detect_entities`, plus all the standard STT params (`model`, `language`, `encoding`, `sample_rate`, ...).
+
+`redact` is typed as `Optional[str]` in the current generated SDK (`src/deepgram/listen/v1/media/client.py`). Pass a single redaction mode such as `"pci"`, `"pii"`, `"numbers"`, or `"phi"`. Multi-mode redaction at the transport level is supported by sending `redact` as a repeated query parameter — check `src/deepgram/types/listen_v1redact.py` for the current type and fall back to raw query-param construction (or multiple calls) if you need several modes. The earlier `Union[str, Sequence[str]]` override is no longer carried in `.fernignore`.
+
+## API reference (layered)
+
+1. **In-repo reference**: `reference.md` — "Listen V1 Media" (REST params include all analytics flags), "Listen V1 Connect" (WSS-supported subset).
+2. **OpenAPI (REST)**: https://developers.deepgram.com/openapi.yaml
+3. **AsyncAPI (WSS)**: https://developers.deepgram.com/asyncapi.yaml
+4. **Context7**: library ID `/llmstxt/developers_deepgram_llms_txt`.
+5. **Product docs**:
+   - https://developers.deepgram.com/docs/stt-intelligence-feature-overview
+   - https://developers.deepgram.com/docs/summarization
+   - https://developers.deepgram.com/docs/topic-detection
+   - https://developers.deepgram.com/docs/intent-recognition
+   - https://developers.deepgram.com/docs/sentiment-analysis
+   - https://developers.deepgram.com/docs/language-detection
+   - https://developers.deepgram.com/docs/redaction
+   - https://developers.deepgram.com/docs/diarization
+
+## Gotchas
+
+1. **`summarize` on `/v1/listen` accepts a boolean OR the string `"v2"`.** Use `"v2"` to pin the current summarization model; `True` also works (maps to the default model). `/v1/read` is the reverse — it accepts boolean only. If you need summarization on already-transcribed text, see `deepgram-python-text-intelligence`.
+2. **Sentiment / topics / intents / summarize / detect_language are REST-only.** Don't pass them on WSS — they'll be ignored or rejected.
+3. **English-only** for sentiment / topics / intents / summarize.
+4. **Not all models support all overlays.** Flux / Base models have restrictions. Stick to `nova-3` unless you have a reason.
+5. **Redaction values** are `pci`, `pii`, `phi`, `numbers`, etc. — not arbitrary strings.
+6. **`custom_topic` / `custom_intent` need a mode** (`"extended"` or `"strict"`).
+7. **Diarization is noisy on short / low-quality audio.** Expect speaker churn on <30s clips.
+
+## Example files in this repo
+
+- `examples/15-transcription-advanced-options.py` — smart_format, punctuate, diarize
+- `tests/wire/test_listen_v1_media.py` — wire test covering intelligence params
+
+## Related skills
+
+- `deepgram-python-speech-to-text` — same endpoint, plain transcription
+- `deepgram-python-text-intelligence` — same analytics, text input
+- `deepgram-python-conversational-stt` — Flux for turn-taking
+- `deepgram-python-voice-agent` — interactive assistants
+
+## Central product skills
+
+For cross-language Deepgram product knowledge — the consolidated API reference, documentation finder, focused runnable recipes, third-party integration examples, and MCP setup — install the central skills:
+
+```bash
+npx skills add deepgram/skills
+```
+
+This SDK ships language-idiomatic code skills; `deepgram/skills` ships cross-language product knowledge (see `api`, `docs`, `recipes`, `examples`, `starters`, `setup-mcp`).
@@ -0,0 +1,154 @@
+---
+name: deepgram-python-conversational-stt
+description: Use when writing or reviewing Python code in this repo that calls Deepgram Conversational STT v2 / Flux (`/v2/listen`) for turn-aware streaming transcription. Covers `client.listen.v2.connect(...)`, Flux models, end-of-turn detection. Use `deepgram-python-speech-to-text` for standard v1 ASR, `deepgram-python-voice-agent` for full-duplex interactive assistants. Triggers include "flux", "v2 listen", "conversational STT", "turn detection", "end of turn", "EOT", "listen.v2", "flux-general-en", "flux-general-multi".
+---
+
+# Using Deepgram Conversational STT / Flux (Python SDK)
+
+Turn-aware streaming STT at `/v2/listen` — optimized for conversational audio (end-of-turn detection, eager EOT, barge-in scenarios).
+
+## When to use this product
+
+- You're building a **conversational UI** and need explicit turn boundaries.
+- You want **Flux models** (optimized for human-to-human or human-to-agent conversation).
+- You want lower latency turn signals than v1 utterance_end.
+
+**Use a different skill when:**
+- You want general-purpose transcription (captions, batch, non-conversational) → `deepgram-python-speech-to-text`.
+- You want a full interactive agent (STT + LLM + TTS) → `deepgram-python-voice-agent`.
+- You want analytics (summarize/sentiment) → `deepgram-python-audio-intelligence`.
+
+## Authentication
+
+```python
+import os
+from dotenv import load_dotenv
+load_dotenv()
+
+from deepgram import DeepgramClient
+client = DeepgramClient(api_key=os.environ["DEEPGRAM_API_KEY"])
+```
+
+Header: `Authorization: Token <api_key>`. WSS only — no REST path on v2.
+
+## Quick start
+
+```python
+import threading, time
+from pathlib import Path
+from deepgram.core.events import EventType
+from deepgram.listen.v2.types import (
+    ListenV2CloseStream,
+    ListenV2Connected,
+    ListenV2FatalError,
+    ListenV2TurnInfo,
+)
+
+with client.listen.v2.connect(
+    model="flux-general-en",
+    encoding="linear16",
+    sample_rate="16000",
+) as conn:
+
+    def on_message(m):
+        if isinstance(m, ListenV2TurnInfo):
+            print(f"turn {m.turn_index} [{m.event}] {m.transcript}")
+        elif isinstance(m, dict):                     # untyped fallback
+            if m.get("type") == "TurnInfo":
+                print(f"turn {m.get('turn_index')} [{m.get('event')}] {m.get('transcript')}")
+        else:
+            print(f"event: {getattr(m, 'type', type(m).__name__)}")
+
+    conn.on(EventType.OPEN,    lambda _: print("open"))
+    conn.on(EventType.MESSAGE, on_message)
+    conn.on(EventType.CLOSE,   lambda _: print("close"))
+    conn.on(EventType.ERROR,   lambda e: print(f"err: {type(e).__name__}: {e}"))
+
+    def send_audio():
+        for chunk in mic_chunks():                     # 80ms recommended
+            conn.send_media(chunk)
+            time.sleep(0.01)
+        conn.send_close_stream(ListenV2CloseStream(type="CloseStream"))
+
+    threading.Thread(target=send_audio, daemon=True).start()
+    conn.start_listening()
+```
+
+## Key parameters
+
+| Param | Notes |
+|---|---|
+| `model` | `flux-general-en` (English) or `flux-general-multi` (multilingual) — REQUIRED, must be a Flux model |
+| `encoding` | `linear16`, `mulaw`, etc. Omit for containerized audio |
+| `sample_rate` | String in the SDK signature, e.g. `"16000"` |
+| `eager_eot_threshold` | Fire end-of-turn early at this confidence |
+| `eot_threshold` | Primary end-of-turn confidence |
+| `eot_timeout_ms` | Time-based fallback turn end |
+| `keyterm` | Bias for domain keywords |
+| `mip_opt_out`, `tag` | Metadata / privacy flags |
+| `language_hint` | **ONLY for `flux-general-multi`** |
+| `authorization`, `request_options` | Override auth or request options |
+
+**No `language` parameter** on v2 — language is implied by model (`flux-general-en`) or hinted via `language_hint` on multi.
+
+## Events (server → client)
+
+- `ListenV2Connected` — connection established
+- `ListenV2ConfigureSuccess` / `ListenV2ConfigureFailure` — mid-session config changes
+- `ListenV2TurnInfo` — per-turn transcript + event (`Update`, `EndOfTurn`, `EagerEndOfTurn`, ...) + `turn_index`
+- `ListenV2FatalError` — terminal error
+
+Client messages: `ListenV2Media`, `ListenV2Configure`, `ListenV2CloseStream`.
+
+## Async equivalent
+
+```python
+from deepgram import AsyncDeepgramClient
+client = AsyncDeepgramClient()
+
+async with client.listen.v2.connect(model="flux-general-en", ...) as conn:
+    # same .on(...) handlers, then:
+    await conn.start_listening()
+```
+
+## API reference (layered)
+
+1. **In-repo reference**: `reference.md` — "Listen V2 Connect".
+2. **AsyncAPI (WSS)**: https://developers.deepgram.com/asyncapi.yaml
+3. **Context7**: library ID `/llmstxt/developers_deepgram_llms_txt`.
+4. **Product docs**:
+   - https://developers.deepgram.com/reference/speech-to-text/listen-flux
+   - https://developers.deepgram.com/docs/flux/quickstart
+   - https://developers.deepgram.com/docs/flux/language-prompting
+
+## Gotchas
+
+1. **`/v2/listen`, not `/v1/listen`.** Different route, different client path (`listen.v2` vs `listen.v1`).
+2. **Flux models only.** `nova-3`, `base`, etc. will be rejected. Use `flux-general-en` or `flux-general-multi`.
+3. **No `language` parameter.** Language is set by model choice. Use `language_hint` on `flux-general-multi`.
+4. **`sample_rate` is a STRING** in the SDK (e.g. `"16000"`).
+5. **Send ~80ms audio chunks** for best turn-detection latency.
+6. **Close with `send_close_stream(ListenV2CloseStream(type="CloseStream"))`** — not `send_finalize` (that's v1).
+7. **Messages may arrive as typed objects OR raw dicts** — the SDK uses a tagged union with `construct_type` for unknowns. Handle both branches (see `socket_client.py` patch in `.fernignore`).
+8. **`socket_client.py` is patched / frozen** (see `.fernignore` → `src/deepgram/listen/v2/socket_client.py`). Don't overwrite that manual patch during regeneration; treat other `listen/v2` files as generated unless the regen workflow says otherwise.
+9. **Omit `encoding`/`sample_rate` for containerized audio** (WAV, OGG, etc.) — the server detects them from the container.
+
+## Example files in this repo
+
+- `examples/14-transcription-live-websocket-v2.py`
+- `tests/manual/listen/v2/connect/main.py`
+
+## Related skills
+
+- `deepgram-python-speech-to-text` — v1 general-purpose STT (REST + WSS)
+- `deepgram-python-voice-agent` — full interactive assistant
+
+## Central product skills
+
+For cross-language Deepgram product knowledge — the consolidated API reference, documentation finder, focused runnable recipes, third-party integration examples, and MCP setup — install the central skills:
+
+```bash
+npx skills add deepgram/skills
+```
+
+This SDK ships language-idiomatic code skills; `deepgram/skills` ships cross-language product knowledge (see `api`, `docs`, `recipes`, `examples`, `starters`, `setup-mcp`).