-
Notifications
You must be signed in to change notification settings - Fork 125
chore: add agent skills for product usage and maintenance [no-ci] #695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
lukeocodes
wants to merge
7
commits into
main
Choose a base branch
from
lo/add-agent-skills
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
ece8bb7
chore: add agent skills for product usage and maintenance
lukeocodes 7380302
docs: cross-reference deepgram/skills for central product knowledge
lukeocodes ec3836e
docs: fix Python SDK path and type errors found by Copilot review
lukeocodes 6c255c3
docs: correct summarize parameter - both bool and 'v2' are valid on /…
lukeocodes 6a3723c
docs: correct summarize semantics - /v1/read is bool-only, /v1/listen…
lukeocodes 288b1bc
chore: namespace skill names with deepgram-python- prefix
lukeocodes 3e22465
docs: fix stale Python SDK claims (Copilot re-review)
lukeocodes File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
165 changes: 165 additions & 0 deletions
165
.agents/skills/deepgram-python-audio-intelligence/SKILL.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,165 @@ | ||
| --- | ||
| name: deepgram-python-audio-intelligence | ||
| description: Use when writing or reviewing Python code in this repo that calls Deepgram audio analytics overlays on `/v1/listen` - summarize, topics, intents, sentiment, diarize, redact, detect_language, entity detection. Same endpoint as plain STT but with analytics params. Covers both REST (`client.listen.v1.media.transcribe_url`/`transcribe_file`) and the WSS-supported subset (`client.listen.v1.connect`). Use `deepgram-python-speech-to-text` for plain transcription, `deepgram-python-text-intelligence` for analytics on already-transcribed text. Triggers include "diarize", "summarize audio", "sentiment from audio", "redact PII", "topic detection audio", "audio intelligence", "detect language audio". | ||
| --- | ||
|
|
||
| # Using Deepgram Audio Intelligence (Python SDK) | ||
|
|
||
| Analytics overlays applied to `/v1/listen` transcription: summarize, topics, intents, sentiment, language detection, diarization, redaction, entities. Same endpoint / same client methods as STT — enable features via params. | ||
|
|
||
| ## When to use this product | ||
|
|
||
| - You have **audio** (file, URL, or live stream) and want analytics alongside the transcript. | ||
| - REST is the primary path — most analytics are REST-only. | ||
|
|
||
| **Use a different skill when:** | ||
| - You want a pure transcript with no analytics → `deepgram-python-speech-to-text`. | ||
| - Your input is already transcribed text → `deepgram-python-text-intelligence` (`/v1/read`). | ||
| - You need conversational turn-taking → `deepgram-python-conversational-stt`. | ||
| - You need a full interactive agent → `deepgram-python-voice-agent`. | ||
|
|
||
| ## Feature availability: REST vs WSS | ||
|
|
||
| | Feature | REST | WSS | | ||
| |---|---|---| | ||
| | `diarize` | yes | yes | | ||
| | `redact` | yes | yes | | ||
| | `punctuate`, `smart_format` | yes | yes | | ||
| | Entity detection | yes | yes | | ||
| | `summarize` | yes | **no** | | ||
| | `topics` | yes | **no** | | ||
| | `intents` | yes | **no** | | ||
| | `sentiment` | yes | **no** | | ||
| | `detect_language` | yes | **no** | | ||
| | `custom_topic` / `custom_intent` | yes | **no** | | ||
|
|
||
| For the WSS-only subset, same code path as `deepgram-python-speech-to-text`. | ||
|
|
||
| ## Authentication | ||
|
|
||
| ```python | ||
| from dotenv import load_dotenv | ||
| load_dotenv() | ||
|
|
||
| from deepgram import DeepgramClient | ||
| client = DeepgramClient() | ||
| ``` | ||
|
|
||
| Header: `Authorization: Token <api_key>`. | ||
|
|
||
| ## Quick start — REST with full analytics | ||
|
|
||
| ```python | ||
| response = client.listen.v1.media.transcribe_url( | ||
| url="https://dpgr.am/spacewalk.wav", | ||
| model="nova-3", | ||
| smart_format=True, | ||
| punctuate=True, | ||
| diarize=True, # speaker separation | ||
| summarize="v2", # "v2" for the current model; True also accepted on /v1/listen | ||
| topics=True, | ||
| intents=True, | ||
| sentiment=True, | ||
| detect_language=True, | ||
| redact=["pci", "pii"], # or Sequence[str] | ||
| language="en-US", | ||
| ) | ||
|
|
||
| r = response.results | ||
| print("transcript:", r.channels[0].alternatives[0].transcript) | ||
| print("summary:", r.summary) | ||
| print("topics:", r.topics) | ||
| print("intents:", r.intents) | ||
| print("sentiments:", r.sentiments) | ||
| print("detected_language:", r.channels[0].detected_language) | ||
|
|
||
| # Speaker diarization | ||
| for word in r.channels[0].alternatives[0].words or []: | ||
| speaker = getattr(word, "speaker", None) | ||
| if speaker is not None: | ||
| print(f"Speaker {speaker}: {word.word}") | ||
| ``` | ||
|
|
||
| ## Quick start — REST file | ||
|
|
||
| ```python | ||
| with open("call.wav", "rb") as f: | ||
| audio = f.read() | ||
|
|
||
| response = client.listen.v1.media.transcribe_file( | ||
| request=audio, | ||
| model="nova-3", | ||
| diarize=True, | ||
| redact=["pii"], | ||
| summarize="v2", | ||
| topics=True, | ||
| ) | ||
| ``` | ||
|
|
||
| ## Quick start — WSS subset (diarize / redact / entities only) | ||
|
|
||
| ```python | ||
| import threading | ||
| from deepgram.core.events import EventType | ||
|
|
||
| with client.listen.v1.connect(model="nova-3", diarize=True, redact=["pii"]) as conn: | ||
| conn.on(EventType.MESSAGE, lambda m: print(m)) | ||
| threading.Thread(target=conn.start_listening, daemon=True).start() | ||
| for chunk in audio_chunks: | ||
| conn.send_media(chunk) | ||
| conn.send_finalize() | ||
| ``` | ||
|
|
||
| ## Key parameters | ||
|
|
||
| `summarize`, `topics`, `intents`, `sentiment`, `detect_language`, `diarize`, `redact`, `custom_topic`, `custom_topic_mode`, `custom_intent`, `custom_intent_mode`, `detect_entities`, plus all the standard STT params (`model`, `language`, `encoding`, `sample_rate`, ...). | ||
|
|
||
| `redact` is typed as `Optional[str]` in the current generated SDK (`src/deepgram/listen/v1/media/client.py`). Pass a single redaction mode such as `"pci"`, `"pii"`, `"numbers"`, or `"phi"`. Multi-mode redaction at the transport level is supported by sending `redact` as a repeated query parameter — check `src/deepgram/types/listen_v1redact.py` for the current type and fall back to raw query-param construction (or multiple calls) if you need several modes. The earlier `Union[str, Sequence[str]]` override is no longer carried in `.fernignore`. | ||
|
|
||
| ## API reference (layered) | ||
|
|
||
| 1. **In-repo reference**: `reference.md` — "Listen V1 Media" (REST params include all analytics flags), "Listen V1 Connect" (WSS-supported subset). | ||
| 2. **OpenAPI (REST)**: https://developers.deepgram.com/openapi.yaml | ||
| 3. **AsyncAPI (WSS)**: https://developers.deepgram.com/asyncapi.yaml | ||
| 4. **Context7**: library ID `/llmstxt/developers_deepgram_llms_txt`. | ||
| 5. **Product docs**: | ||
| - https://developers.deepgram.com/docs/stt-intelligence-feature-overview | ||
| - https://developers.deepgram.com/docs/summarization | ||
| - https://developers.deepgram.com/docs/topic-detection | ||
| - https://developers.deepgram.com/docs/intent-recognition | ||
| - https://developers.deepgram.com/docs/sentiment-analysis | ||
| - https://developers.deepgram.com/docs/language-detection | ||
| - https://developers.deepgram.com/docs/redaction | ||
| - https://developers.deepgram.com/docs/diarization | ||
|
|
||
| ## Gotchas | ||
|
|
||
| 1. **`summarize` on `/v1/listen` accepts a boolean OR the string `"v2"`.** Use `"v2"` to pin the current summarization model; `True` also works (maps to the default model). `/v1/read` is the reverse — it accepts boolean only. If you need summarization on already-transcribed text, see `deepgram-python-text-intelligence`. | ||
| 2. **Sentiment / topics / intents / summarize / detect_language are REST-only.** Don't pass them on WSS — they'll be ignored or rejected. | ||
| 3. **English-only** for sentiment / topics / intents / summarize. | ||
| 4. **Not all models support all overlays.** Flux / Base models have restrictions. Stick to `nova-3` unless you have a reason. | ||
| 5. **Redaction values** are `pci`, `pii`, `phi`, `numbers`, etc. — not arbitrary strings. | ||
| 6. **`custom_topic` / `custom_intent` need a mode** (`"extended"` or `"strict"`). | ||
| 7. **Diarization is noisy on short / low-quality audio.** Expect speaker churn on <30s clips. | ||
|
|
||
| ## Example files in this repo | ||
|
|
||
| - `examples/15-transcription-advanced-options.py` — smart_format, punctuate, diarize | ||
| - `tests/wire/test_listen_v1_media.py` — wire test covering intelligence params | ||
|
|
||
| ## Related skills | ||
|
|
||
| - `deepgram-python-speech-to-text` — same endpoint, plain transcription | ||
| - `deepgram-python-text-intelligence` — same analytics, text input | ||
| - `deepgram-python-conversational-stt` — Flux for turn-taking | ||
| - `deepgram-python-voice-agent` — interactive assistants | ||
|
|
||
| ## Central product skills | ||
|
|
||
| For cross-language Deepgram product knowledge — the consolidated API reference, documentation finder, focused runnable recipes, third-party integration examples, and MCP setup — install the central skills: | ||
|
|
||
| ```bash | ||
| npx skills add deepgram/skills | ||
| ``` | ||
|
|
||
| This SDK ships language-idiomatic code skills; `deepgram/skills` ships cross-language product knowledge (see `api`, `docs`, `recipes`, `examples`, `starters`, `setup-mcp`). | ||
154 changes: 154 additions & 0 deletions
154
.agents/skills/deepgram-python-conversational-stt/SKILL.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,154 @@ | ||
| --- | ||
| name: deepgram-python-conversational-stt | ||
| description: Use when writing or reviewing Python code in this repo that calls Deepgram Conversational STT v2 / Flux (`/v2/listen`) for turn-aware streaming transcription. Covers `client.listen.v2.connect(...)`, Flux models, end-of-turn detection. Use `deepgram-python-speech-to-text` for standard v1 ASR, `deepgram-python-voice-agent` for full-duplex interactive assistants. Triggers include "flux", "v2 listen", "conversational STT", "turn detection", "end of turn", "EOT", "listen.v2", "flux-general-en", "flux-general-multi". | ||
| --- | ||
|
|
||
| # Using Deepgram Conversational STT / Flux (Python SDK) | ||
|
|
||
| Turn-aware streaming STT at `/v2/listen` — optimized for conversational audio (end-of-turn detection, eager EOT, barge-in scenarios). | ||
|
|
||
| ## When to use this product | ||
|
|
||
| - You're building a **conversational UI** and need explicit turn boundaries. | ||
| - You want **Flux models** (optimized for human-to-human or human-to-agent conversation). | ||
| - You want lower latency turn signals than v1 utterance_end. | ||
|
|
||
| **Use a different skill when:** | ||
| - You want general-purpose transcription (captions, batch, non-conversational) → `deepgram-python-speech-to-text`. | ||
| - You want a full interactive agent (STT + LLM + TTS) → `deepgram-python-voice-agent`. | ||
| - You want analytics (summarize/sentiment) → `deepgram-python-audio-intelligence`. | ||
|
|
||
| ## Authentication | ||
|
|
||
| ```python | ||
| import os | ||
| from dotenv import load_dotenv | ||
| load_dotenv() | ||
|
|
||
| from deepgram import DeepgramClient | ||
| client = DeepgramClient(api_key=os.environ["DEEPGRAM_API_KEY"]) | ||
| ``` | ||
|
|
||
| Header: `Authorization: Token <api_key>`. WSS only — no REST path on v2. | ||
|
|
||
| ## Quick start | ||
|
|
||
| ```python | ||
| import threading, time | ||
| from pathlib import Path | ||
| from deepgram.core.events import EventType | ||
| from deepgram.listen.v2.types import ( | ||
| ListenV2CloseStream, | ||
| ListenV2Connected, | ||
| ListenV2FatalError, | ||
| ListenV2TurnInfo, | ||
| ) | ||
|
|
||
| with client.listen.v2.connect( | ||
| model="flux-general-en", | ||
| encoding="linear16", | ||
| sample_rate="16000", | ||
| ) as conn: | ||
|
|
||
| def on_message(m): | ||
| if isinstance(m, ListenV2TurnInfo): | ||
| print(f"turn {m.turn_index} [{m.event}] {m.transcript}") | ||
| elif isinstance(m, dict): # untyped fallback | ||
| if m.get("type") == "TurnInfo": | ||
| print(f"turn {m.get('turn_index')} [{m.get('event')}] {m.get('transcript')}") | ||
| else: | ||
| print(f"event: {getattr(m, 'type', type(m).__name__)}") | ||
|
|
||
| conn.on(EventType.OPEN, lambda _: print("open")) | ||
| conn.on(EventType.MESSAGE, on_message) | ||
| conn.on(EventType.CLOSE, lambda _: print("close")) | ||
| conn.on(EventType.ERROR, lambda e: print(f"err: {type(e).__name__}: {e}")) | ||
|
|
||
| def send_audio(): | ||
| for chunk in mic_chunks(): # 80ms recommended | ||
| conn.send_media(chunk) | ||
| time.sleep(0.01) | ||
| conn.send_close_stream(ListenV2CloseStream(type="CloseStream")) | ||
|
|
||
| threading.Thread(target=send_audio, daemon=True).start() | ||
| conn.start_listening() | ||
| ``` | ||
|
|
||
| ## Key parameters | ||
|
|
||
| | Param | Notes | | ||
| |---|---| | ||
| | `model` | `flux-general-en` (English) or `flux-general-multi` (multilingual) — REQUIRED, must be a Flux model | | ||
| | `encoding` | `linear16`, `mulaw`, etc. Omit for containerized audio | | ||
| | `sample_rate` | String in the SDK signature, e.g. `"16000"` | | ||
| | `eager_eot_threshold` | Fire end-of-turn early at this confidence | | ||
| | `eot_threshold` | Primary end-of-turn confidence | | ||
| | `eot_timeout_ms` | Time-based fallback turn end | | ||
| | `keyterm` | Bias for domain keywords | | ||
| | `mip_opt_out`, `tag` | Metadata / privacy flags | | ||
| | `language_hint` | **ONLY for `flux-general-multi`** | | ||
| | `authorization`, `request_options` | Override auth or request options | | ||
|
|
||
| **No `language` parameter** on v2 — language is implied by model (`flux-general-en`) or hinted via `language_hint` on multi. | ||
|
|
||
| ## Events (server → client) | ||
|
|
||
| - `ListenV2Connected` — connection established | ||
| - `ListenV2ConfigureSuccess` / `ListenV2ConfigureFailure` — mid-session config changes | ||
| - `ListenV2TurnInfo` — per-turn transcript + event (`Update`, `EndOfTurn`, `EagerEndOfTurn`, ...) + `turn_index` | ||
| - `ListenV2FatalError` — terminal error | ||
|
|
||
| Client messages: `ListenV2Media`, `ListenV2Configure`, `ListenV2CloseStream`. | ||
|
|
||
| ## Async equivalent | ||
|
|
||
| ```python | ||
| from deepgram import AsyncDeepgramClient | ||
| client = AsyncDeepgramClient() | ||
|
|
||
| async with client.listen.v2.connect(model="flux-general-en", ...) as conn: | ||
| # same .on(...) handlers, then: | ||
| await conn.start_listening() | ||
| ``` | ||
|
|
||
| ## API reference (layered) | ||
|
|
||
| 1. **In-repo reference**: `reference.md` — "Listen V2 Connect". | ||
| 2. **AsyncAPI (WSS)**: https://developers.deepgram.com/asyncapi.yaml | ||
| 3. **Context7**: library ID `/llmstxt/developers_deepgram_llms_txt`. | ||
| 4. **Product docs**: | ||
| - https://developers.deepgram.com/reference/speech-to-text/listen-flux | ||
| - https://developers.deepgram.com/docs/flux/quickstart | ||
| - https://developers.deepgram.com/docs/flux/language-prompting | ||
|
|
||
| ## Gotchas | ||
|
|
||
| 1. **`/v2/listen`, not `/v1/listen`.** Different route, different client path (`listen.v2` vs `listen.v1`). | ||
| 2. **Flux models only.** `nova-3`, `base`, etc. will be rejected. Use `flux-general-en` or `flux-general-multi`. | ||
| 3. **No `language` parameter.** Language is set by model choice. Use `language_hint` on `flux-general-multi`. | ||
| 4. **`sample_rate` is a STRING** in the SDK (e.g. `"16000"`). | ||
| 5. **Send ~80ms audio chunks** for best turn-detection latency. | ||
| 6. **Close with `send_close_stream(ListenV2CloseStream(type="CloseStream"))`** — not `send_finalize` (that's v1). | ||
| 7. **Messages may arrive as typed objects OR raw dicts** — the SDK uses a tagged union with `construct_type` for unknowns. Handle both branches (see `socket_client.py` patch in `.fernignore`). | ||
| 8. **`socket_client.py` is patched / frozen** (see `.fernignore` → `src/deepgram/listen/v2/socket_client.py`). Don't overwrite that manual patch during regeneration; treat other `listen/v2` files as generated unless the regen workflow says otherwise. | ||
| 9. **Omit `encoding`/`sample_rate` for containerized audio** (WAV, OGG, etc.) — the server detects them from the container. | ||
|
|
||
| ## Example files in this repo | ||
|
|
||
| - `examples/14-transcription-live-websocket-v2.py` | ||
| - `tests/manual/listen/v2/connect/main.py` | ||
|
|
||
| ## Related skills | ||
|
|
||
| - `deepgram-python-speech-to-text` — v1 general-purpose STT (REST + WSS) | ||
| - `deepgram-python-voice-agent` — full interactive assistant | ||
|
|
||
| ## Central product skills | ||
|
|
||
| For cross-language Deepgram product knowledge — the consolidated API reference, documentation finder, focused runnable recipes, third-party integration examples, and MCP setup — install the central skills: | ||
|
|
||
| ```bash | ||
| npx skills add deepgram/skills | ||
| ``` | ||
|
|
||
| This SDK ships language-idiomatic code skills; `deepgram/skills` ships cross-language product knowledge (see `api`, `docs`, `recipes`, `examples`, `starters`, `setup-mcp`). |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.