Summary
When using client.listen.v2.connect() with flux-general-multi, all incoming messages are deserialized as ListenV2Connected regardless of the actual type field. Semantic fields on TurnInfo messages (event, transcript, end_of_turn_confidence, audio_window_start/end, words, languages) are lost.
Environment
deepgram-sdk: 6.1.1 (also reproduced on 5.3.3)
- Python: 3.12 (Rocky Linux 9)
- Endpoint:
wss://api.deepgram.com/v2/listen
- Model:
flux-general-multi
Expected
TurnInfo messages from the server should be deserialized as ListenV2TurnInfo with all fields accessible.
Actual
All messages arrive as ListenV2Connected. model_dump() returns only {type, request_id, sequence_id}. The type attribute correctly contains "TurnInfo", but the discriminator seems ignored during deserialization. No errors are raised.
Repro
import asyncio, os
from deepgram import AsyncDeepgramClient
from deepgram.core.events import EventType
async def main():
client = AsyncDeepgramClient(api_key=os.environ["DEEPGRAM_API_KEY"])
async with client.listen.v2.connect(
model="flux-general-multi",
encoding="linear16",
sample_rate="16000",
eot_threshold="0.7",
) as conn:
def on_msg(msg):
print(f"class={type(msg).__name__} type={getattr(msg,'type','?')} "
f"event={getattr(msg,'event',None)} "
f"transcript={getattr(msg,'transcript',None)}")
conn.on(EventType.MESSAGE, on_msg)
task = asyncio.create_task(conn.start_listening())
await asyncio.sleep(0.2)
# Send ~8s of italian speech as linear16 16kHz PCM
with open("italian_sample.raw", "rb") as f:
audio = f.read()
chunk = 16000 * 2 * 80 // 1000
for i in range(0, len(audio), chunk):
await conn.send_media(audio[i:i+chunk])
await asyncio.sleep(0.08)
# Trailing silence to trigger EndOfTurn
for _ in range(25):
await conn.send_media(b"\x00" * chunk)
await asyncio.sleep(0.08)
await asyncio.sleep(2)
asyncio.run(main())
Sample output
class=ListenV2Connected type=Connected event=None transcript=None
class=ListenV2Connected type=TurnInfo event=None transcript=None
class=ListenV2Connected type=TurnInfo event=None transcript=None
class=ListenV2Connected type=TurnInfo event=None transcript=None
...
Expected class for the TurnInfo rows: ListenV2TurnInfo, with populated event/transcript/end_of_turn_confidence.
Likely cause
The v2 response union ListenV2SocketClientResponse appears to be discriminated on type, but deserialization via construct_type(..., skip_validation=True) seems to silently pick ListenV2Connected (the first member?) instead of the variant matching type.
Impact
flux-general-multi is unusable via the Python SDK — no transcripts can be extracted. Downgrading to 5.3.3 does not help (same behavior).
Workaround
Temporarily bypassing the SDK with a raw websockets client against /v2/listen and parsing JSON manually works. Happy to share a minimal working example if useful for regression testing.
Thank you
Thanks for your work on the SDK. flux-general-multi for Italian is a great product — once this is fixed it'll unblock a production voicebot deployment.
Summary
When using
client.listen.v2.connect()withflux-general-multi, all incoming messages are deserialized asListenV2Connectedregardless of the actualtypefield. Semantic fields onTurnInfomessages (event,transcript,end_of_turn_confidence,audio_window_start/end,words,languages) are lost.Environment
deepgram-sdk: 6.1.1 (also reproduced on 5.3.3)wss://api.deepgram.com/v2/listenflux-general-multiExpected
TurnInfomessages from the server should be deserialized asListenV2TurnInfowith all fields accessible.Actual
All messages arrive as
ListenV2Connected.model_dump()returns only{type, request_id, sequence_id}. Thetypeattribute correctly contains"TurnInfo", but the discriminator seems ignored during deserialization. No errors are raised.Repro
Sample output
class=ListenV2Connected type=Connected event=None transcript=None
class=ListenV2Connected type=TurnInfo event=None transcript=None
class=ListenV2Connected type=TurnInfo event=None transcript=None
class=ListenV2Connected type=TurnInfo event=None transcript=None
...
Expected class for the
TurnInforows:ListenV2TurnInfo, with populatedevent/transcript/end_of_turn_confidence.Likely cause
The v2 response union
ListenV2SocketClientResponseappears to be discriminated ontype, but deserialization viaconstruct_type(..., skip_validation=True)seems to silently pickListenV2Connected(the first member?) instead of the variant matchingtype.Impact
flux-general-multiis unusable via the Python SDK — no transcripts can be extracted. Downgrading to 5.3.3 does not help (same behavior).Workaround
Temporarily bypassing the SDK with a raw
websocketsclient against/v2/listenand parsing JSON manually works. Happy to share a minimal working example if useful for regression testing.Thank you
Thanks for your work on the SDK.
flux-general-multifor Italian is a great product — once this is fixed it'll unblock a production voicebot deployment.