Skip to content

v2 Listen: TurnInfo messages deserialized as ListenV2Connected (fields lost) #693

@simonepalomba

Description

@simonepalomba

Summary

When using client.listen.v2.connect() with flux-general-multi, all incoming messages are deserialized as ListenV2Connected regardless of the actual type field. Semantic fields on TurnInfo messages (event, transcript, end_of_turn_confidence, audio_window_start/end, words, languages) are lost.

Environment

  • deepgram-sdk: 6.1.1 (also reproduced on 5.3.3)
  • Python: 3.12 (Rocky Linux 9)
  • Endpoint: wss://api.deepgram.com/v2/listen
  • Model: flux-general-multi

Expected

TurnInfo messages from the server should be deserialized as ListenV2TurnInfo with all fields accessible.

Actual

All messages arrive as ListenV2Connected. model_dump() returns only {type, request_id, sequence_id}. The type attribute correctly contains "TurnInfo", but the discriminator seems ignored during deserialization. No errors are raised.

Repro

import asyncio, os
from deepgram import AsyncDeepgramClient
from deepgram.core.events import EventType

async def main():
    client = AsyncDeepgramClient(api_key=os.environ["DEEPGRAM_API_KEY"])
    async with client.listen.v2.connect(
        model="flux-general-multi",
        encoding="linear16",
        sample_rate="16000",
        eot_threshold="0.7",
    ) as conn:
        def on_msg(msg):
            print(f"class={type(msg).__name__} type={getattr(msg,'type','?')} "
                  f"event={getattr(msg,'event',None)} "
                  f"transcript={getattr(msg,'transcript',None)}")

        conn.on(EventType.MESSAGE, on_msg)
        task = asyncio.create_task(conn.start_listening())
        await asyncio.sleep(0.2)

        # Send ~8s of italian speech as linear16 16kHz PCM
        with open("italian_sample.raw", "rb") as f:
            audio = f.read()
        chunk = 16000 * 2 * 80 // 1000
        for i in range(0, len(audio), chunk):
            await conn.send_media(audio[i:i+chunk])
            await asyncio.sleep(0.08)
        # Trailing silence to trigger EndOfTurn
        for _ in range(25):
            await conn.send_media(b"\x00" * chunk)
            await asyncio.sleep(0.08)
        await asyncio.sleep(2)

asyncio.run(main())

Sample output

class=ListenV2Connected type=Connected event=None transcript=None
class=ListenV2Connected type=TurnInfo event=None transcript=None
class=ListenV2Connected type=TurnInfo event=None transcript=None
class=ListenV2Connected type=TurnInfo event=None transcript=None
...

Expected class for the TurnInfo rows: ListenV2TurnInfo, with populated event/transcript/end_of_turn_confidence.

Likely cause

The v2 response union ListenV2SocketClientResponse appears to be discriminated on type, but deserialization via construct_type(..., skip_validation=True) seems to silently pick ListenV2Connected (the first member?) instead of the variant matching type.

Impact

flux-general-multi is unusable via the Python SDK — no transcripts can be extracted. Downgrading to 5.3.3 does not help (same behavior).

Workaround

Temporarily bypassing the SDK with a raw websockets client against /v2/listen and parsing JSON manually works. Happy to share a minimal working example if useful for regression testing.

Thank you

Thanks for your work on the SDK. flux-general-multi for Italian is a great product — once this is fixed it'll unblock a production voicebot deployment.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions