Environment
- Model:
gemini-live-2.5-flash-native-audio
- Endpoint: Vertex AI (
vertexai=True, project-based auth via ADC)
- SDK version:
google-genai v1.70.0
- Platform: Cloud Run (Python 3.11)
Configuration
config = types.LiveConnectConfig(
response_modalities=["AUDIO"],
input_audio_transcription=types.AudioTranscriptionConfig(),
output_audio_transcription=types.AudioTranscriptionConfig(),
# ... speech_config, tools, etc.
)
async with client.aio.live.connect(model="gemini-live-2.5-flash-native-audio", config=config) as session:
async for response in session.receive():
sc = response.server_content
# sc.output_transcription - works, text arrives reliably
# sc.input_transcription - NEVER populated, always None
Problem
input_audio_transcription=AudioTranscriptionConfig() is accepted in LiveConnectConfig without any error or warning, but the Vertex AI backend never emits input_transcription events in LiveServerContent.
output_transcription (model speech ? text): works correctly
input_transcription (user speech ? text): never arrives - zero events across hundreds of sessions over multiple days
- No error, no warning, no rejection of the config - it is silently swallowed
The SDK has the field (LiveServerContent.input_transcription: Optional[Transcription]), the config type exists (AudioTranscriptionConfig), the documentation describes it - but on Vertex AI the feature simply does not function.
Impact
We run a production full-duplex voice AI assistant on Vertex AI. We architected our turn management system around input_transcription events:
- User transcription display - users could not see their own speech in the UI
- Turn semaphore - our dead air detection relies on
input_transcription to call record_user_speech(). Since it never fires, the system cannot distinguish "user is silent" from "user is speaking"
- Cascade failure - broken turn detection triggered aggressive reconnection loops, wasting compute and degrading UX
We spent 8+ hours debugging what we initially thought was Gemini session instability, deploying four production hotfixes, before tracing it to this single missing feature. The fix was a one-line fallback to browser-side SpeechRecognition.
Expected behavior
If input_audio_transcription is set in config:
server_content.input_transcription should contain user speech text (same as output_transcription does for model speech)
If the feature is not supported on Vertex AI:
- The config should be rejected with a clear error, not silently accepted
Related issues
This has been reported multiple times with no clear resolution on Vertex AI:
Ask
- Is
input_audio_transcription actually supported on Vertex AI for gemini-live-2.5-flash-native-audio?
- If not, please reject the config or document the limitation clearly
- If it is supposed to work, what's the timeline for a fix?
Silent acceptance of config that gets ignored is the worst possible developer experience - especially for paying enterprise customers building production infrastructure on your platform.
Environment
gemini-live-2.5-flash-native-audiovertexai=True, project-based auth via ADC)google-genaiv1.70.0Configuration
Problem
input_audio_transcription=AudioTranscriptionConfig()is accepted inLiveConnectConfigwithout any error or warning, but the Vertex AI backend never emitsinput_transcriptionevents inLiveServerContent.output_transcription(model speech ? text): works correctlyinput_transcription(user speech ? text): never arrives - zero events across hundreds of sessions over multiple daysThe SDK has the field (
LiveServerContent.input_transcription: Optional[Transcription]), the config type exists (AudioTranscriptionConfig), the documentation describes it - but on Vertex AI the feature simply does not function.Impact
We run a production full-duplex voice AI assistant on Vertex AI. We architected our turn management system around
input_transcriptionevents:input_transcriptionto callrecord_user_speech(). Since it never fires, the system cannot distinguish "user is silent" from "user is speaking"We spent 8+ hours debugging what we initially thought was Gemini session instability, deploying four production hotfixes, before tracing it to this single missing feature. The fix was a one-line fallback to browser-side
SpeechRecognition.Expected behavior
If
input_audio_transcriptionis set in config:server_content.input_transcriptionshould contain user speech text (same asoutput_transcriptiondoes for model speech)If the feature is not supported on Vertex AI:
Related issues
This has been reported multiple times with no clear resolution on Vertex AI:
Live API: Transcription Fails with "Invalid Argument" (1007) or No Messages in Node.js SDK (@google/genai@^1.34.0`) js-genai#1212 - No transcription messages in Node.js SDKfinishedflag never updatesAsk
input_audio_transcriptionactually supported on Vertex AI forgemini-live-2.5-flash-native-audio?Silent acceptance of config that gets ignored is the worst possible developer experience - especially for paying enterprise customers building production infrastructure on your platform.