Vertex AI: input_audio_transcription silently ignored - zero input_transcription events emitted

## Environment

- **Model:** `gemini-live-2.5-flash-native-audio`
- **Endpoint:** Vertex AI (`vertexai=True`, project-based auth via ADC)
- **SDK version:** `google-genai` v1.70.0
- **Platform:** Cloud Run (Python 3.11)

## Configuration

```python
config = types.LiveConnectConfig(
    response_modalities=["AUDIO"],
    input_audio_transcription=types.AudioTranscriptionConfig(),
    output_audio_transcription=types.AudioTranscriptionConfig(),
    # ... speech_config, tools, etc.
)

async with client.aio.live.connect(model="gemini-live-2.5-flash-native-audio", config=config) as session:
    async for response in session.receive():
        sc = response.server_content
        # sc.output_transcription - works, text arrives reliably
        # sc.input_transcription - NEVER populated, always None
```

## Problem

`input_audio_transcription=AudioTranscriptionConfig()` is accepted in `LiveConnectConfig` without any error or warning, but the Vertex AI backend **never emits `input_transcription` events** in `LiveServerContent`.

- `output_transcription` (model speech ? text): **works correctly**
- `input_transcription` (user speech ? text): **never arrives** - zero events across hundreds of sessions over multiple days
- No error, no warning, no rejection of the config - it is silently swallowed

The SDK has the field (`LiveServerContent.input_transcription: Optional[Transcription]`), the config type exists (`AudioTranscriptionConfig`), the documentation describes it - but on Vertex AI the feature simply does not function.

## Impact

We run a production full-duplex voice AI assistant on Vertex AI. We architected our turn management system around `input_transcription` events:

1. **User transcription display** - users could not see their own speech in the UI
2. **Turn semaphore** - our dead air detection relies on `input_transcription` to call `record_user_speech()`. Since it never fires, the system cannot distinguish "user is silent" from "user is speaking"
3. **Cascade failure** - broken turn detection triggered aggressive reconnection loops, wasting compute and degrading UX

We spent **8+ hours** debugging what we initially thought was Gemini session instability, deploying four production hotfixes, before tracing it to this single missing feature. The fix was a one-line fallback to browser-side `SpeechRecognition`.

## Expected behavior

If `input_audio_transcription` is set in config:
- `server_content.input_transcription` should contain user speech text (same as `output_transcription` does for model speech)

If the feature is **not supported** on Vertex AI:
- The config should be **rejected** with a clear error, not silently accepted

## Related issues

This has been reported multiple times with no clear resolution on Vertex AI:

- #735 - "why inputAudioTranscription is not available?" (response: "only supported in Vertex" - but it doesn't work there either)
- #1279 - "Audio transcription not returned in response" on Vertex AI
- googleapis/js-genai#1212 - No transcription messages in Node.js SDK
- googleapis/js-genai#1429 - Transcription `finished` flag never updates

## Ask

1. Is `input_audio_transcription` actually supported on Vertex AI for `gemini-live-2.5-flash-native-audio`?
2. If not, please reject the config or document the limitation clearly
3. If it is supposed to work, what's the timeline for a fix?

Silent acceptance of config that gets ignored is the worst possible developer experience - especially for paying enterprise customers building production infrastructure on your platform.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vertex AI: input_audio_transcription silently ignored - zero input_transcription events emitted #2348

Environment

Configuration

Problem

Impact

Expected behavior

Related issues

Ask

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Vertex AI: input_audio_transcription silently ignored - zero input_transcription events emitted #2348

Description

Environment

Configuration

Problem

Impact

Expected behavior

Related issues

Ask

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions