Skip to content

Voice SDK: Updated ForceEndOfUtterance to support padding the timestamp#117

Closed
sam-s10s wants to merge 36 commits into
mainfrom
feat/va-rel
Closed

Voice SDK: Updated ForceEndOfUtterance to support padding the timestamp#117
sam-s10s wants to merge 36 commits into
mainfrom
feat/va-rel

Conversation

@sam-s10s

Copy link
Copy Markdown
Member

When using the timestamp attribute with ForceEndOfUtterance messages, users have reported that they see some short utterances / single words being missed in the transcript.

The extra attribute added to the finalize() function for the VoiceAgentClient now supports a pad argument (defaults to 0.2s) where the timestamp argument for the FEOU is padded.

Tests to follow.

sam-s10s and others added 30 commits February 11, 2026 13:23
Introduce an optional `ws_headers` parameter to the `connect` method in
`VoiceAgentClient`. This allows users to pass custom headers when
establishing the WebSocket connection to the Speechmatics API.
Refactor the EndOfTurnPenaltyItem logic to improve clarity and
functionality. Group related penalty items with descriptive comments
for better maintainability. Adjust penalties for situations with
Smart Turn and VAD to improve detection accuracy, including new
conditions for SMART_TURN_FALSE and ACTIVE combinations. This change
is necessary to fine-tune the configuration for complex speech
patterns and ensure better end-of-turn detection in the transcription
process.
Refactor the EndOfTurnPenaltyItem logic to improve clarity and
functionality. Group related penalty items with descriptive comments
for better maintainability. Adjust penalties for situations with
Smart Turn and VAD to improve detection accuracy, including new
conditions for SMART_TURN_FALSE and ACTIVE combinations. This change
is necessary to fine-tune the configuration for complex speech
patterns and ensure better end-of-turn detection in the transcription
process.
* Add No Signal Penalty for Smart Turn

* Update Penalty to Extend TTL
…chmatics-python-sdk into fix/smart-turn

# Conflicts:
#	sdk/voice/speechmatics/voice/_models.py
Introduce `test_no_feou_fix.py` to validate scenarios where
Fixed End Of Utterance (FEOU) is disabled. This test ensures
correct behavior when FEOU mode is set to FIXED in the
`VoiceAgentConfig`. Utilize additional vocabulary and message
logging for enhanced debugging. Skipped in CI to avoid
unnecessary API calls without a valid key.
Add `validate_config` method to `VoiceAgentConfig` to ensure
cross-field validation post-merging. This enhances the robustness
of configurations by checking for inconsistencies and errors,
such as ensuring valid combinations of end-of-utterance modes
and features like VAD, and sample rates.

Enhance preset functionality by validating merged configurations.
This ensures that custom configurations derived from presets are
validated before use, preventing runtime errors due to invalid
configurations. Drop use of `model_validator` for clearer
validation flow and improve error reporting by raising specific
exceptions for validation failures.
Set `use_forced_eou` to True in EndOfTurnConfig to ensure
correct behavior for utterance detection. Previously,
`use_forced_eou` was set to False, which could lead to
inaccurate turn-taking scenarios. Added validation in
`validate_config` to prevent setting `use_forced_eou` to
False, ensuring configurations remain consistent with
intended usage and avoiding potential run-time errors.
Remove redundant flags and streamline end-of-utterance (EOU) and
voice activity detection (VAD) handling in the VoiceAgentClient class.

Changes include:
- Rename confusing boolean flags to improve clarity.
- Simplify logic for determining when to listen to EOU messages.
- Remove unused code paths and clean up comments for better readability.
- Combine similar conditional logic to avoid duplicated checks.

These changes are intended to make the codebase more maintainable,
reduce potential for errors, and improve overall performance.
Remove the `use_forced_eou` setting from the `EndOfTurnConfig`
in several test files to simplify configurations. Forced
end-of-utterance must always be true (default), so removed.
…n VoiceAgentConfig

Remove the conditional validation logic for 'use_forced_eou'
within the 'VoiceAgentConfig.validate_config' method. This
logic was enforcing that 'EndOfTurnConfig.use_forced_eou'
cannot be False, which is no longer required. This change
streamlines the validation process, aligning it with updated
requirements, and ensuring clarity around utterance handling
configurations. Additionally, cleanup of imports of
'EndOfTurnConfig' in test files reflects this update.
Refactor turn management logic to ensure better handling
of forced End of Utterance (EOU) configurations. While FEOU
cannot be disabled in normal use, it can be disabled for
testing directly manipulating the config value:
`config.end_of_turn_config.use_forced_eou = False`
Introduce two new tests to validate header
handling in the STT client. `test_with_headers`
checks successful connection using valid headers,
while `test_with_corrupted_headers` ensures
connection failure with invalid header format.
Remove unnecessary boolean conversion for
'end_of_utterance_mode' check and update the
conditional logic for '_listen_to_eou_messages'.
This resolves a logical error that prevented
proper handling of 'fixed' end of utterance mode,
and ensures the client correctly listens or
doesn't listen to EOU messages based on
'_listen_to_eou_messages' state. These changes
enhance the processing of end of utterance
events, improving overall speech-to-text
functionality.
Extract the configuration setup into a separate `config` variable to improve readability and maintainability. Add debug print statements for configuration details to aid in debugging. Move client disconnect logic to the end of the test to ensure the connection is properly closed, improving resource management.
Change speechmatics-rt version specifier from a minimum
version requirement to an exact version pin (==0.5.3).
Add FFT-based resampling in SmartTurnDetector for non-16kHz
audio. Parameterise Silero VAD chunk/context sizes to handle
both 8kHz and 16kHz natively.

Refactor forced end-of-utterance control: replace the testing
flag with a declarative `_use_forced_eou` derived from config.
Defer audio format initialisation in AsyncClient until
start_session is called, and return the FEOU timestamp for
diagnostic logging.

Rename `_vad_evaluation` to `_speaker_start_stop_evaluation`
and remove unused `EndOfTurnConfig` from presets.
Disable smart turn cutoff skip that prevented
re-evaluation. Improve multiple speakers test with
accumulated error reporting and turn boundary tracking.
# Conflicts:
#	sdk/voice/speechmatics/voice/_client.py
# Conflicts:
#	sdk/rt/speechmatics/rt/_async_client.py
#	tests/voice/test_17_eou_feou.py
Allow callers to pass a `pad` value through finalize() to
_await_forced_eou(), replacing the fixed 0.02s timestamp padding.
Add forced_eou_padding config option (default 0.2s) to
EndOfTurnConfig and include timing info in the diagnostic message.
@sam-s10s sam-s10s closed this Jun 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants