Multi modal (speech + text) experience#5817
Open
pranavjoshi001 wants to merge 4 commits intomicrosoft:mainfrom
Open
Multi modal (speech + text) experience#5817pranavjoshi001 wants to merge 4 commits intomicrosoft:mainfrom
pranavjoshi001 wants to merge 4 commits intomicrosoft:mainfrom
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR introduces a multi-modal (speech + text) Web Chat experience by enabling “voice mode” behavior (fire-and-forget / no echo-back), updating Fluent send box UX to allow mic + send coexistence, and adding voice-processing audio feedback and tests to cover mixed interaction flows.
Changes:
- Add “voice mode” support across DirectLine creation and core sagas (optimistic outgoing activities; skip
replyToIdordering wait in voice mode). - Update Fluent-theme send box, microphone iconography, and voice transcript status visuals for the multi-modal design.
- Add a voice “processing” sound bridge + new style options, and expand HTML2 test coverage (including a new multimodal test + updated snapshots).
Reviewed changes
Copilot reviewed 26 out of 33 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| packages/test/page-object/src/globals/testHelpers/createDirectLineEmulator.js | Adds a voice-mode capability flag and changes emulator postActivity behavior for fire-and-forget mode. |
| packages/fluent-theme/src/components/sendBox/SendBox.tsx | Keeps send button visible alongside mic; disables text path while recording. |
| packages/fluent-theme/src/components/sendBox/MicrophoneToolbarButton.tsx | Updates mic icon variants by voice state (regular/filled/audio-playing). |
| packages/fluent-theme/src/components/icon/FluentIcon.module.css | Adds new mic icon masks (microphone-regular / microphone-filled). |
| packages/fluent-theme/src/components/activityStatus/VoiceTranscriptActivityStatus.tsx | Simplifies transcript status (timestamp + divider + icon) and removes agent label. |
| packages/fluent-theme/src/components/activityStatus/VoiceTranscriptActivityStatus.module.css | Adjusts transcript status spacing; adds icon styling. |
| packages/core/src/types/internal/WebChatOutgoingActivity.ts | Allows timestamp?: string on outgoing activities. |
| packages/core/src/sagas/queueIncomingActivitySaga.ts | Skips replyToId wait in voice mode. |
| packages/core/src/sagas/postActivitySaga.ts | Optimistic fulfillment in voice mode; stamps a timestamp for optimistic outgoing activities. |
| packages/bundle/src/createDirectLine.ts | Adds enableVoiceMode option passthrough to DirectLineJS. |
| packages/api/src/providers/SpeechToSpeech/private/VoiceProcessingSoundBridge.tsx | New bridge to play an audio cue during voice 'processing' state. |
| packages/api/src/providers/SpeechToSpeech/SpeechToSpeechComposer.tsx | Mounts the new processing-sound bridge. |
| packages/api/src/providers/Capabilities/types/Capabilities.ts | Adds `isVoiceModeEnabled?: boolean |
| packages/api/src/providers/ActivityTyping/ActivityTypingComposer.tsx | Clears typing indicator on voice transcript activities. |
| packages/api/src/defaultStyleOptions.ts | Adds default processing sound + loop/volume defaults. |
| packages/api/src/StyleOptions.ts | Adds public style options for processing sound URL/loop/volume. |
| tests/html2/speechToSpeech/*.html (+ PNG snaps) | Updates existing tests for voice mode and adds a new multimodal text+voice interleaving test. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changelog Entry
Multi-Modal Voice Experience
Summary
Adds support for multi-modal voice experience where DirectLine enables bi-directional WebSocket communication for speech-to-speech scenarios. When voice mode is enabled, outgoing activities are fire-and-forget with no echo back, enabling optimistic rendering.
Changes
Features
'processing'to provide user feedbackvoiceProcessingSound,voiceProcessingSoundLoop,voiceProcessingSoundVolumegetIsVoiceModeEnabled()returnstrue, outgoing activities complete immediately without waiting for echo backCore Changes
postActivitySaga.ts- Added fire-and-forget branch for voice modequeueIncomingActivitySaga.ts- Enhanced handling for voice activitiesVoiceProcessingSoundBridge.tsx- New component to manage voice processing audio cuecreateDirectLine.ts- ExposedenableVoiceModeprop.UI Updates
SendBox.tsx- Disables send button and makes text input read-only while recordingMicrophoneToolbarButton.tsx- Updated toggle behavior for voice modeVoiceTranscriptActivityStatus- Styling refinementsTest Infrastructure
multimodal.text.with.voice.html- verifies text can be sent alongside voice modeCHANGELOG.mdReview Checklist
z-index)package.jsonandpackage-lock.jsonreviewed