Multi modal (speech + text) experience by pranavjoshi001 · Pull Request #5817 · microsoft/BotFramework-WebChat

pranavjoshi001 · 2026-04-29T15:56:44Z

Changelog Entry

Added multi-modal text + voice experience, in PR #5817, by @pranavjoshi

Multi-Modal Voice Experience

Summary

Adds support for multi-modal voice experience where DirectLine enables bi-directional WebSocket communication for speech-to-speech scenarios. When voice mode is enabled, outgoing activities are fire-and-forget with no echo back, enabling optimistic rendering.

Changes

Features

Voice Processing Sound: Plays an audio cue while voice state is 'processing' to provide user feedback
- New style options: voiceProcessingSound, voiceProcessingSoundLoop, voiceProcessingSoundVolume
- Configurable looping and volume control
Multi-Modal SendBox: Send button and text input are disabled while microphone is active (voice and text are mutually exclusive during recording)
Fire-and-Forget postActivity: When getIsVoiceModeEnabled() returns true, outgoing activities complete immediately without waiting for echo back

Core Changes

postActivitySaga.ts - Added fire-and-forget branch for voice mode
queueIncomingActivitySaga.ts - Enhanced handling for voice activities
VoiceProcessingSoundBridge.tsx - New component to manage voice processing audio cue
createDirectLine.ts - Exposed enableVoiceMode prop.

UI Updates

SendBox.tsx - Disables send button and makes text input read-only while recording
MicrophoneToolbarButton.tsx - Updated toggle behavior for voice mode
VoiceTranscriptActivityStatus - Styling refinements

Test Infrastructure

Existing test cases updated to align multi modal experience
New test: multimodal.text.with.voice.html - verifies text can be sent alongside voice mode

I have added tests and executed them locally
I have updated CHANGELOG.md
I have updated documentation

Review Checklist

This section is for contributors to review your work.

Accessibility reviewed (tab order, content readability, alt text, color contrast)
Browser and platform compatibilities reviewed
CSS styles reviewed (minimal rules, no z-index)
Documents reviewed (docs, samples, live demo)
Internationalization reviewed (strings, unit formatting)
package.json and package-lock.json reviewed
Security reviewed (no data URIs, check for nonce leak)
Tests reviewed (coverage, legitimacy)

Copilot

Pull request overview

This PR introduces a multi-modal (speech + text) Web Chat experience by enabling “voice mode” behavior (fire-and-forget / no echo-back), updating Fluent send box UX to allow mic + send coexistence, and adding voice-processing audio feedback and tests to cover mixed interaction flows.

Changes:

Add “voice mode” support across DirectLine creation and core sagas (optimistic outgoing activities; skip replyToId ordering wait in voice mode).
Update Fluent-theme send box, microphone iconography, and voice transcript status visuals for the multi-modal design.
Add a voice “processing” sound bridge + new style options, and expand HTML2 test coverage (including a new multimodal test + updated snapshots).

Reviewed changes

Copilot reviewed 26 out of 33 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
packages/test/page-object/src/globals/testHelpers/createDirectLineEmulator.js	Adds a voice-mode capability flag and changes emulator `postActivity` behavior for fire-and-forget mode.
packages/fluent-theme/src/components/sendBox/SendBox.tsx	Keeps send button visible alongside mic; disables text path while recording.
packages/fluent-theme/src/components/sendBox/MicrophoneToolbarButton.tsx	Updates mic icon variants by voice state (regular/filled/audio-playing).
packages/fluent-theme/src/components/icon/FluentIcon.module.css	Adds new mic icon masks (`microphone-regular` / `microphone-filled`).
packages/fluent-theme/src/components/activityStatus/VoiceTranscriptActivityStatus.tsx	Simplifies transcript status (timestamp + divider + icon) and removes agent label.
packages/fluent-theme/src/components/activityStatus/VoiceTranscriptActivityStatus.module.css	Adjusts transcript status spacing; adds icon styling.
packages/core/src/types/internal/WebChatOutgoingActivity.ts	Allows `timestamp?: string` on outgoing activities.
packages/core/src/sagas/queueIncomingActivitySaga.ts	Skips `replyToId` wait in voice mode.
packages/core/src/sagas/postActivitySaga.ts	Optimistic fulfillment in voice mode; stamps a timestamp for optimistic outgoing activities.
packages/bundle/src/createDirectLine.ts	Adds `enableVoiceMode` option passthrough to DirectLineJS.
packages/api/src/providers/SpeechToSpeech/private/VoiceProcessingSoundBridge.tsx	New bridge to play an audio cue during voice `'processing'` state.
packages/api/src/providers/SpeechToSpeech/SpeechToSpeechComposer.tsx	Mounts the new processing-sound bridge.
packages/api/src/providers/Capabilities/types/Capabilities.ts	Adds `isVoiceModeEnabled?: boolean
packages/api/src/providers/ActivityTyping/ActivityTypingComposer.tsx	Clears typing indicator on voice transcript activities.
packages/api/src/defaultStyleOptions.ts	Adds default processing sound + loop/volume defaults.
packages/api/src/StyleOptions.ts	Adds public style options for processing sound URL/loop/volume.
tests/html2/speechToSpeech/*.html (+ PNG snaps)	Updates existing tests for voice mode and adds a new multimodal text+voice interleaving test.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

multi modal experience

d916b23

Copilot AI review requested due to automatic review settings April 29, 2026 15:56

pranavjoshi001 requested review from a-b-r-o-w-n, compulim, cwhitten, srinaath and tdurnford as code owners April 29, 2026 15:56

Copilot started reviewing on behalf of pranavjoshi001 April 29, 2026 15:57 View session

Copilot AI reviewed Apr 29, 2026

View reviewed changes

pranavjoshi001 added 3 commits April 30, 2026 06:31

fix csp

cb23d8b

changelog entry

1ff7905

changlog correct name

b881612

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi modal (speech + text) experience#5817

Multi modal (speech + text) experience#5817
pranavjoshi001 wants to merge 4 commits intomicrosoft:mainfrom
pranavjoshi001:multi-modal-experience

pranavjoshi001 commented Apr 29, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pranavjoshi001 commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog Entry

Multi-Modal Voice Experience

Summary

Changes

Features

Core Changes

UI Updates

Test Infrastructure

Review Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pranavjoshi001 commented Apr 29, 2026 •

edited

Loading