Skip to content

Multi modal (speech + text) experience#5817

Open
pranavjoshi001 wants to merge 4 commits intomicrosoft:mainfrom
pranavjoshi001:multi-modal-experience
Open

Multi modal (speech + text) experience#5817
pranavjoshi001 wants to merge 4 commits intomicrosoft:mainfrom
pranavjoshi001:multi-modal-experience

Conversation

@pranavjoshi001
Copy link
Copy Markdown
Contributor

@pranavjoshi001 pranavjoshi001 commented Apr 29, 2026

Changelog Entry

Multi-Modal Voice Experience

Summary

Adds support for multi-modal voice experience where DirectLine enables bi-directional WebSocket communication for speech-to-speech scenarios. When voice mode is enabled, outgoing activities are fire-and-forget with no echo back, enabling optimistic rendering.

Changes

Features

  • Voice Processing Sound: Plays an audio cue while voice state is 'processing' to provide user feedback
    • New style options: voiceProcessingSound, voiceProcessingSoundLoop, voiceProcessingSoundVolume
    • Configurable looping and volume control
  • Multi-Modal SendBox: Send button and text input are disabled while microphone is active (voice and text are mutually exclusive during recording)
  • Fire-and-Forget postActivity: When getIsVoiceModeEnabled() returns true, outgoing activities complete immediately without waiting for echo back

Core Changes

  • postActivitySaga.ts - Added fire-and-forget branch for voice mode
  • queueIncomingActivitySaga.ts - Enhanced handling for voice activities
  • VoiceProcessingSoundBridge.tsx - New component to manage voice processing audio cue
  • createDirectLine.ts - Exposed enableVoiceMode prop.

UI Updates

  • SendBox.tsx - Disables send button and makes text input read-only while recording
  • MicrophoneToolbarButton.tsx - Updated toggle behavior for voice mode
  • VoiceTranscriptActivityStatus - Styling refinements

Test Infrastructure

  • Existing test cases updated to align multi modal experience
  • New test: multimodal.text.with.voice.html - verifies text can be sent alongside voice mode
  • I have added tests and executed them locally
  • I have updated CHANGELOG.md
  • I have updated documentation

Review Checklist

This section is for contributors to review your work.

  • Accessibility reviewed (tab order, content readability, alt text, color contrast)
  • Browser and platform compatibilities reviewed
  • CSS styles reviewed (minimal rules, no z-index)
  • Documents reviewed (docs, samples, live demo)
  • Internationalization reviewed (strings, unit formatting)
  • package.json and package-lock.json reviewed
  • Security reviewed (no data URIs, check for nonce leak)
  • Tests reviewed (coverage, legitimacy)

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a multi-modal (speech + text) Web Chat experience by enabling “voice mode” behavior (fire-and-forget / no echo-back), updating Fluent send box UX to allow mic + send coexistence, and adding voice-processing audio feedback and tests to cover mixed interaction flows.

Changes:

  • Add “voice mode” support across DirectLine creation and core sagas (optimistic outgoing activities; skip replyToId ordering wait in voice mode).
  • Update Fluent-theme send box, microphone iconography, and voice transcript status visuals for the multi-modal design.
  • Add a voice “processing” sound bridge + new style options, and expand HTML2 test coverage (including a new multimodal test + updated snapshots).

Reviewed changes

Copilot reviewed 26 out of 33 changed files in this pull request and generated no comments.

Show a summary per file
File Description
packages/test/page-object/src/globals/testHelpers/createDirectLineEmulator.js Adds a voice-mode capability flag and changes emulator postActivity behavior for fire-and-forget mode.
packages/fluent-theme/src/components/sendBox/SendBox.tsx Keeps send button visible alongside mic; disables text path while recording.
packages/fluent-theme/src/components/sendBox/MicrophoneToolbarButton.tsx Updates mic icon variants by voice state (regular/filled/audio-playing).
packages/fluent-theme/src/components/icon/FluentIcon.module.css Adds new mic icon masks (microphone-regular / microphone-filled).
packages/fluent-theme/src/components/activityStatus/VoiceTranscriptActivityStatus.tsx Simplifies transcript status (timestamp + divider + icon) and removes agent label.
packages/fluent-theme/src/components/activityStatus/VoiceTranscriptActivityStatus.module.css Adjusts transcript status spacing; adds icon styling.
packages/core/src/types/internal/WebChatOutgoingActivity.ts Allows timestamp?: string on outgoing activities.
packages/core/src/sagas/queueIncomingActivitySaga.ts Skips replyToId wait in voice mode.
packages/core/src/sagas/postActivitySaga.ts Optimistic fulfillment in voice mode; stamps a timestamp for optimistic outgoing activities.
packages/bundle/src/createDirectLine.ts Adds enableVoiceMode option passthrough to DirectLineJS.
packages/api/src/providers/SpeechToSpeech/private/VoiceProcessingSoundBridge.tsx New bridge to play an audio cue during voice 'processing' state.
packages/api/src/providers/SpeechToSpeech/SpeechToSpeechComposer.tsx Mounts the new processing-sound bridge.
packages/api/src/providers/Capabilities/types/Capabilities.ts Adds `isVoiceModeEnabled?: boolean
packages/api/src/providers/ActivityTyping/ActivityTypingComposer.tsx Clears typing indicator on voice transcript activities.
packages/api/src/defaultStyleOptions.ts Adds default processing sound + loop/volume defaults.
packages/api/src/StyleOptions.ts Adds public style options for processing sound URL/loop/volume.
tests/html2/speechToSpeech/*.html (+ PNG snaps) Updates existing tests for voice mode and adds a new multimodal text+voice interleaving test.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants