get webrtc adm into rust#1037
Conversation
…o a room and thus failing the audio mode switching
…different local audio tracks
ChangesetThe following package versions will be affected by this PR:
|
faf99f6 to
0ba69b0
Compare
…hub.com/livekit/rust-sdks into sxian/CLT-2765/bring-webrtc-adm-to-rust
…hub.com/livekit/rust-sdks into sxian/CLT-2765/bring-webrtc-adm-to-rust
…hub.com/livekit/rust-sdks into sxian/CLT-2765/bring-webrtc-adm-to-rust
| - Text-to-speech (TTS) audio | ||
| - Audio from files or network streams | ||
| - Testing without audio hardware | ||
|
|
There was a problem hiding this comment.
This is the original audio input right? Existing Unity clients who want to keep the "Unity" style microphone management would also use this.
|
|
||
| ### Hybrid Approach | ||
|
|
||
| You can combine both approaches - use `PlatformAudio` for automatic speaker playback while also creating `NativeAudioStream` for audio processing/analysis: |
There was a problem hiding this comment.
Is this also possible from a Unity client? As we discussed, for the lip sync animation Unity clients might want read access to the audio data, but still want output through the platform audio.
There was a problem hiding this comment.
Yes, that is possible with Unity client for such lip sync animation + mic / speaker audio via PlatformAudio
| // Set recording device | ||
| message SetRecordingDeviceRequest { | ||
| uint64 platform_audio_handle = 1; | ||
| uint32 index = 2; | ||
| } | ||
|
|
||
| message SetRecordingDeviceResponse { | ||
| optional string error = 1; | ||
| } | ||
|
|
||
| // Set playout device | ||
| message SetPlayoutDeviceRequest { | ||
| uint64 platform_audio_handle = 1; | ||
| uint32 index = 2; | ||
| } |
There was a problem hiding this comment.
How does it handle switching the device at runtime?
There was a problem hiding this comment.
webrtc handles the runtime device switching automatically.
If the current device is removed during a call, it will automatically switch over to "default device", and the call should still be on-going.
| **Suitable for:** | ||
| - Server-side agents | ||
| - Text-to-speech (TTS) audio | ||
| - Audio from files or network streams |
There was a problem hiding this comment.
Or screen share audio right?
|
Hi @ladvoc , I think I addressed your comments, PTAL. |
ladvoc
left a comment
There was a problem hiding this comment.
This is looking great, just a few more comments.
| /// ``` | ||
| /// | ||
| /// [`recording_device_guid`]: Self::recording_device_guid | ||
| pub fn set_recording_device_by_guid(&self, guid: &str) -> AudioResult<()> { |
There was a problem hiding this comment.
issue: It looks like these methods that don't accept new types have been left here for FFI support. A better pattern would be to remove them and add a constructor on the new types to initialize from the raw value (explicitly marked as unchecked, only to be used for FFI):
impl PlayoutDeviceId {
#[doc(hidden)]
pub fn from_unchcked_guid(guid: &str) → Self {
Self(guid.to_string())
}
}Then FFI can still construct as needed and use the existing set/switch methods:
match ffi_audio.audio.set_recording_device(PlayoutDeviceId::from_unchecked_guid(&req.device_id)) {
// ...
}| } | ||
|
|
||
| // ========================================================================= | ||
| // Audio Processing (AEC, AGC, NS) |
There was a problem hiding this comment.
nitpick: Prefer grouping related methods into separate impl blocks. impl blocks can also have doc comments attached to them, and each block will get it's own section in the generated docs to make it easier for developers to find things. Preview generated docs locally with cargo doc --no-deps from within the livekit crate. Example:
/// Audio Processing (AEC, AGC, NS)
impl PlatformAudio {
// ...
}| /// // Reset handle references | ||
| /// reset_platform_audio(); | ||
| /// ``` | ||
| pub fn reset_platform_audio() { |
There was a problem hiding this comment.
suggestion (non-blocking): Maybe make this an associated function (no &self) on PlatformAudio to avoid the extra import for the user:
impl PlatformAudio {
pub fn reset() {
// ...
}
}User can invoke with PlatformAudio::reset().
|
Last thing I just noticed: |
| std::shared_ptr<RtcRuntime> rtc_runtime() const { return rtc_runtime_; } | ||
|
|
||
| // Device enumeration | ||
| int16_t playout_devices() const; |
There was a problem hiding this comment.
Should we move the following method to the audio_device.rs file? And add a PeerConnectionFactory::audio_device() -> SharedPtr<AudioDevice> pointing to (AudioDevice/AdmProxy)?
This isn't a strong requirement, but it will make the webrtc-sys codebase look clearer.
There was a problem hiding this comment.
hmm, I got where this comment comes from.
I refactor it into a audio_device_controller.cpp and audio_device_controller.r, and move those device handling code to such audio_device_controller so that the peer connection factory will be cleaner.
It is a larger refactoring than what I initially expected it to be.
The layers now look like this:
- PlatformAudio
- LkRuntime
- libwebrtc::PeerConnectionFactory
- webrtc-sys::PeerConnectionFactory
- webrtc-sys::AudioDeviceController
- AdmProxy
What each layer does:
- PlatformAudio
- public Rust API for device enumeration, selection, switching, and platform audio lifecycle
- LkRuntime
- internal LiveKit runtime wrapper
- forwards platform-audio operations to libwebrtc
- libwebrtc::PeerConnectionFactory
- Rust wrapper over the native bridge
- still owns the main sys factory handle
- now also caches one sys audio-device controller handle
- webrtc-sys::PeerConnectionFactory
- native C++ factory object
- owns the real shared audio-device state via AdmProxy
- creates and caches one AudioDeviceController
- webrtc-sys::AudioDeviceController
- focused facade for audio-device/ADM operations only
- no independent audio state of its own
- AdmProxy
- the actual implementation behind device enumeration, selection, playout/recording control, and platform ADM activation
How AdmProxy is shared:
- webrtc-sys::PeerConnectionFactory constructs AdmProxy once in its constructor
- it stores that in adm_proxy_
- it also constructs one AudioDeviceController with the same adm_proxy_
- both hold references to the same AdmProxy
So the ownership/reference picture is:
- PeerConnectionFactory
- owns adm_proxy_
- owns cached audio_device_
- AudioDeviceController
- holds a scoped_refptr
| // Create synthetic ADM for synthetic mode. | ||
| // This pumps the WebRTC audio pipeline without platform audio, | ||
| // allowing FFI callbacks to receive decoded remote audio. | ||
| synthetic_adm_ = webrtc::make_ref_counted<AudioDevice>(env_); |
There was a problem hiding this comment.
Should we rename AudioDevice to DummyAudioDevice/dummy_audio_device.cpp? (maybe SyntheticAudioDevice)
There was a problem hiding this comment.
Good idea, SyntheticAudioDevice is clearer than AudioDevice.
Done with renaming it.
Done with adding livekit-ffi to the changeset and also added |
62cc378 to
5ae4d00
Compare
| @@ -136,9 +136,19 @@ export declare enum AudioStreamType { | |||
| */ | |||
| export declare enum AudioSourceType { | |||
There was a problem hiding this comment.
@ladvoc , I change the audio_frame proto in this PR, should I commit this pb.d.ts change? not sure why the node proto code is in the rust SDK
There was a problem hiding this comment.
@lukasIO moved the Node bindings to this repo a month or two ago (I don't remember the exact reasons). But yes, the generated Node bindings get committed. However, they are automatically generated by CI so you shouldn't have to worry about generating/committing manually.
| /// ``` | ||
| /// | ||
| /// [`recording_device_guid`]: Self::recording_device_guid | ||
| pub fn set_recording_device_by_guid(&self, guid: &str) -> AudioResult<()> { |
There was a problem hiding this comment.
Good idea, done with addressing the comment.
| // Create synthetic ADM for synthetic mode. | ||
| // This pumps the WebRTC audio pipeline without platform audio, | ||
| // allowing FFI callbacks to receive decoded remote audio. | ||
| synthetic_adm_ = webrtc::make_ref_counted<AudioDevice>(env_); |
There was a problem hiding this comment.
Good idea, SyntheticAudioDevice is clearer than AudioDevice.
Done with renaming it.
| std::shared_ptr<RtcRuntime> rtc_runtime() const { return rtc_runtime_; } | ||
|
|
||
| // Device enumeration | ||
| int16_t playout_devices() const; |
There was a problem hiding this comment.
hmm, I got where this comment comes from.
I refactor it into a audio_device_controller.cpp and audio_device_controller.r, and move those device handling code to such audio_device_controller so that the peer connection factory will be cleaner.
It is a larger refactoring than what I initially expected it to be.
The layers now look like this:
- PlatformAudio
- LkRuntime
- libwebrtc::PeerConnectionFactory
- webrtc-sys::PeerConnectionFactory
- webrtc-sys::AudioDeviceController
- AdmProxy
What each layer does:
- PlatformAudio
- public Rust API for device enumeration, selection, switching, and platform audio lifecycle
- LkRuntime
- internal LiveKit runtime wrapper
- forwards platform-audio operations to libwebrtc
- libwebrtc::PeerConnectionFactory
- Rust wrapper over the native bridge
- still owns the main sys factory handle
- now also caches one sys audio-device controller handle
- webrtc-sys::PeerConnectionFactory
- native C++ factory object
- owns the real shared audio-device state via AdmProxy
- creates and caches one AudioDeviceController
- webrtc-sys::AudioDeviceController
- focused facade for audio-device/ADM operations only
- no independent audio state of its own
- AdmProxy
- the actual implementation behind device enumeration, selection, playout/recording control, and platform ADM activation
How AdmProxy is shared:
- webrtc-sys::PeerConnectionFactory constructs AdmProxy once in its constructor
- it stores that in adm_proxy_
- it also constructs one AudioDeviceController with the same adm_proxy_
- both hold references to the same AdmProxy
So the ownership/reference picture is:
- PeerConnectionFactory
- owns adm_proxy_
- owns cached audio_device_
- AudioDeviceController
- holds a scoped_refptr
|
Landing now |
> [!IMPORTANT] > Merging this pull request will create these releases # livekit 0.7.40 (2026-05-14) ## Fixes - feat: add scalability mode for AV1/VP9. - #1076 (@cloudwebrtc) - Add `LIVEKIT_PREFERRED_HW_ENCODER` to prefer `nvenc` or `vaapi` hardware video encoding when both are available. - Relocate unrelated types out of `livekit-protocol` ### Get WebRTC ADM into Rust - #1037 (@xianshijing-lk) This PR introduces platform audio device management via WebRTC's Audio Device Module (ADM). ### Features - **ADM Proxy**: New `AdmProxy` class that switches between Dummy ADM (synthetic mode) and Platform ADM (real audio I/O) - **PlatformAudio API**: High-level Rust API for microphone capture and speaker playout with AEC/AGC/NS - **Device enumeration**: List and select recording/playout devices by index or GUID - **Mode switching**: Seamlessly switch between synthetic mode (FFI callbacks) and platform mode (native speakers) while audio is active - **FFI platform audio support**: Expose platform audio device enumeration and selection through `livekit-ffi` - **Audio processing**: Configure echo cancellation, noise suppression, and auto gain control with platform-specific defaults (hardware on iOS, software elsewhere) ### Audio Modes | Mode | Recording | Playout | Use Case | |------|-----------|---------|----------| | Synthetic | NativeAudioSource | Dummy ADM + FFI | Unity audio, agents | | Platform | Platform ADM mic | Platform ADM speakers | VoIP with AEC | ### API ```rust // Create PlatformAudio for microphone/speaker access let audio = PlatformAudio::new()?; // Enumerate and select devices for i in 0..audio.recording_devices() as u16 { println!("Mic {}: {}", i, audio.recording_device_name(i)); } audio.set_recording_device(0)?; // Create audio track for publishing let track = LocalAudioTrack::create_audio_track("mic", audio.rtc_source()); ``` # livekit-api 0.4.22 (2026-05-14) ## Fixes - Proper client SDK - #1081 (@stephen-derosa) # livekit-protocol 0.7.7 (2026-05-14) ## Features - Relocate unrelated types out of `livekit-protocol` # livekit-ffi 0.12.57 (2026-05-14) ## Fixes - feat: add scalability mode for AV1/VP9. - #1076 (@cloudwebrtc) - Add `LIVEKIT_PREFERRED_HW_ENCODER` to prefer `nvenc` or `vaapi` hardware video encoding when both are available. - Reword audio filter logs to be less confusing - #1092 (@1egoman) ### Get WebRTC ADM into Rust - #1037 (@xianshijing-lk) This PR introduces platform audio device management via WebRTC's Audio Device Module (ADM). ### Features - **ADM Proxy**: New `AdmProxy` class that switches between Dummy ADM (synthetic mode) and Platform ADM (real audio I/O) - **PlatformAudio API**: High-level Rust API for microphone capture and speaker playout with AEC/AGC/NS - **Device enumeration**: List and select recording/playout devices by index or GUID - **Mode switching**: Seamlessly switch between synthetic mode (FFI callbacks) and platform mode (native speakers) while audio is active - **FFI platform audio support**: Expose platform audio device enumeration and selection through `livekit-ffi` - **Audio processing**: Configure echo cancellation, noise suppression, and auto gain control with platform-specific defaults (hardware on iOS, software elsewhere) ### Audio Modes | Mode | Recording | Playout | Use Case | |------|-----------|---------|----------| | Synthetic | NativeAudioSource | Dummy ADM + FFI | Unity audio, agents | | Platform | Platform ADM mic | Platform ADM speakers | VoIP with AEC | ### API ```rust // Create PlatformAudio for microphone/speaker access let audio = PlatformAudio::new()?; // Enumerate and select devices for i in 0..audio.recording_devices() as u16 { println!("Mic {}: {}", i, audio.recording_device_name(i)); } audio.set_recording_device(0)?; // Create audio track for publishing let track = LocalAudioTrack::create_audio_track("mic", audio.rtc_source()); ``` # libwebrtc 0.3.33 (2026-05-14) ## Fixes - feat: add scalability mode for AV1/VP9. - #1076 (@cloudwebrtc) - Add `LIVEKIT_PREFERRED_HW_ENCODER` to prefer `nvenc` or `vaapi` hardware video encoding when both are available. - Relocate unrelated types out of `livekit-protocol` ### Get WebRTC ADM into Rust - #1037 (@xianshijing-lk) This PR introduces platform audio device management via WebRTC's Audio Device Module (ADM). ### Features - **ADM Proxy**: New `AdmProxy` class that switches between Dummy ADM (synthetic mode) and Platform ADM (real audio I/O) - **PlatformAudio API**: High-level Rust API for microphone capture and speaker playout with AEC/AGC/NS - **Device enumeration**: List and select recording/playout devices by index or GUID - **Mode switching**: Seamlessly switch between synthetic mode (FFI callbacks) and platform mode (native speakers) while audio is active - **FFI platform audio support**: Expose platform audio device enumeration and selection through `livekit-ffi` - **Audio processing**: Configure echo cancellation, noise suppression, and auto gain control with platform-specific defaults (hardware on iOS, software elsewhere) ### Audio Modes | Mode | Recording | Playout | Use Case | |------|-----------|---------|----------| | Synthetic | NativeAudioSource | Dummy ADM + FFI | Unity audio, agents | | Platform | Platform ADM mic | Platform ADM speakers | VoIP with AEC | ### API ```rust // Create PlatformAudio for microphone/speaker access let audio = PlatformAudio::new()?; // Enumerate and select devices for i in 0..audio.recording_devices() as u16 { println!("Mic {}: {}", i, audio.recording_device_name(i)); } audio.set_recording_device(0)?; // Create audio track for publishing let track = LocalAudioTrack::create_audio_track("mic", audio.rtc_source()); ``` # webrtc-sys 0.3.31 (2026-05-14) ## Fixes - chore: bump libwebrtc version to webrtc-51ef663 - Add `LIVEKIT_PREFERRED_HW_ENCODER` to prefer `nvenc` or `vaapi` hardware video encoding when both are available. ### fix: fix LICENSE.md generation in webrtc build scripts - Add fix_license_json_parsing.patch to handle GN warnings in JSON output - Enable add_licenses.patch for iOS and Android builds (was commented out) - Restore LICENSE.md copy in iOS build script (regression from #1053) The license generation script was failing because `gn desc --format=json` outputs warnings before the JSON when certain build args trigger deprecation notices. The new patch strips non-JSON content before parsing. ### Get WebRTC ADM into Rust - #1037 (@xianshijing-lk) This PR introduces platform audio device management via WebRTC's Audio Device Module (ADM). ### Features - **ADM Proxy**: New `AdmProxy` class that switches between Dummy ADM (synthetic mode) and Platform ADM (real audio I/O) - **PlatformAudio API**: High-level Rust API for microphone capture and speaker playout with AEC/AGC/NS - **Device enumeration**: List and select recording/playout devices by index or GUID - **Mode switching**: Seamlessly switch between synthetic mode (FFI callbacks) and platform mode (native speakers) while audio is active - **FFI platform audio support**: Expose platform audio device enumeration and selection through `livekit-ffi` - **Audio processing**: Configure echo cancellation, noise suppression, and auto gain control with platform-specific defaults (hardware on iOS, software elsewhere) ### Audio Modes | Mode | Recording | Playout | Use Case | |------|-----------|---------|----------| | Synthetic | NativeAudioSource | Dummy ADM + FFI | Unity audio, agents | | Platform | Platform ADM mic | Platform ADM speakers | VoIP with AEC | ### API ```rust // Create PlatformAudio for microphone/speaker access let audio = PlatformAudio::new()?; // Enumerate and select devices for i in 0..audio.recording_devices() as u16 { println!("Mic {}: {}", i, audio.recording_device_name(i)); } audio.set_recording_device(0)?; // Create audio track for publishing let track = LocalAudioTrack::create_audio_track("mic", audio.rtc_source()); ``` Co-authored-by: knope-bot[bot] <152252888+knope-bot[bot]@users.noreply.github.com>
Summary
This PR implements Platform Audio support for the LiveKit Rust SDK, enabling WebRTC's built-in audio device handling with microphone capture and speaker playout. The implementation introduces a handle-based PlatformAudio API that coexists with the existing NativeAudioSource for manual audio pushing.
Key Features
Design Document
See docs/ADM_PROXY_DESIGN.md for full architecture details including:
API Overview
use livekit::prelude::*;
// Create PlatformAudio instance (enables ADM recording)
let audio = PlatformAudio::new()?;
// Enumerate and select devices
for i in 0..audio.recording_devices() as u16 {
println!("Mic [{}]: {}", i, audio.recording_device_name(i));
}
audio.set_recording_device(0)?;
// Connect and publish
let (room, _) = Room::connect(&url, &token, RoomOptions::default()).await?;
let track = LocalAudioTrack::create_audio_track("mic", audio.rtc_source());
room.local_participant().publish_track(LocalTrack::Audio(track), opts).await?;
// Cleanup - just drop the handle
room.close().await?;
drop(audio); // ADM recording disabled when all handles released
Testing
Run Standalone Tests (no LiveKit server required)
Set custom WebRTC build path
export LK_CUSTOM_WEBRTC="/path/to/webrtc-sys/libwebrtc/mac-arm64-debug"
Run standalone PlatformAudio tests
cargo test -p livekit --test platform_audio_test test_platform_audio_standalone -- --nocapture
Run FFI request handler tests
cargo test -p livekit-ffi requests::tests -- --nocapture
Run E2E Integration Tests (requires LiveKit server)
Start a local LiveKit server first, then:
LIVEKIT_URL=ws://localhost:7880
LIVEKIT_API_KEY=devkey
LIVEKIT_API_SECRET=secret
cargo test -p livekit --test platform_audio_test --features __lk-e2e-test -- --nocapture
Test Coverage
Category │ Tests │ Description │
Standalone - Creation │ 1 │ PlatformAudio creation, device enumeration
Standalone - Ref Counting │ 1 │ Clone, sharing, drop behavior
Standalone - Device Selection │ 1 │ Set devices, invalid index handling
Standalone - Processing │ 1 │ AEC/AGC/NS configuration, hardware availability
Standalone - Reset │ 1 │ reset_platform_audio() function
Standalone - Lifecycle │ 1 │ Full create→configure→use→release cycle
FFI - Handlers │ 6 │ NewPlatformAudio, GetDevices, SetDevice, handle lifecycle
E2E - Room Connection │ 4+ │ Platform audio with room, two participants, device switching
All tests handle missing audio devices gracefully (CI-friendly).
Run the Example
List Audio Devices
cargo run -p basic_room -- --list-devices
Connect with Platform Audio (microphone capture)
LIVEKIT_URL=wss://your-server.livekit.cloud
LIVEKIT_API_KEY=your-key
LIVEKIT_API_SECRET=your-secret
cargo run -p basic_room -- --platform-audio
Connect with File Audio
cargo run -p basic_room -- --file path/to/audio.raw
Connect with Both Platform Audio and File
cargo run -p basic_room -- --platform-audio-and-file path/to/audio.raw
WebRTC Build Requirements
The external_audio_source.patch must be applied to WebRTC. The patch is automatically applied by all platform build scripts:
For local development, set LK_CUSTOM_WEBRTC to point to your patched WebRTC build.
Known Limitations
Process-global │ Audio configuration affects all rooms in the process
Device indices │ May change on hot-plug; match by name for persistence
Single device track │ One device audio track per ADM (use NativeAudioSource for additional streams)