Skip to content

Diarization benchmark reports 80.8% DER on AMI ES2004a (target <30%) #752

Description

@Alex-Wengg

Summary

The automated diarization benchmark on PR #744 reports a DER of 80.8% and JER of 80.9% on AMI ES2004a — roughly 2.7× worse than the <30% DER target and far outside the 18–30% research baseline. A ~80% DER indicates the diarizer is essentially failing (near chance-level speaker assignment), not just degraded.

Benchmark comment: #744 (comment)

Reported results (AMI Corpus ES2004a, 1049.0s audio)

Metric Value Target Status
DER 80.8% <30% ⚠️
JER 80.9% <25% ⚠️
RTFx 29.91x >1.0x

Pipeline timing looked nominal (Segmentation 10.5s, Embedding 17.5s, Clustering 7.0s), so this is an accuracy failure rather than a crash or timeout.

Notes

  • PR fix(asr/nemotron): native-Swift mel front-end to fix iPadOS cold-start zero output (#739) #744 itself only touches the Nemotron ASR front-end (native-Swift mel), so the diarization regression is very likely pre-existing / unrelated to that PR's changes and is being surfaced by the CI benchmark that runs on every PR.
  • Need to confirm whether this reproduces on main (i.e. is it a real regression in the diarization pipeline, or a benchmark/CI harness issue — e.g. wrong reference RTTM, model download/version mismatch, or eval collar/mapping bug).

Next steps

  • Reproduce locally: swift run fluidaudiocli diarization-benchmark on AMI ES2004a
  • Confirm whether main shows the same ~80% DER or if this is PR-branch specific
  • Check the diarization models actually downloaded/compiled (vs. a fallback) in the CI run
  • Verify the reference annotation and DER scoring (collar, speaker mapping) are correct

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingspeaker-diarizationIssues related to speaker diarization

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions