Accuracy
Delivers high diarization performance with DER and speaker-attribution results.
Robustness
Stays consistent in noisy, overlapping, & multi-speakers conversations.
Speed
Processes full diarization and speaker-segmentation output in seconds.
What can go wrong?

Broadcast Interview - Radio interview speech.

Clinical - Clinical child assessment interviews.

Courtroom - Formal multi-speaker legal speech.

Conversational telephone speech - Two-speaker telephone conversations.

Map task - Task-oriented dyadic dialogue.

Meeting - Spontaneous multi-speaker meetings.

Restaurant - Noisy informal group conversations.

Sociolinguistic (field) - Field sociolinguistic interviews.

Sociolinguistic (lab) - Controlled sociolinguistic interviews.

Web video - Diverse online video speech.
DIHARD Broadcast
DIHARD Clinical
DIHARD Court
DIHARD CTS
DIHARD Maptask
DIHARD Meeting
DIHARD Restaurant
DIHARD Socio Field
DIHARD Socio Lab
DIHARD Webvideo
pyannoteAI - Precision-2
pyannoteAI - OSS Community-1
AssemblyAI - Universal
Deepgram - Nova-3
ElevenLabs - Scribe-v1
Soniox - STT-async-preview-v1
Speechmatics - Enhanced
OpenAI - GPT-4o-transcribe-diarize
AWS - Transcribe, word-level
NVIDIA - OSS NeMo streaming sortformer (very high latency)


