Community-1: Unleashing open-source diarization - pyannoteAI Speaker Intelligence and Diarization

Blog

Community-1: Unleashing open-source diarization

Almost two years after the release of pyannote.audio 3.1 and with the recent upgrade of our premium model precision-2, we are proud to unveil pyannote.audio 4.0 together with community-1, the latest iteration of our open-source speaker diarization pretrained model. This release represents a significant milestone in our journey to offer state-of-the-art speaker diarization technology.

community-1 is aptly named – it embodies the collaborative spirit that has driven pyannote.audio to 8k+ GitHub followers, 140k unique registered users and 45M monthly downloads on Hugging Face. Over the past year, our community has highlighted crucial pain points that we have addressed head-on. Two challenges stood out as critical: performance gaps in real-world applications and the complexity of reconciliation with speech-to-text timestamps.

community-1 tackles each of these challenges with targeted improvements that maintain our commitment to accessible, cutting edge open-source tools.

Setting a new benchmark for open-source diarization

community-1 represents a major leap forward in open-source speaker diarization performance. The model establishes itself as the best open-source solution available for speaker diarization, significantly outperforming 3.1 across all key metrics.

Better speaker assignment and counting

While segmentation has always been pyannote's core strength, community-1 brings significant improvements to speaker assignment and counting. The model demonstrates significant marked reductions in speaker confusion (when speech is assigned to the wrong speaker) while still providing the community with the same segmentation performance (voice activity and overlapped speech detection) than on pyannote.audio 3.1, the model that the community knows and loves.

These improvements translate directly into more reliable speaker counting and consistent speaker identity tracking across entire conversations – crucial for downstream applications like meeting transcription and call center analytics.

Streamlined reconciliation with STT timestamps

Whisper and other speech-to-text models have become integral to many pyannote workflows, yet reconciling their timestamps with pyannote's precise diarization output remains challenging. STT models often struggle with overlapped speech and short backchannels, while pyannote excels at detecting these nuances.

community-1 introduces exclusive speaker diarization mode to address this fundamental challenge:

# initialize "community-1" speaker diarization
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained(
  "pyannote/speaker-diarization-community-1", 
  token="huggingface-access-token")

# perform speaker diarization
output = pipeline("/path/to/conversation.wav")

# iterate over speech turns without overlapping speech
for turn, speaker in output.exclusive_speaker_diarization:
  print(f"{speaker} speaks between t={turn.start:.3f}s and t={turn.end:.3f}s")

In “exclusive” mode, only one speaker (the most likely to be transcribed) is active at any given time, dramatically simplifying the alignment between STT word timestamps and speaker labels. This feature, initially developed for precision-2, is now available in community-1 as it forms the foundation for exciting new products pyannoteAI will release in the coming months 👀.

`community-1` hosted at cost

To further democratize access to high-quality speaker diarization, we are introducing hosted community-1 on pyannoteAI's platform – available at cost. This provides an alternative to existing hosting options, with the added benefit of seamless switching between our open-source community-1 and premium precision-2 models.

You can now switch from local community-1 to hosted community-1 and then precision-2 by changing one single line of code:

# initialize cloud-based "community-1" or "precision-2" with pyannoteAI API key
community1 = Pipeline.from_pretrained(
  "pyannote/speaker-diarization-community-1-cloud",
  token="pyannoteAI-api-key"
)
precision2 = Pipeline.from_pretrained(
  "pyannote/speaker-diarization-precision-2",
  token="pyannoteAI-api-key"
)

# runs on pyannoteAI cloud!
output = community1("/path/to/audio.wav")
better_output = precision2("/path/to/audio.wav")

No more infrastructure headaches. No setup complexity. Just the same powerful models you know and trust, ready to integrate into your applications.

The Python code above is simply a wrapper around our API that can you can obviously use outside of the Python environment: read the documentation.

Faster training & optimized tooling

In developing precision-2, we implemented important infrastructure improvements that accelerated our internal development. We wanted to share these optimizations to the entire community through the release of pyannote.audio 4.0 on which community-1 is also built.

Metadata caching and optimized dataloaders make indeed training on large-scale datasets dramatically faster. In our internal benchmarks, these improvements delivered a 15x speed-up on large-scale training pipelines. While these enhancements will primarily benefit power users training custom models on substantial datasets, we believe in giving back to the community that has supported us. These same optimizations that power our premium models are now freely available to advance research and development across the entire ecosystem.

Read complete 4.0 changelog

Looking ahead

community-1 represents more than just a model update – it is a testament of ours to bring cutting edge science in the hands of builders across the globe. We extend our heartfelt thanks to every member of our community who has contributed code, reported issues, or provided feedback. Your contributions continue to make our open-source ecosystem thrive.

We invite you to join us for the community-1 release webinar on October 7th, 5pm CET, where we'll dive deeper into the technical details, demonstrate the new features, and answer your questions. Your feedback during this session will help shape the future direction of pyannoteAI as a whole.

Together, we're building the foundation for the next generation of voice AI applications.

Join the Discord community , try community-1 today on Hugging Face or explore hosted options on pyannoteAI.