Community meets Cloud: Hosting OSS Models on pyannoteAI - pyannoteAI Speaker Intelligence and Diarization

Blog

Community meets Cloud: Hosting OSS Models on pyannoteAI

We're happy to announce that pyannoteAI now hosts open-source speaker diarization models through our API. Starting today, you can access state-of-the-art OSS diarization models alongside our Precision models, all through the same unified API, with zero infrastructure setup required. This means you can now experiment with open-source models, prototype faster, and scale your diarization pipeline without managing GPU clusters, dependency conflicts, or deployment complexity.

Whether you're a developer building production applications or a researcher exploring new approaches, pyannoteAI provides the infrastructure so you can focus on what matters: your application logic.

What this unlocks in practice:

Zero GPU setup: run Community-1 from the cloud, no local CUDA drivers, no container maintenance.
No infrastructure ops: we handle scaling, scheduling, and storage.
Models ready via API: Send an audio URL or uploaded file and receive diarization annotations.

👉 Get the world’s most advanced open-source diarization model running in just 30 seconds!

Why We Decided to Host OSS Models?

At pyannote, we've always believed in supporting the open-source community. The pyannote.audio toolkit has been open-source since its inception, and we've seen firsthand how OSS accelerates innovation in speech processing research.

By hosting open-source diarization models on our infrastructure, we're addressing three critical pain points:

Lower the barrier to entry for researchers and devs. Many users told us their experiments stalled on ops: provisioning GPUs, resolving dependency conflicts, or adapting models to cloud runtimes. Hosting removes that friction so you can focus on model behavior and downstream evaluation.
Faster iteration. Community-1 brings key improvements to speaker counting, speaker assignment, and STT reconciliation; hosting it at cost helps more teams validate results on real data faster.
Offer a single, consistent integration path. You can flip a single parameter to switch between Community-1 (hosted) and our Precision-2 models; same request pattern, same outputs, which drastically simplifies product development and AB tests.

Deep dive into the models

We now offer distinct models through the pyannoteAI API, each optimized for different use cases and accuracy requirements.

Hosted Community-1: Open Source model

Who it’s for: researchers, developers, rapid prototypers, and teams who value transparency and reproducibility.

Typical use cases: benchmarking, academic work, product iteration, prototyping, and custom diarization deployment (e.g., dataset-specific fine-tuning or custom reconciliation with STT).

Benefits:

Open-source flexibility: full transparency into model weights and code paths (useful for debugging and research).
State-of-the-art OSS baseline: Community-1 is the newest iteration of pyannote.audio and significantly improves speaker assignment and counting over prior OSS releases.
Cost efficiency: hosted at cost, which is attractive for experimentation and low-volume workloads.

Trade-offs: If you scale to high throughput or require the last few percentage points of accuracy in noisy/edge conditions, you may need engineering effort (careful batching, customized post-processing) or to evaluate the Precision series for production reliability.

Precision-2: Commercial model

Who it’s for: startups, SMEs, and enterprises who need production-grade, enterprise support, a deeper set of features (call analytics, meeting products, broadcasting workflows, transcription services)

Typical use cases: phone call analytics, meeting transcription with speaker attribution, video dubbing, and timestamp-critical workflows, large-scale monitored call-center pipelines.

Benefits:

Higher accuracy and robustness: tuned on large training runs, Precision models systematically reduce speaker confusion, missed detections, and false alarms relative to OSS baselines. Precision-2 in particular was designed for more accurate speaker assignment and timestamp precision.
Production-oriented tooling: lower latency options, tighter integration into enterprise flows, and support for large workloads.

Precision-2 headline numbers: Precision-2 is the latest flagship, ~28% more accurate than community-1 on pyannote’s internal benchmarks — a meaningful delta for applications where human-in-the-loop correction is expensive.

How to choose between the models

Typical use cases: benchmarking, academic work, product iteration, prototyping, and custom diarization deployment (e.g., dataset-specific fine-tuning or custom reconciliation with STT).

Tutorial: accessing models via API

Using pyannoteAI models (whether open-source or commercial) is straightforward. Here's how to diarize an audio file in less than five minutes.

Step 1: Get Your API Token

Sign up at pyannoteAI and generate an API token from your dashboard. You'll use this token to authenticate all API requests.

Step 2: Send a Diarization Request

The pyannoteAI API accepts audio file URLs and returns structured job data in JSON format. Here's a minimal example using cURL:

curl --request POST \\
  --url <https://api.pyannote.ai/v1/diarize> \\
  --header 'Authorization: Bearer YOUR_API_TOKEN' \\
  --header 'Content-Type: application/json' \\
  --data '{
  "url": "<https://example.com/meeting.wav>",
  "model": "community-1",
}'

Response:

{
  "jobId": "3c8a89a5-dcc6-4edb-a75d-ffd64739674d",
  "status": "created"
}

Replace YOUR_API_TOKEN with your actual token and https://example.com/meeting.wav with your audio file URL. The model parameter accepts:

community-1 for open-source diarization
precision-2 for best-in-class performance

Step 3: Get and Parse the Results

Use the Get Job endpoint to poll for the job results until the status is succeeded. Results are automatically deleted after 24 hours, so make sure to save them. For production use, we recommend setting up webhooks to receive results automatically.

curl --request GET \\
  --url <https://api.pyannote.ai/v1/jobs/{jobId}> \\
  --header 'Authorization: Bearer YOUR_API_TOKEN'

The API returns a JSON response with speaker segments and timestamps:

{
	"jobId": "3c8a89a5-dcc6-4edb-a75d-ffd64739674d",
  "status": "succeeded",
  "createdAt": "2024-02-20T12:00:00Z",
  "updatedAt": "2024-02-20T12:00:00Z",
  "output": {
	  "diarization": [
	    {
	      "start": 0.5,
	      "end": 3.2,
	      "speaker": "SPEAKER_00"
	    },
	    {
	      "start": 3.5,
	      "end": 7.1,
	      "speaker": "SPEAKER_01"
	    },
	    {
	      "start": 7.3,
	      "end": 11.8,
	      "speaker": "SPEAKER_00"
	    }
	  ],
  }
}

Each segment indicates when a speaker was active, making it trivial to integrate with transcription pipelines, analytics dashboards, or conversational AI systems.

Step 4: Integrate into Your Application

Because pyannoteAI handles all infrastructure complexity, integrating diarization into your application requires minimal code. Here's a Python example:

import requests
import time

API_KEY = "YOUR_API_TOKEN"

# Create a diarization job
response = requests.post(
    "<https://api.pyannote.ai/v1/diarize>", 
    json={"url": "<https://example.com/meeting.wav>"},
    headers={"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json"}
)

if response.status_code != 200:
    raise Exception(f"Failed to create job: {response.status_code} - {response.text}")

job_id = response.json()["jobId"]

# Poll for diarization job result
while True:
    response = requests.get(
        f"<https://api.pyannote.ai/v1/jobs/{job_id}>", 
        headers={"Authorization": f"Bearer {API_KEY}"}
    )
    status = response.json()["status"]
    if status == "succeeded":
        print(response.json()["output"])
        break
    elif status in ["failed", "canceled"]:
        print(f"Job {status}: {response.json()}")
        break
    time.sleep(10) # Wait for 10 seconds to poll again

No PyTorch installation. No CUDA setup. No GPU management. Just a simple HTTP request that returns production-ready diarization results.

👉 Start running Community-1 with pyannote

Get ready to simplify your speaker diarization pipeline

Hosting Community-1 through the pyannote API gives you the best of both worlds: the transparency and flexibility of open-source with the convenience and operational reliability of a managed API. For rapid prototyping, evaluation, and reproducible research, Community-1 hosted at cost is a fast path. For production systems that require the smallest possible diarization error and the most robust handling of noisy or complex audio, Precision-2 is the recommended choice.

Start today to:

Try Community-1 (hosted at cost) to accelerate your experiments and reduce ops overhead.
Evaluate Precision-2 on a representative slice of your production data to measure the practical accuracy gains and reduced correction cost.
Read the API docs for request examples, upload flows, webhook integration, and model parameters.

At pyannote, our mission is twofold: support the OSS community and deliver production-grade tools for teams building real-world voice applications. Hosting Community-1 is one more step toward that mission, and it’s designed so you can move quickly from experimental idea to robust product.

We’re really looking forward to seeing what you build with both of our diarization models!