ElevenLabs launches Scribe v2 Realtime speech-to-text model with 93.5% accuracy

A
Ayaan Zaveri

Introduction to Scribe v2 Realtime

ElevenLabs has recently unveiled Scribe v2 Realtime, an advanced, ultra-low latency Speech-to-Text (STT) model optimized for agentic and real-time conversational AI applications elevenlabs.io+2. Launched in mid-November 2025, this model marks a significant leap in voice AI technology, setting new benchmarks for speed, accuracy, and multilingual support in live transcription elevenlabs.io+3. It represents a substantial improvement over its predecessor, Scribe v1, particularly in latency and real-time capabilities elevenlabs.io.

What is Scribe v2 Realtime?

Scribe v2 Realtime is a cutting-edge speech recognition model designed to convert spoken audio into text with exceptional speed and precision. It is built from the ground up with predictive transcription capabilities, allowing for highly responsive and natural interactions news.ycombinator.com.

Why it Matters

This advancement enhances real-time communication and accessibility across various sectors. By providing highly accurate and rapid transcription, Scribe v2 Realtime is poised to revolutionize how businesses and individuals interact with voice AI, improving efficiency and user experience in critical applications elevenlabs.io+2.

Unmatched Performance and Accuracy

Scribe v2 Realtime is engineered for superior performance, particularly in challenging real-world audio conditions.

Ultra-Low Latency

The model boasts ultra-low latency, transcribing speech in as little as 30-80 milliseconds (ms) elevenlabs.io. More broadly, it consistently delivers live transcription in under 150 ms, making it ideal for applications requiring immediate text output from spoken audio elevenlabs.io+6. This remarkable speed enables natural and fluid responses in conversational AI systems, crucial for seamless user interactions elevenlabs.io.

Superior Accuracy

ElevenLabs positions Scribe v2 Realtime as the most accurate model in its category, delivering "human-quality" transcription elevenlabs.io+2. It has achieved an impressive 93.5% accuracy on the FLEURS multilingual benchmark across 30 languages, outperforming competitors like Gemini 2.5 Flash (91.4%) and GPT-4o MiniTranscribe (90.7%) linkedin.com+3. The model is specifically trained to handle diverse accents and poor audio quality, effectively capturing user intent even in noisy environments elevenlabs.io+2.

Extensive Multilingual Support

A key strength of Scribe v2 Realtime is its robust multilingual capabilities. It supports over 90 languages, with some sources indicating support for 99 languages, including major global languages like English, French, German, Italian, Spanish, Portuguese, Hindi, and Japanese threads.com+7. This includes support for 11 Indian languages, such as Hindi, Tamil, Malayalam, and Telugu linkedin.com+2. The model achieved the lowest Word Error Rate (WER) on the FLEURS multilingual benchmark, underscoring its global applicability elevenlabs.io.

Advanced Features for Enhanced Control

Scribe v2 Realtime incorporates several innovative features designed to improve transcription quality and user experience:

  • Negative Latency Prediction: This feature enhances user experience by predicting the next word and punctuation, making the transcription feel even more instantaneous linkedin.com+2.
  • Voice Activity Detection (VAD): Automatic speech segmentation based on silence detection improves the overall quality and readability of transcripts adtechtoday.com+2.
  • Manual Commit Controls: Users have the flexibility to control when to finalize and commit transcript segments, offering greater precision for streaming applications linkedin.com+3.
  • Custom Vocabulary: The model can be tailored with domain-specific vocabulary to enhance accuracy in specialized contexts adtechtoday.com.
  • Precise Timestamps: It provides detailed word-level timestamps, which are crucial for applications requiring high accuracy in transcription elevenlabs.io.
  • Zero Retention Mode: Ensures compliance for sensitive workloads by not retaining data adtechtoday.com.

Diverse Applications and Use Cases

The versatility of Scribe v2 Realtime makes it suitable for a wide array of industries and applications:

  • Conversational AI and Voice Agents: Its ultra-low latency is critical for creating natural and efficient AI agents and voice assistants elevenlabs.io+4.
  • Live Events and Broadcasting: Ideal for live captioning, conferences, webinars, and presentations, enhancing accessibility for multilingual audiences elevenlabs.io+4.
  • Customer Service: Enables real-time transcription of customer calls, improving efficiency and record-keeping elevenlabs.io+4.
  • Meetings and Education: Facilitates real-time meeting notetakers, medical dictation, and educational accessibility tools threads.com+6.
  • Developers: Aimed at developers building voice-driven applications and tools linkedin.com+2.
ElevenLabs — Scribe v2 Realtime live in ElevenLabs Agents

Seamless Integration and Accessibility

ElevenLabs has ensured broad accessibility for Scribe v2 Realtime. It is fully integrated into ElevenLabs Agents, where users can enable it under the Advanced configuration section for immediate use elevenlabs.io+5. The model is also available via the ElevenLabs API and through a Python SDK, allowing developers to integrate it into their custom applications threads.com+5. For client-side implementation, an npm SDK is available, along with an open-source real-time transcription UI component news.ycombinator.com+2.

Commitment to Compliance and Security

ElevenLabs emphasizes compliance and security with Scribe v2 Realtime. The model meets enterprise standards, including SOC 2, PCI, and HIPAA, and offers EU data residency options linkedin.com. For regions like India, data residency options are available to comply with local regulations linkedin.com+2. A single-use token system is utilized for secure API credentials, further enhancing privacy and security elevenlabs.io+1.

ElevenLabs Blog - Company, Research & Product Updates
ElevenLabs Launches Scribe v2 Realtime for Sub-150 ms Live Transcription Across 90+ Languages
ElevenLabs launches Scribe v2 Realtime
Realtime Speech to Text | ElevenLabs Documentation
Build Live Caption Broadcasting for Events with Next.js and Elevenlabs API
From Kannada to Hindi, ElevenLabs’ Scribe v2 Transcribes in Real Time | AIM
ElevenLabs docs | ElevenLabs Documentation
ElevenLabs

Related Bites