speech recognition

Voice AI

New Hierarchical Model Improves Real-Time Conversational AI Turn-

Researchers introduce a hierarchical end-of-turn detection model with primary speaker segmentation, advancing real-time conversational AI systems for more natural voice interactions.

Voice AI

Spatial Audio Meets LLMs: Multi-Talker Speech Understanding

New research equips large language models with directional multi-talker speech capabilities, enabling AI to understand who is speaking and from where in complex audio environments.

Mistral AI

Mistral AI Unveils Voxtral Transcribe 2 With Real-Time ASR

Mistral AI launches Voxtral Transcribe 2, combining batch speaker diarization with open real-time automatic speech recognition for multilingual production workloads at enterprise scale.

Voice Synthesis

Building Low-Latency Voice Agents: A Technical Deep Dive

A comprehensive guide to designing fully streaming voice agents with end-to-end latency budgets, covering incremental ASR, LLM streaming, and real-time text-to-speech synthesis.

speech recognition

Three-Stage LLM Framework Tackles ASR Errors and Hallucinations

New research introduces a verification-based approach to correct speech recognition errors while minimizing LLM hallucinations through structured multi-stage processing.

Voice Cloning

Latent Mixup Creates Diverse Synthetic Voices for Fair ASR

New research uses latent space mixing to generate diverse synthetic voices, addressing underrepresented accents in automatic speech recognition training. The technique improves ASR equity without real voice data from marginalized communities.