Voice AI
Spatial Audio Meets LLMs: Multi-Talker Speech Understanding
New research equips large language models with directional multi-talker speech capabilities, enabling AI to understand who is speaking and from where in complex audio environments.
Voice AI
New research equips large language models with directional multi-talker speech capabilities, enabling AI to understand who is speaking and from where in complex audio environments.
Mistral AI
Mistral AI launches Voxtral Transcribe 2, combining batch speaker diarization with open real-time automatic speech recognition for multilingual production workloads at enterprise scale.
voice synthesis
A comprehensive guide to designing fully streaming voice agents with end-to-end latency budgets, covering incremental ASR, LLM streaming, and real-time text-to-speech synthesis.
speech recognition
New research introduces a verification-based approach to correct speech recognition errors while minimizing LLM hallucinations through structured multi-stage processing.
voice cloning
New research uses latent space mixing to generate diverse synthetic voices, addressing underrepresented accents in automatic speech recognition training. The technique improves ASR equity without real voice data from marginalized communities.