Mistral has launched Voxtral, a language model focused on speech recognition applications, featuring two variants: Voxtral Mini (3B) and Voxtral Small (24B). These models aim to combine cost-efficient transcription of classic ASR systems with the semantic understanding of advanced LLM-based models. Voxtral provides a 32K token context for processing audio, facilitating tasks like Q&A and summarization without needing to chain systems. Available for local deployment and via API, Voxtral supports multilingual processing and retains text-only capabilities, claiming significant cost benefits over competitors like OpenAI Whisper.
Mistral has released Voxtral, a large language model aimed at speech recognition (ASR) applications that seek to integrate more advanced LLM-based capabilities and go beyond simple transcription.
According to Mistral, Voxtral closes a gap between classic ASR systems, which deliver cost-efficient transcription but lack semantic understanding, and more advanced LLM-based models.
Voxtral has a 32K token context, which enables it to process audios up to 30 minutes for transcription, or 40 minutes for understanding.
Mistral claims both cost and performance advantages over other solutions like OpenAI Whisper, ElevenLabs Scribe, and Gemini 2.5 Flash.
Collection
[
|
...
]