Low-latency real-time speech-to-text, text-to-speech and translation APIs.

United States
Pinned Tweet
Soniox v5 Real-Time is now available. Live speech AI is not batch transcription with lower latency. It has to turn raw, noisy, continuous audio into structured intelligence while people speak. What’s new: • Higher accuracy across 60+ languages • Completely reengineered speaker separation • Better spoken language identification • Higher-quality real-time translation across 3,600+ language pairs • Faster semantic endpointing for voice agents • Better alphanumeric recognition • More robust native context handling Built for voice agents, meetings, captions, translation, dictation, customer support, contact centers, and multilingual products. Read more: soniox.com/blog/soniox-v5-re…
15
14
107
3,539,375
Don’t trust STT benchmarks. Too clean. Too closed. Too English-heavy. Too easy to cherry-pick. Real speech is chaos: accents, noise, code-switching, interruptions, names, numbers, IDs, domain terms. So we built Soniox Compare STT. Open source. Raw output. Trust no one. Test everyone. soniox.com/compare
8
6
52
1,448,106
Telugu with the dialect intact, used to write actual stories. That's the entire point of building for 60+ languages instead of one. Thanks for this.
Have tried a lot of transcription apps but @soniox_ai is the best. It has live transcription and translation facilities. I was surprised it transcribed Telugu so well even the dialect was transcribed properly. Thanks a ton @soniox_ai you made my writing stories easy.
1
2
267
Sub-250ms endpointing with Soniox v5 Real-Time. This is the next level of voice agent experience.
Real-time voice AI breaks when end-of-turn detection is wrong. Manifone, a telecom and voice AI company in France, integrated Soniox v5 Real-Time endpointing into Manivox.ai and is now seeing endpoint finalization in under 250ms after the phrase ends. Fast turn-taking. Fewer false endpoints. Natural voice agents.
1
10
600
Voice AI cannot be English-first. ATLO, a startup in South Korea, is building AI companion apps, robots, meeting assistants, and smart home devices. They use Soniox extensively for real-world voice AI. Korean is not an edge case. Every language matters. The future of AI is global, multilingual, real-time, and voice-first. Thank you, Sunghyun Park and ATLO.
5
1
14
2,481
With Soniox v5 Real-Time you get a far more robust mid-sentence language switching and speaker diarization. Diarization also holds up on harder audio, when people talk over each other. All our models provide multilingual output by default, and speaker diarization can be turned on with a single parameter (enable_speaker_diarization: true) without any additional costs. Works across all 60+ languages.
5
2
14
569
With Soniox stt-rt-v5 model endpoint detection receives additional configuration controls. This allows users to fine-tune endpointing behavior to the needs of their implementation: - endpoint_latency_adjustment_level - endpoint_sensitivity - max_endpoint_delay_ms Read more on how they work together and how to tune them to specific use cases in our docs: soniox.com/docs/stt/rt/endpo…
1
1
9
696
Soniox v5 Real-Time introduces endpoint_sensitivity. It adjusts how likely the model is to emit an endpoint. Higher values make endpoints more likely, finalizing segments sooner. Lower values make them less likely, so the system waits longer before finalizing. Tune it for fast voice agent turn-taking or for dictation and mid-sentence pausers. Learn more about it from endpoint detection docs: soniox.com/docs/stt/rt/endpo…
1
11
400
Soniox retweeted
We use Soniox extensively at Tana, great stuff!
Soniox v5 Real-Time is now available. Live speech AI is not batch transcription with lower latency. It has to turn raw, noisy, continuous audio into structured intelligence while people speak. What’s new: • Higher accuracy across 60+ languages • Completely reengineered speaker separation • Better spoken language identification • Higher-quality real-time translation across 3,600+ language pairs • Faster semantic endpointing for voice agents • Better alphanumeric recognition • More robust native context handling Built for voice agents, meetings, captions, translation, dictation, customer support, contact centers, and multilingual products. Read more: soniox.com/blog/soniox-v5-re…
2
6
298
Speaker diarization is one of the hardest problems in speech AI. People interrupt, laugh, and talk at once. Acoustic-only systems break when voices sound alike or overlap. Soniox v5 Async uses the sound and the meaning together to figure out who said what, which leads to better separation in real-life conversations.
2
25
1,477
The easiest way to try Soniox Async v5 in your code: use our Python or Node SDK. Call transcribe_and_wait_with_tokens, wait, read the audio transcription from the result. Done.
1
5
474
Soniox v5 Async is live. Our new async speech-to-text model turns real-world audio into more accurate, structured speech data. What’s improved: • Higher accuracy across 60+ languages • Completely reengineered speaker separation for identifying who said what • Improved language identification for multilingual and accented speech • Better recognition and formatting of numbers, dates, emails, IDs, codes, names, and addresses • More robust context usage for names, domain vocabulary, product terms, and custom phrases stt-async-v5 is fully compatible with the existing async API. Just update the model name. Read more: soniox.com/blog/soniox-v5-as…
12
6
57
2,526,321
Google now has Gemini Live Translate.
 Soniox has Real-World Live Translate.
1
3
22
756,197
Soniox shows its performance already on simple audio input. Once you throw in IDs, numbers, emails, addresses, and actual hard speech, the accuracy gap just grows bigger. A broken speech recognition layer makes the rest of the pipeline fall apart, and a laggy service amplifies it. Your voice agents deserve a speech system that does not fall apart.
話題のGemini 3.5 Live Translateを少し前に話題になった、GPT-Realtime-Translateと私が自作アプリで使っている圧倒的コスパのSonioxと比較テストしました。 結論:GPT不安定、Geminiさすが、Sonioxすごい。ASRの速度と精度がこの中でいちばんに見える。 ただ、私の声をマイク音声で音声アウトプットもパソコンのスピーカーからやったので全然本来の力を発揮できていない可能性もあります😅 また真面目な比較テストをしたいと思います。
8
1,161
Stop overpaying for speech AI. Compare your bill across providers with our new pricing calculator. soniox.com/compare#calculato…
2
5
80
3,220,077