Architecting AI that learns and interacts like humans

San Francisco, CA
Pinned Tweet
Two new models just dropped 👀 Sonic-3.5 and Ink-2 are the #1 streaming models for text to speech and speech to text
We released Sonic-3.5 and Ink-2, the #1 streaming models for text to speech and speech to text you can use in your voice agents today. New architectures enable new frontiers for speed and quality. We're now the only provider to have #1 models for both speaking and listening.
10
23
102
14,876
Sonic-3.5 is also the #1 streaming TTS model on @voicearena_ai for notable languages including English🇺🇸, Hindi 🇮🇳, Portuguese 🇧🇷
Cartesia Sonic 3.5 is the #1 streaming TTS model on Voice Arena US English Leaderboard. In the overall leaderboard (streaming + non-streaming) it jumped from rank #9 → #2, moving ahead of Grok TTS, ElevenLabs v3 & OpenAI's gpt-4o-mini-tts. Sonic-3.5 is the latest TTS model from @cartesia . It supports 42 languages, with 500+ voices available out of the box. The model has been highly preferred among raters on @voicearena_ai . Results backed by 11,110 blind, head-to-head listener votes
2
4
39
4,483
Cartesia retweeted
Announcing the Bolna × Cartesia VOC-A-THON! Calling the most cracked builders in Voice AI. Come ship voice agents powered by Sonic 3.5, Cartesia's most natural and expressive TTS model yet. - Build with Sonic 3.5, Cartesia's newest TTS model - @bolna_dev and @cartesia teams in the room - Free Sonic 3.5 & Bolna credits for every participant - Exciting prizes for the best voice agents Apply now: luma.com/ubu85bxv
6
26
4,160
Cartesia Ink-2 debuts as #1 for accuracy on the brand-new streaming speech-to-text leaderboard from @ArtificialAnlys! We designed Ink-2 from the ground up for voice agents - with low latency, eager transcripts, and semantic endpointing.
9
37
131
67,613
A great speech-to-text model for voice agents first and foremost needs to have high accuracy in production settings - this means noisy environments and conventionally difficult audio like silences, short transcripts, phone numbers, and UUIDs. For the conversation to be smooth, it also needs to have low latency with eager transcripts to reduce end to end response time. Finally, semantic endpointing with high accuracy is critical so they respond appropriately and don't interrupt the user.
1
1
18
2,220
We've built Ink-2 to excel on all of these axes in production. Give it a try on our website cartesia.ai/ink
1
2
20
1,860
Cartesia is excited to be the Voice that powers @avaturn_me's new Open Weights AVTR-1 avatar models. Check out the links, repo and docs below 👇
🚨 AVTR-1 New Model is OPEN WEIGHTS . Duplex Native , #1 on benchmarks. Here’s what being released. Links in comments - Model + Paper now on HF - Full Github repo to run it really fast Run it anywhere as low as $0. Comment, share, star on GH to get the word out
3
1
15
3,328
Sonic 3.5 is now the #1 text to speech model on the @ArtificialAnlys leaderboard! You no longer have to trade off quality and latency - Sonic 3.5 also has the fastest time to first audio at 82ms end to end. See full benchmark results 👇
Cartesia’s Sonic-3.5 takes the #1 spot on the Artificial Analysis Speech Arena Leaderboard, surpassing Inworld Realtime TTS 1.5 Max and Google’s Gemini 3.1 Flash TTS Sonic-3.5 is the latest TTS model from @cartesia . It supports 42 languages, including 9 Indian languages, with 500+ voices available out of the box. The model has been highly preferred among voters in the TTS Arena, with its demonstrated naturalness and accurate transcript following. Key takeaways: ➤ Quality: Sonic-3.5 has an Elo score of 1,218 (+16/-16) based on 1,144 arena appearances, placing it ahead of Inworld Realtime TTS 1.5 Max at 1,194 and Gemini 3.1 Flash TTS at 1,209 ➤ Pricing: Sonic-3.5 is priced at $39/1M characters, a premium compared to Gemini 3.1 Flash TTS at $18.3/1M characters, and Inworld Realtime TTS 1.5 Max at $35/1M characters ➤ Speed: 105.5 characters per second, compared to 205 characters per second for Inworld Realtime TTS 1.5 Max and 26.3 characters per second for Gemini 3.1 Flash TTS See more details and listen to samples below 🧵
6
21
88
11,675
Cartesia retweeted
Voice cloning is now available on LiveKit Inference. We’re launching with @inworld_ai and @cartesia. Clone a voice once and use it across multiple TTS providers, with automatic fallback to the same voice if a provider fails mid-call. Free to create and available on all paid plans today.
5
9
98
11,131
Cartesia retweeted
Every day I curse @krandiash for tainting us, removing @cartesia’s purity as a neo lab in the pursuit of “revenue”
The Ultimate List of Artificial Intelligence "Neolabs": May 2026. A Neolab is a pre-revenue scale startup working on long-term AI breakthroughs, usually with a $1B+ valuation. There are now 63 of them!
5
2
87
19,791
Nobody in voice AI is talking about TCPA compliance. Enterprise buyers ask about it first. @2xSolutions CEO Kevin DeMeritt processed $4B in business at Lear Capital - and built 2X Solutions around the TCPA compliance reality that most platforms ignore. Switching to Cartesia’s sub-100ms latency kept them well-within a two-second TCPA window to power millions of calls. Voice quality sealed the deal for live demos. 2X customers even ask to “talk to Mary" (Mary is the AI 🤖)
1
8
1,726
“There were moments where the Cartesia team was telling us things that were happening with our product before we even knew it,” said @fundamentoAI Co-Founder Vickram Saigal. “That gave us a lot of trust – we’ve got the right partners.” @fundamentoAI runs 20M+ monthly outbound calls for India’s largest lenders and insurers. Cartesia was 2x faster than any other provider they tested, and delivered true enterprise partnership.
1
8
1,219
Mamba-3 is out! 🐍 SSMs marked a major advance for the efficiency of modern LLMs. Mamba-3 takes the next step, shaping SSMs for a world where AI workloads are increasingly dominated by inference. Read about it on the Cartesia blog: blog.cartesia.ai/p/mamba-3
3
29
175
73,362
The world’s leading AI infrastructure platforms are converging on the same voice model 🔥 Excited to announce that Cartesia is now a dedicated model partner on @togethercompute's Voice Platform for the 450K+ teams and developers building on Together.
3
11
45
5,373