Foundation models have enabled amazing human-in-the-loop systems (ChatGPT, Copilot, ++). How can we bring them to bear on important batch computing tasks (like information extraction) - where we need efficiency and reliability at scale? Early thoughts at: hazyresearch.stanford.edu/bl…
3
44
135
33,174
I work at cartesia but unfortunately am bad at math contests. If you're bad at math, there's a home for you here too!
14% of @cartesia is named Brandon, collectively winning the USAMO 4 times If you're a Brandon, come find a home here
2
7
82
15,044
The independent human evals are coming out - Sonic is the highest quality TTS model with conversational latency.
2.5 months ago @elevenlabs put up this comparison with our 10 day old Sonic model: elevenlabs.io/blog/elevenlab… The team took it as a challenge, here's our new scorecard. Higher quality, cheaper & the fastest voice model period. labelbox.com/guides/evaluati… Next 3 months will be fun.
1
1
31
5,933
Thrilled to be sharing some of our early work at Cartesia - Sonic, a blazing fast generative voice model. New architectures will be key to the next generation AI - real time, interactive, and on-device. Grateful to be building with an amazing team!
Today, we’re excited to release the first step in our mission to build real time multimodal intelligence for every device: Sonic, a blazing fast  (🚀 135ms model latency), lifelike generative voice model and API. Read cartesia.ai/blog/sonic and try Sonic play.cartesia.ai/
2
4
30
1,934
New model is out! This has been endlessly fun to play with - and opens up a new way to create audio that sounds exactly the way you like. Try it on our Playground!
We're releasing a new model called Voice Changer. Transform any input voice clip into an output voice from your voice library, and preserve key characteristics of the input voice like intonation, prosody, and emphasis. Try now at play.cartesia.ai
2
28
3,106
It's been a amazing to see our work on SSMs go from academia to powering real-time voice in production across thousands of customers. And excited to share a sneak peak into our research on multi-stream models for multimodal data. Grateful for our early team and supporters :)
We've raised $27M from Index Ventures, Lightspeed, Factory, Conviction, SVA, General Catalyst, A* and our wonderful angels. Cartesia's audio models power the next generation of voice agents, digital media, and assistants across startups and large enterprises. Our mission is to build real-time intelligence with long memory, that runs wherever you are. Multimodal brains for everyone!
4
27
1,222
I joined Snorkel because I was inspired by the team + vision for a unique data-centric approach to transform every step of the AI lifecycle. Two years later and somehow I'm more excited than I was, and there's even more left to do. Come join us!🦄
We are delighted to announce our $85 million Series C at a $1 billion valuation to accelerate #DataCentricAI, with funding by @blackrock, Addition Capital, @lightspeedvp, @greylockvc, @googleventures, and more. Read more in @Fortunesnkl.ai/seriesc
1
26
We launched Snorkel Flow today! We're building a data-first platform that leads the way towards iterative, end-to-end ML development. Grateful to get to work with folks I deeply admire @SnorkelML. More on our product and vision at: snorkel.ai/07-14-2020-snorke….
2
24
So excited to share our work using LLMs for large-scale information extraction, with asymptotically lower costs! My personal takeaway - LLMs enable fundamentally new system designs, with lots of fun, new trade-offs to explore!
LMs can be expensive for document processing. E.g., inference over the 55M Wiki pages costs >$100K (>$0.002/1k toks)💰 We propose a strategy that reduces inference cost by 110x and can even improve quality vs. running inference over each doc directly! 💻​ github.com/HazyResearch/evap…
3
23
2,365
Thrilled to share our work to bring AI to the edge with SSMs. The early AI applications of today run on the datacenter as APIs you can query - but I think the next generation of AI application will run on your device: always on, proactively helpful, fully private, and secure.
Today, we’re unveiling a significant milestone in our journey toward ubiquitous artificial intelligence: AI On-Device. Our team pioneered a radically more efficient architecture for AI with state space models (SSMs). Now, we’ve optimized and deployed them at the edge. We believe the future of AI runs on your device, where it can process continuously and is reliable, private, and secure. Read our full blog post here: cartesia.ai/blog/on-device
1
2
24
1,482
Super excited about this and new possibilities around genome level design and understanding. I'm also a believer in the principle of getting down to the most fundamental level (raw nucleotides) and scaling up with long context models + compute + data. Also pretty pictures :)
A new Science study presents “Evo”—a machine learning model capable of decoding and designing DNA, RNA, and protein sequences, from molecular to genome scale, with unparalleled accuracy. Evo’s ability to predict, generate, and engineer entire genomic sequences could change the way synthetic biology is done. Learn more in this week's issue: bit.ly/3OsmUPr
1
19
1,245
It's been awesome to work with @mavolpi, @shardul_shah, @ishanit5 and the entire team at Index!
We’re thrilled to lead $27M in new funding for @cartesia as they build the next generation of real-time AI. Their pioneering SSMs offer multimodal intelligence available on any device, powering new solutions for customer service, healthcare, transportation, robotics, and more. indexventures.com/perspectiv…
2
17
1,974
Excited to share results and weights for scaling up Mamba text models by @_albertgu @tri_dao with @cartesia @togethercompute! Strong performance, fast inference - and Apache 2.0 🚀
Cartesia Chief Scientist @_albertgu teamed up with Together Chief Scientist @tri_dao to release a new 3B Mamba text model trained on the SlimPajama dataset, in a close collaboration with Cartesia & @togethercompute. Read more on our blog: cartesia.ai/mamba-3b-slimpj
1
15
1,553
Thanks for having me @DrBahijjaRaimiA! Applications to healthcare are what first got me excited about AI, and it was great to discuss how the field is evolving, outstanding challenges, and how we're working to address some of them at @SnorkelAI!
Tune into tomorrow's episode where Dr. Bahijja, our host, and machine learning engineer Brandon Yang (@bclyang) from Snorkel AI (@SnorkelAI) dive deep into the application of AI in healthcare! @anchor @spotifypodcasts #mondaysciencepodcast #DeepLearning #ArtificialIntelligence
1
3
13
snek!
excited to finally release Mamba-2!! 8x larger states, 50% faster training, and even more S's 🐍🐍 Mamba-2 aims to advance the theory of sequence models, developing a framework of connections between SSMs and (linear) attention that we call state space duality (SSD) w/@tri_dao
14
983
Chat with @krandiash and @_albertgu to talk about our research on multi-modal SSM architectures! No booth but you might be able to convince them to buy you a drink
.@_albertgu and I will be meeting folks to talk all things @cartesia research tomorrow (Th) 2-3pm. You can find us near the NeurIPS registration desk
1
12
756
Excited to start my first blog with @maithra_raghu! Exploring the AI Landscape: extail.github.io, looking at AI research, products, and deployment. First post is on a longstanding interest of ours - Digital Health and AI for Health: extail.github.io/jekyll/upda….
Exploring the AI Landscape: extail.github.io/ New blog by @bclyang and me! We'll be covering topics in AI from fundamental research to considerations for deployment. Our first post: extail.github.io/jekyll/upda… is on Digital Health and AI for Health, a longstanding interest!
1
11
A fully on-device assistant built with our new release! Assistants in particular will transform with on-device AI - they'll be reliable, always available, and continuously processing multimodal information to be proactively helpful when you need it without you having to ask.
Check out this fully on-device interview assistant running with our open source repo Edge and our new model Sonic On-Device 📲
10
390
Super excited about the future of real-time assistants! Check out the OSS repo: github.com/ai-ng/swift.
swift-ai.vercel.app an oss voice assistant that pipelines state-of-the-art high-performance ai models: @groqinc whisper → llama3 → @cartesia sonic
9
733
This is the first article that helped me understand what people mean by “biology becoming more engineering”. In CS, we build on systems built by other people. In biology, we build on systems built by nature we discover. Fascinated by how discovery enables engineering.
1
8
Was awesome seeing everyone as part of SF tech week. The SF conversational AI scene is amazing and I'm grateful to be a part of it!
We had a fantastic time hosting our Conversational AI Leaders Dinner for SF Tech Week last night. It’s always energizing to spend time with our community and hear how they’re pushing the boundaries of voice agents. Thanks to this group for sharing their insights - go check out what they’re building! @timshi_ai , CTO of @cresta - bringing enterprise-grade generative AI to contact centers @davidzh , CTO of @livekit - open source infrastructure for real time AI @swolephia, former Head of Product @inworld_ai @qfav95, COO of Tavus - digital twins Jeffery Liu, Co-CEO of @AssortHealth - ai call centers for healthcare @peggy_wang, CTO of Ego.live - ai native 3d simulation engines Samir Sen, AI Engineer at @crescendoCX - ai powered customer service @DPankaew , Founder and CEO at Listening.com - text to audio for academic papers
7
509
You hear a lot about how slow technology adoption can be, but it's been amazing to see how fast all industries are adopting real-time voice. Some great insights from the enterprises, startups, and creators we've been lucky to work with this year.
We’re excited to share our 2024 State of Voice! 🎙️ We've been fortunate to work with hundreds of incredible founders, product leaders, and engineers this year, and we couldn't help but notice some exciting patterns emerging in voice tech. We dive into key infrastructure breakthroughs and innovative use cases we're seeing in the space. Plus, we share our thoughts on what's coming next for voice in 2025!
7
348
Excited to share some great work by the team on applying Snorkel to healthcare, and how these ideas can address the training data bottleneck, empower subject matter experts, and make machine learning more private and auditable!
AI has so much potential to improve healthcare. However, there are still many practical and ethical challenges to be overcome for AI to deliver value. In this post, one of our engineers, @bclyang, analyzes these challenges and possible solutions. snorkel.ai/ai-challenges-in-…
7
Sad to announce I'm leaving the @GoogleAI Residency! So grateful for everyone who took a chance on me as a researcher: my mentors @quocleix, @jiquanngiam, @maithra_raghu, & Jon Shlens and our amazing friends/collaborators @Waymo and @Google.
7
Super cool work. Long sequence models + efficiently learning from raw data is going to be a big part of the next generation of ML techniques!
Bio2Token: All-atom tokenization of any biomolecular structure with Mamba @FlagshipPioneer • This paper introduces “Bio2Token”, a method that tokenizes biomolecular structures at an all-atom level using Mamba. Unlike many current approaches that rely on coarse-grained residue-level representations, Bio2Token focuses on a more detailed atomic-level tokenization. • The innovation here lies in the use of quantized auto-encoders that learn atom-level representations, achieving reconstruction accuracies below and around 1 Ångström. • Mamba, a state space model, plays a key role by providing efficient and scalable encoding, overcoming computational limitations of traditional transformer-based models. Bio2Token can handle structures up to 95,000 atoms, which is significantly larger than the limit for many transformer models. • This approach not only achieves high accuracy but also uses fewer parameters and training resources compared to existing methods like AlphaFold-3 and ESM-3. • Bio2Token demonstrates versatility by tokenizing proteins, RNA, and small molecules, making it a flexible tool for biomolecular structure representation. • The quantized auto-encoders (QAE) efficiently transform 3D structures into 1D discrete tokens, allowing future integration with language models for biomolecular tasks. • The authors present domain-specific tokenizers (mol2token, protein2token, RNA2token) and a combined tokenizer (bio2token) that generalizes across different types of biomolecules. • Compared to ESM-3, Bio2Token achieves a lower reconstruction RMSE and superior performance across protein and RNA datasets, demonstrating its potential as a robust tool for accurate structural modeling. • The combination of Mamba-based architecture and quantized auto-encoders provides a lightweight yet powerful solution, avoiding the quadratic computational cost seen in transformers. • Limitations include ensuring chemical validity in reconstructed structures, as even small deviations can lead to unrealistic bonding. Future directions involve improving accuracy by adding more training data and integrating post-processing steps for chemical validity. @oliviaviessmann 📜Paper: arxiv.org/abs/2410.19110 #biomoleculardesign #proteinmodeling #machinelearning #stateSpaceModel #bioinformatics #Mamba #tokenization
7
1,110
Congrats @lucaswcampa and the @trytobyAI! Inspired by the potential for real-time speech AI to connect the world :)
Congrats to our friends at @trytobyAI on their recent launch! We’re excited to support them in their mission to empower global workforces. With Sonic, toby achieves real-time speech translation across every language. Read more about our partnership here: cartesia.ai/blog/2024-08-19-…
1
7
369
Replying to @saranormous
It takes a rare investor to believe in you before you even know you want to start a company. Thanks for all the support @saranormous @pranavreddy!
7
452
Exciting way of thinking about supervision by designing losses! Reminds me of @SnorkelML, which uses programmatic labeling functions to label many examples at once. I think encoding human knowledge as supervision more efficiently than individual labels is an important direction.
This is a cool way to build highly specific prior knowledge into neural nets: via highly specific loss functions arxiv.org/abs/1609.05566 Don’t just maximize likelihood of the data or predict the next frame—build all the constraints you can know into the loss and make NN satisfy.
5
Try Sonic on @Quora with Poe!
Excited to share that Quora is partnering with Cartesia to add audio capabilities to Poe! ✨ Users can already interact with advanced LLMs on the @poe_platform. Now they can also get audio responses in diverse voices and languages, transforming text into audio for a richer experience. Want to hear today's news in our vintage 1920's Radioman voice, or create epic stories narrated by our immersive Wizardman voice? You can do it all on Poe. "Poe brings together the world's best AI, all in one place. With Cartesia's Sonic model, users can interact with a wide range of high-quality, human-like voices in multiple languages, enhancing their experience on our platform," says Spencer Chan, Quora's Head of Poe Product.
4
302
Come join us!
Conversational Voice and Video Hackathon updates 🌟 @cartesia and @googlecloud have joined as sponsors 🌟 @ProductHunt is sponsoring and is facilitating a remote participation track Oct 19th-20th. In-person in San Francisco and remote. 💸 $20,000 in cash prizes 💸 I'm grateful to Product Hunt for making a remote track possible. People always ask if they can join our real-time AI events remotely. We're going to do a really great job supporting remote participants for this one! 💪 The Product Hunt team will do special consultations and feedback about launching on Product Hunt for remote track winners 💪 UI and product feedback from the Product Hunt team 💪 Full support of remote developers by all the hackathon sponsors, in the hackathon Discord 💪 Top secret Product Hunt swag for remote track winners 💪 Remote teams can compete on equal footing with in-person participants for the $20,000 in prize money that's up for grabs Join us if you're interested in building and learning about: 💠​ voice AI agents 💠​ ​virtual avatar experiences 💠 ​UIs for multi-modal AI ​ 💠​ apps built around conversational dynamics 💠​ art projects 💠​ anything else real-time AI you can dream up! Oct 19th and 20th. Registration link in the thread below.
4
312
Amazing nostalgia - and a peak into the future of AI and gaming
An Homage To Metal Gear Solid a playable voice AI puzzle game <overheard in slack> me: I wrote some sample code to show how you switch out LLM context on the fly and why you might want to. @JonPTaylor: hold my beer ... </> Tech stack: - input speech processing @DeepgramAI - LLM @AIatMeta Llama 3.1 70B on @togethercompute - voices @cartesia - app code and hosting @vercel - orchestration @trydaily Daily Bots / @pipecat_ai Watch the video. Or just play the game.
4
337
Come hang out with us!
Excited to be hosting our first Conversational AI Meetup with @usepylon next Tuesday evening! 🎉 We'll have an expert panel discussion on conversational agents followed by quick demos. Pizza and drinks will be served. Come meet our team! Sign up to attend or show a demo 👇
4
280
Excited for all the progress over the last couple months - check out how we're doing on the latest human evals!
Users come to us all the time with questions around how to evaluate the best voice generation APIs. To help, we put together a systematic comparison on the important features to look at when comparing Cartesia Sonic to ElevenLabs (link below) Another great resource is the @ArtificialAnlys Text-to-Speech Arena, which conducts a blind human preference test on voice quality across providers.
4
271
It's been amazing to work with @bobs and the team at Goodcall - they're speech experts and moving super fast!
@bobs prior led Google’s Text-to-Speech product and co-founded a business phone assistant within Google’s product incubator, Area 120. Today, Bob’s new company, Goodcall, announced that they have switched 100% of their text-to-speech generations of 2,217 unique voice agents from Eleven Labs to Cartesia. Here's why they chose Cartesia: ⚡️ Industry-leading Latency: 90 ms average time-to-first-audio including model and network latency. 3X faster than comparables. 🔊 Best-In-Class Conversational Quality: Bob was able to silently switch his voice provider to Sonic with 0 customer complaints and consistently hit 97% interaction rate, with several customers asking how he got the voices to be so human-like 🎨Voice Design Capabilities: Cartesia gives Bob the ability to clone voices and blend them in a couple of clicks Read the full story and try Goodcall's Sonic-powered agents here 👇
3
283
Super cool from friends @togethercompute !
New Cookbook: An open source implementation of NotebookLM from Google. In short it involves: 1. Structured decoding with Llama 3.1 70B on Together AI to extract a JSON script. 2. Cartesia Text to Speech to bring the script to life! Check it out below!
3
351
Thanks for the support and all the great work on MLX!!
3
113
Replying to @tavus @cartesia
It's been awesome to work with you @hassaanraza97 @qfav95 !
1
3
251
Thanks for your support @awnihannun ! We <3 MLX!
2
199
Thanks, this is a great suggestion! We'll add to our pipeline.
2
51
Replying to @nikhilro_
Congrats @nikhilro_!!
2
130
brrr
SSMs go brrrr super excited to announce the first model powered by our latest research into efficient architectures 👀 stay tuned for more details soon!
2
369
Replying to @_albertgu
Awesome work! Congrats @avivbick @kevinyli_!
1
2
565
Replying to @anothercohen
Awesome to work with you (and loved the launch) @anothercohen!
2
95
Replying to @nickoates_
This was SO cool!
2
52
Thanks for all the support @ishanit5!
1
2
83
Replying to @kwindla
Awesome to work with you @kwindla !
1
90
It's been great working with you!! Thanks for all the support
1
87
Great to meet you!
1
215
It's updated in the docs, but it's billed at 15 characters per second!
1
62
Thanks to @EyubogluSabri and @HazyResearch for being amazing collaborators on this post! Additional thanks to @simran_s_arora, Arjun Desai, @krandiash, @DS3Lab, @chamii22, Ben Spector for their comments + contributions 🙏
1
933
Replying to @lumetric @cartesia
Thanks for being one of our earliest users!!
118
Replying to @lateinteraction
Congrats omar!!
1
106
Thanks for all of your support sarah!!
1
168