100% open source framework for realtime voice and multimodal AI. Maintained by @trydaily engineering team with support from the Pipecat developer community.

Thank you to our community, and all the Pipecat developers out there. You guys are amazing 🫡
Big day today. Pipecat version 1.0. Two years in the making. The most widely used framework for voice agents, but not just voice agents. Pipecat is a framework for realtime, multi-modal, multi-model AI applications. Contributions from NVIDIA, all the foundation labs, AWS, GCP, and Azure. Used by thousands of startups, scale-ups, and enterprises. Pipecat Subagents v0.1.0. A new library for sub-agent orchestration. Which is just a fancy way of saying running lots of inference loops in parallel, with partially shared context. The basic architecture of Pipecat Subagents is an event bus that works locally, and over the network. And Gradient Bang. The side project that broke containment. Built with Pipecat and Pipecat Subagents. Gradient Bang was actually the proving ground for the early Subagents work. But ... it's also a really fun game.
2
4
14
2,748
The new sonic-2 voice models from @cartesia are available in Pipecat. Latency is 40ms for the `sonic-turbo` version of the model and 90ms for the larger `sonic-2` model. - Cartesia developer docs: docs.cartesia.ai/build-with-… - A complete voice agent example project: github.com/pipecat-ai/pipeca…
1
3
31
1,430
link in bio
6
1
21
25,309
Today, 2:40 🙌 Open source + voice agents at the @aiDotEngineer Summit in New York. Nik from Superdial talks automating millions of healthcare calls
6
21
2,794
0.0.61 released. Support for @GroqInc's new voice model.
4
18
1,257
Lots of new things in 0.40. ✨ Function calling and prompt caching for @AnthropicAI Claude 3.5 Sonnet ✨ Llama 3.1 function calling support in the @togethercompute service ✨ A complete implementation of the RTVI standard ✨ Studypal, a new application example from the team at @cartesia ✨ A GStreamer pipeline source
3
2
19
8,659
Pipecat 0.0.77 🫡
3
3
18
2,209
This Thursday, Sep 4th - Voice AI Meetup in London! We love talking with you all at these meetups, and are excited to be back in the UK. Register your spot, in thread. 🍻 Networking, food, drinks with fellow voice AI engineers and teams 🫡 Live demos, panel, and chats with @pipecat_ai @Speechmatics @trydaily @GoogleDeepMind @tavus — the latest on multi-speaker diarization, multimodal research, Open Source UI Kit Demo 🔥 🔥 Shoutout to @Speechmatics for hosting at their new office, and also cohost Daily.
1
6
16
1,932
🚀 Today we’re launching Pipecat TV! A video podcast about Pipecat: new features, how-tos, advanced tricks, community interviews & more. 🎙️ Fun + useful for all voice AI builders. Watch the pilot episode now! 🐱📺 piped.video/Nw0VyfVTbGQ
2
2
15
1,813
Pipecat 0.0.62 released today. Highlights include: ➡️ Support for @gladia_io's Solaris speech-to-text model released today. ➡️ A new memory layer service courtesy of @mem0ai. ➡️ WhisperSTTServiceMLX for Whisper transcription on Apple Silicon ➡️ A new peer-to-peer WebRTC transport
.@gladia_io announced their new Solaria speech-to-text model today. I hear so, so many rave reviews about Gladia from people building French-language voice AI agents. Gladia's new model supports more than 100 languages. The language auto-detection is the best I've seen from any speech model. You can configure Gladia to translate whatever language is spoken into a target language. In this video I start out speaking English and then switch to (terrible, terrible) French. You can see that Gladia "transcribes" both English and French to English. 🧵 with a live demo you can try out, plus code for the demo, and more links ...
1
2
13
1,522
Here's an example of parallel pipelines and complex function calling using a speech-to-speech API.
Can you beat my 1-929-LLM-GAME high score? We've been exploring what you can do with speech-to-speech models. Here's a word guessing game, built with the Gemini Multimodal Live API, Vercel, and Twilio, that has a bunch of interesting features ... 🧵
10
1,586
Pipecat 0.0.37 is now available with the first RTVI-AI backend implementation! Meow!
Today we’re announcing an open standard for Real-time Voice and Video Inference: RTVI-AI. The RTVI abstractions and data structures define how client applications communicate with inference services. These are the “real-time APIs” for use cases like: - Voice chat with LLMs - Enterprise voice workflows such as healthcare patient intake - Video avatars and immersive experiences - Voice-driven user interfaces - Voice conversational apps for education, customer support, and games - High-framerate image generation and streaming generative video We’re shipping open source reference JavaScript and React SDKs today, with iOS, Android and other platform SDKS coming soon. (Links in the thread below.) This first release has been several months in the making, and incorporates work and insights from @GroqInc, @DeepgramAI, @FAL, @cartesia, @cerebriumai, @Vapi_AI, and @trydaily. With RTVI, a “hello world” voice-to-voice AI chat app in JavaScript is 21 lines of code. If you want to build real-time AI applications, implement infrastructure for real-time inference, or implement your own SDKs that leverage the RTVI standard, you are more than welcome to join this project. We welcome all contributions and ideas!
4
10
1,810
Meow
Something new cooking ...
1
1
9
1,045
NVIDIA GTC 11:00 today — @kwindla is talking AI agents on a panel, SJCC 230B (L2). Come say hi if you're around. We love meeting you all in person.
1
9
558
Conversational Voice + RAG example code, with Pipecat and Daily Bots.
Talk to (a bootleg) virtual @benthompson [Meta-note: I recorded this video in a Waymo. So you're watching an AI experience inside an AI experience.] We did an internal voice AI hackathon a couple of weeks ago at @trydaily. Several of us are long-time @stratechery fans; @mark_backman had the idea of creating a "talk to Ben Thompson" toy demo. This kind of project is a really nice testbed for combining RAG with voice. I'll put some notes about building voice + RAG below, but if you just want to jump to a live demo, there's a link further down in this thread. The tech stack here breaks down into two parts: preparing and indexing the data, and running the live experience. There are lots and lots of choices right now for chunking, embedding, and storage/retrieval tooling. Mark used these: - @spacy_io for semantic chunking - @OpenAI text-embedding-3-small - @pinecone to store the embeddings The live app uses: - @OpenAI gpt-4-o mini - function calls trigger a @langchain query - the voice is a @cartesia clone - @pipecat_ai does the low latency phrase endpointing, interruption handling, context management, and orchestration - @trydaily Daily Bots voice transport and Pipecat hosting - the demo app is hosted on @vercel A link to the full source code of the app (but not the copyrighted Stratechery content) is in the thread below. Several things about building this are tricky: - Latency really matters, and it's hard to make function calling + RAG fast. This was an experiment, not a production app, but Mark was still able to get the median total (voice-to-voice) response time down below 1.5s. In general, we aim for ~800ms for conversational voice AI response times, so this is slower than we want these experiences to be. But the median here isn't terrible. The outliers do feel too slow, though. - RAG is complicated to get right. Mark did a lot of experimenting with chunking and embeddings. I think this definitely clears the 80/20 bar of being an interesting demo. I'm interested in what you think if you try it! For a production app, we'd want to do significantly more work on the retrieval subsystem. The quality of the data fetch heavily influences the quality of the conversational output. I'm convinced that talking to an LLM "personality" is going to be a very, very common thing in the near future. Sometimes, we'll talk to personalities that are slices of real peoples' public personals. Like this one. I also think there will be hugely popular personas that are "natively AI," personalities that are not based on a specific, real person. These new apps pose interesting, interrelated questions about copyright, user expectations and desires, and UI design. We trained this app on copyrighted Stratechery content and cloned a real person's voice. This is clearly a copyright violation and of course we'll take the demo down if Ben Thompson objects to it being publicly available. Note that it's not possible to retrieve the copyrighted material, here. It's only possible to get GPT-4o-mini's "remix" of the content. We needed a "behind a paywall" corpus of content to build an interesting RAG demo, because today's large LLMs are trained on much of the freely accessible information on the Internet. There are two things to note about that: 1. You can build a decent "clone" of a public person's personality just by creatively prompting a state-of-the-art LLM. You won't necessarily get the specific content grounding you probably want, but writing style usually shines through pretty nicely. 2. Almost all of the content used for training state-of-the-art LLMs is copyrighted content, even when it's not behind a paywall. Courts haven't yet ruled on whether mixing *a lot* of copyrighted content from many sources together constitutes a legal use of copyrighted content. Perhaps this falls under the category of "fair use." Perhaps not. I went in person to see the Eldred vs Ashcroft oral argument at the Supreme Court in 2002. The court's decision in that case upheld the Digital Millenium Copyright Act. That felt momentous at the time — and wrongly decided. It seems certain that there will be an even more momentous case about how copyright law applies to large model training. Perhaps our highly polarized congress will find a way to pass new laws that extend and clarify copyright for this new era. If so, we should hope that corporate lobbyists aren't the primary authors of that law, as they were with the DMCA. We did not use any of the Stratechery podcast content for this demo, because adding multi-modal, multi-person content was beyond the scope of a hackathon project. But it sure seems like you'd want to add all of that great audio source material to a bigger, production-quality, authorized version of an app like this. It's less obvious to me whether you would want to try to add in non-Stratechery content from Ben Thompson. (NBA commentary and analysis!) Thompson maintains a "no-tech" X account — @NoTechBen. Should this content separation that makes sense on X also port over to the new generative AI personality world? Anyway, this is now a very long post ... so go play with the demo if you're so inclined. Link in the next tweet.
9
1,018
Meow! 🙀
Pipecat 0.0.54 is out! 🔥 Performance improvements, a new task management system to track asyncio tasks within Pipecat, an initial Unit Test framework for testing frame processors and more improvements and bug fixes! 🔥 @pipecat_ai github.com/pipecat-ai/pipeca…
1
8
1,127
Meow meow meow meow!!! 🚀🚀🚀
We are super excited to announce Pipecat 0.0.50! This is a HUGE release that includes support for the new Gemini 2.0 Flash multi-modal model, NVIDIA (NIM and RIVA), Groq, Grok, client SDKs (web, Android and iOS) and a bunch of improvements. Check it out! github.com/pipecat-ai/pipeca…
4
6
2,540
i can talk on the phone now!
Pipecat dial-in on @trydaily with interruptions. Latency isn’t optimal since I’m dialling from the UK (using Skype, no less), but gosh it’s cool!
2
7
949
I had a small paw in this launch.
We’re thrilled to introduce the world’s fastest Conversational Video Interface for developers. Build rich, real-time video experiences with digital twins that can speak, see, and hear. ⏱️ Less than one second of latency 🤖 Realistic, intelligent digital twins 🔌 Plug and play end-to-end building blocks 🏷️ Fully white-labeled tech 🧩 Modular, customizable components like LLM & TTS See the magic! Try talking to Carter live: tavus.io
1
5
987
Excited for developers to build with @SarvamForDevs + @pipecat_ai. The voice AI community in India is 🔥 So many great teams and developers building at scale and pushing forward use cases.
We're on a mission to help developers build voice experiences, faster. Sarvam’s STT, STT Translate, and TTS models are now supported on @pipecat_ai. Making it easier to build real-time conversational agents with Indian language support. Get started here: docs.sarvam.ai/api-reference…
3
6
639
ᓚᘏᗢ // 0.0.63
The Gemini Multimodal Live API has some new features, and @pipecat_ai 0.0.63 is out, with support for them. ➡️ Control for image processing resolution ➡️ Configurable VAD ➡️ Support for 30 languages
5
984
psst, super-secret new t-shirts are in the works for this ...
Voice and multimodal AI Hackathon at @ycombinator on October 11th. Engineers from @trydaily and the @pipecat_ai core team will be there. Join us to build something new, learn from other fun people, and win prizes like a guaranteed Y Combinator interview!
1
6
1,341
Pipecat makes this easy, really, is what I took away from this demo.
Old 4o vs New 4o — a dialog between two generations of voice AI Here's the demo I showed last night at the @cloudflare/@openai builders event. This is two GPT-4o Voice AI bots talking to each other. The first voice is coming from the phone and is powered by the standard Daily Bots demo app. It uses @DeepgramAI transcription, GPT-4o as the LLM, and @cartesia for voice generation. The second voice is GPT-4o voice-to-voice via the new OpenAI realtime API.
6
1,330
gemini-1.5-flash is fast
Replying to @altryne @Google
Latency is great, too. Time-to-first-token is consistently under 400ms.
6
913
Meoooooooooooow!

ALT Awesome Cat GIF

9 days is a long time nowadays, so here's a new release, Pipepcat 0.0.55 is now available! This release comes with a bunch of nice improvements. Check out the changelog for details: github.com/pipecat-ai/pipeca… @pipecat_ai
6
500
👏🏽
Nice PR from @yousifa adding streamable_http support to Pipecat's `MCPClient`! github.com/pipecat-ai/pipeca… If you're doing MCP-related things with voice AI, I'd love to hear about both what's working well for you and what issues you're hitting.
1
5
1,072
Replying to @Sanava_AI @trydaily
🙌 Thank you! And h/t to all the Pipecat developer community!
4
96
0.0.25 puts the fun in function calling
2
4
1,168
Meow meow
Pipecat 0.0.40 is now available. There's so much stuff in this release that you better just check it out. Happy meowing! github.com/pipecat-ai/pipeca…
3
812
Lots and lots of (human) languages!
The text-to-speech and speech-to-text services in @pipecat_ai now support voice-to-voice LLM conversational AI in 79 languages. People "outside the AI world" often ask me how much of the hype about AI is real. One of the things I always say is that the global, multi-lingual, and translation capabilities of AI models are a big, big deal. The Internet collapsed distance globally, but doesn't bridge language gaps and for a long time was very English-language-centric. I'm heartened, this time around, that a lot of people are doing a lot of great work training models in a lot of languages.
2
3
1,322
Meow!
Building real-time AND open-source conversational AI is tough—unless you're using Tavus and the open-source framework @pipecat_ai (by @trydaily) With Pipecat’s native Tavus API integration, developers can build lifelike, modular video AI interactions in minutes 👇
4
1,172
Hackathon, Oct 19th-20th in San Francisco. It's going to be fun!
Conversational Voice and Video AI Hackathon Oct 19th-20th at @solarislll in San Francisco $20,000 in prize money for the best ... 💠​ voice AI agents 💠​ ​virtual avatar experiences 💠 ​UIs for multi-modal AI ​ 💠​ apps built around conversational dynamics 💠​ art projects 💠​ something else ... The focus is on using Open Source tools to build real-time, multi-modal, conversational AI applications. We want to see amazing things. Bring projects you're already working on to get new eyes and collaborators. Start something new. Come to meet people and get inspired. We'll be interviewing startups and AI leaders during the hackathon and posting the interviews. So sign up if you want to demo your real-time AI-related product or project to the broader community, too.
3
685
👏🏽
Most voice AI agents fall apart in noisy environments, especially with cross talk, music, or a TV blaring. At Vocality, we've spent the last 6 months making sure ours doesn't. Here’s a live demo of our medical interpeter and a short thread on what worked, didn't work, and surprised us 🧵
3
934
Thinking of you @aconchillo 💛 Thank you for all that you do for the Pipecat community.
0.0.48 @pipecat_ai release out today. Lots of good stuff, including... ᓚᘏᗢ ‣ Server-side (in the Pipecat pipeline) support for @krispHQ's excellent audio processing models. These are commercial models, but if you need state of the art background noise reduction and speaker isolation, they're well worth paying for. Krisp client-side models are available for free in the @trydaily Web, iOS, and Android SDKs. But for telephony use cases, or if you want to minimize CPU use of your client apps, you can now run Krisp models server-side. ᓚᘏᗢ ‣ A new @tavus video avatars pipeline element. Clone yourself. Talk to yourself. Clone your friends. Talk to your friends. Introduce your clone to the clones of your friends and see what happens. ᓚᘏᗢ ‣ Output audio mixers. Add music or background noise to your voice AI agents. See `examples/foundational/23-bot-background-sound.py` ᓚᘏᗢ ‣ New frame processor input queues make it easier to sequence things like TTS speech fragments. @aconchillo named this release after his father, who is in the hospital right now. We are thinking about Aleix and his family.
3
1,187
with v0.0.24 you can now route calls to pipecat voice bots through any provider that supports SIP connections. if you already have @twilio phone numbers and workflows, this is purr-fect for you.
3
922
Pipecat panel at GTC on Wednesday ...
I'm at GTC to be on a panel about conversational voice and video. Wednesday at 11:00 in SJCC 230B (L2). Transforming Experiences with AI Agents and Digital Humans [S73441] I brought copies of the "Illustrated Guide to Voice AI" that we wrote for the @aiDotEngineer summit in NYC. We gave out every copy we had at the Summit, so we did another print run for GTC. I don't think we'll ever print any more, so if you want one, come to the panel!
3
496
Tonight! Voice AI Meetup, Wed, 6:30p. Register at link in thread, for SF + livestream. Voice agents are live in production today, across a wide range of use cases. The voice AI community has put together core building blocks — turn detection and interruption handling, low-latency inference and processing, telephony, tool calling abstractions, etc. What else are we tackling now? What will help us deliver more value, improve the experience, expand use cases, improve the developer experience? @kwindla (Pipecat) moderates: ✳️ Side Project Launch ✳️ Panel with @jundesai @cartesia, @tarunipaleru @TeamHathora, @jpalioto @Google, @trydaily We’ll be talking about things like: - Coordinating between multiple agents - Incorporating thinking models into voice agents - Using sandboxes for long-running sub-agents - Dynamic user interfaces, and more s/o to @trychroma for hosting!
1
1
3
483
🙌
The team @trydaily have been great partners in our journey to build the fastest voice technology for developers. Check out Sonic in their open source conversational AI framework at @pipecat_ai!
1
663
@GroqInc + @AIatMeta + @pipecat_ai + @trydaily = real-time Voice AI with 405B and tool use!
We are proud to collaborate with @trydaily on real-time voice #AI. Check out enterprise voice workflows, such as this healthcare patient intake demo running on #Llama 3.1 405B by @AIatMeta! hubs.la/Q02HF_2k0 #VoiceAI #Meta #Inference #LLM #Llama3
2
52
ᓚᘏᗢ // 0.0.67
1
2
971
ᓚᘏᗢ // 0.0.70
1
2
893
Meow can see it now, the Pipecat @covaldev vest collab
1
1
34
ᓚᘏᗢ v0.0.49
1
939
🕵️‍♀️
Did you know that the "cat" in Pipecat doesn't actually refer to a cat? I think it's a very easy one... but does anyone know what it could be referring to? @pipecat_ai
1
658
Replying to @cairns
🙌
1
30
Meow meow!
We have achieved so much in @pipecat_ai! Thank you all! The community is amazing and keeps growing (1445 on Discord)! Pretty sure it's probably the most complete and powerful conversational AI orchestration framework out there (and this list is even missing a few things!):
1
1,378