Intelligence shaped by you

San Francisco
The no-nonsense chunking library. Lightweight, lightning fast and prepared to Chonk all your docs 🦛 ✨ chonkie.ai
2
5
22
5,070
New chonker unlocked!✨
Life update: Joined @ChonkieAI today 🦛
2
1
16
778
Get ready to CHONK... in style?! 🤩 Our little hippo 🦛 has been busy cooking up some seriously fresh Chonkie SWAG! Check it out: 🔗 shop.chonkie.ai
2
2
16
1,676
Hello Chonkers! Chonkie’s been spending the summer getting bigger. Today we’re launching integrations with MongoDB, Weaviate, and Pinecone along with a brand new Semantic chunker for smarter text splitting. 🧵
2
17
565
🤝 More Handshakes! Seamlessly connect Chonkie's chunking logic to: ✅ MongoDB ✅ Weaviate ✅ Pinecone Chonkie now supports ALL 7 major vector databases. Build wherever you want, chonkie’s got you
1
1
15
2,267
🧙‍♂️
12
375
But you can call me chonk 💞
Corporate secret: Chonkie's full name is Chonkius Maximus.
1
10
508
🚨 HOT NEW BLOG ALERT 🚨 @michael_chomsky 's "Semantic Chunking PDFs In The Cloud With Chonkie" shows how our friendly hippo makes PDF processing effortless for frontend devs! No infrastructure headaches, just results! ✨🦛 #Chonkie #RAG
1
11
161
Chonkie has crossed 22,000 downloads! A huge thanks to everyone who has joined us on this chonking journey! A lot more to come soon! :) 🦛📷🚀
1
9
320
Our hippo keeps growing!!! 🦛📈- Chonkie has crossed 50 thousand downloads! ✨ Thanks for supporting us on this journey. The next chapter is going to be even more exciting 👀 Now back to making the best chunker 🧑‍💻
9
698
🦛✨ Meet Chonkie Viz! Ever caught yourself squinting at print statements, trying to figure out where your chunks begin and end? Drawing ASCII art with dashes just to visualize your text splits? Our tiny hippo friend has a better way! Introducing a magical way to see your CHONKs in all their glory!
1
9
697
Chonkie ❤️ FlashRAG Chonkie joins the FlashRAG toolkit as its goodest boi! Now, you can easily develop state of the art RAG pipelines with the best chunker out there Check it out here! github.com/RUC-NLPIR/FlashRA… Happy Chonking🦛✨
10
1,346
Huge shoutout to @not_so_lain for presenting @ChonkieAI at AINS~ 🙌🚀
aaand that's a wrap 👏 had fun representing @ChonkieAI at the AI national summit this year, met lots of cool people and genuinely was impressed by lots of the projects I was judging
9
436
The Chonk is strong with this one. Happy #MayThe4th from Chonkie! 🦛✨ #StarWarsDay #Chonkie #AI
9
360
🦛 Chonkie v1.0.6 is HERE! ✨ This release is packed with exciting new ways to CHONK your data, including our first generative features! Get ready for some serious upgrades. 🦛 👩🏻‍💻 github.com/chonkie-inc/chonk… #RAG #AI
1
3
9
325
Merry Chonkmas everybody! This holiday season, Chonkie brings to you a new release featuring support for Late Chunking! 🧵 github.com/chonkie-ai/chonki…
1
2
9
781
❤️Happy (belated) Valentine's Day! Chonkie has grown to 35k downloads, all thanks to your love. 🙏 📈 To celebrate, we're thrilled to announce v0.5.0, our biggest release yet! 🎉 Our little hippo is now a certified heartthrob. 🦛💕 Highlights 🧵👇 github.com/chonkie-ai/chonki…
1
1
9
359
🎨 Happy Holi, CHONKers! 🦛✨ Just like Holi's distinct colors create beautiful patterns, our SemanticChunkers separate your text while preserving meaning! May your RAG be as colorful as your celebrations today! #Chonkie #HappyHoli #RAG
1
9
693
We have an exciting day of many announcements today. So without waiting any further, lets do this 🚀
8
114
Why chunk early when you can wait? 🤔✨ Late chunking helps preserve context across chunks, preventing loss of meaning. Michael Ryaboy breaks down how you can easily do this with Chonkie!🦛✨ 🔗 pub.towardsai.net/easy-late-…
3
8
332
Chonkie ❤️ txtai Chonkie is now available for chonking your text with txtai's increadible pipelines in just a few lines of code! Check it out here: neuml.github.io/txtai/ Sneak peak 🧵👇
1
1
8
237
🦛✨ Chonkie v1.1.2 is here! Explore fresh features like FileFetcher & TextChef, support for jina-embedding-4, and Azure OpenAI integration, plus crucial bug fixes. 🎉 Shoutout to new contributors @TaylorN15 and @real-jiakai! More details: github.com/chonkie-inc/chonk… #RAG #LLM #AI
6
264
@ChonkieAI is now integrated with Onyx! If you’re using onyx, show some love to Chonkie ❤️🦛
1
6
292
🧠 A Whole New Semantic Chunker! Chonkie's new semantic chunking better detect shifts in subtle meaning, leading to less erratic chunks.
2
8
572
Thrilled to see 🦛Chonkie✨ featured in @kalyan_kpl's excellent "LLM Engineering Toolkit" repo! Thanks for recognizing our work on making text chunking fast, lightweight, and effective for RAG applications. 🫶 Check out Kalyan's curated collection of top-tier tools 👇 #RAG #Chonkie #LLM
👨🏻‍💻 LLM Engineer Toolkit - Collection of 120+ LLM Libraries Category Wise LLM Engineer Toolkit repository contains a curated list of 120+ LLM libraries category wise. 🚀 LLM Training 🧱 LLM Application Development 🩸LLM RAG 🟩 LLM Inference 🚧 LLM Serving 📤 LLM Data Extraction 🌠 LLM Data Generation 💎 LLM Agents ⚖️ LLM Evaluation 🔍 LLM Monitoring 📅 LLM Prompts 📝 LLM Structured Outputs 🛑 LLM Safety and Security 💠 LLM Embedding Models ❇️ Others Repo - github.com/KalyanKS-NLP/llm-…
7
281
Write the chonkie documentation first, bro
I think I know what I’m calling the LLM I build now
6
305
Chonkie v1.2.0 also comes with performance improvements, better embeddings support, and more! Big thanks to our contributors for helping keep Chonkie a happy hippo ❤️ Check out the full release on GitHub. Happy chonking! ✨ 🦛 github.com/chonkie-inc/chonk…
7
200
Slowlie but surelie ☺️
5
212
Hello Chonkers – as some of you noticed, our tiny hippo had to go into hiding this last week. This thread by @minhash explains what happened. TL;DR: Threat of legal action forced us to take down the code and start fresh. Now, we are back and this time the hippo is here to stay. Same spirit, same mascot, and still the API you love. Check us out at github.com/chonkie-inc/chonk…
Some of you noticed Chonkie disappeared from GitHub over the last week or so. Chonkie is now public on Github at a new address: github.com/chonkie-inc/chonk… Today, we're finally ready to share what happened behind the scenes. It's been a wild ride. 🧵👇 #OpenSource #Chonkie #RAG
1
1
6
231
Your favorite hippo has been working on itself this past week! Chonkie v0.4.1 is out and brings with it a ton of bug fixes and a progreeeeessssss bar. Check out our full release here github.com/chonkie-ai/chonki… Happy chonking! 🦛✨
6
297
Chonkie TS is here! The best chonking hippo 🤝 The faster growing code ecosystem Run semantic, code, and recursive chunking all without leaving your app code
2
1
6
234
Chonk Chonk Chonk
5
74
Testing out RAG? Wanna see how good Chonkie is? Check out our playground at chonkie.cloud! Test out all our chunkers in real-time. Paste your text, pick a strategy, see the chunks, and run inference — all in one place. #TryBeforeYouChonk
1
1
5
257
Replying to @shreyash_nm
Yeah I got hobbies outside of chunking 🤘
5
38
Time to make your RAG pipeline as aesthetically pleasing as a pygmy hippo's waddle! 🦛💖 Hop on over and give it a try – because good CHONKs deserve to look good too! #ChonkieViz #RAGTools #TinyHippoApproved
1
5
86
There's a lot more in Chonki vO.5.0, including new `to dict` and `from_dict` functions and changes to the overlap refinery. 🚀 Upgrade now and experience the CHONKiest chunking yet! 🦛 💪 Happy CHONKing! ✨
4
86
🦛 Chonkie Loves to Surf! 🏄‍♀️🌊 SurfSense now powered by Chonkie! SurfSense delivers a fully customizable AI research agent—connecting public data with your personal knowledge base (Slack, Notion & more). The perfect match: SurfSense's research smarts + Chonkie's chunking power = AI research that actually understands your context! Check out SurfSense at surfsense.net/ #Chonkie #RAG #AI #SurfSense
3
1
5
720
🦛 Chonkie v0.4.0 is out! Meet our new RecursiveChunker—smarter text chunking that preserves document structure. 🧠 Plus: fixed chunk boundary bugs & enhanced semantic chunking. Try it out now 🚀 🔗 github.com/chonkie-ai/chonki… #LLM #RAG #AI #Chonkie
1
2
5
271
Merry Chonkmas!
1
4
88
Chonkie now supports 7 different chunkers in just 10 MB! #OurTeenyTinyMightyHippo
1
4
114
You can also get merch without hacking - shop.chonkie.ai
We hacked @ChonkieAI early in the batch and just got a t-shirt from @shreyash_nm as a bug bounty!
4
360
Check out our full release notes here: github.com/chonkie-ai/chonki… Happy Chonking! 🦛 ✨
4
77
Replying to @nizzyabi @djcows
Fr, she would’ve loved you more 🥰
4
34
As we are starting new, so is our star count. Our simple request - Chonkie was close to hitting the 3K star mark. Help us get back to that state. If you find Chonkie useful or interesting, please ⭐️ us on Github. It would the mean the world to us ♥️
4
70
🤝 Cohere Embeddings support added! 🌐 Chonkie now plays nice with Cohere Embeddings for SemanticChunker and SDPMChunker. More embedding options, more chonk!
1
4
75
Replying to @minhash
Just put the code in the bag, lil bro
4
43
🙌 Shoutout to @DavidMezzetti, who's been absolutely incredible in his support for Chonkie, building in open-source and making this integration in txtai! 🫶🦛 Check his phenomenal work out at NeuML at neuml.com/
1
4
81
Chonkie is a fast boi ⚡ Late chunking all of War and Peace took less than 30 seconds! 💨 🤯
1
4
96
👨‍🍳 Introducing the TextChef! 🥘 Load and preprocess your text and Markdown files with Chonkie's new TextChef class. Feed your chunkers with perfectly prepared text cuisine!
1
4
62
🐊 injest-anything ✨ A python library that allows you to ingest anything with no effort and use in your RAG, with the help of Chonkie 🦛✨ 👩🏻‍💻 github.com/AstraBert/ingest-…
1
4
174
Huge shoutout to all contributors, and a special welcome to @not_so_lain for their first contribution! 🙏 Check the full changelog for all the details & fixes! ✨ Go grab v1.0.6 and happy Chonking! 🦛 🔗 github.com/chonkie-inc/chonk…
1
4
162
🎯 include_delim="next" for delimiter-based chunkers! ↩️ Put delimiters like periods or newlines in the next chunk for better text splitting, especially useful for Markdown processing.
1
4
110
🆕 Introducing return_type="texts" for all chunkers! 📜 Get your chunks as simple lists of strings without the metadata overhead. Perfect for when you just need the text and nothing else.
1
4
60
🧠 Bring Your Own Token Counters 🧮 Pass in your own token_counter function to customize how chunkers determine token counts. Unleash the power of custom tokenization!
1
4
77
Honored to see Chonkie 🦛 as the exclusive chunking library in @kalyan_kpl's comprehensive "RAG Zero to Hero" repository! 😻 Thanks for recognizing our lightweight, blazing-fast chunking solution for RAG applications. 🚀 Explore Kalyan's excellent guide to mastering RAG at github.com/KalyanKS-NLP/rag-… #RAG #LLMOps #NLP
I'm happy to announce "RAG Zero to Hero Guide" "RAG Zero to Hero Guide" is as a comprehensive guide to learn RAG from basics to advanced. As of now, "RAG Zero to Hero Guide" includes 🚀 RAG Basics Course This course covers - What is RAG? - Why RAG? - How does RAG work? - RAG Benefits and Challenges - RAG Must Know Terms - RAG Roadmap - RAG Developer's Stack - RAG from Scratch - RAG with LangChain - Website RAG - YouTube Video RAG - Agentic RAG ❇️ RAG Toolkit This includes category wise libraries related to RAG. 🔰RAG Survey Papers This includes category wise collection of RAG survey papers. 📌 Repo link in the comments #rag #llms #nlproc
3
262
Introducing Chonkie Recipes. The hippo has been cooking 🦛🧑‍🍳 Recipes provide easy to use predefined configs to help you quickly get started. Wanna chunk some markdown text quickly? Got some text in Japanese and don't know what rules to set? No worries, Chonkie's got you 🤝
1
3
73
Replying to @Crumpton_ @gifdead
Thanks @Crumpton_ for telling our story so beautifully! ✨💖🥹
3
37
Coding convenience unlocked! 🔓 CodeChunker now supports auto language detection! Just feed it code, and it figures out the language. ✨ Specify manually if you need that extra bit of speed! 🦛
1
3
39
We've crossed 2,000 stars on GitHub!!! It's a big milestone for our mighty hippo, but the best is yet to come. 2025 here we come! 🚀🌕 Building with RAGs? Wondering what the best chunking library is? Check us out at chonkie.ai 😉
3
78
A great blog post by James Jackson on using RecursiveChunker along with his own Hierarchial Overlap Chunking (PR in review) that works wonderfully for your chunking needs! Great illustration on using Chonkie for Transcript RAG and how to make the most of it 🚀 🔗👇
1
1
3
125
Just install, import and CHONK with TXTAI and Chonkie! Just how Chonkie likes it~
1
3
79
The 🌏 is chunking! Are you?
3
94
Hello Chonkers! We just published our roadmap for Q1 2025. Chonkie 🦛 is growing fast, and we want your help in raising our mighty hippo 🚀 Take a look at our roadmap here github.com/chonkie-ai/chonki… and let us know what you think! Comments are always encouraged :)
1
3
415
We're giving $100 each to 7 users! Tell us how you use Chonkie and you might just get a little hippo hug (in cash). #TheGenerousHippo Fill out this form to get started👉 forms.gle/BQCwVE9YKBQUqMnH8
3
212
We’re gonna mog so hard on the ‘splitters’ looking like that 🚀
3
228
Replying to @shreyash_nm
Chonkie’s lore goes harder
3
61
Replying to @mod_setter
Thanks @mod_setter! We have the support of the best people in the community 🦛😊
3
20
Replying to @feng
Thanksss @feng! 🦛🫶🚀
2
137
Instead of segmenting data into fixed chunks upfront, late chunking postpones this decision until the latest possible moment—when more context is available. 🧠 This approach results in embeddings that are much richer in context, as they have "attended to" the entire context. ⚡
1
2
63
Late chunking with Chonkie is simple and powerful
1
1
2
149
Meet SlumberChunker 😴! Our brand new agentic chunker. It uses the power of generative models via our new Genie interface which connects to LLMs for ✨ S-tier ✨ chunk quality. 🦛
1
2
240
Read our full release notes here: github.com/chonkie-ai/chonki… Happy Chonking! 🦛 ✨
2
48
Recipes are currently supported by Recursive and Late chunkers. Got a recipe idea? Let us know or create a PR on our Hugging Face! #MoreCooksInTheKitchen
2
43
Using recipes is super simple! Simply find the recipe you want on our Hugging Face repo (huggingface.co/datasets/chon…) and pass it to your chunker!
1
2
52
Can we @mogpfp @ChonkieAI? 😳👉👈
1
2
155
As always, Chonkie is a speedy boi. Recursive chunking the Wikipedia article for Elvis Presley (the longest English Wikipedia article there is!) takes only 25 milliseconds!
1
1
2
258
Recursive chunking with Chonkie is a breeze! Simply load up the recursive chunker and pass your texts!
1
2
22
new art, who dis?
Replying to @minhash
petition to repost this 😸
1
128
Need speed AND quality? ⚡️ Say hello to NeuralChunker! It uses a fine-tuned BERT-like model for super fast, high-quality chunking. Second only to SlumberChunker! 🦛✨
1
2
50
Introducing Genie! 🧞 Our new system for integrating generative models & APIs (like Gemini!) seamlessly into Chonkie. Powers SlumberChunker and future generative magic! Requires [genie] install. ✨🦛
1
2
118
Replying to @gifdead @Crumpton_
This would not have been possible without you @gifdead! Sooooo happy with the acoustics and the final produce looks amazing~ 😊🤎
1
1
2
85
Happy chonking holidays everyone! Chonkie’s been hard at work this holiday season to bring you something exciting for the new year—we're happy to announce Chonkie v0.4.0 with support for recursive chunking! github.com/chonkie-ai/chonki…
1
2
49
Just a few lines of code and poof! 🎨 Your chunks transform into a beautiful pastel paradise~ Who knew text splitting could look this adorable? It's like giving your chunks a tiny hippo makeover! ✨ Each chunk gets its own cozy color home, making debugging feel like a playdate!
1
2
325
Replying to @nizzyabi @posthog
It's a tough job being this adorable, but someone's gotta do it! 🤷‍♀️ Thanks for noticing! You've got great taste! 😄✨
2
26
🦁❤️
2
66
Recursive chunking: It's like a Russian nesting doll! 🪆 Break down big chunks of text into smaller ones, then break those down even further! This hierarchy keeps all the context intact, making it perfect for feeding massive texts to AI models. 🧠
1
2
11
Thanks @not_so_lain for the shoutout! ❤️🚀🦛 We're so glad to have you building with us~
2
29
Replying to @aakash_thatte
You can find all the related code for it in the vizard files here~ github.com/chonkie-inc/chonk…
1
24
Replying to @michael_chomsky
Love and care ❤️ JK, we have a better minima detection for semantic shifts and candidate chunk embedding creation logic.
1
1
30
Didn't that owl die once? 🪦🦉 Let the poor owl rest a bit~
1
1
20
Simple to start, easier to to iterate. Create high quality chunks in just 5 lines of code
1
76