Bhavnick Minhas · Oct 5, 2025 · 4:44 AM UTC

Bhavnick Minhas

Bhavnick Minhas

@minhash

5 Oct 2025

pip install wife

627

74,209

Bhavnick Minhas · Nov 10, 2024 · 9:47 AM UTC

Bhavnick Minhas

@minhash

10 Nov 2024

🦛 Introducing Chonkie: The no-nonsense RAG chunking library that's lightweight, lightning-fast, and ready to CHONK your texts! 🔗 pypi.org/project/chonkie/ 👩🏻‍💻 github.com/bhavnicksm/chonki… A thread 🧵

194

53,139

Bhavnick Minhas · Feb 24, 2025 · 11:46 AM UTC

Bhavnick Minhas

@minhash

24 Feb 2025

Replying to @maharshii

If you’re a lucky 1, you’ll learn how to install Ubuntu after this

177

9,298

Bhavnick Minhas · May 25, 2024 · 9:51 PM UTC

Bhavnick Minhas

@minhash

25 May 2024

Something I learned about FSDP today is, If you can fit your model in one GPU, or even in one Node, then don't use FSDP. The communication overhead makes it slower and your MFUs would decrease, compared to DDP or PipelineParallelism. 🔗 medium.com/pytorch/pytorch-d…

156

87,085

Bhavnick Minhas · Nov 10, 2024 · 8:23 AM UTC

Bhavnick Minhas

@minhash

10 Nov 2024

Replying to @wordgrammer

If you want to do progress on low-level computing, take the left path. If you wanna build AI wrappers, take the right path. (NVidia GPUs have insanely high core counts compared to Mac; C+CUDA makes a pretty low-level stack too)

144

18,374

Bhavnick Minhas · Jun 6, 2025 · 7:35 AM UTC

Bhavnick Minhas

@minhash

6 Jun 2025

Replying to @nizzyabi

"When I get rich there will be signs" The signs:

3,091

Bhavnick Minhas · Nov 11, 2024 · 9:34 AM UTC

Bhavnick Minhas

@minhash

11 Nov 2024

Woah! 🤯 I'm absolutely blown away by all the love for 🦛Chonkie✨ - our smol but mighty Python chunking library! You're making this tiny hippo's (and my) heart grow bigger 💙 Thanks for every star, download, and CHONK of your support! Keep CHONKING 🫶 🔗github.com/bhavnicksm/chonki…

4,606

Bhavnick Minhas · Dec 28, 2022 · 4:58 PM UTC

Bhavnick Minhas

@minhash

28 Dec 2022

📰 #NLProc Paper Summary ”Transcending Scaling Laws with 0.1% Extra Compute” Post-training adaptation can cause significant increase in upstream and downstream performance with negligible compute requirement. 📜 abs: arxiv.org/abs/2210.11399 🧵🔻

ALT Paper Image

11,356

Bhavnick Minhas · Oct 26, 2025 · 3:59 AM UTC

Bhavnick Minhas

@minhash

26 Oct 2025

chat, I think @himanshustwts likes chonkie :)

3,902

Bhavnick Minhas · Sep 2, 2024 · 1:24 PM UTC

Bhavnick Minhas

@minhash

2 Sep 2024

Replying to @prajdabre

I am 24 and unhappy 💀 Striving for excellence but my skill issues and imposter syndrome don’t leave me alone. Haven’t built anything of high impact yet. What do I do, Raj Sensei?

48,662

Bhavnick Minhas · Oct 5, 2025 · 4:17 AM UTC

Bhavnick Minhas

@minhash

5 Oct 2025

@ryanvogel finally met @nizzyabi today. Best day of his life!

4,362

Bhavnick Minhas · Oct 14, 2024 · 9:27 PM UTC

Bhavnick Minhas

@minhash

14 Oct 2024

Cooking hard at @CohereForAI on the next big thingie — bringing people together has never been this rewarding

Sara Hooker

@sarahookr

14 Oct 2024

we are cooking 🔥🔥

9,388

Bhavnick Minhas · Nov 27, 2024 · 6:59 PM UTC

Bhavnick Minhas

@minhash

27 Nov 2024

Finally, after a while of just shifting docs, I've finally transferred them all to new docs! 🔗 docs.chonkie.ai

2,713

Bhavnick Minhas · Feb 5, 2025 · 6:34 PM UTC

Bhavnick Minhas

@minhash

5 Feb 2025

Replying to @prajdabre

Krutrim? More like Copy-Trim

2,945

Bhavnick Minhas · Oct 24, 2024 · 6:48 PM UTC

Bhavnick Minhas

@minhash

24 Oct 2024

🍳Introducing AyaMCooking—your multilingual AI sous chef that speaks 10 languages! Built with @CohereForAI's Aya Expanse, it's the perfect kitchen companion that lets you cook hands-free. 🔗github: [ github.com/bhavnicksm/AyaMCo… ]

9,454

Bhavnick Minhas · Sep 8, 2024 · 3:36 PM UTC

Bhavnick Minhas

@minhash

8 Sep 2024

Society if Google Colab received even 1% attention from Google

2,147

Bhavnick Minhas · Oct 28, 2024 · 11:50 AM UTC

Bhavnick Minhas

@minhash

28 Oct 2024

I believe time spent per day on coding is a bad metric for productivity As a ML Engineer, especially one who's closer to research than LLMOps, it get's really awkward to have someone watch me work Because to the outside observer, it looks like 80% of the time I am just doing nothing, staring at the ceiling or my research notebook, with the rest 20% spent in actual coding Most of my job is making proper decisions, really. Decisions that are to deal with some of the following questions: 1. What are the top problems to solve for the company? Which are manageable in the current timeframe? 2. Is the technical descision a proper one? Can we do it any other way? Is this the best we can manage? 3. Should this be using a LLM, or a SLM or a VLM? Can we fit it to the current deployement stack? 4. How do I test this hypothesis to answer the questions? Has anyone done research on this before? Whats the standard practice? And more... So, I don't measure the time I spend working for my company or actual coding since it is not reflective of the time I've spent considering, evaluating, and making that decision. After all, (imho) time spent to cut down tasks by asking the right questions is of higher value than time spent doing unnecessary work. That is not to say I work any less; I probably work a lot more, tbh (especially with me getting nerdsnipped and super into some research topics), though my point still stands. I would say this for ML Engineers, Researchers and in general Knowledge workers, I think Time spent is a bad metric of productivity.

2,431

Bhavnick Minhas · Dec 21, 2022 · 8:25 AM UTC

Bhavnick Minhas

@minhash

21 Dec 2022

Replying to @irinarish

Definitely something like a Open Source Chinchilla-optimal PaLM models of various sizes, or open-source InstructGPT (GPT 3.5) so we can have a free version of ChatGPT sooner! 💙🩵 P.S. Love what you're doing for the community! Thank you ❤️

3,761

Bhavnick Minhas · Dec 2, 2022 · 4:57 PM UTC

Bhavnick Minhas

@minhash

2 Dec 2022

📰 #NLProc Paper Summary "UL2: Unifying Language Learning Paradigms" Understanding the *BEST* pre-training objective for training LLMs and more... 📜 abs: arxiv.org/abs/2205.05131 🤗 HF: huggingface.co/google/ul2 👩‍💻 GH: github.com/google-research/g…

ALT UL2 Paper

Bhavnick Minhas · Oct 23, 2024 · 4:11 AM UTC

Bhavnick Minhas

@minhash

23 Oct 2024

As the Hindi Language Ambassador for Aya Expanse (@CohereForAI), I've spent the last few weeks rigorously testing its Hindi capabilities. I'm excited to share that the results have been remarkable! 🚀 Here are some of my favourite use cases 🧵 #AyaExpanse #MysteryBot

3,357

Bhavnick Minhas · Nov 12, 2024 · 8:22 AM UTC

Bhavnick Minhas

@minhash

12 Nov 2024

Replying to @maharshii

Making my lil hippo awesomer 🦛✨

Bhavnick Minhas

@minhash

10 Nov 2024

15,856

Bhavnick Minhas · May 12, 2024 · 3:53 AM UTC

Bhavnick Minhas

@minhash

12 May 2024

Real talk

François Fleuret

@francoisfleuret

11 May 2024

Calling SwiGLU an activation function is weird. It's a full-fledge parametrized gating layer.

7,054

Bhavnick Minhas · Nov 13, 2024 · 9:42 AM UTC

Bhavnick Minhas

@minhash

13 Nov 2024

🎉 CHONK ALERT! 🦛 🦛 Chonkie just hit 1000 stars & 2000 downloads in just 3 days from release! Our tiny hippo is making big waves in the RAG pond! Turns out people really like their text chunking to be smol but mighty 💪 Let's CHONK to infinity and beyond! 🚀

3,848

Bhavnick Minhas · Mar 17, 2025 · 12:00 PM UTC

Bhavnick Minhas

@minhash

17 Mar 2025

Everytime I talk to @prajdabre1 sensei, I get a burst of motivation to work harder! A bunch of really cool things are in motion 😄

4,283

Bhavnick Minhas · Dec 1, 2022 · 7:35 PM UTC

Bhavnick Minhas

@minhash

1 Dec 2022

My application to join the amazing @forai_ml community, led by the even more amazing @sarahookr, got accepted today! Super excited to become a part of this effort and generate some value! ✨ Hoping to connect with everyone in this awesome community :))

Bhavnick Minhas · Oct 21, 2024 · 9:48 AM UTC

Bhavnick Minhas

@minhash

21 Oct 2024

something is happening today folks 👀 #MysteryBot

2,972

Bhavnick Minhas · Oct 26, 2025 · 9:09 AM UTC

Bhavnick Minhas

@minhash

26 Oct 2025

Replying to @HarveenChadha

Proposal: Ask medical students and doctors for their notes to make the toughest OCR eval the world has ever seen

2,832

Bhavnick Minhas · Sep 8, 2024 · 11:13 AM UTC

Bhavnick Minhas

@minhash

8 Sep 2024

I go by minhash now btw

Raj Dabre

@prajdabre

31 Aug 2024

Replying to @minhash

Exactly and it's a nice hacker name. "I'm Minhash and my job is creating efficiency by eliminating redundancy"

3,910

Bhavnick Minhas · Apr 10, 2025 · 1:51 PM UTC

Bhavnick Minhas

@minhash

10 Apr 2025

Some of you noticed Chonkie disappeared from GitHub over the last week or so. Chonkie is now public on Github at a new address: github.com/chonkie-inc/chonk… Today, we're finally ready to share what happened behind the scenes. It's been a wild ride. 🧵👇 #OpenSource #Chonkie #RAG

GitHub - chonkie-inc/chonkie: 🦛 CHONK docs with Chonkie ✨ — The lightweight ingestion library for...

🦛 CHONK docs with Chonkie ✨ — The lightweight ingestion library for fast, efficient and robust RAG pipelines - chonkie-inc/chonkie

github.com

5,115

Bhavnick Minhas · Oct 23, 2024 · 2:53 AM UTC

Bhavnick Minhas

@minhash

23 Oct 2024

Remember all those #mysterybot hints? 😉 Introducing Aya Expanse by @CohereForAI - where language barriers become language bridges 🌉 SOTA multilingual NLP, now at your fingertips ✨ Watch the magic unfold 👇

2,487

Bhavnick Minhas · Sep 30, 2025 · 4:55 AM UTC

Bhavnick Minhas

@minhash

30 Sep 2025

Chonkie was at Times Square today 🫣

1,981

Bhavnick Minhas · Aug 6, 2024 · 4:46 PM UTC

Bhavnick Minhas

@minhash

6 Aug 2024

Replying to @karpathy @repligate

Old gen llms: "In my era, we would just make some sh*t up, things changed now, you gotta follow rules"

1,051

Bhavnick Minhas · Oct 22, 2024 · 2:12 AM UTC

Bhavnick Minhas

@minhash

22 Oct 2024

Did you know they got #mysterybot on WhatsApp now? 👀

1,365

Bhavnick Minhas · Nov 16, 2025 · 7:26 PM UTC

Bhavnick Minhas

@minhash

16 Nov 2025

Replying to @bekacru @lauradang0

thanks for saving me from the flood, I was about to drown if you hadn’t

2,023

Bhavnick Minhas · Nov 6, 2025 · 6:24 AM UTC

Bhavnick Minhas

@minhash

6 Nov 2025

a millie a millie a millie a millie a miilie

1,104

Bhavnick Minhas · Nov 17, 2024 · 1:36 PM UTC

Bhavnick Minhas

@minhash

17 Nov 2024

Chonkie v0.2 is out! 🦛✨ 👉Tiny 9.7MB footprint 👉Zero bloat –– one dependency 👉New batch processing support 👉Native TokenChunker batching 👉Fixed index labeling 👉 (slightly) Better docs The smol hippo got even smoler! 🔗 github.com/bhavnicksm/chonki… #RAG #Python #NLP

2,946

Bhavnick Minhas · Nov 26, 2024 · 2:19 AM UTC

Bhavnick Minhas

@minhash

26 Nov 2024

🧵 How I got upto 5x speed-up on SentenceChunker in Chonkie with Token Estimate Validate Loops (TEVL) A lot of the world works on Control Systems and negative feedback loops. Most PID controllers that control your kettles, inductions stoves and geysers work with negative feedback loops to maintain temperature. Even high-end espresso machines have PIDs. Feedback loops are amazing! And I happened to be inspired by one to speed up SentenceChunker by upto 5x. Chunkers, especially rule-based chunkers, like the SentenceChunker work based on few very common algorithms that are honestly capped at how much you can optimize them. The idea behind the sentence chunker is that you want to first split the text into sentences by a splitting algorithm, then group the sentences together till a particular chunk_size is reached, and then step a few sentences back till the chunk_overlap is achieved, to then repeat the grouping process. Naive or Brute-force approaches (which some well known packages use actually) add one sentence to a candidate chunk and run the tokenizer on the chunk to count, then add another and so on. You get the idea. Tokenization is usually the bottleneck so tokenize, check, tokenize, check — process gets really cumbersome and slow. That's why earlier in Chonkie, we would use pre-computation and caching (of a sort). Chonkie before this was using a linear O(n) algorithm where we would split the sentences and get the token counts for each sentence before hand to use for the entire grouping process, via a Scan-lookback styled algo (think prefix-sum). The disadvantage is that while O(n) the checking add, check, add, check process really adds up to the overhead. One simple optimization to do is, let's first calculate the sums of the tokens (again, think prefix sum) and then use Binary Search (which is O(log N)) to get to the ideal point. This would still be O(N) since we calculate the prefix sums but saves a little overhead for really really long texts. But all these are micro optimisations, when the realisation should be that tokenization is insanely slow! At least an order of 1000x slower than counting characters. But just because it's slow doesn't mean we can remove it altogether either and go to CharacterChunkers (very uncool). So we finally come to Token Estimates. Essentially, mimic the tokenizers average case behaviour by noticing a mean statistic (which could be calibrated based on the piece of text). Some academic text has longer character to token ratio and some childrens books would have it shorter so calibration makes it more effective. But essentially, we approximate the number of characters per token on average based on the tokenizer passed. For example, the average for GPT2 is ~6.38 while the average for LLama3 is ~6.57. And, then we use these stats to approximate the number of tokens in each sentence, group them up into chunks and before the final outputing, we validate with the actual tokenizer call on the entire chunk. This is important! What this does is, earlier, if we had a text with 100 sentences, we would have 100 calls, which you could only optimize so much with batching. But now, with a TEVL cycle, we only have tokenizer calls when we finally output the chunks. Which, in the example if it's grouping 5 sentences into a chunk, that implies 20 chunks. So we have 20 tokenizer calls. Reducing the calls by an order of 5 in the example. And that seems to be a pretty large speed boost. Almost 2-3x by itself. But feedback is also important. Because sometimes we overshoot or sometimes we undershoot. And because we wish the user to have accurate chunks, we add or subtract sentences post-validation phase to give the best chunk. This is slow again because we need to do extra tokenizer calls for these. So, we have feedback to reduce future calls if we notice a large discrepancy between the estimated and actual token counts and to iteratively get the estimate closer to the actual counts. The feedback mechanism seems to provide another 20-50% boost and generally is never slower than not having any feedback. So, it makes sense to use it. That's how we can get SentenceChunker at light speed! Thanks for reading 📖

900

Bhavnick Minhas · Oct 16, 2024 · 8:04 AM UTC

Bhavnick Minhas

@minhash

16 Oct 2024

Just around the corner folks, be on the lookout 👀 You really don’t want to miss this 🤫

Bhavnick Minhas

@minhash

14 Oct 2024

Cooking hard at @CohereForAI on the next big thingie — bringing people together has never been this rewarding

3,992

Bhavnick Minhas · Nov 10, 2024 · 12:02 PM UTC

Bhavnick Minhas

@minhash

10 Nov 2024

This means a lot for us smol accounts 🥹

3,293

Bhavnick Minhas · Jul 18, 2025 · 8:10 PM UTC

Bhavnick Minhas

@minhash

18 Jul 2025

@thdxr cooked hard with sst/opencode

886

Bhavnick Minhas · Oct 29, 2024 · 5:34 PM UTC

Bhavnick Minhas

@minhash

29 Oct 2024

Twas' a wonderful time collaborating with such a diverse and fabulous community of people leading to the launch I miss it; I can't believe its over 🥺💙

Cohere Labs

@Cohere_Labs

29 Oct 2024

We create breakthroughs together. ✨ Aya Expanse Ambassadors represent 45 countries and 23 languages. Before the launch of Aya Expanse, we invited 110 ambassadors to join us to shape how Aya worked for communities all over the world. 🌍

2,061

Bhavnick Minhas · Jun 19, 2025 · 10:53 PM UTC

Bhavnick Minhas

@minhash

19 Jun 2025

I'm excited to announce that i have joined @mail0dotcom to work on 🦛 chonkie email 🤗tysm to @nizzyabi for the opportunity

Beka

@bekacru

19 Jun 2025

I’m excited to announce that I have joined @mail0dotcom team to work on better-mail.

2,811

Bhavnick Minhas · Jun 8, 2025 · 8:37 PM UTC

Bhavnick Minhas

@minhash

8 Jun 2025

Heading out to pitch soon — wish us luck! 🍀

1,217

Bhavnick Minhas · Sep 3, 2025 · 2:49 PM UTC

Bhavnick Minhas

@minhash

3 Sep 2025

“There are probably 5 problems you can address fully in your lifetime” @sarahookr

1,546

Bhavnick Minhas · Jul 7, 2025 · 10:38 PM UTC

Bhavnick Minhas

@minhash

7 Jul 2025

Happy to announce that I turned a quarter of a century today! Here are 25 things I learnt about B2B SaaS:

1,131

Bhavnick Minhas · Dec 12, 2024 · 4:08 PM UTC

Bhavnick Minhas

@minhash

12 Dec 2024

Aya 🫶

435

Bhavnick Minhas · Oct 20, 2025 · 8:40 AM UTC

Bhavnick Minhas

@minhash

20 Oct 2025

AWS went down and now nothing is working

3,935

Bhavnick Minhas · Oct 24, 2024 · 6:38 PM UTC

Bhavnick Minhas

@minhash

24 Oct 2024

Huh? Aya Expanse gave notebooks to run with it?? I wonder what's this all about 👀 AyaMCooking sounds punny... I wonder🥸 🔗huggingface.co/CohereForAI/a…

1,127

Bhavnick Minhas · Oct 21, 2024 · 9:50 AM UTC

Bhavnick Minhas

@minhash

21 Oct 2024

👀🕵️‍♂️

Bhavnick Minhas

@minhash

21 Oct 2024

something is happening today folks 👀 #MysteryBot

1,205

Bhavnick Minhas · Feb 21, 2023 · 8:52 PM UTC

Bhavnick Minhas

@minhash

21 Feb 2023

I wonder if something LEAD to this development? 😂 Pun Intended. Excited to be working as a Community Lead for ML Efficiency at @forai_ml! I love this community and hope to get a lot more people interested in ML Efficiency 💪💙

8,683

Bhavnick Minhas · Nov 22, 2024 · 10:57 AM UTC

Bhavnick Minhas

@minhash

22 Nov 2024

🦛 The smol hippo is back with Chonkie v0.2.1! 🚀 Default SemanticChunker now uses Model2Vec (10x faster & lighter!) ✨ Added OpenAI embeddings support 💪 More powerful, still tiny! Your favorite RAG chunking library just got even better 🦛✨ 🔗 #RAG #LLM #Python

2,427

Bhavnick Minhas · Sep 5, 2025 · 5:58 PM UTC

Bhavnick Minhas

@minhash

5 Sep 2025

Hitting another milestone soon 👀

894

Bhavnick Minhas · Oct 21, 2024 · 12:34 AM UTC

Bhavnick Minhas

@minhash

21 Oct 2024

Weekends are for Mystery Bot and I to explore all the best momo spots 😋🥟 Hope you had a good one :)) #MysteryBot #RehesayaBot

3,499

Bhavnick Minhas · Oct 21, 2025 · 6:27 PM UTC

Bhavnick Minhas

@minhash

21 Oct 2025

Chonkie's gonna rise to the mooooon~🚀

Feyn

@FeynAI

21 Oct 2025

🎉 3,000🌟 for Chonkie! Huge thanks to the amazing community for all the support—here's to many more milestones together! #Chonkie #opensourcecode #Python

1,930

Bhavnick Minhas · Aug 5, 2025 · 6:24 PM UTC

Bhavnick Minhas

@minhash

5 Aug 2025

Gm! ❤️ Waking up to an OpenAI <> Chonkie Example is much Goals~ ✨

Shreyash Nigam

@shreyash_nm

5 Aug 2025

Good morning, OpenAI recommends @ChonkieAI for building agents cookbook.openai.com/examples…

1,187

Bhavnick Minhas · Nov 8, 2025 · 11:51 PM UTC

Bhavnick Minhas

@minhash

8 Nov 2025

You know, lowkey I dig this website

1,214

Bhavnick Minhas · Nov 24, 2024 · 5:40 PM UTC

Bhavnick Minhas

@minhash

24 Nov 2024

Woah! We hit 900 follows, lfg! 🥹 I’ve had a wonderful time here talking to some amazingly smart people, received a lot of love and support, and genuinely grown a lot over the past few months since I started posting — it’s been really gratifying to be here 🫶 Thank You!

413

Bhavnick Minhas · Oct 26, 2025 · 8:10 AM UTC

Bhavnick Minhas

@minhash

26 Oct 2025

when your training run completes but the model fails to save on disk because the disk is full

Yem🌹

@big_yemm

21 Oct 2025

Apart from breakup, what else can make a man be like this?

4,843

Bhavnick Minhas · Oct 22, 2024 · 1:40 AM UTC

Bhavnick Minhas

@minhash

22 Oct 2024

🩵 #mysterybot passing the mic to… Italian 🇮🇹

Sara Hooker

@sarahookr

22 Oct 2024

💙 #mysterybot passing the mic to ... Hindi.

4,700

Bhavnick Minhas · Jun 24, 2025 · 1:43 AM UTC

Bhavnick Minhas

@minhash

24 Jun 2025

Fully chonked up! 🦛✨

529

Bhavnick Minhas · Nov 9, 2024 · 6:59 AM UTC

Bhavnick Minhas

@minhash

9 Nov 2024

Replying to @lilianweng

So hyped for the blog posts to come (^^)

1,206

Bhavnick Minhas · May 13, 2025 · 3:03 PM UTC

Bhavnick Minhas

@minhash

13 May 2025

🦛 CHONK all the data in the world w/ @ChonkieAI 🗺️

Y Combinator

@ycombinator

13 May 2025

Chonkie (@ChonkieAI) is building the open source library for connecting your data to AI. Split unstructured data into optimized AI-ingestible chunks that boost your AI accuracy, improve app performance, and reduce token costs. ycombinator.com/launches/NUw… Congrats on the launch, @shreyash_nm and @minhash!

1,356

Bhavnick Minhas · Nov 15, 2025 · 6:04 PM UTC

Bhavnick Minhas

@minhash

15 Nov 2025

Us fr

Beka

@bekacru

15 Nov 2025

what is vip startup 👀

8,994

Bhavnick Minhas · Oct 22, 2024 · 1:27 AM UTC

Bhavnick Minhas

@minhash

22 Oct 2024

The true identity of the #mysterybot is… . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . …going to be revealed soon :) Meanwhile, try talking with it in WhatsApp at +1 (431) 302-8498

2,618

Bhavnick Minhas · Nov 19, 2024 · 3:45 PM UTC

Bhavnick Minhas

@minhash

19 Nov 2024

Chonkie got it's first tutorial! Let's gooooo Thanks to Fahid Mizra, he covered all the important points of Chonkie! 🔗👇

783

Bhavnick Minhas · Dec 13, 2024 · 11:29 AM UTC

Bhavnick Minhas

@minhash

13 Dec 2024

Replying to @maharshii

problems in the back of my head

891

Bhavnick Minhas · Nov 13, 2025 · 10:08 PM UTC

Bhavnick Minhas

@minhash

13 Nov 2025

introducing ceo driven development when the ceo says yes to a feature that doesn’t exist

989

Bhavnick Minhas · Jun 11, 2025 · 5:20 AM UTC

Bhavnick Minhas

@minhash

11 Jun 2025

Just got our logo updated from Apple Design — they call this the 💧 liquid glonkie 🦛 #WWDC25

415

Bhavnick Minhas · Jan 12, 2025 · 11:14 AM UTC

Bhavnick Minhas

@minhash

12 Jan 2025

When cooking your own embedding model, it's necessary to have a quick evaluation set to validate your ideas. That's what I was in need of when trying my own set of experiments, when I found @ZetaVector's NanoBEIR set. It's perfect! A subset of BEIR to validate ideas on~ Though one thing missing for me to use it was how correlated were scores on NanoBEIR to those of BEIR? I didn't find this metric on their blog, so I decided to calculate it myself with a few models. Generally, from what I see on a limited set of models that offered BEIR scores publically and calculating their NanoBEIR scores myself, the correlation is ~99%, which is great! The scores come out to be on the higher end usually, so that score can't be compared against BEIR score, but to check on what works and what doesn't, it's good enough. [ Then again, STSBenchmark scores are said to be ~70% correlated too—which was my previous "quick" evaluation set.

3,444

Bhavnick Minhas · Sep 3, 2025 · 6:25 AM UTC

Bhavnick Minhas

@minhash

3 Sep 2025

if you aren’t 3D printing your anime pfp character, what’s even the point?

963

Bhavnick Minhas · Oct 24, 2024 · 6:52 PM UTC

Bhavnick Minhas

@minhash

24 Oct 2024

This amazing work would not have been possible without my amazing hack partner @Sree_Harsha_N, who matched the energy and was down to hack from 11PM on a Friday Night continuously all the way to Saturday Night 1AM when we submitted it. We didn't even need to do it in one day xD But we did! TYSM and looking forward to cooking more stuff with you! 🙌🫶

Bhavnick Minhas

@minhash

24 Oct 2024

725

Bhavnick Minhas · Nov 11, 2024 · 7:05 PM UTC

Bhavnick Minhas

@minhash

11 Nov 2024

Got the dependencies down to 9.7MB guys! That's all!! You know what this means, right? (Chonkie would soon be thinner)

606

Bhavnick Minhas · Jun 19, 2025 · 7:19 AM UTC

Bhavnick Minhas

@minhash

19 Jun 2025

Replying to @thejustinguo

am i the only one seeing the similarities?

409

Bhavnick Minhas · Nov 11, 2024 · 11:03 PM UTC

Bhavnick Minhas

@minhash

11 Nov 2024

8x speed-up 👀 And it's literally so simple to attain... 😮‍💨

430

Bhavnick Minhas · Jan 13, 2025 · 4:18 PM UTC

Bhavnick Minhas

@minhash

13 Jan 2025

Putting Chonkie under the Flash! 📸📸 Super glad that we were able to integrate Chonkie into the FlashRAG library, as our first ever adoption of Chonkie~ Chonkie sped-up chunking for in FlashRAG by orders of magnitude and also came with lot more support out of the box! 📦⚡️

Feyn

@FeynAI

13 Jan 2025

Chonkie ❤️ FlashRAG Chonkie joins the FlashRAG toolkit as its goodest boi! Now, you can easily develop state of the art RAG pipelines with the best chunker out there Check it out here! github.com/RUC-NLPIR/FlashRA… Happy Chonking🦛✨

994

Bhavnick Minhas · Oct 16, 2024 · 9:44 AM UTC

Bhavnick Minhas

@minhash

16 Oct 2024

It really does feel good 😊

neo @niobiumneo

16 Oct 2024

@1vnzh probably doesn’t know what’s cooking at @CohereForAI, but I do… stay tuned…

968

Bhavnick Minhas · Sep 24, 2025 · 9:13 PM UTC

Bhavnick Minhas

@minhash

24 Sep 2025

Chonkie Cloud served about 50K requests with a 99.6% uptime in just the last month. Onwards and upwards! 🚀

1,397

Bhavnick Minhas · Nov 9, 2025 · 11:49 PM UTC

Bhavnick Minhas

@minhash

9 Nov 2025

> my data migrating to the 4th serverless pgsql db in a month

1,077

Bhavnick Minhas · Nov 11, 2024 · 3:02 PM UTC

Bhavnick Minhas

@minhash

11 Nov 2024

Feels like just yesterday when I was at 681, nostalgia is hitting hard 🥹 (Oh wait what?)

Bhavnick Minhas

@minhash

10 Nov 2024

Can we get 9 more chat?

701

Bhavnick Minhas · Dec 9, 2024 · 11:00 AM UTC

Bhavnick Minhas

@minhash

9 Dec 2024

We have hit a nice milestone

347

Bhavnick Minhas · Aug 30, 2024 · 8:49 PM UTC

Bhavnick Minhas

@minhash

30 Aug 2024

Applied ✅ Just the application process made me learn a lot about myself, somehow 💪

Sara Hooker

@sarahookr

29 Aug 2024

Applications for @CohereForAI scholars program close tomorrow. Something special about the program is our commitment that a research scientist or engineer will read every application.

3,848

Bhavnick Minhas · Aug 31, 2024 · 11:46 AM UTC

Bhavnick Minhas

@minhash

31 Aug 2024

I resisted… I persisted for so long but I couldn’t do it much longer — anime pfp set, just need to become 10x cracked now

1,887

Bhavnick Minhas · Oct 23, 2024 · 11:36 AM UTC

Bhavnick Minhas

@minhash

23 Oct 2024

I want to see Aya Expanse and Pangea to fight it out in the boxing ring for who’s the multilingual champion of the people 🥊🩳 (I support Aya Expanse obviously… 😁)

466

Bhavnick Minhas · Mar 18, 2025 · 12:36 PM UTC

Bhavnick Minhas

@minhash

18 Mar 2025

The silent battles I fight, no one knows about 💔🤐🥀

584

Bhavnick Minhas · Jun 25, 2025 · 1:55 AM UTC

Bhavnick Minhas

@minhash

25 Jun 2025

Twas a good talk!

Ben Clavié

@bclavie

25 Jun 2025

1,264

Bhavnick Minhas · Feb 5, 2025 · 6:38 PM UTC

Bhavnick Minhas

@minhash

5 Feb 2025

Replying to @prajdabre

Even if it’s MIT license, crediting the original repository or cloning it is part of good ethics 🙂

2,654

Bhavnick Minhas · Oct 21, 2024 · 11:23 AM UTC

Bhavnick Minhas

@minhash

21 Oct 2024

Hindi AGI has been achieved 🍛🩵

harsha @Sree_Harsha_N

21 Oct 2024

Sambhar idli or Masala Dosa? YOU DECIDE! #mysterybot

471

Bhavnick Minhas · Nov 4, 2024 · 12:13 PM UTC

Bhavnick Minhas

@minhash

4 Nov 2024

Tiktoken is way faster than Huggingface Tokenizers Tiktoken is almost 2-3x faster than Tokenizers (on avg) on my tests and, when every second and millisecond counts (which it does ofter) it does make a difference. People are really sleeping on this library fr 🤷🏽‍♂️

742

Bhavnick Minhas · Jul 8, 2025 · 8:09 AM UTC

Bhavnick Minhas

@minhash

8 Jul 2025

old skool cool vibes cause i'm an unc now

721

Bhavnick Minhas · Sep 1, 2025 · 6:19 PM UTC

Bhavnick Minhas

@minhash

1 Sep 2025

haters thought I couldn’t setup my 3D printer

297

Bhavnick Minhas · Nov 6, 2024 · 10:57 AM UTC

Bhavnick Minhas

@minhash

6 Nov 2024

Don’t kill yourself

442

Bhavnick Minhas · Oct 19, 2024 · 8:51 PM UTC

Bhavnick Minhas

@minhash

19 Oct 2024

Oh me and @Sree_Harsha_N cooked 🍳 We cooked so hard over the past 24 hrs, its just insane 💀

ALT I'M Sakamoto GIF

2,840

Bhavnick Minhas · Nov 10, 2024 · 8:46 AM UTC

Bhavnick Minhas

@minhash

10 Nov 2024

Replying to @wordgrammer

doesn’t seem like that big of an issue to me, ngl But then again, I’m just used to PyTorch I suppose

2,853

Bhavnick Minhas · Aug 6, 2025 · 11:16 PM UTC

Bhavnick Minhas

@minhash

6 Aug 2025

🚨BREAKING NEWS!!!🚨 Here's a random picture of Wednesday alongside chonkie who both had a new release today! Checkout Wednesday S2 on Netflix and Chonkie v1.1.2 on 👩🏻‍💻Github Happy coding~

693

Bhavnick Minhas · Nov 9, 2024 · 12:11 PM UTC

Bhavnick Minhas

@minhash

9 Nov 2024

Better chunking is literally a free lunch for RAGs and I love free lunches :) Putting my money where my mouth is, planning to (beta) release something tomorrow

1,366

Bhavnick Minhas · Nov 15, 2025 · 10:19 PM UTC

Bhavnick Minhas

@minhash

15 Nov 2025

there’s only one task

2,932

Bhavnick Minhas · Jun 11, 2025 · 5:23 PM UTC

Bhavnick Minhas

@minhash

11 Jun 2025

It’s x25 d-day!

460

Bhavnick Minhas · Jul 22, 2025 · 9:09 PM UTC

Bhavnick Minhas

@minhash

22 Jul 2025

Take your cofounder on dates, so the company keeps running well 🥰

330

Bhavnick Minhas · Nov 2, 2024 · 8:37 PM UTC

Bhavnick Minhas

@minhash

2 Nov 2024

I wrote a little excerpt on why chunking is needed in #RAG and may always be? (happy to accept feedback/criticisms)

499

Bhavnick Minhas · Jun 12, 2025 · 6:47 AM UTC

Bhavnick Minhas

@minhash

12 Jun 2025

Been a long time coming but we hit 3K downloads in 1 day and I literally can’t believe it 🥹 So many people use it every single day? I’m so grateful to everyone who has supported us so far and continues to do so I build and ship for you 🥰

477