Subham Sahoo · Oct 6, 2025 · 5:21 AM UTC

Subham Sahoo

Pinned Tweet

Subham Sahoo

@ssahoo_

6 Oct 2025

🎓 Officially a doctor now 😊!!! As a first-gen college kid, this moment means the world to me. Grateful beyond words to all my mentors who’ve guided me along the way — from @GMartius who first introduced me to research back in 2017, to @volokuleshov who sparked my love for generative modeling, and finally to @jwthickstun and @Jimantha for their incredible mentorship through the final stretch of my PhD. ❤️

1,652

102,614

Subham Sahoo · Sep 14, 2025 · 4:50 PM UTC

Subham Sahoo

@ssahoo_

14 Sep 2025

For a PhD, you need to be a romantic at some level. Your papers will get rejected. Your ideas will get scooped. All while you peers flourish. And yes--It will sting. 2023 was one such year for me. Yet I call it my golden year, because that’s when I truly fell in love with my research. I found my escape in it.

Hoang @hwangnamd

12 Sep 2025

Replying to @ssahoo_

What do you think would make a good PhD candidate? What specific traits do you see in a smart/talented PhD? Would love to hear some feedbacks on your side

811

91,369

Subham Sahoo · Jun 13, 2025 · 11:59 PM UTC

Subham Sahoo

@ssahoo_

13 Jun 2025

🚨 “The Diffusion Duality” is out! @ICML2025 ⚡️ Few-step generation in discrete diffusion language models by exploiting the underlying Gaussian diffusion. 🦾Beats AR on 3/7 zero-shot likelihood benchmarks. 📄 Paper: arxiv.org/abs/2506.10892 💻 Code: github.com/s-sahoo/duo 🧠 Blog: s-sahoo.com/duo/ (1/8)

100

551

144,093

Subham Sahoo · Sep 12, 2025 · 5:20 AM UTC

Subham Sahoo

@ssahoo_

12 Sep 2025

As I wrap up my thesis, I can’t help but look back on the past year of working on Diffusion LLMs. People often ask me: why and how I got into this strange little world of discrete diffusion. I usually give the textbook answer: the kind you’d find in any random paper and make myself sound like some visionary who saw this whole field explode the way it did. Lie. A blatant lie. The truth is simpler: the day I first stumbled upon this topic, I felt like a kid again. Remember that toy you loved so much as a child that you couldn’t stop obsessing over? That’s what discrete diffusion felt like to me. My head exploded with possibilities. ⚡️2023 was my golden year. No. of papers published: 0. BUT I was working on things that made me genuinely happy-- no expectations, no thought of return. That was the year I learned almost everything I know about diffusion. The seeds of MDLM and Duo were planted in late Nov / Dec. I didn’t start formally working on discrete diffusion until early February'24, when SEDD came out and showed a lot of promise in this area. P.S. I’ll miss my desk, which had a gorgeous view of Manhattan.

408

37,853

Subham Sahoo · Oct 30, 2025 · 10:24 PM UTC

Subham Sahoo

@ssahoo_

30 Oct 2025

Overwhelmed by the number of Diffusion LLM papers? 🌊 Same here 😭 So I’m starting a Discrete Diffusion Reading Group (@diffusion_llms) with my favorite disciples @jdeschena and @zhihanyang_ ✨ We’ll cover everything—from theory to empirics, from language to molecules. Join us 👉 Google Group: groups.google.com/g/diffusio… webpage: d-llms.io Follow us @diffusion_llms

314

30,478

Subham Sahoo · Nov 11, 2025 · 4:02 AM UTC

Subham Sahoo

@ssahoo_

11 Nov 2025

✨New beginnings: I’ve joined the Institute of Foundation Models @llm360, where I’ll be leading research on diffusion-LLMs. 🚨Goals > Design frontier diffusion-LLMs > Advance these algorithms through fundamental research ✌️About to go on a hiring frenzy, so stay tuned.

274

20,897

Subham Sahoo · Jun 3, 2025 · 5:02 AM UTC

Subham Sahoo

@ssahoo_

3 Jun 2025

🚨 [New paper alert] Esoteric Language Models (Eso-LMs) First Diffusion LM to support KV caching w/o compromising parallel generation. 🔥 Sets new SOTA on the sampling speed–quality Pareto frontier 🔥 🚀 65× faster than MDLM ⚡ 4× faster than Block Diffusion 📜 Paper: arxiv.org/abs/2506.01928 📘 Blog: s-sahoo.com/Eso-LMs/ 💻 Code: github.com/s-sahoo/Eso-LMs 🔬 Colab: colab.research.google.com/dr… 🤗 Hugging Face: huggingface.co/collections/s… Project co-led with @zhihanyang_ (1/9)

256

92,527

Subham Sahoo · Oct 27, 2025 · 8:01 PM UTC

Subham Sahoo

@ssahoo_

27 Oct 2025

🔥 Rethinking Reasoning (with Diffusion LLMs) This work changes how you think about reasoning in LLMs. 🤯 Turns out: you don’t need the full chain-of-thought — only a small subset of CoT tokens actually matter for the final answer. ❌ Autoregressive LLMs can’t exploit this since they must generate the entire CoT. ✨But MDLMs enable early-exit reasoning — predicting the answer without materializing the whole CoT. Huge congrats to the team behind this breakthrough: @zachary_horvitz @_rk_singhal @cdomingoenrich @Zhou_Yu_AI

228

15,485

Subham Sahoo · Sep 7, 2025 · 6:03 AM UTC

Subham Sahoo

@ssahoo_

7 Sep 2025

Pre-training for Diffusion LLMs will be solved in the next 6 months. ^That’s underestimating both myself and the community.

201

31,455

Subham Sahoo · Jul 14, 2025 · 1:33 AM UTC

Subham Sahoo

@ssahoo_

14 Jul 2025

Attending ICML ✈️Tues-Fri to present "The Diffusion Duality" 🗓️Wed, July 16 @ 4:30pm 📍East Exhibition Hall A-B (E-3003) DM if you want to chat about diffusion LMs, or my current work on Duality or Esoteric LMs!

Subham Sahoo

@ssahoo_

13 Jun 2025

156

10,302

Subham Sahoo · Nov 2, 2025 · 5:35 PM UTC

Subham Sahoo

@ssahoo_

2 Nov 2025

We’re building a space that connects researchers, students, and practitioners working on discrete diffusion. Join the Discord — collaborate, learn, and share! Whether you’re 💼hiring or showcasing your work, this is the place 👇 Discord: discord.gg/JxSCwpNb

Discord - Group Chat That’s All Fun & Games

Discord is great for playing games and chilling with friends, or even building a worldwide community. Customize your own space to talk, play, and hang out.

discord.com

Discrete Diffusion Reading Group

@diffusion_llms

2 Nov 2025

The Discrete Diffusion Reading Group is growing — 400+ members strong! We’ve launched a Discord for discussions, research ideas, help, and job opportunities. Join the conversation 👇 💬 discord.gg/JxSCwpNb 📧 groups.google.com/g/diffusio…

107

15,721

Subham Sahoo · Oct 11, 2025 · 1:28 PM UTC

Subham Sahoo

@ssahoo_

11 Oct 2025

We’re dropping “The Diffusion Duality, Chapter 2” soon! So, stay tuned 🤗

Sander Dieleman

@sedielem

10 Oct 2025

In diffusion LMs, discrete methods have all but displaced continuous ones (🥲). Interesting new trend: why not both? Use continuous methods to make discrete diffusion better. Diffusion duality: arxiv.org/abs/2506.10892 CADD: arxiv.org/abs/2510.01329 CCDD: arxiv.org/abs/2510.03206

10,917

Subham Sahoo · Nov 13, 2025 · 9:39 PM UTC

Subham Sahoo

@ssahoo_

13 Nov 2025

🚨“We have only one internet” (@ilyasut) — and that’s exactly why diffusion is the future of LLMs. 🔥Come for the hot takes, stay for @mihirp98’s deep dive at Monday’s @diffusion_llms reading group. ⏲️10 am ET (4pm CET)

5,225

Subham Sahoo · Oct 16, 2025 · 8:52 PM UTC

Subham Sahoo

@ssahoo_

16 Oct 2025

Impressive work by @jdeschena ! They propose to replace the Encoder only denoising transformer with an Encoder-Decoder architecture which leads to faster training and inference of MDLM.

Justin Deschenaux @jdeschena

16 Oct 2025

📢 « Partition Generative Modeling (PGM): Masked Modeling without Masks » is out! 🚯 Masked diffusion models waste FLOPs processing countless mask tokens that carry no real information. ⚡We show how partitioning can replace masking, boosting throughput by >5.3x on text and up to 7.5x on VQ-ImageNet! 📄 paper: arxiv.org/abs/2505.18883 💻 Code: github.com/jdeschena/pgm 🤗 Models: huggingface.co/jdeschena/pgm 1/9 🧵

7,532

Subham Sahoo · Nov 13, 2025 · 4:09 PM UTC

Subham Sahoo

@ssahoo_

13 Nov 2025

We've finalized the schedule for our weekly reading group starting this Monday, Nov 17th. Do join us and sign up if you haven't already.

Discrete Diffusion Reading Group

@diffusion_llms

13 Nov 2025

📅 Weekly meetings on Mondays starting November 17, 10–11 AM ET (4–5 PM CET). Details about our first session are coming soon! 🚀

5,382

Subham Sahoo · Nov 7, 2025 · 6:35 PM UTC

Subham Sahoo

@ssahoo_

7 Nov 2025

Now @elonmusk has joined the chat!

Elon Musk

@elonmusk

7 Nov 2025

Replying to @StefanoErmon @_inception_ai

Diffusion will obviously work on any bitstream. With text, since humans read from first word to last, there is just the question of whether the delay to first sentence for diffusion is worth it. That said, the vast majority of AI workload will be video understanding and generation, so good chance diffusion is the biggest winner overall. Also means that the ratio of compute to memory bandwidth will increase.

7,903

Subham Sahoo · Jun 12, 2024 · 2:33 AM UTC

Subham Sahoo

@ssahoo_

12 Jun 2024

Our work on text diffusion. paper: arxiv.org/abs/2406.07524 code: github.com/kuleshov-group/md…

Simple and Effective Masked Diffusion Language Models

While diffusion models excel at generating high-quality images, prior work reports a significant performance gap between diffusion and autoregressive (AR) methods in language modeling. In this...

arxiv.org

Aran Komatsuzaki

@arankomatsuzaki

12 Jun 2024

Simple and Effective Masked Diffusion Language Models Achieves a new SotA among diffusion models on a range of LM tasks and approaches AR perplexity repo: github.com/kuleshov-group/md… abs: arxiv.org/abs/2406.07524

8,607

Subham Sahoo · Aug 1, 2025 · 4:47 AM UTC

Subham Sahoo

@ssahoo_

1 Aug 2025

📢 @BytedanceTalk just dropped their diffusion LLM!!! And boy it's fast 💨 From their technical report, it seems like they are using MDLM (my research) 😊 lf3-static.bytednsdoc.com/ob…

1,882

Subham Sahoo · Mar 15, 2025 · 10:17 PM UTC

Subham Sahoo

@ssahoo_

15 Mar 2025

Honored to have been invited to the Research/PostTraining Round Table at @NVIDIAGTC! Thrilled that Diffusion-LMs are going mainstream. Here’s hoping the next generation of GPUs will supercharge both training and inference of these models.

16,815

Subham Sahoo · Jun 13, 2024 · 2:59 AM UTC

Subham Sahoo

@ssahoo_

13 Jun 2024

🔥Diffusion Models 𝐚𝐥𝐦𝐨𝐬𝐭 beat AR models on text generation. Presenting MDLM, a Masked discrete Diffusion Language Model featuring a Rao-Blackwelized ELBO which is a mixture of classical Masked Language Modeling losses and achieves SOTA results among all DMs. (1/10)

4,303

Subham Sahoo · Jun 17, 2025 · 5:30 PM UTC

Subham Sahoo

@ssahoo_

17 Jun 2025

Duo has 56.3K 🔥downloads on HF already, Jesus! Please find the colab notebook below to play around with the HF model. 🤗HugginFace model: huggingface.co/s-sahoo/duo-d… 🤗HuggingFace paper: huggingface.co/papers/2506.1… 🖥️Colab: colab.research.google.com/dr…

Subham Sahoo

@ssahoo_

13 Jun 2025

1,583

Subham Sahoo · Sep 29, 2025 · 7:21 PM UTC

Subham Sahoo

@ssahoo_

29 Sep 2025

📢 Excited to defend my PhD thesis: "Foundations of Diffusion Language Models" 🎓✨ 📅 October 3 | 11:30 am PT / 2:30 pm ET 🔗Zoom: cornell.zoom.us/j/9586300292… Topics covered: 1⃣ MDLM 2⃣The Diffusion Duality 3⃣Esoteric Language Models

Join our Cloud HD Video Meeting

Zoom is the leader in modern enterprise cloud communications.

cornell.zoom.us

5,421

Subham Sahoo · Nov 11, 2025 · 6:30 PM UTC

Subham Sahoo

@ssahoo_

11 Nov 2025

WHY DID I NOT KNOW ABOUT THIS!!!

Jon Barron

@jon_barron

11 Nov 2025

Just learned that tacking a * to the \operatorname latex tag causes it to underset its subscript, very handy.

2,320

Subham Sahoo · Jun 13, 2025 · 11:59 PM UTC

Subham Sahoo

@ssahoo_

13 Jun 2025

🎥 Sampling Viz: Duo vs MDM vs AR 🔥Notice how Duo self-corrects, unlike masked diffusion or AR. (7/8)

5,425

Subham Sahoo · Oct 2, 2025 · 6:58 PM UTC

Subham Sahoo

@ssahoo_

2 Oct 2025

Happening tomorrow at 2:30pm ET / 11:30 am PT

Subham Sahoo

@ssahoo_

29 Sep 2025

3,657

Subham Sahoo · Jul 22, 2025 · 6:05 AM UTC

Subham Sahoo

@ssahoo_

22 Jul 2025

📢 Duo and Eso-LMs at 2B scale on Slim Pajama These models will finish training in a few days. While HF release may take time due to corporate red tape, we'll try providing early access case-by-case. Email susahoo@nvidia.com with the subject “Early access”. Duo: s-sahoo.com/duo/ Eso-LMs: s-sahoo.com/Eso-LMs/

1,222

Subham Sahoo · Jun 13, 2025 · 11:59 PM UTC

Subham Sahoo

@ssahoo_

13 Jun 2025

😱 Discrete diffusion emerges from Gaussian diffusion. 🧮 The argmax operator maps a Gaussian latent to a discrete one. 🔄 The resulting transition dynamics match Uniform-state discrete diffusion. 📉 We prove the discrete ELBO is tighter — making discrete space preferable. Details in the paper 😊 (3/8)

4,135

Subham Sahoo · Aug 8, 2025 · 3:07 PM UTC

Subham Sahoo

@ssahoo_

8 Aug 2025

Honored to see MDLM featured in the tutorial 😊

Jia-Bin Huang

@jbhuang0604

8 Aug 2025

Diffusion LLMs are promising ways to overcome the limitations of autoregressive LLMs. Less error propagation, easier to control, and faster to sample! But how do Diffusion LLMs actually work? 🤔 Let's explore some ideas on this fascinating topic! piped.video/8BTOoc0yDVA

1,968

Subham Sahoo · Oct 13, 2025 · 8:43 PM UTC

Subham Sahoo

@ssahoo_

13 Oct 2025

Funny enough, after we released MDLM last year, @srush_nlp came up with the exact same idea!

Cai Zhou

@zhuci19

13 Oct 2025

(1/5) Beyond Next-Token Prediction, introducing Next Semantic Scale Prediction! Our @NeurIPSConf NeurIPS 2025 paper HDLM is out! Check out the new language modeling paradigm: Next Semantic Scale Prediction via Hierarchical Diffusion Language Models. It largely generalizes Masked Diffusion Models (MDM), and provides the progressively denoising capability for each token in the semantic level. Minimal computation overheads, much better results! arxiv: arxiv.org/abs/2510.08632 code: github.com/zhouc20/HDLM

2,511

Subham Sahoo · May 16, 2025 · 11:31 AM UTC

Subham Sahoo

@ssahoo_

16 May 2025

coming soon.

806

Subham Sahoo · Jun 13, 2025 · 11:59 PM UTC

Subham Sahoo

@ssahoo_

13 Jun 2025

🧩 Background Uniform-state discrete diffusion models can self-correct—unlike AR or masked diffusion (MDMs). ❌But they trail MDMs and AR in terms of perplexity. (2/8)

2,506

Subham Sahoo · Sep 19, 2025 · 7:24 PM UTC

Subham Sahoo

@ssahoo_

19 Sep 2025

And that’s where diffusion shines!!

Percy Liang

@percyliang

19 Sep 2025

-2016 (classic era): focus on data efficiency 2017-2025 (pretraining era): focus on compute efficiency 2026-: focus on data efficiency (again) The standard Transformer paradigm is optimized for compute efficiency. As we look at data efficiency, we'll see very different design decisions, which will be exciting!

2,037

Subham Sahoo · Nov 4, 2025 · 11:42 AM UTC

Subham Sahoo

@ssahoo_

4 Nov 2025

The term AGI gives me the same ick that “AI” did back in 2015. If it takes hundreds of billions of tokens just to get a respectable score on grade school math (GSM8K), that says everything about where we actually are.

991

Subham Sahoo · Nov 6, 2025 · 11:42 AM UTC

Subham Sahoo

@ssahoo_

6 Nov 2025

Please fill out your availability for the reading group

Discrete Diffusion Reading Group

@diffusion_llms

6 Nov 2025

As we get started with our discrete diffusion reading group, we’d like to schedule a recurring one-hour meeting time that works for everyone. Form: forms.gle/Xtogq4T7xuKBfFjr7 > Please fill out your availability in the Google form , and be sure to select your local timezone when setting your availability. > This will help us find a time that accommodates everyone across time zones. Once the responses are in, I’ll follow up with the finalized meeting time and our first reading.

3,321

Subham Sahoo · Jun 13, 2025 · 11:59 PM UTC

Subham Sahoo

@ssahoo_

13 Jun 2025

🎯 100× reduction in sampling steps via Discrete Consistency Distillation. ✨1024 → 16 steps: no quality/diversity loss ✨1024 → 8 steps: same quality, slight drop in diversity (6/8)

1,256

Subham Sahoo · Jun 3, 2025 · 5:02 AM UTC

Subham Sahoo

@ssahoo_

3 Jun 2025

Masked Diffusion Models (MDMs) are a strong alternative to autoregressive (AR) LMs—but they have two fatal flaws: 🐌 Slow: No KV caching = much slower than AR in practice 📉 Quality gap: Struggle on complex tasks, lower likelihood than AR (2/9)

1,679

Subham Sahoo · Jun 13, 2025 · 11:59 PM UTC

Subham Sahoo

@ssahoo_

13 Jun 2025

🤝 With amazing collaborators: @jdeschena , @SkyLi0n , @Guanghan__Wang , @justintchiu , @volokuleshov (8/8)

1,132

Subham Sahoo · Jun 13, 2025 · 11:59 PM UTC

Subham Sahoo

@ssahoo_

13 Jun 2025

🚀 Application 1: Faster Training + Better Likelihood We use this duality to create a curriculum: we start off with a Gaussian diffusion model and anneal it to a discrete diffusion model! 🤯 Our method: Duo ➡️ 2× faster training ➡️ Outperforms AR on 3/7 zero-shot likelihood benchmarks (4/8)

1,593

Subham Sahoo · Jun 13, 2025 · 11:59 PM UTC

Subham Sahoo

@ssahoo_

13 Jun 2025

⚡ Application 2: Few-Step Generation Few-step sampling in Gaussian Diffusion relies on PF-ODEs + consistency distillation. 🦾Discrete models lack such tools—until now. 🕺Our duality allows us to port consistency distillation to the discrete domain. cc:@DrYangSong (5/8)

1,406

Subham Sahoo · Oct 18, 2025 · 9:55 AM UTC

Subham Sahoo

@ssahoo_

18 Oct 2025

How do you even compute such probabilities?

Elon Musk

@elonmusk

18 Oct 2025

My estimate of the probability of Grok 5 achieving AGI is now at 10% and rising

1,815

Subham Sahoo · Jun 14, 2025 · 1:29 AM UTC

Subham Sahoo

@ssahoo_

14 Jun 2025

I tagged the wrong ICML FML 🤦🏽

Subham Sahoo

@ssahoo_

13 Jun 2025

1,201

Subham Sahoo · Jun 15, 2025 · 6:53 PM UTC

Subham Sahoo

@ssahoo_

15 Jun 2025

And in 2025 we unify discrete-space and Gaussian-space diffusion 😊

Ayan Das

@dasayan05

15 Mar 2025

How Diffusion unification went: > score based model > then DDPM came along > we have two formalism, DDPM & SBM > SDE came to unify them > now we have Score, DDPM & SDE > Then came flow matching to unify them > now we have Score, DDPM, SDE & Flow Models > Then consistency models came > now we have Score, DDPM, SDE, Flow & Consistency Models

960

Subham Sahoo · Jun 3, 2025 · 4:28 PM UTC

Subham Sahoo

@ssahoo_

3 Jun 2025

Replying to @Scobleizer

Thank you so much for the shoutout! Much appreciated 😊 More details: 📜 Paper: arxiv.org/abs/2506.01928 🧾 Blog: s-sahoo.com/Eso-LMs/ 🖥️ Code: github.com/s-sahoo/Eso-LMs

Esoteric Language Models: A Family of Any-Order Diffusion LLMs

Diffusion-based language models offer a compelling alternative to autoregressive (AR) models by enabling parallel and controllable generation. Within this family, Masked Diffusion Models (MDMs)...

arxiv.org

Subham Sahoo · Jun 3, 2025 · 5:02 AM UTC

Subham Sahoo

@ssahoo_

3 Jun 2025

BD3-LMs improve speed via blockwise semi-autoregressive diffusion. They partially support KV caching and outperform MDMs in speed. But… ⚠️ Mode collapse at low steps = bad samples 🧠 Partial caching only = intra-block KV still missing Speed & quality still compromised. (3/9)

1,284

Subham Sahoo · Dec 11, 2024 · 11:55 PM UTC

Subham Sahoo

@ssahoo_

11 Dec 2024

Excited to present our #NeurIPS2024 paper: "Simple and Effective Masked Diffusion Language Models" on Thurs at 11:30 a.m. in Hall A-C (#2505) 🔥Our method almost surpasses AR models in text generation 📜arxiv.org/abs/2406.07524 🔖s-sahoo.com/mdlm/ 💻github.com/kuleshov-group/md…

1,192

Subham Sahoo · Oct 20, 2025 · 4:52 PM UTC

Subham Sahoo

@ssahoo_

20 Oct 2025

Happy Diwali — from mine to yours ✨

882

Subham Sahoo · Jul 2, 2025 · 4:46 PM UTC

Subham Sahoo

@ssahoo_

2 Jul 2025

is it just me or the OpenReview is down for everyone?

1,645

Subham Sahoo · Jun 3, 2025 · 5:02 AM UTC

Subham Sahoo

@ssahoo_

3 Jun 2025

Training Innovations in Eso-LMs: 🔀 Half the batch → AR-style: Mask tokens see clean context + prior clean tokens → AR loss 📥 Other half → MDLM-style: Shuffled inputs, left = clean, right = masked + causal attention → MDLM loss 🏁 Outcome: ✅ Unified Denoising Model for AR and diffusion ✅ KV caching during diffusion (yes, really!) (5/9)

1,238

Subham Sahoo · Jun 3, 2025 · 5:02 AM UTC

Subham Sahoo

@ssahoo_

3 Jun 2025

We introduce a new LM paradigm that fuses AR and MDMs. 💡 Trained with a hybrid loss (AR + MDM), our model interpolates smoothly between both styles—balancing: ✅ Perplexity ✅ Sample quality ✅ Inference speed (4/9)

987

Subham Sahoo · Jun 3, 2025 · 7:27 PM UTC

Subham Sahoo

@ssahoo_

3 Jun 2025

🔥 Not sure if 2025 is the year of AGI, but it definitely belongs to Diffusion LMs. Dropping another banger next week — stay tuned. 👀💥 #DiffusionLMs #NLP #GenerativeAI #LLMs

654

Subham Sahoo · Jun 3, 2025 · 5:02 AM UTC

Subham Sahoo

@ssahoo_

3 Jun 2025

Inference time Innovations: ⏩ Reformulated ancestral sampling: Only do forward pass on scheduled MASK + clean tokens—not the whole sequence = Massive FLOP savings 💾 🧠 Thanks to any-order autoregressive training, KV of clean tokens is cacheable! 🏁 Outcome: 🚀 65× faster than MDLM ⚡ 4× faster than Block Diffusion (6/9)

777

Subham Sahoo · Jul 7, 2025 · 9:27 PM UTC

Subham Sahoo

@ssahoo_

7 Jul 2025

Replying to @jaschasd

Just reached out! Would love to chat about diffusion-LLMs with you 😊

Subham Sahoo

@ssahoo_

13 Jun 2025

1,530

Subham Sahoo · Jun 3, 2025 · 6:32 PM UTC

Subham Sahoo

@ssahoo_

3 Jun 2025

Replying to @iScienceLuvr

And theirs will be Esoteric

Subham Sahoo

@ssahoo_

3 Jun 2025

484

Subham Sahoo · Apr 27, 2025 · 5:26 AM UTC

Subham Sahoo

@ssahoo_

27 Apr 2025

Replying to @RickyTQChen

Looking forward to it! I have an oral presentation there where I’ll present our work “The diffusion duality” where we unlock few-step generation in diffusion language models. Hopefully you’ll like it 😊 s-sahoo.com/duo/

The Diffusion Duality

Unlocking Few-Step Generation in Discrete Diffusion Language Models

s-sahoo.com

435

Subham Sahoo · Jun 18, 2020 · 8:46 PM UTC

Subham Sahoo

@ssahoo_

18 Jun 2020

Replying to @geoffreyhinton

We published a paper at #ICML where we used periodic functions along with 1 / x as activations to perform symbolic regression. Doing this helped the NN to generalize to unseen domains (and it outperformed Eureqa). @GMartius arxiv.org/abs/1806.07259

Learning Equations for Extrapolation and Control

We present an approach to identify concise equations from data using a shallow neural network approach. In contrast to ordinary black-box regression, this approach allows understanding functional...

arxiv.org

Subham Sahoo · Jun 3, 2025 · 5:02 AM UTC

Subham Sahoo

@ssahoo_

3 Jun 2025

🥇 New SOTA on the Speed–Quality Pareto Frontier Eso-LMs redefine what’s possible: 🔁 MDLM-level perplexity at high speed ✍️ AR-level perplexity when needed ❌ No mode collapse at low steps — unlike Block Diffusion One model. Full control. P.S. Low Gen PPL = High sample quality (8/9)

755

Subham Sahoo · Oct 27, 2025 · 8:04 PM UTC

Subham Sahoo

@ssahoo_

27 Oct 2025

paper: arxiv.org/pdf/2510.19990 code: github.com/rajesh-lab/Reason…

885

Subham Sahoo · Jun 3, 2025 · 5:02 AM UTC

Subham Sahoo

@ssahoo_

3 Jun 2025

In collaboration with: @zhihanyang_ @yashakha Johnna Deepansha @ChengZhoujun Hector Liu @ericxing @jwthickstun @ArashVahdat (9/9)

676

Subham Sahoo · Jul 6, 2020 · 10:20 PM UTC

Subham Sahoo

@ssahoo_

6 Jul 2020

Our work uses gradient based methods to scale SMT solvers (#Microsoft z3) to analyze deep networks, such as inception, for model explanation. (1 / 3) @rishabhs #NeurIPS2020 #Google #GoogleAI

arxiv @arxiv_org

2 Jul 2020

Scaling Symbolic Methods using Gradients for Neural Model Explanation. arxiv.org/abs/2006.16322

Subham Sahoo · Dec 11, 2024 · 11:26 PM UTC

Subham Sahoo

@ssahoo_

11 Dec 2024

Excited to present our #NeurIPS2024 🌟spotlight🌟paper: "MuLAN: Diffusion Models with Learned Adaptive Noise" on Fri at 4:30 p.m. in Hall A-C (#2604) 📜arxiv.org/abs/2312.13236 💻github.com/s-sahoo/MuLAN 🔖s-sahoo.com/MuLAN/ w/ @SkyLi0n Chris @volokuleshov

762

Subham Sahoo · Jun 16, 2025 · 9:58 PM UTC

Subham Sahoo

@ssahoo_

16 Jun 2025

@sedielem Very kind of you to share our work; it's such an honor 😊 Not sure if you recall, but during my NeurIPS poster session, I briefly mentioned an idea about why adding Gaussian noise to one-hot vectors might be better than adding it to embeddings. It was because of this connection to Uniform-state diffusion.

741

Subham Sahoo · Jun 13, 2025 · 7:45 PM UTC

Subham Sahoo

@ssahoo_

13 Jun 2025

Replying to @iScienceLuvr

You can find the code and checkpoints here: s-sahoo.com/duo/

The Diffusion Duality

Unlocking Few-Step Generation in Discrete Diffusion Language Models

s-sahoo.com

483

Subham Sahoo · Jun 18, 2020 · 9:26 PM UTC

Subham Sahoo

@ssahoo_

18 Jun 2020

Replying to @danielbigham @geoffreyhinton

Learning Equations for Extrapolation and Control

We present an approach to identify concise equations from data using a shallow neural network approach. In contrast to ordinary black-box regression, this approach allows understanding functional...

arxiv.org

Subham Sahoo · Apr 9, 2025 · 5:12 AM UTC

Subham Sahoo

@ssahoo_

9 Apr 2025

Replying to @keenanisalive

The same holds for diffusion processes too! We show that discrete diffusion emerges from Gaussian diffusion in our paper “The Diffusion Duality” openreview.net/forum?id=CB0U…

The Diffusion Duality

In the context of language modeling, Uniform State discrete Diffusion Models (USDMs) hold the promise of faster generation due to their ability to self-correct. However, they are typically...

openreview.net

210

Subham Sahoo · Jun 15, 2025 · 7:08 PM UTC

Subham Sahoo

@ssahoo_

15 Jun 2025

.@keenanisalive From quantum mechanics, where the quantized energy states of electrons arise as solutions to continuous wave equations to the binary logic of digital circuits, fundamentally driven by smooth analog currents, discreteness has repeatedly and naturally emerged from an underlying continuum. In the following work, we show that a discrete diffusion process is, in fact, an emergent phenomenon of an underlying continuous Gaussian diffusion process. nitter.app/ssahoo_/status/1933675…

Keenan Crane

@keenanisalive

7 Apr 2025

We often use discretization to approximate continuous laws of physics, but it also goes the other way: You can use continuous equations to approximate the behavior of discrete systems! Here we'll see how electrical circuits can be modeled using the Laplace equation Δφ=0. [1/n]

825

Subham Sahoo · Jun 16, 2025 · 7:01 PM UTC

Subham Sahoo

@ssahoo_

16 Jun 2025

Replying to @_akhaliq

Thanks for the shoutout 😊For details see this thread:

Subham Sahoo

@ssahoo_

13 Jun 2025

464

Subham Sahoo · Sep 14, 2025 · 6:43 PM UTC

Subham Sahoo

@ssahoo_

14 Sep 2025

Replying to @kfountou

Indeed. Very fortunate to be in this place and I couldn't have asked for more.

2,799

Subham Sahoo · Oct 26, 2025 · 2:00 PM UTC

Subham Sahoo

@ssahoo_

26 Oct 2025

Replying to @huybery

Diffusion LLMs. 🔥Few-step generation in LLMs (ICML 25): s-sahoo.com/duo/ ✨Used by Byte Dance's Seed Diffusion (Neurips 24): s-sahoo.com/mdlm/

The Diffusion Duality

Unlocking Few-Step Generation in Discrete Diffusion Language Models

s-sahoo.com

937

Subham Sahoo · Jun 3, 2025 · 5:02 AM UTC

Subham Sahoo

@ssahoo_

3 Jun 2025

ESO-LMs seamlessly interpolate between MDLM and AR perplexities on OWT and LM1B (7/9)

595

Subham Sahoo · Jul 31, 2025 · 7:42 PM UTC

Subham Sahoo

@ssahoo_

31 Jul 2025

Replying to @GoogleResearch

Congrats @yashakha

580

Subham Sahoo · Jun 3, 2025 · 7:57 PM UTC

Subham Sahoo

@ssahoo_

3 Jun 2025

Replying to @jarridrb

We’re working on a benchmarking paper that evaluates existing diffusion LMs, highlighting where they excel and where they fall short. I’ll share it with you as soon as it’s ready :)

Subham Sahoo · Sep 14, 2025 · 9:59 PM UTC

Subham Sahoo

@ssahoo_

14 Sep 2025

Replying to @kohjingyu

Yeah, they are adorable

2,060

Subham Sahoo · Sep 9, 2025 · 6:58 PM UTC

Subham Sahoo

@ssahoo_

9 Sep 2025

Replying to @tw_killian

Congrats to the team!

274

Subham Sahoo · Jun 14, 2025 · 5:04 AM UTC

Subham Sahoo

@ssahoo_

14 Jun 2025

Replying to @bspectacledGOAT @ICML2025

no rest for the wicked 😊

467

Subham Sahoo · Jul 6, 2020 · 10:20 PM UTC

Subham Sahoo

@ssahoo_

6 Jul 2020

In this work we aim to find a minimal subset of input features relevant for the model’s prediction. Earlier approaches, which used these solvers for interpretability, were limited to small networks featuring a few thousand neurons. (2 / 3)

Subham Sahoo · Oct 31, 2025 · 1:30 AM UTC

Subham Sahoo

@ssahoo_

31 Oct 2025

Replying to @miguelsaavedra @diffusion_llms @jdeschena @zhihanyang_

It’s remote for now but we’ll consider in-person session depending on the number of participants

575

Subham Sahoo · Nov 4, 2025 · 9:17 PM UTC

Subham Sahoo

@ssahoo_

4 Nov 2025

Replying to @LucaAmb @Cornell

Thank you so much for having me 😊

784

Subham Sahoo · May 23, 2025 · 12:21 AM UTC

Subham Sahoo

@ssahoo_

23 May 2025

Replying to @giladturok

We do that in our ICML’25 paper: s-sahoo.com/duo/

The Diffusion Duality

Unlocking Few-Step Generation in Discrete Diffusion Language Models

s-sahoo.com

Subham Sahoo · Sep 12, 2025 · 9:29 PM UTC

Subham Sahoo

@ssahoo_

12 Sep 2025

Replying to @yingheng_wang

Omg, yesss!! Feels like it was yesterday. This was an unplanned PhD, ngl. I wanted to dropout of the program for the longest time haha. And Thank you so much for your kind words 😊😊

565

Subham Sahoo · Jul 6, 2020 · 10:20 PM UTC

Subham Sahoo

@ssahoo_

6 Jul 2020

The unique problem formulation leverages gradient information to partially encode the network which helps these solvers scale to networks with millions of parameters. (3/ 3)

Subham Sahoo · Jun 10, 2025 · 9:05 PM UTC

Subham Sahoo

@ssahoo_

10 Jun 2025

Replying to @jxmnop

Thanks for the RT!😊 Folks might find this useful: 📘 Blog: s-sahoo.com/Eso-LMs/ 📷 Code: github.com/s-sahoo/Eso-LMs

Esoteric Language Models

TWITTER BANNER DESCRIPTION META TAG

s-sahoo.com

203

Subham Sahoo · Jul 2, 2025 · 10:45 PM UTC

Subham Sahoo

@ssahoo_

2 Jul 2025

Ouch, my ego took a hit. Chemistry is a subject that can be gamed with rote learning, yet surprisingly, Gemini performs worse in it than in physics and math.

Deedy

@deedydas

2 Jul 2025

AI now beats every single human in the hardest college entrance exam in India, the IIT JEE. Bytedance silently published this result this week. The top scorer was Rajit Gupta with 332/360, but Google's Gemini 2.5 Pro was at rank 1 with 336/360.

757

Subham Sahoo · Sep 13, 2025 · 3:20 AM UTC

Subham Sahoo

@ssahoo_

13 Sep 2025

Replying to @NiJinjie

And I've learned from yours! Both you and @mihirp98 have done an amazing job with the scaling laws for MDMs in the data constrained regime.

Subham Sahoo · Jun 19, 2020 · 1:39 AM UTC

Subham Sahoo

@ssahoo_

19 Jun 2020

Replying to @vincesitzmann

You might find our work interesting (published at #ICML) where we used periodic functions along with 1 / x as activations to perform symbolic regression. Doing this helped the NN to generalize to unseen domains (and it outperformed Eureqa). arxiv.org/abs/1806.07259

Subham Sahoo · Jun 13, 2024 · 3:04 AM UTC

Subham Sahoo

@ssahoo_

13 Jun 2024

This work was done in collaboration with @mariannearr @SchiffYair @SkyLi0n Edgar @justintchiu @srush_nlp @volokuleshov. (10/10)

281

Subham Sahoo · Nov 7, 2025 · 11:32 PM UTC

Subham Sahoo

@ssahoo_

7 Nov 2025

Replying to @giannis_daras @elonmusk

Lmao 😂😂

114

Subham Sahoo · Oct 7, 2025 · 4:57 PM UTC

Subham Sahoo

@ssahoo_

7 Oct 2025

Replying to @ozgurgulerx @GMartius @volokuleshov

Thanks!! This is a good starting point: piped.video/WjAUX23vgfg?si=rGEs…

Simple Diffusion Language Models

Short tutorial on text diffusion.Simple and Effective Masked Diff...

youtube.com

146

Subham Sahoo · Oct 31, 2025 · 3:12 PM UTC

Subham Sahoo

@ssahoo_

31 Oct 2025

Replying to @PackBropagated @diffusion_llms @jdeschena @zhihanyang_

Thank you for your kind words! We look forward to having you 😊

236

Subham Sahoo · Oct 31, 2025 · 1:24 AM UTC

Subham Sahoo

@ssahoo_

31 Oct 2025

Replying to @mocutobi @diffusion_llms @jdeschena @zhihanyang_

Would love to have you! Please join the mailing list: groups.google.com/g/diffusio…

334

Subham Sahoo · Oct 31, 2025 · 1:21 AM UTC

Subham Sahoo

@ssahoo_

31 Oct 2025

Replying to @Debargha_ @diffusion_llms @jdeschena @zhihanyang_

We are in the process of figuring it out. Meanwhile, subscribe to the mailing list and we’ll keep you posted:)

372

Subham Sahoo · Oct 6, 2025 · 3:05 PM UTC

Subham Sahoo

@ssahoo_

6 Oct 2025

Replying to @sovon_haidar

Thank you so much!! I'll release a tutorial on that paper soon. Stay tuned 😊😊

263

Subham Sahoo · Jun 13, 2024 · 3:17 AM UTC

Subham Sahoo

@ssahoo_

13 Jun 2024

Replying to @arankomatsuzaki

Thank you @arankomatsuzaki for sharing our work. Here's a more detailed thread if you're interested😄

Subham Sahoo

@ssahoo_

13 Jun 2024

Subham Sahoo · Sep 15, 2025 · 1:49 PM UTC

Subham Sahoo

@ssahoo_

15 Sep 2025

Replying to @FoldMani

That’s the unfortunate reality

300

Subham Sahoo · May 16, 2025 · 4:18 PM UTC

Subham Sahoo

@ssahoo_

16 May 2025

Replying to @jdeschena

Looking forward to it :)

101

Subham Sahoo · Aug 8, 2025 · 3:04 PM UTC

Subham Sahoo

@ssahoo_

8 Aug 2025

Replying to @jbhuang0604

Thank you so much for covering MDLM 😊 In our ICML paper, we show that uniform state diffusion, a type of discrete diffusion, emerges from Gaussian diffusion—enabling few-step generation in diffusion language models. paper: s-sahoo.com/duo/ tweet:

The Diffusion Duality

Unlocking Few-Step Generation in Discrete Diffusion Language Models

s-sahoo.com

Subham Sahoo

@ssahoo_

13 Jun 2025

264

Subham Sahoo · Jun 14, 2025 · 1:42 AM UTC

Subham Sahoo

@ssahoo_

14 Jun 2025

Thanks for your interest😊More details here:

Subham Sahoo

@ssahoo_

13 Jun 2025

Subham Sahoo · Oct 31, 2025 · 12:42 AM UTC

Subham Sahoo

@ssahoo_

31 Oct 2025

Replying to @LouisaHempel @diffusion_llms @jdeschena @zhihanyang_

🙏

294

Subham Sahoo · Nov 3, 2025 · 11:56 PM UTC

Subham Sahoo

@ssahoo_

3 Nov 2025

Replying to @pranamanam

Many congratulations! You just couldn’t resist talking about work, could you?

517

Subham Sahoo · Dec 14, 2024 · 6:37 AM UTC

Subham Sahoo

@ssahoo_

14 Dec 2024

Replying to @NathanYan2012

Haha