Welcome to the diffusion era. We bet on parallel generation years ago, when it was a contrarian idea. It's great to see the industry arrive. Mercury 2 continues to lead the Pareto frontier for quality, speed, and cost among publicly available diffusion LLMs.
40
69
838
67,455
Next-Edit just shipped in @kilocode, powered by Mercury Edit 2, our diffusion LLM. It predicts your next change anywhere in the file, not just the tokens ahead of your cursor. Diffusion means the edit comes back fast enough to stay in flow. Tab to accept. Free for everyone through July 23, no key required. blog.kilo.ai/p/announcing-ne…
1
3
19
1,450
Thanks for the shout-out @radicalvcfund!
1
5
14
1,466
Inception retweeted
Next-Edit is live in Kilo, powered by Mercury Edit 2 from @_inception_ai. Autocomplete predicts the next few tokens ahead of your cursor. Next-Edit predicts your next actual edit anywhere in the file. Hit Tab to accept. And it's free for everyone for 30 days!
5
6
37
3,783
Welcome to the diffusion era. We bet on parallel generation years ago, when it was a contrarian idea. It's great to see the industry arrive. Mercury 2 continues to lead the Pareto frontier for quality, speed, and cost among publicly available diffusion LLMs.
40
69
838
67,455
Why are autoregressive LLMs still generating one token at a time? That question led to the breakthrough behind the first commercially available diffusion LLM. The team applied diffusion, the technical approach that transformed image and video generation, to text and code. The result: a dLLM that matched the quality of traditional speed-optimized autoregressive models while running 10x faster. Our CEO @StefanoErmon joined @YourProtagonist, host of the Fund/Build/Scale Podcast, to talk about the journey from lab to commercial product. Full episode linked in thread.
3
3
28
3,517
Inception retweeted
Excited to see Mercury 2 live on @baseten Mercury 2 delivers Groq/Cerebras-like speeds (>1000 tokens/sec) with quality comparable to speed-optimized models like Claude Haiku If you have latency-sensitive workloads we’d love to hear from you.
We are excited to announce that we have partnered with @_inception_ai to make Mercury 2 available on Baseten. This makes us the first inference platform to bring Inception’s diffusion LLM to production. Inception’s dLLM architecture fixes the bottlenecks of sequential token generation and can deliver 1,000+ tokens/sec on standard NVIDIA GPUs. Early users like @augmentcode have seen impressive results, such as an 82% reduction in latency and 90% cost savings, while maintaining high quality.
3
9
61
8,050
The fastest reasoning LLM is now in production on Baseten. Mercury 2 is a diffusion LLM, so it generates tokens in parallel and hits 1,000+ tokens/sec on @NVIDIAAI GPUs, speeds that used to require specialized hardware. @augmentcode is already using Mercury 2, cutting cost 90% and latency 82%. Proud to partner with the @baseten team to bring dLLMs to production.
We are excited to announce that we have partnered with @_inception_ai to make Mercury 2 available on Baseten. This makes us the first inference platform to bring Inception’s diffusion LLM to production. Inception’s dLLM architecture fixes the bottlenecks of sequential token generation and can deliver 1,000+ tokens/sec on standard NVIDIA GPUs. Early users like @augmentcode have seen impressive results, such as an 82% reduction in latency and 90% cost savings, while maintaining high quality.
5
11
114
12,567
We've been named to the @wef's 2026 Technology Pioneers community. Diffusion changed image generation, and Mercury 2 is doing the same for text and code. More from @StefanoErmon below.
Honored that @_inception_ai has been named to the @WEF's 2026 Technology Pioneers community. Diffusion reshaped image generation. With Mercury 2 we're bringing that leap to text and code. Grateful to the team and everyone who backed us early.
4
2
20
3,002
Inception retweeted
Excited to see Mercury 2 recognized by @ArtificialAnlys as the fastest model. Autoregressive models generate one token at a time, while diffusion LLMs refine many tokens in parallel. Mercury 2 shows what this unlocks in practice. artificialanalysis.ai/models
4
3
30
4,015
Autoregressive models generate text one token at a time. That sequential process becomes a major bottleneck at inference scale with: -memory-bound workloads -poor GPU utilization -growing infrastructure demands Diffusion LLMs work differently. Instead of generating tokens one at a time, Mercury refines multiple tokens in parallel, which is why diffusion models can achieve dramatically higher throughput. Part 2 from @StefanoErmon's keynote at @StartupGrind on why diffusion models are the future of LLMs.
4
7
39
2,504
The question is no longer just which model is the smartest. It’s which model is most efficient without sacrificing quality. The highest-volume AI workloads are bottlenecked by latency, token generation speed, and serving cost. Autoregressive models were not designed for that world. Part 1 from @StefanoErmon's keynote at @startupgrind on why diffusion models are the future of LLMs.
2
4
34
3,609
Inception retweeted
Replying to @DavidSHolz
That’s exactly the bet we’re making at @_inception_ai We’re already matching speed-optimized models from frontier labs on quality, while being faster and more cost efficient. That gap will only widen as we continue to scale.
2
9
112
11,644
Hiring our first Forward Deployed AI Engineer at Inception. We built the world's fastest reasoning LLM and the first commercially available diffusion LLM, Mercury 2. >1,000 tokens/sec on standard GPUs via diffusion, 10x faster than speed-optimized autoregressive models at comparable quality. Enterprise demand has outpaced what we can serve as a research-led team. You'll define how we run customer engagements, scope POCs, build evals, turn deployments into a flywheel for the next generation of models. Apply:jobs.gem.com/inception/am9ic…
6
10
160
9,850
Will the next decade of LLMs run on autoregression, or on diffusion? One of the top questions we got at MLSys this week. Part 6, the final part of our founder story series with @timt at @MenloVentures. Featuring @StefanoErmon, @adityagrover_, @volokuleshov
5
3
34
8,427
Day 2 at @MLSysConf. Thanks to everyone who came by yesterday. The conversations on diffusion for language, the future of language models, and what fast inference unlocks have been the highlight. Come find us at the booth today and meet the team behind Mercury 2. And join us tonight for drinks. 🔗 luma.com/9rw9lx31
2
23
2,563