Nathan Lambert · Jun 22, 2026 · 1:49 PM UTC

Nathan Lambert

Pinned Tweet

Nathan Lambert

@natolambert

Jun 22

TMax: An open RL recipe for terminal agents I’m very excited to get to share a new RL paper today that I got to have a small part in – a type of paper I suspect we’ll see much more of in the future. The key is that RL research is very different today, in mid-2026, than what most observers have in their context. The average conception of an RL paper is grounded in the RLVR revolution of early 2025, where many people could use vanilla RLVR libraries to hillclimb on math benchmarks. Crucially, this style of math work could be done on base models or fairly stably on already trained models. With agents, the tasks of focus are very hard, requiring complex tool-use, harnesses where the model automatically manages its history, and much more training to make smaller eval improvements. We’re shifting from a renaissance of RL study to rapidly needing to improve its empirical rigor and common community engagements. TMax is the best open data for hillclimbing on frontier terminal tasks. It’s been validated with rigorous experiments, and if the authors wanted to just form a “RL environments startup” they could probably sell it for millions of dollars. This data work is some of my favorite stuff to be around in my 2.5+ years at Ai2. As a general summary, the recipe is open data and recipe lessons from hillclimbing the Qwen 3.5 smaller, dense models on terminal tasks. These models are super hard to hillclimb in this area, as they’re already trained heavily on the task. The training is very infrastructure-dependent, and most of the RL innovations are more designed to make training stable than to improve the rate of learning. I strongly recommend this paper. I joke around that I was happy to be an author just so I had to read it twice! You can find Hamish’s thread sharing more here or read the paper here. You can click through to find the model weights, the data, and even some fun further artifacts to study like all the RL rollouts from a training run – where the model sometimes became aware that it was being tested. The biggest takeaway I have from following this work, and more of the work in the community, is how important recipe work is. Let me define “recipe work.” It is a style of paper that explains all the steps you need to make crucial model improvements – data, algorithm, codebase, pitfalls, etc. Getting started in meaningful RL experiments today is a substantial expense. There are a ton of companies, an entire industry emerging really, around the idea of taking open-weight language models and finetuning them with RL on your domain-specific tasks. What I see in many projects is that getting an initial baseline is very hard. This phase, which can cost weeks and anywhere from $10K to $1M+, feels like spinning your wheels (A fun fact is that an RL step on a model like Nvidia Nemotron 3 Ultra on Tinker costs $1K and a meaningful RL run would be hundreds of steps – credit Edward Hu). It takes a lot of time to get traction in learning signal on meaningful, hard RL tasks. What we need as a community is a way for people to study small ablations to established RL recipes, as most labs won’t have the resources to do it from scratch in a meaningful way. This is what I hope TMAX can be for terminal agents, or the start of. Yes the training jobs are expensive, as the paper documents a standard training job being 8 nodes of H100s (2 train 6 inference) for 2-3 days, but that is approaching something academics can study. The establishment of this recipe took O(100) of these training jobs to get right. This isn’t my first time trying to establish this direction. When we launched Olmo 3 we had the “RL Zero“ model families, which are clean RL runs from a base model on a certain domain. This type of recipe-dependent work is a clear indicator that meaningful post-training work today looks much more like pretraining work of years past. We need decision-making ladders, clear ways of seeing small improvements in the models, stability, and so on. Part of this is down to academic gatekeepers, who won’t reward a paper doing very clean empirical work to push a recipe 1-2% up. They’ll favor a “new algorithm” that matches results, or something sort of bogus. My hope is that we can have multiple, stable, clear recipes across agent types, so innovations can be tested more clearly in multiple domains. (If you’re working on this, please reach out – I’m happy to support if I can, but I likely can’t reply to every email). As a quick aside, the RL frameworks in vogue today seem to be SLIME and SkyRL. The libraries of choice have shifted throughout these seasons in RL, which further contributes to a form of fragility in the literature. A bit of continuity will go a long way. So, go read this paper. It’s a really great example of how seemingly simple data and infrastructure work can be very hard and impactful. It’s also got me looking for more applications of Divergence Proximal Policy Optimization (DPPO) as another small evolution to the best RL algorithms of the day, by virtue of being a bit more stable by improving token-level clipping.

Hamish Ivison @ ICML

@hamishivi

Jun 22

Trained some terminal agents with friends! Introducing Tmax, open RL terminal agent models. Under default settings and shorter length (65k) token budgets, tmax outperforms prior open work on terminal use. We are releasing all data+weights+rollouts publically!

641

157,592

Nathan Lambert · Jan 23, 2025 · 5:34 PM UTC

Nathan Lambert

@natolambert

23 Jan 2025

Meta is definitely not alone in this. And its normally overblown too.

170

827

9,370

3,176,244

Nathan Lambert · Oct 21, 2025 · 3:28 PM UTC

Nathan Lambert

@natolambert

21 Oct 2025

Airbnb CEO Brian Chesky: “We’re relying a lot on Alibaba’s Qwen model. It’s very good. It’s also fast and cheap... We use OpenAI’s latest models, but we typically don’t use them that much in production because there are faster and cheaper models.” The valley is built on Qwen?

123

342

4,403

1,124,869

Nathan Lambert · Nov 17, 2023 · 8:42 PM UTC

Nathan Lambert

@natolambert

17 Nov 2023

I'm happy to announce I'm the next CEO of OpenAI and we're going to start doing open source again

198

4,079

881,634

Nathan Lambert · Sep 5, 2025 · 10:09 PM UTC

Nathan Lambert

@natolambert

5 Sep 2025

The weirdest VC subsidizing of our time, 10% of the Anthropic series F goes to writers

151

3,555

290,152

Nathan Lambert · Oct 20, 2025 · 2:05 PM UTC

Nathan Lambert

@natolambert

20 Oct 2025

Life update, she said yes. 🤩👩‍❤️‍👨🐕‍🦺

275

3,379

178,226

Nathan Lambert · Apr 25, 2025 · 10:59 PM UTC

Nathan Lambert

@natolambert

25 Apr 2025

transparency in AI is dying: no evals, no release notes, just vibes and more bad naming

Sam Altman

@sama

25 Apr 2025

we updated GPT-4o today! improved both intelligence and personality.

115

2,686

234,851

Nathan Lambert · Apr 8, 2025 · 2:19 PM UTC

Nathan Lambert

@natolambert

8 Apr 2025

vagueposting is the worst part of the ai community here

william

@wgussml

8 Apr 2025

i just witnessed a new form of human computer interaction completely blown away by what’s coming in the next month

2,620

131,761

Nathan Lambert · May 22, 2025 · 4:40 PM UTC

Nathan Lambert

@natolambert

22 May 2025

Anthropic sliding into that code tooling company role instead of AGI race role

2,136

242,801

Nathan Lambert · Jun 5, 2025 · 11:54 PM UTC

Nathan Lambert

@natolambert

5 Jun 2025

In my recent trip, the waymo market in SF has converged to ~2-3x the wait time and ~2-3x the cost of uber because that's how much more people are willing to pay for Waymo.

1,915

159,269

Nathan Lambert · May 11, 2025 · 5:40 PM UTC

Nathan Lambert

@natolambert

11 May 2025

leaders in ai talk like there's some master plan but it's literally just this

105

1,868

68,530

Nathan Lambert · Oct 12, 2025 · 7:26 PM UTC

Nathan Lambert

@natolambert

12 Oct 2025

Recurring frontier lab gossip: OpenAI has best post-training/rl and has pushed it super hard on weaker pretraining. Gemini has spectacular pretraining. Making a reasoning model was super easy for them & OpenAI folks were surprised Anthropic? Secretive i guess.

1,784

277,493

Nathan Lambert · Jun 25, 2025 · 2:05 PM UTC

Nathan Lambert

@natolambert

25 Jun 2025

A sign Google is waking up is them dropping by far and away the biggest free plans across the industry. This time is for the Claude Code competitor. Super excited, next time can be them getting rid of "Gemini Advanced" or whatever it is. Free Gemini.

116

1,684

325,295

Nathan Lambert · May 20, 2025 · 7:55 PM UTC

Nathan Lambert

@natolambert

20 May 2025

Most important plot from IO today -- AI usage is skyrocketing. This is real.

124

1,584

148,896

Nathan Lambert · Dec 13, 2024 · 10:49 PM UTC

Nathan Lambert

@natolambert

13 Dec 2024

ILYA: "PRETRAINING IS DONE. WE ARE NOW IN THE POST TRAINING ERA."

1,512

199,422

Nathan Lambert · Sep 29, 2025 · 5:13 PM UTC

Nathan Lambert

@natolambert

29 Sep 2025

This is a way bigger deal than the Claude release.

OpenAI

@OpenAI

29 Sep 2025

ChatGPT already helps millions of people find what to buy. Now it can help them buy it too. We’re introducing Instant Checkout in ChatGPT with @Etsy and @Shopify, and open-sourcing the Agentic Commerce Protocol that powers it, built with @Stripe, so more merchants and developers can integrate agentic checkout.

1,479

309,260

Nathan Lambert · Apr 17, 2025 · 7:57 PM UTC

Nathan Lambert

@natolambert

17 Apr 2025

Props to Google for including O4-mini in their Flash 2.5 release. A model released *yesterday*, while some companies only compare to their own models. Looking good gemini.

1,399

73,474

Nathan Lambert · Dec 20, 2024 · 11:44 PM UTC

Nathan Lambert

@natolambert

20 Dec 2024

o3 does not use different training nor inference methods than o1 (at least in pro mode). No special "search". OpenAI just found a hill and very quickly started hillclimbing it. Excited to build an open-source one and prove this to you in 2025. interconnects.ai/p/openais-o…

o3: The grand finale of AI in 2024

A step change as influential as the release of GPT-4. Reasoning language models are the current big thing.

interconnects.ai

100

1,343

127,213

Nathan Lambert · Mar 24, 2025 · 2:27 AM UTC

Nathan Lambert

@natolambert

24 Mar 2025

Okay okay, spent my weekend gooning around learning GRPO math. Here's some takes. Essentially, this is me yapping through a recap of smaller details on how GRPO is implemented, what Dr. GRPO changes, why, DAPO, connections to PPO, aggregating batches... Reading list below.

168

1,366

123,097

Nathan Lambert · Feb 1, 2025 · 10:05 PM UTC

Nathan Lambert

@natolambert

1 Feb 2025

Since everyone wants to learn RL for language models now post DeepSeek, reminder that I've been working on this book quietly in the background for months. Policy gradient chapter is coming together. Plugging away at the book every day now. rlhfbook dot com

163

1,353

108,328

Nathan Lambert · Oct 16, 2025 · 1:59 PM UTC

Nathan Lambert

@natolambert

16 Oct 2025

The first fantastic paper on scaling RL with LLMs just dropped. I strongly recommend taking a look and will be sharing more thoughts on the blog soon. The Art of Scaling Reinforcement Learning Compute for LLMs Khatri & Madaan et al.

187

1,232

91,695

Nathan Lambert · Nov 22, 2023 · 11:57 PM UTC

Nathan Lambert

@natolambert

22 Nov 2023

TIN HAT TIME on what OpenAI is cooking with Q* RLHF To start, the hilarious quotes from the @Reuters article: "long-time executive Mira Murati told employees on Wednesday that a letter about the AI breakthrough called Q* (pronounced Q-Star), precipitated the board's actions." + "Given vast computing resources, the new model was able to solve certain mathematical problems... Though only performing math on the level of grade-school students, acing such tests made researchers very optimistic about Q*’s future success" Now, buckle up: OpenAI's new technology is Q* (Q-star), two prominent things: Q learning (RL algorithm) an A Star (search algorithm) 1. Q learning makes sense, the first notable RL algorithm with many variants used today, the tokens / word are states and some response is actions. 2. A star is a graph search algorithm known for saving results in memory as it goes. The post says "Given vast computing resources, the new model was able to solve certain mathematical problems" --> needs to store A TON of data in the new RLHF training. Search is important for making multi-turn optimization in training work??? Essentially, I guess applying the A* formula over Q-values for multi-turn reasoning. Why may this work really well and be hard? * Optimizing over multiple turns means more model forward passes in memory + more gradients * need this type of thing to solve hard math problems * really, it's prolly closer to RLAIF

141

1,154

517,490

Nathan Lambert · Apr 16, 2025 · 7:01 PM UTC

Nathan Lambert

@natolambert

16 Apr 2025

First draft online version of The RLHF Book is DONE. Recently I've been creating the advanced discussion chapters on everything from Constitutional AI to evaluation and character training, but I also sneak in consistent improvements to the RL specific chapter. rlhfbook.com/ RLHF has a long future ahead of it and this will do a lot to make it more accessible to the next generation. What's next: Getting a physical copy in your hands (may not be exactly 1to1, we'll see) and minor fixes at a slower cadence (thanks to many github contributors, some of you will get a copy from me). Here are all the chapters. 1.Introduction: Overview of RLHF and what this book provides. 2.Seminal (Recent) Works: Key models and papers in the history of RLHF techniques. 3.Definitions: Mathematical definitions for RL, language modeling, and other ML techniques leveraged in this book. 4.RLHF Training Overview: How the training objective for RLHF is designed and basics of understanding it. 5.What are preferences?: Why human preference data is needed to fuel and understand RLHF. 6.Preference Data: How preference data is collected for RLHF. 7.Reward Modeling: Training reward models from preference data that act as an optimization target for RL training (or for use in data filtering). 8.Regularization: Tools to constrain these optimization tools to effective regions of the parameter space. 9.Instruction Tuning: Adapting language models to the question-answer format. 10.Rejection Sampling: A basic technique for using a reward model with instruction tuning to align models. 11.Policy Gradients: The core RL techniques used to optimize reward models (and other signals) throughout RLHF. 12.Direct Alignment Algorithms: Algorithms that optimize the RLHF objective directly from pairwise preference data rather than learning a reward model first. 13.Constitutional AI and AI Feedback: How AI feedback data and specific models designed to simulate human preference ratings work. 14.Reasoning and Reinforcement Finetuning: The role of new RL training methods for inference-time scaling with respect to post-training and RLHF. 15.Synthetic Data: The shift away from human to synthetic data and how distilling from other models is used. 16.Evaluation: The ever-evolving role of evaluation (and prompting) in language models. 17.Over-optimization: Qualitative observations of why RLHF goes wrong and why over-optimization is inevitable with a soft optimization target in reward models. 18.Style and Information: How RLHF is often underestimated in its role in improving the user experience of models due to the crucial role that style plays in information sharing. 19.Product, UX, Character: How RLHF is shifting in its applicability as major AI laboratories use it to subtly match their models to their products.

207

1,201

90,674

Nathan Lambert · Jul 16, 2025 · 11:53 PM UTC

Nathan Lambert

@natolambert

16 Jul 2025

It is a major policy failure that the US cannot accommodate top AI conferences due to visa issues.

153

1,196

328,255

Nathan Lambert · Aug 8, 2025 · 1:30 AM UTC

Nathan Lambert

@natolambert

8 Aug 2025

I'm going to miss o3

1,150

106,618

Nathan Lambert · Apr 5, 2025 · 10:30 PM UTC

Nathan Lambert

@natolambert

5 Apr 2025

It’s very common for leadership at top labs to be in the know of other lab’s release schedules, so the simplest explanation to Meta releasing today is that next week is going to be bonkers, or at least plausibly outshine llama 4 (and they wanted to release last week, but other news was too big).

1,169

343,273

Nathan Lambert · Jun 11, 2025 · 3:13 AM UTC

Nathan Lambert

@natolambert

11 Jun 2025

Major reasoning models so far with technical reports (focused on those w RL): 2025-01-22 — DeepSeek R1 — arxiv.org/abs/2501.12948 2025-01-22 — Kimi 1.5 — arxiv.org/abs/2501.12599 2025-03-31 — Open-Reasoner-Zero — arxiv.org/abs/2503.24290 2025-04-10 — Seed 1.5-Thinking — arxiv.org/abs/2504.13914 2025-04-30 — Phi-4 Reasoning — arxiv.org/abs/2504.21318 2025-05-02 — Llama-Nemotron — arxiv.org/abs/2505.00949 2025-05-14 — Qwen 3 — arxiv.org/abs/2505.09388 2025-05-28 — Skywork Open Reasoner 1 — arxiv.org/abs/2505.22312 2025-06-04 — Xiaomi MiMo — arxiv.org/abs/2505.07608 2025-06-10 — Magistral — mistral.ai/static/research/m… Did I miss any?

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via...

General reasoning represents a long-standing and formidable challenge in artificial intelligence. Recent breakthroughs, exemplified by large language models (LLMs) and chain-of-thought prompting,...

arxiv.org

185

1,153

83,513

Nathan Lambert · Jan 20, 2025 · 4:38 PM UTC

Nathan Lambert

@natolambert

20 Jan 2025

For those trying to understand DeepSeeks Group Relative Policy Optimization (GRPO): GRPO is just PPO without a value function using monte carlo estimates of the advantage. So, study why PPO exists (lots of docs / writing on that) and understand that value functions are tricky with LLMs. Left ppo, right grpo

118

1,095

77,576

Nathan Lambert · May 27, 2025 · 4:58 PM UTC

Nathan Lambert

@natolambert

27 May 2025

immortalizing this moment forever when RL is so easy that you can just use random rewards and your benchmarks still go up smh

1,082

95,609

Nathan Lambert · Jun 18, 2025 · 1:27 PM UTC

Nathan Lambert

@natolambert

18 Jun 2025

Here's a recent talk I gave recapping the last 6-12 months of AI progress, why getting perfect models is hard, how labs are likely approaching the next phase of training (for agents), and other interesting tidbits across the reasoning landscape. Topics: 00:00 Introduction & the state of reasoning 05:50 Hillclimbing imperfect evals 09:18 Technical bottlenecks 13:02 Sycophancy 18:08 The Goldilocks Zone 19:28 What comes next? (hint, planning) 26:40 Q&A YouTube etc in replies. Thanks @corbtt and @OpenPipeAI for hosting me.

140

1,072

89,524

Nathan Lambert · Aug 17, 2025 · 3:49 PM UTC

Nathan Lambert

@natolambert

17 Aug 2025

A tier list of China's top 19 open model builders. Who did we miss? At the frontier * DeepSeek * Qwen Close competitors * Moonshot AI (Kimi) * Zhipu / Z AI Noteworthy * StepFun * Tencent (Hunyuan) * RedNote (Xiaohongshu) * MiniMax * OpenGVLab / InternLM * Skywork On the rise * ByteDance Seed * OpenBMB * Xiaomi (MiMo) * Baidu (ERNIE) Honorable Mentions * Multimodal Art Projection * Alibaba International Digital Commerce Group * Beijing Academy of Artificial Intelligence (BAAI) * inclusionAI * Pangu (Huawei) I learned a lot from these. We have so much more we need to do to understand how their AI ecosystem works.

Interconnects

@interconnectsai

17 Aug 2025

China's Top 19 Open Model Labs We ranked all the organizations in China releasing open models, from the top of DeepSeek to small, newer academic labs making waves with tech reports and niche models. interconnects.ai/p/chinas-to…

159

1,027

558,483

Nathan Lambert · Jan 26, 2025 · 8:22 PM UTC

Nathan Lambert

@natolambert

26 Jan 2025

DeepSeek app sitting at number 1 overall in the US Iphone App Store is not on my bingo card and is the biggest sign yet that the ChatGPT moat can maybe be cracked.

936

157,511

Nathan Lambert · Oct 1, 2024 · 4:06 PM UTC

Nathan Lambert

@natolambert

1 Oct 2024

I'm flying so I'm working on my mini RLHF book today :) rlhfbook dot com

936

87,845

Nathan Lambert · Mar 13, 2025 · 6:14 PM UTC

Nathan Lambert

@natolambert

13 Mar 2025

A very exciting day for open-source AI! We're releasing our biggest open source model yet -- OLMo 2 32B -- and it beats the latest GPT 3.5, GPT 4o mini, and leading open weight models like Qwen and Mistral. As usual, all data, weights, code, etc. are available. For a long time, people have asked for an truly open-source version of ChatGPT and we finally have it. This is multiple years coming into efforts following the release of ChatGPT and builds on the efforts of so many at both Ai2 and in the broader open AI ecosystem. With just a bit more progress everyone can pretrain, midtrain, post-train, whatever they need to get a GPT 4 class model in their class. This is a major shift in how open-source AI can grow into real applications. Oh yeah, it's also Apache 2 as always, so happy to make things that are simple to use. I did NOT expect to be undercutting OpenAI's offerings this year but here we are :D

146

933

99,666

Nathan Lambert · Aug 17, 2025 · 8:35 PM UTC

Nathan Lambert

@natolambert

17 Aug 2025

Since everyone loved my Chinese lab ranking based on open model releases, here’s the equivalent for the Western companies based on released open models aligned in tiers to comparable Chinese companies. We have no one comparable in the top two tiers.

Nathan Lambert

@natolambert

17 Aug 2025

882

238,363

Nathan Lambert · May 13, 2024 · 5:36 PM UTC

Nathan Lambert

@natolambert

13 May 2024

did OpenAI tell me to downgrade to a free plan today?

864

284,297

Nathan Lambert · Jul 7, 2022 · 5:51 PM UTC

Nathan Lambert

@natolambert

7 Jul 2022

I put together all the interview timelines, reflections, and advice from my job search. I focused on RL jobs, varying from applied ML engineer to research scientist! Lot's of people want to get a PhD and this is what getting a job after looks like! natolambert.com/writing/ai-p…

Job Hunt as a PhD in AI / ML / RL: How it Actually Happens

The full breakdown of what a job search in AI with a new Ph.D. looks like.

natolambert.com

121

884

Nathan Lambert · Jan 20, 2025 · 3:01 PM UTC

Nathan Lambert

@natolambert

20 Jan 2025

R1 making me feel very heard. Will read more thoroughly later. Laughs in continued shock that RL working like this.

895

86,467

Nathan Lambert · Apr 6, 2025 · 3:25 AM UTC

Nathan Lambert

@natolambert

6 Apr 2025

bruh what's on monday

Nathan Lambert

@natolambert

5 Apr 2025

880

190,641

Nathan Lambert · Jan 10, 2025 · 3:39 AM UTC

Nathan Lambert

@natolambert

10 Jan 2025

New export controls incoming, Bloomberg reporting: "But if an AI company wants to fine-tune a general-purpose open weight model for a specific purpose, and that process uses a significant amount of computing power, they would need to apply for a US government license to do so in a Tier 2 country." Controlling who in the entire world can finetune on what seems like a losing and generally bad proposition.

105

104

819

253,886

Nathan Lambert · Oct 17, 2025 · 8:59 PM UTC

Nathan Lambert

@natolambert

17 Oct 2025

New toy!

802

54,633

Nathan Lambert · Jul 19, 2025 · 1:23 PM UTC

Nathan Lambert

@natolambert

19 Jul 2025

Not falling for OpenAI’s hype-vague posting about the new IMO gold model with “general purpose RL” and whatever else “breakthrough.” Google also got IMO gold (harder than mastering AIME), but remember, simple ideas scale best.

860

117,054

Nathan Lambert · Apr 6, 2025 · 3:34 PM UTC

Nathan Lambert

@natolambert

6 Apr 2025

> be me > be zuck > need llama 4 to land > send a model/prompt to LMSYS to get a top1 score, cringe be damned > release a different model as "open source" > think people won't find out even with weights

802

136,131

Nathan Lambert · Feb 28, 2025 · 7:57 PM UTC

Nathan Lambert

@natolambert

28 Feb 2025

oh no

803

56,893

Nathan Lambert · Sep 7, 2025 · 2:39 PM UTC

Nathan Lambert

@natolambert

7 Sep 2025

I'm going to sound like a shill but I describe paying for better AIs right now as a way that you can "pay to win" in your career. Normally dynamics like this are restricted to video games.

finbarr

@finbarrtimbers

7 Sep 2025

I am very impressed by GPT-5 Pro. Had a bug in a script. Claude Code w/ Opus couldn’t find it after repeated attempts. dumped the problem and all relevant code into GPT-5 Pro and it found it first shot. Very impressive.

812

111,342

Nathan Lambert · Oct 29, 2025 · 5:19 PM UTC

Nathan Lambert

@natolambert

29 Oct 2025

I'd put good money on this being an high-impact finetune of one of the large, Chinese MoE models. I'm very excited to see more companies able to train models that suit their needs. Bodes very well for the ecosystem that specific data is stronger than a bigger, general model.

Cursor

@cursor_ai

29 Oct 2025

Introducing Cursor 2.0. Our first coding model and the best way to code with agents.

788

80,592

Nathan Lambert · Sep 12, 2025 · 2:39 AM UTC

Nathan Lambert

@natolambert

12 Sep 2025

They're showcasing the RL to prod pipeline as a good form of continual learning. Totally shocking if you had told me this just 12 months ago.

Cursor

@cursor_ai

11 Sep 2025

We've trained a new Tab model that is now the default in Cursor. This model makes 21% fewer suggestions than the previous model while having a 28% higher accept rate for the suggestions it makes. Learn more about how we improved Tab with online RL.

702

97,816

Nathan Lambert · Mar 20, 2025 · 5:16 PM UTC

Nathan Lambert

@natolambert

20 Mar 2025

FINALLY!!!

778

47,022

Nathan Lambert · Jul 10, 2025 · 1:25 PM UTC

Nathan Lambert

@natolambert

10 Jul 2025

Grok 4 coming soon after Llama 4 with a completely different trajectory should help people finally take in how important culture is to progress in technology generally and AI specifically. I don't agree with many of xAI's values but give full props to hard work.

786

56,163

Nathan Lambert · Nov 21, 2024 · 5:01 PM UTC

Nathan Lambert

@natolambert

21 Nov 2024

I've spent the last two years scouring all available resources on RLHF specifically and post training broadly. Today, with the help of a totally cracked team, we bring you the fruits of that labor — Tülu 3, an entirely open frontier model post training recipe. We beat Llama 3.1 Instruct at 8B and 70B on the tasks we focused on. So many things to share: New SFT data, recipes for scaling preference fine-tuning, a new RL optimization stage, extensive evaluation details, etc.

139

789

118,324

Nathan Lambert · Oct 4, 2023 · 3:57 PM UTC

Nathan Lambert

@natolambert

4 Oct 2023

Life update: Today's my last day at @huggingface after 1.5 years. It's been an awesome ride, I'm moving within the open RLHF space to have a slightly more research oriented role, continue to figure out what makes RLHF tick, share everything, and make some in-person friends. Some key lessons: * If you don't promote and communicate your work, no one else will. * It's harder to get visibility for your work than to do good work. People don't like this reality. * Open-source moves very fast, so it takes clever leadership and guidance to maximize collective effort. * Open-source ML is at it's very early days. We're figuring out what it means to do open-source ML. OSS will forever be changed. * Open-source succeeds in multiplicity. Just because someone is trying something similar does NOT mean you should stop. * RLHF is very, very under-explored. Leaders in the space have just tried a few more things. Please join us. Thanks to everyone who helped this be an awesome ride. I'm sure I'll still collaborate on many of the same projects.

774

151,435

Nathan Lambert · Mar 27, 2025 · 6:41 PM UTC

Nathan Lambert

@natolambert

27 Mar 2025

Gemini 2.5 Pro long context goes hard. First model to take an entire paper latex (15k tokens), tell it to ignore comments, find all typos. Does it perfectly. Even o1pro didn't even feel coherent on that!

766

57,885

Nathan Lambert · Aug 9, 2025 · 7:43 PM UTC

Nathan Lambert

@natolambert

9 Aug 2025

In some ways the GPT-5 release feels like the Llama 4 release. They just waited too long to get it out. Feels like some weirdness may be happening behind the scenes. Messy release in terms of presentation & technical details. Blip or trend for OpenAI?

754

176,506

Nathan Lambert · Sep 28, 2025 · 3:32 PM UTC

Nathan Lambert

@natolambert

28 Sep 2025

RL research is becoming like pretraining/modeling. This is a huge vibe shift. Most research published on RL isn't using enough compute to make many of these decisions matter as much. This is slowly shifting.

Tanishq Mathew Abraham, Ph.D.

@iScienceLuvr

28 Sep 2025

practical, modern GRPO tweaks as described in Meta's Code World Models paper

733

92,719

Nathan Lambert · Aug 15, 2025 · 10:08 PM UTC

Nathan Lambert

@natolambert

15 Aug 2025

GPT 5o has arrived.

OpenAI

@OpenAI

15 Aug 2025

We’re making GPT-5 warmer and friendlier based on feedback that it felt too formal before. Changes are subtle, but ChatGPT should feel more approachable now. You'll notice small, genuine touches like “Good question” or “Great start,” not flattery. Internal tests show no rise in sycophancy compared to the previous GPT-5 personality. Changes may take up to a day to roll out, more updates soon.

739

70,397

Nathan Lambert · Jul 27, 2025 · 1:44 PM UTC

Nathan Lambert

@natolambert

27 Jul 2025

I bet pretty soon a Chinese research org drops a LLM scaling laws for RL paper. Closed frontier labs have definitely done this and wont share it, academics havent mastered the data + infra tweaks yet.

742

67,569

Nathan Lambert · Jun 20, 2025 · 1:50 PM UTC

Nathan Lambert

@natolambert

20 Jun 2025

Mastering the GRPO math, its implementation details, and other policy gradient algorithms has made it way easier to understand new research on reasoning algorithms. Read the policy gradients chapter of the rlhf book. Studying pays off, I started writing before R1 was released.

747

50,364

Nathan Lambert · Nov 2, 2025 · 6:58 PM UTC

Nathan Lambert

@natolambert

2 Nov 2025

Claude 4.1 Opus > Claude 4.5 Sonnet

141

721

119,528

Nathan Lambert · Apr 5, 2023 · 11:03 PM UTC

Nathan Lambert

@natolambert

5 Apr 2023

Almost everyone I know working in AI these days feels one step away from total burnout. I took the time to take you behind the curtain and know what people on the state-of-the-art AI are struggling with: robotic.substack.com/p/behin…

132

688

338,055

Nathan Lambert · Apr 18, 2025 · 4:07 PM UTC

Nathan Lambert

@natolambert

18 Apr 2025

rlhfbook also available on arxiv for SEO 😀 happy friday arxiv.org/abs/2504.12501

Reinforcement Learning from Human Feedback

Reinforcement learning from human feedback (RLHF) has become an important technical and storytelling tool to deploy the latest machine learning systems. In this book, we hope to give a gentle...

arxiv.org

694

38,929

Nathan Lambert · Nov 27, 2024 · 7:51 PM UTC

Nathan Lambert

@natolambert

27 Nov 2024

Qwen's first o1 inspired model. Blog: buff.ly/3ZbWCpz Model weights: buff.ly/4i7AJjA

103

663

76,389

Nathan Lambert · Sep 29, 2024 · 5:44 PM UTC

Nathan Lambert

@natolambert

29 Sep 2024

NotebookLM and OpenAI Advanced Voice Mode feel like we have entirely new ways we need to learn how to work with AI/LLMs again. Normally, when this happens we unlock a bunch of value.

668

54,440

Nathan Lambert · Feb 15, 2025 · 3:55 PM UTC

Nathan Lambert

@natolambert

15 Feb 2025

Glad to see DeepSeek team members writing more papers than just their fun tech reports :) (maybe I just missed them in the past)

672

167,649

Nathan Lambert · Jul 30, 2025 · 1:36 PM UTC

Nathan Lambert

@natolambert

30 Jul 2025

New Zuck post, what a difference a few years makes: Today: "We'll need to be rigorous about mitigating these risks and careful about what we choose to open source." 2024: "Meta is committed to open source AI... and therefore a platform that will be around for the long term."

678

188,220

Nathan Lambert · Nov 24, 2023 · 3:15 PM UTC

Nathan Lambert

@natolambert

24 Nov 2023

The Q* hypothesis I can stand behind (from literature): 1. Tree of Thoughts reasoning: something to search over 2. Process reward models: rank all the steps of reasoning 3. GPT4 to score all vertices of the tree (RLAIF) 4. Q-learning to optimize 🚀 interconnects.ai/p/q-star

The Q* hypothesis: Tree-of-thoughts reasoning, process reward models, and supercharging synthetic...

Emergency special: The information we need to understand what Q* is was right in front of us, but the memes are more fun than reality.

interconnects.ai

122

650

358,592

Nathan Lambert · Oct 30, 2023 · 10:55 PM UTC

Nathan Lambert

@natolambert

30 Oct 2023

Excited! I've started as a Research Scientist at @allen_ai (on @ai2_allennlp) working on all things "RLHF research" - it encompasses a lot. Let's show the world that openness can foster broadly beneficial AI. I'm excited to work with everyone who wants to make that happen.

670

151,912

Nathan Lambert · Jan 20, 2025 · 3:55 PM UTC

Nathan Lambert

@natolambert

20 Jan 2025

hahahahah there were actually two technical reports for RL reasoning models today, kimi 1.5 also has good stuff on reward shaping + RL infra

680

54,998

Nathan Lambert · Apr 7, 2025 · 12:43 AM UTC

Nathan Lambert

@natolambert

7 Apr 2025

Models that are actually really really good. Way better than what we were using in 2024: Gemini 2.5 Pro o1 pro

Jack Morris

@jxmnop

6 Apr 2025

Recent AI model progress feels mostly like bullshit (2025)

664

68,272

Nathan Lambert · Jun 16, 2021 · 12:46 PM UTC

Nathan Lambert

@natolambert

16 Jun 2021

In a world before derivatives, Fermat solved optimization by assuming two converging points are not equal (so they don’t divide by 0 in algebra) but are approximately 0 so can be substituted. 🤯 The steps people took to discover modern math is crazy. Now we have autodiff.

622

Nathan Lambert · Jun 10, 2025 · 4:46 PM UTC

Nathan Lambert

@natolambert

10 Jun 2025

Zuck just buying Scale to cut off the competition from data obviously

653

55,911

Nathan Lambert · Oct 1, 2024 · 5:44 PM UTC

Nathan Lambert

@natolambert

1 Oct 2024

going to go on the record and say this is a bad idea

Aidan McLaughlin

@aidan_mclau

1 Oct 2024

4o vision fine-tuning enables autonomous driving

628

83,393

Nathan Lambert · Jan 30, 2025 · 2:08 PM UTC

Nathan Lambert

@natolambert

30 Jan 2025

Very happy to show that we can do RL finetuning on 405B models with open-source code, beat Llama 405B instruct with their base model, and beat DeepSeek V3 too. Enjoy building off this teams hard work. Here's Tulu 3 405B. A holiday present from @hamishivi, @vwxyzjn and team.

Ai2

@allen_ai

30 Jan 2025

Here is Tülu 3 405B 🐫 our open-source post-training model that surpasses the performance of DeepSeek-V3! The last member of the Tülu 3 family demonstrates that our recipe, which includes Reinforcement Learning from Verifiable Rewards (RVLR) scales to 405B - with performance on par with GPT-4o, and surpassing prior open-weight post-trained models of the same size including Llama 3.1

ALT The logo for Tülu 405B.

659

74,015

Nathan Lambert · Mar 31, 2025 · 4:05 PM UTC

Nathan Lambert

@natolambert

31 Mar 2025

I hear people are pretty into GRPO and RL these days, so I wrote up a pretty comprehensive research survey of recent papers I liked. Kimi 1.5, OpenReasonerZero, DAPO and Dr. GRPO. + discussion on if GRPO is special and further reading. interconnects.ai/p/papers-im…

Recent reasoning research: GRPO tweaks, base model RL, and data curation

The papers I endorse as worth reading among a cresting wave of reasoning research.

interconnects.ai

665

76,135

Nathan Lambert · Aug 10, 2025 · 2:31 PM UTC

Nathan Lambert

@natolambert

10 Aug 2025

Anthropic is the only leading AI lab to not release a reasonable open weights model. Is notable that pretty much everyone has a touchpoint here now.

622

64,332

Nathan Lambert · Mar 21, 2025 · 3:19 PM UTC

Nathan Lambert

@natolambert

21 Mar 2025

Qwen 3 coming imminently! Meta's smart to have locked in LlamaCon, else Llama 4 maybe would've been delayed again 🤭. Really I'm hype for Llama 4, bring it asap.

638

100,574

Nathan Lambert · Jan 21, 2025 · 5:20 PM UTC

Nathan Lambert

@natolambert

21 Jan 2025

DeepSeek makes it quite clear how they trained R1. None of these steps alone are super surprising, but how to sequence and blend them together definitely is.

Nathan Lambert

@natolambert

21 Jan 2025

The DeepSeek R1 recipe, what questions we need to answer to train an o1 replication ourselves at home, and what it means for the near future of AI. interconnects.ai/p/deepseek-…

615

64,973

Nathan Lambert · Sep 29, 2025 · 7:03 PM UTC

Nathan Lambert

@natolambert

29 Sep 2025

Thinking machines proving you can be worth $10B with your one product being great content.

Thinking Machines

@thinkymachines

29 Sep 2025

LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA. thinkingmachines.ai/blog/lor…

648

83,621

Nathan Lambert · Feb 3, 2025 · 12:45 AM UTC

Nathan Lambert

@natolambert

3 Feb 2025

Stoked to get to talk to @lexfridman + my homie @dylan522p for 5+ hours to try and get to the bottom of what is actually happening in AI right now. DeepSeek R1 & V3, China v US, open vs closed, decreasing hype, datacenters, everything in between... 🚀 what a fun whirlwind week

Lex Fridman

@lexfridman

3 Feb 2025

Here's my 5-hour conversation with @dylan522p and @natolambert on DeepSeek, China, OpenAI, NVIDIA, xAI, Google, Anthropic, Meta, Microsoft, TSMC, Stargate, megacluster buildouts, RL, reasoning, and a lot of other topics at the cutting edge of AI. This is was a mind-blowing, super-technical, and fun conversation. Yes, we discuss r1 and o3-mini, but more importantly we look into the future of technology, geopolitics, and humanity in a world that stands on the precipice of a global AI revolution. The first 4 hours are here on X (4 hours is current limit), and the full 5 hours are up everywhere else. Links in comment. Timestamps: 0:00 - Introduction 3:33 - DeepSeek-R1 and DeepSeek-V3 25:07 - Low cost of training 51:25 - DeepSeek compute cluster 58:57 - Export controls on GPUs to China 1:09:16 - AGI timeline 1:18:41 - China's manufacturing capacity 1:26:36 - Cold war with China 1:31:05 - TSMC and Taiwan 1:54:44 - Best GPUs for AI 2:09:36 - Why DeepSeek is so cheap 2:22:55 - Espionage 2:31:57 - Censorship 2:44:52 - Andrej Karpathy and magic of RL 2:55:23 - OpenAI o3-mini vs DeepSeek r1 3:14:31 - NVIDIA 3:18:58 - GPU smuggling 3:25:36 - DeepSeek training on OpenAI data 3:36:04 - AI megaclusters 4:11:26 - Who wins the race to AGI? 4:21:39 - AI agents 4:30:21 - Programming and AI 4:37:49 - Open source 4:47:01 - Stargate 4:54:30 - Future of AI

641

89,552

Nathan Lambert · Dec 20, 2024 · 6:08 PM UTC

Nathan Lambert

@natolambert

20 Dec 2024

OpenAI skips o2, previews o3 scores, and they're truly crazy. Huge progress on the few benchmarks we think are truly hard today. Including ARC AGI. Rip to people who say any of "progress is done," "scale is done," or "llms cant reason" 2024 was awesome. I love my job.

623

86,065

Nathan Lambert · Oct 31, 2025 · 2:13 PM UTC

Nathan Lambert

@natolambert

31 Oct 2025

I'm convinced to try it asap, we should all try fp16, look at this plot man. FP16 is like perfect in error reduction. "This is precisely why switching to FP16 provides a fundamental solution. With its 10 mantissa bits, FP16 offers 8 times more precision (2^10 values vs. 2^7 values) than BF16. This higher fidelity means that the outputs of the training and inference engines are much more likely to be numerically identical. The increased precision creates a buffer that absorbs the minor implementation differences between the two engines, preventing rounding errors from accumulating and causing a policy divergence. For RL fine-tuning, the dynamic range of the model’s weights and activations has already been established during pre-training. Therefore, the extreme range of BF16 is less critical, while the precision it sacrifices becomes a dominant drawback. By reverting to FP16, we trade the unnecessary range of BF16 for the critical precision, effectively closing the gap between training and inference without any complex algorithmic or engineering workaround."

646

144,104

Nathan Lambert · Oct 3, 2025 · 2:38 PM UTC

Nathan Lambert

@natolambert

3 Oct 2025

Who's using GPT-OSS and for what? Was it cheaper, better, faster than other open models? Or just not from China? Download numbers are actually very strong on HuggingFace for first model releases.

123

590

125,360

Nathan Lambert · Jul 4, 2025 · 2:05 PM UTC

Nathan Lambert

@natolambert

4 Jul 2025

My latest post: The American DeepSeek Project Build fully open models in the US in the next two years to enable a flourishing, global scientific AI ecosystem to balance China's surge in open-source and an alternative to building products ontop of leading closed models.

627

145,083

Nathan Lambert · Mar 1, 2025 · 4:53 PM UTC

Nathan Lambert

@natolambert

1 Mar 2025

Tbh I’m happily using GPT-4.5. thanks OpenAI for not being too eval obsessed

619

92,542

Nathan Lambert · Mar 26, 2025 · 1:56 PM UTC

Nathan Lambert

@natolambert

26 Mar 2025

Gemini 2.5 is amazing -- a bigger jump than the recent releases of Claude 3.7, Grok 3, and GPT 4.5. Google needs to have the same drive for excellence across the product and cloud orgs They can reclaim the crown in AI if they commit to it

Interconnects

@interconnectsai

26 Mar 2025

Gemini 2.5 Pro and Google's second chance with AI Plus some coverage for the latest DeepSeek. interconnects.ai/p/gemini-25…

621

72,463

Nathan Lambert · Apr 19, 2025 · 3:25 AM UTC

Nathan Lambert

@natolambert

19 Apr 2025

o3's search abilities are incredible. Can find extremely niche information without a ton of additional context. Just what I would say to a colleague.

617

144,592

Nathan Lambert · Apr 16, 2025 · 5:12 PM UTC

Nathan Lambert

@natolambert

16 Apr 2025

TLDR on o3 and o4-mini: incremental. pace of progress still really high, no dramatic changes in performance or shocking new features. Pressure to ship fast across the industry has never been higher.

617

46,616

Nathan Lambert · Sep 6, 2025 · 5:43 PM UTC

Nathan Lambert

@natolambert

6 Sep 2025

I'm using GPT 5 Pro a lot. Mostly for research in my case, but I am bullish it or Gemini Deep Think are the smartest models available publicly today. You should use one or both of them.

Andrej Karpathy

@karpathy

5 Sep 2025

I think congrats again to OpenAI for cooking with GPT-5 Pro. This is the third time I've struggled on something complex/gnarly for an hour on and off with CC, then 5 Pro goes off for 10 minutes and comes back with code that works out of the box. I had CC read the 5 Pro version and it wrote up 2 paragraphs admiring it (very wholesome). If you're not giving it your hardest problems you're probably missing out.

481

75,804

Nathan Lambert · Aug 4, 2025 · 2:08 PM UTC

Nathan Lambert

@natolambert

4 Aug 2025

America needs to take open models more seriously. This summer the early lead in open model adoption of the US via Llama has been overtaken by Chinese models. With The American Truly Open Models (ATOM) Project we're looking to build support and express the urgency of this issue.

100

621

132,162

Nathan Lambert · Apr 12, 2024 · 3:33 PM UTC

Nathan Lambert

@natolambert

12 Apr 2024

Watch my @stanford CS 25 lecture next week, "aligning open language models," it'll be good, v excited

598

141,112

Nathan Lambert · Apr 21, 2025 · 4:13 PM UTC

Nathan Lambert

@natolambert

21 Apr 2025

"Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?" This isn't a new intuition, but a nice new set of results. The paper in question Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? has a lot of discussions underway on if Reinforcement Learning from Verifiable Rewards (RLVR) is actually improving the models we’re training. The core figures are attached. These are all using pass@k as the core metric. Pass@k is the metric that checks to see if the right answer exists in k completions. This is not how practical inference works, but is a good test to see if the model is “in distribution.” You should focus on the bottom three rows here which are in-distribution for the RL training data. Second, Qwen is known to be predisposed to learning reasoning so those base models may be stronger. I can’t say a lot more about the base models as that’s an open area of research — what are the right base models for reasoning? Some are surprised that the base model does so well, but really we’ve been saying for a while that RL training is increasing the probability of correct behaviors — elicitation. With this view, the results align totally with what RLVR should be doing. There are also some caveats on the work that make it have the usual academic grains of salt. Mostly, they only train on the MATH and GSM8K training sets. While this is great for controlled ablations, it’s not great for showing the fundamental limits of RL training. OpenAI and others have shown that scaling RL is a crucial aspect of it, and with only these narrow training sets that isn’t really possible. Second, the paper doesn’t have a ton of plots showing the training curves for their models. It’s safe to assume they’re decent because the results look reasonable, but the base model training is much more reliable than the RL training in another paper trying to make a point. The pass@1 results for RL are extremely promising and should reinforce that RLVR is working. That being said, if we had perfect verifiers — an oracle — we’d never need RLVR in the first place (or post-training really), and we could just use that instead of trying to make the model better. My very first post on inference models made the same point that random sampling with pass@k metrics is important as a baseline for inference scaling! This isn’t new. This is a nice reminder that there’s no free lunch. We should keep checking how this changes as we: Scale RL training to many more prompts, and Scale RL to bigger base models. A final caveat, which I think is minor. These results are all RL-Zero style, i.e. just on the base model with no warm start. DeepSeek and others stated that better performance comes from a warmup with on-policy SFT before RL. This’ll make the RL results above even stronger, where the base model results won’t change.

627

118,668

Nathan Lambert · Feb 3, 2025 · 12:17 AM UTC

Nathan Lambert

@natolambert

3 Feb 2025

RL keeps cooking 🤣🤡🫡 “Deep research was trained using end-to-end reinforcement learning on hard browsing and reasoning tasks across a range of domains.”

582

62,533

Nathan Lambert · Oct 5, 2025 · 9:06 PM UTC

Nathan Lambert

@natolambert

5 Oct 2025

I gave a talk today at The Curve on the state of open models. Here are the slides, recording soon. Topics include: Chinese ecosystem, reflections on DeepSeek, the demise of Llama, who will fill the U.S. market, what local models do, ATOM project & ai2, and more topics

595

91,422

Nathan Lambert · Jul 5, 2025 · 8:49 PM UTC

Nathan Lambert

@natolambert

5 Jul 2025

There are like 10-20 Chinese orgs shipping open models that I try and keep a somewhat close eye on and there are like 3-4 in the rest of the world 😳

581

72,983

Nathan Lambert · Mar 3, 2025 · 4:20 PM UTC

Nathan Lambert

@natolambert

3 Mar 2025

If you look at most of the models we've received from OpenAI, Anthropic, and Google in the last 18 months you'll hear a lot of "Most of the improvements were in the post-training phase." Here's a simple analogy for how so many gains can be made on mostly the same base model: The intuition I've been using to understand the potential of post-training is what I call the elicitation interpretation of post-training, where all we are doing is extracting and amplifying valuable behaviors in the base model. Consider F1, most of the teams show up to the beginning of the year with a new chassis and engine. Then, they spend all year on aerodynamics and systems changes (ofc a minor over simplification), and can dramatically improve the performance of the car. The best F1 teams improve way more during a season than chassis to chassis. The same is true for post-training. The best post-training teams extract a ton of performance in a very short time frame. The set of techniques is everything after the end of most of pretraining. It includes "mid-training" like Annealing / high-quality end of pre-training, instruction tuning, RLVR, preference-tuning, etc. Then, when you look like models such as GPT4.5, you can see this as a way more dynamic and exciting base for OpenAI to build onto. We also know that bigger base models can absorb far more diverse changes than their smaller counterparts. This is to say -- scaling also allows post-training to move faster. Of course, to do this, you need the infrastructure to train the models. This is why all the biggest companies are still building gigantic clusters. Still, it is very important to understand how much craft their is to post-training and how these labs are grabbing so much available performance. Improvements have been easier than most people think -- fitting them all together in one model is harder, but we have a lot more to gain still.

594

95,879

Nathan Lambert · Jul 11, 2025 · 2:59 PM UTC

Nathan Lambert

@natolambert

11 Jul 2025

Modern licenses so funny. Amazing looking model. MIT-Modified. Marketing is king. "Our only modification part is that, if the Software (or any derivative worksthereof) is used for any of your commercial products or services that havemore than 100 million monthly active users, or more than 20 million US dollars(or equivalent in other currencies) in monthly revenue, you shall prominentlydisplay "Kimi K2" on the user interface of such product or service."

596

63,064

Nathan Lambert · Jul 18, 2023 · 4:04 PM UTC

Nathan Lambert

@natolambert

18 Jul 2023

Excited to share my analysis of the LLAMA2 model. In short, this model and paper are incredibly well done. Meta has stepped up the level for open-source and signaled another path for the future of LLMs. Influencers are now right with "equals chatgpt". interconnects.ai/p/llama-2-f…

Llama 2: an incredible open LLM

Meta is continuing to deliver high-quality research artifacts and not backing down from pressure against open source.

interconnects.ai

130

571

135,671

Nathan Lambert · Sep 14, 2025 · 5:01 PM UTC

Nathan Lambert

@natolambert

14 Sep 2025

Getting ready to invest more time into the RLHF book to prepare for print edition. What do you wish was clearer or had more coverage in it?

585

38,037

Nathan Lambert · Oct 25, 2025 · 2:34 PM UTC

Nathan Lambert

@natolambert

25 Oct 2025

A new essay on the crazy, all or nothing approach to work happening in AI today, the looming human costs, and the lack of a finish line. I wouldn't say it's okay, but I'm not sure how to fix it.

592

284,914

Nathan Lambert · Sep 16, 2025 · 7:16 PM UTC

Nathan Lambert

@natolambert

16 Sep 2025

So, Dylan, where does Google land on this chart?

578

92,216

Nathan Lambert · Aug 30, 2025 · 5:13 PM UTC

Nathan Lambert

@natolambert

30 Aug 2025

Got the essentials, not much more is needed.

255

14,222

Nathan Lambert · May 22, 2024 · 6:28 PM UTC

Nathan Lambert

@natolambert

22 May 2024

If you're a student wanting an exciting life, good comp, and impact on the real world: work on LLMs

Yann LeCun

@ylecun

22 May 2024

If you are a student interested in building the next generation of AI systems, don't work on LLMs

524

213,535