Joseph Suarez 🐡 · Apr 6, 2026 · 3:14 PM UTC

Joseph Suarez 🐡

Pinned Tweet

Joseph Suarez 🐡

@jsuarez

Apr 6

Releasing PufferLib 4.0: Train agents in seconds

100

1,198

203,812

Joseph Suarez 🐡 · Oct 16, 2025 · 4:11 PM UTC

Joseph Suarez 🐡

@jsuarez

16 Oct 2025

I don't like courses. Most were a waste of time. Yes, even at Stanford. If you're new to ML, take CS231N.

sankalp

@dejavucoder

15 Oct 2025

your honor i object, i dont know about harvard but stanford literally releases SOTA courses

141

2,910

248,937

Joseph Suarez 🐡 · Jul 11, 2025 · 3:24 PM UTC

Joseph Suarez 🐡

@jsuarez

11 Jul 2025

x.com/i/article/194157995594…

My Advice for Programming and ML

This article is a prequel to my opinionated reinforcement learning guide. It exists because people have asked me for it repeatedly. Start here if you a) do not know how to program b) don't know ML or

263

2,794

558,966

Joseph Suarez 🐡 · Nov 1, 2025 · 4:20 PM UTC

Joseph Suarez 🐡

@jsuarez

1 Nov 2025

Guillermo Rauch

@rauchg

30 Oct 2025

The world runs on TypeScript & JavaScript. Our bet is that AI engineering will follow suit. The growth in @aisdk downloads and adoption has been astonishing. When we wrote the Ship AI keynote it was at 3.4M weekly downloads. A couple weeks later, it’s now at 4.1M 😳 vercel.fyi/ai-npm

2,228

169,715

Joseph Suarez 🐡 · Oct 12, 2025 · 12:05 PM UTC

Joseph Suarez 🐡

@jsuarez

12 Oct 2025

You have to have been in ML for over a decade to really understand how bad this was. From 2015-2018ish, any time anything went wrong in an experiment, at least someone would say "local minimum"

kache

@yacineMTB

12 Oct 2025

sent this to my dad this morning. It is really incredible how much you can learn today if you want to learn. There are no barriers except for yourself

2,148

247,701

Joseph Suarez 🐡 · Oct 19, 2025 · 1:56 AM UTC

Joseph Suarez 🐡

@jsuarez

19 Oct 2025

I don't care if Karpathy is down on RL. He, Carmack, Ilya, and Alec Radford could all show up in person to tell me I'm wasting my time and I'd still keep doing it. Because damn it this tech is too cool not to exist.

102

1,911

300,707

Joseph Suarez 🐡 · Jul 19, 2025 · 5:26 PM UTC

Joseph Suarez 🐡

@jsuarez

19 Jul 2025

x.com/i/article/194660565757…

The Tragedy of Reinforcement Learning

Or: how the genie went back into the bottle for years. This is, at least from my perspective, the true story of what happened to reinforcement learning and why it's only really starting to kick off

182

1,894

413,377

Joseph Suarez 🐡 · Jul 11, 2025 · 3:24 PM UTC

Joseph Suarez 🐡

@jsuarez

11 Jul 2025

x.com/i/article/194073770512…

An Ultra Opinionated Guide to Reinforcement Learning

Reinforcement learning is about learning through interaction. Applications include robotics, logistics, gaming, and even control problems in science like nuclear fusion. It's an underexplored niche of

178

1,856

357,546

Joseph Suarez 🐡 · Nov 16, 2025 · 12:49 AM UTC

Joseph Suarez 🐡

@jsuarez

16 Nov 2025

Welcome to RL. If the pink curve overtakes the green curve, we'll have big news

1,781

436,524

Joseph Suarez 🐡 · Jul 23, 2024 · 1:35 PM UTC

Joseph Suarez 🐡

@jsuarez

23 Jul 2024

Whenever there's an Internet spike, it launches flechettes out of the roof

1,409

56,381

Joseph Suarez 🐡 · Nov 2, 2025 · 4:39 PM UTC

Joseph Suarez 🐡

@jsuarez

2 Nov 2025

If you ask me to do this I will 1) not be able to 2) assume you're a moron who tf remembers toy algos from CS101 with 0 application

This tweet is unavailable

1,155

149,170

Joseph Suarez 🐡 · Oct 19, 2025 · 12:27 PM UTC

Joseph Suarez 🐡

@jsuarez

19 Oct 2025

RL really sucks. It takes 10 hours just to learn breakout. ... a few years ago. It's <30 seconds on 1 GPU now in PufferLib and still dropping. Write faster code.

1,170

267,633

Joseph Suarez 🐡 · Jun 23, 2025 · 4:54 PM UTC

Joseph Suarez 🐡

@jsuarez

23 Jun 2025

PufferLib 3.0: We trained reinforcement learning agents on 1 Petabyte / 12,000 years of data with 1 server. Now you can, too! Our latest release includes algorithmic breakthroughs, massively faster training, and 10 new environments. Live demos on our site. Volume on for trailer!

108

1,053

194,727

Joseph Suarez 🐡 · Oct 15, 2025 · 11:05 PM UTC

Joseph Suarez 🐡

@jsuarez

15 Oct 2025

Accidentally said "you all are status obsessed hollywood wannabees" instead of "you're literally building the future bro" and they kicked me out of SF

846

27,471

Joseph Suarez 🐡 · May 29, 2025 · 11:00 PM UTC

Joseph Suarez 🐡

@jsuarez

29 May 2025

Replying to @trashh_dev

Lead from the front by replacing the Warp CEO with AI!

822

19,986

Joseph Suarez 🐡 · Jul 11, 2025 · 2:01 PM UTC

Joseph Suarez 🐡

@jsuarez

11 Jul 2025

25 pages of educational material going live today!

kache

@yacineMTB

11 Jul 2025

i love pufferlib and i love joseph suarez

801

93,196

Joseph Suarez 🐡 · Aug 7, 2025 · 3:01 PM UTC

Joseph Suarez 🐡

@jsuarez

7 Aug 2025

PufferLib has won a best paper award for resourcefulness in reinforcement learning! Thank you to our entire open-source community + of course Spencer @spenccheng, who has built more of the environments than anyone else! Come chat with us in person today/tomorrow!

713

48,662

Joseph Suarez 🐡 · Jul 1, 2025 · 5:08 PM UTC

Joseph Suarez 🐡

@jsuarez

1 Jul 2025

x.com/i/article/194003856135…

Reinforcement Learning on a Petabyte of Data at Home

6.2 petabytes uncompressed. Our agents played >12,000 years of Neural MMO 3 on a single 6x4090 tinybox in 80 hours. You can watch the agent live in your browser at puffer.ai. Everything

725

124,597

Joseph Suarez 🐡 · Sep 3, 2025 · 11:25 PM UTC

Joseph Suarez 🐡

@jsuarez

3 Sep 2025

x.com/i/article/196332982738…

Why RL Failed to Bootstrap

Off-policy RL is slow. On-policy RL is data hungry. I figured out why. This is a precise technical summary of my findings from the past 3 weeks of research and literature review across various

682

126,046

Joseph Suarez 🐡 · May 14, 2024 · 2:20 PM UTC

Joseph Suarez 🐡

@jsuarez

14 May 2024

My thesis has been accepted - 170 pages of Neural MMO available on my github io (jsuarez5341). Thank you to my advisor @phillip_isola, my committee @pulkitology @EugeneVinitsky, and my parents Jose and Patricia Suarez, to whom I devote my thesis.

587

53,741

Joseph Suarez 🐡 · Mar 6, 2025 · 5:26 PM UTC

Joseph Suarez 🐡

@jsuarez

6 Mar 2025

We trained a tiny model to beat Pokemon Red on 1 GPU. This is a qualitatively new result for RL because the game takes ~25 hours to beat! Most tasks used in research run seconds to minutes. Work with @dsrubinstein @DanAdvantage @kywch500 @computerender

598

37,040

Joseph Suarez 🐡 · Jun 5, 2024 · 3:27 PM UTC

Joseph Suarez 🐡

@jsuarez

5 Jun 2024

Replying to @mycoliza

Instabrick (tm)

535

20,151

Joseph Suarez 🐡 · Jul 18, 2025 · 6:41 PM UTC

Joseph Suarez 🐡

@jsuarez

18 Jul 2025

I will be working on RL for drone racing and swarms on stream here/YT/Twitch for the next few hours. Goal is a ~100k param multitask model that we can deploy on real hardware

584

34,304

Joseph Suarez 🐡 · Oct 17, 2025 · 2:35 PM UTC

Joseph Suarez 🐡

@jsuarez

17 Oct 2025

This is one of my favorite books. I've never read it. Created the easiest way ever to spot a snake.

Steve the Beaver

@beaversteever

17 Oct 2025

I was confused when the new intern said my name 10 times in the span of a 2 minute conversation, then I saw this on his desk:

579

60,041

Joseph Suarez 🐡 · Nov 8, 2024 · 11:57 AM UTC

Joseph Suarez 🐡

@jsuarez

8 Nov 2024

x.com/i/article/185130546762…

Reinforcement Learning Quickstart Guide

So you want to learn reinforcement learning? It's a hard mountain to climb, but I'm going to be giving you some of the best tricks and insights from my playbook. Star PufferLib on GitHub if you learn

584

72,230

Joseph Suarez 🐡 · Jul 9, 2025 · 4:30 PM UTC

Joseph Suarez 🐡

@jsuarez

9 Jul 2025

Nobody has spun up in RL faster than Spencer. He started contributing to PufferLib last September and is already better at applications than the vast majority of PhDs. My guides miss a lot of newcomer confusions because I've been doing this too long. Spencer's guides don't.

Spencer Cheng

@spenccheng

9 Jul 2025

x.com/i/article/193499874920…

Intro to Multi-Agent RL

If you can simulate it, you can solve it. That’s the power of reinforcement learning. In the last few years, we’ve seen RL topple DoTA champions, accelerate chip design, and pilot drones at superhuman

561

58,791

Joseph Suarez 🐡 · Nov 8, 2025 · 1:12 PM UTC

Joseph Suarez 🐡

@jsuarez

8 Nov 2025

Live RL Research on PufferLib w/ Joseph Suarez nitter.app/i/broadcasts/1YpKkkdby…

557

337,096

Joseph Suarez 🐡 · Dec 28, 2024 · 4:43 PM UTC

Joseph Suarez 🐡

@jsuarez

28 Dec 2024

Replying to @vikhyatk

That would have taken me under a minute as a freshman in undergrad and I'd have failed it now. Get better interview material.

522

55,058

Joseph Suarez 🐡 · Jul 20, 2025 · 12:42 AM UTC

Joseph Suarez 🐡

@jsuarez

20 Jul 2025

Found the bug. Trains in 2 minutes. No collisions/randomization for first test but pretty zippy! Multi-task + domain randomized next

488

35,381

Joseph Suarez 🐡 · Oct 18, 2025 · 1:22 AM UTC

Joseph Suarez 🐡

@jsuarez

18 Oct 2025

I just stream everything except private sim dev for clients. This is all from this week

Robbie Pasquale

@RobbiePasquale

18 Oct 2025

Replying to @jsuarez

Bruh how tf you stream so much, some kind of mad man

488

149,218

Joseph Suarez 🐡 · Oct 24, 2025 · 5:25 PM UTC

Joseph Suarez 🐡

@jsuarez

24 Oct 2025

A lot of cutting-edge simulation development for RL looks like this type of low-level game dev

Wouter Weynants

@WWeynants

24 Oct 2025

flecs, raylib, ttf, aheasing, kdtree, gpu mesh instancing loving C perfection through minimalism, everything pixel perfect and snappy as hell #flecs #raylib #indiedev #gamedev @ajmmertens @raysan5

461

36,435

Joseph Suarez 🐡 · Sep 27, 2024 · 2:22 AM UTC

Joseph Suarez 🐡

@jsuarez

27 Sep 2024

RL is useless... except if you want super-human perf on games, control, LLMs, chip design, rideshare matching, 5G, and more! It's also an area where you can make major progress with very few resources. Join PufferAI's open source efforts at discord gg/puffer or DM me!

446

45,852

Joseph Suarez 🐡 · Oct 2, 2025 · 8:58 PM UTC

Joseph Suarez 🐡

@jsuarez

2 Oct 2025

Offline RL is not RL. RL is about interaction. No interaction, no RL.

458

174,959

Joseph Suarez 🐡 · Jun 8, 2025 · 1:04 AM UTC

Joseph Suarez 🐡

@jsuarez

8 Jun 2025

RL games research hard trolled from 2015-2025 by writing 10,000x slow code. We fixed it and now it feels like something straight out of scifi. Currently scaling the exact same methods to a variety of industry sims. LLMs are the nerdsnipe braindraining all other AI progress

xjdr

@_xjdr

6 Jun 2025

the 3 largest nerdsnipes of my career: - RL for games - Graph Neural Nets / MPNN - AlphaGo / MuZero Its amazing to watch the AI community wander right into these tarpits. its also amazing to watch RL at least be useful at scale this time around.

454

50,412

Joseph Suarez 🐡 · Nov 6, 2025 · 1:20 PM UTC

Joseph Suarez 🐡

@jsuarez

6 Nov 2025

Welcome to academia. I submitted PufferLib when it made RL 10x faster. Rejected. 100x? Rejected. 1000x? Okay fine sry published + award. You have to be really, really stubborn

Teknium 🪽

@Teknium

6 Nov 2025

What are these reviewers doing?? We've been doing something like this since @dmayhem93 joined as RL lead and it's the single biggest efficiency gain you can add to an rl infra right now if you don't already use it.

441

77,003

Joseph Suarez 🐡 · Jul 30, 2025 · 4:00 PM UTC

Joseph Suarez 🐡

@jsuarez

30 Jul 2025

x.com/i/article/194854344668…

Game Reinforcement Learning isn't Playing Around

Or: why superintelligence is just Runescape with a new coat of paint. From the outside, a lot of reinforcement learning research looks like we're wasting our time and playing too many games. A world

434

78,150

Joseph Suarez 🐡 · Oct 21, 2025 · 7:00 PM UTC

Joseph Suarez 🐡

@jsuarez

21 Oct 2025

Listening to this dude is a waste of time

Pedro Domingos

@pmddomingos

20 Oct 2025

Reinforcement learning is revealed yet again to be a waste of time.

426

57,441

Joseph Suarez 🐡 · Nov 9, 2025 · 3:00 AM UTC

Joseph Suarez 🐡

@jsuarez

9 Nov 2025

Today, I wrote three kernels fusing various gates and scans in our recurrent cell and a fused PPO loss kernel. Result is +2M steps/second training before I even start optimizing. Streamed >12 hours of dev today. Tomorrow, I lift weights all day and relax. Star pufferlib. Gn

419

28,881

Joseph Suarez 🐡 · Mar 4, 2025 · 9:31 PM UTC

Joseph Suarez 🐡

@jsuarez

4 Mar 2025

Replying to @sama

... So you're just deleting the subscription and replacing it with an API wrapper, but still charging monthly up front

409

31,500

Joseph Suarez 🐡 · Jul 17, 2025 · 10:40 PM UTC

Joseph Suarez 🐡

@jsuarez

17 Jul 2025

Early prototype of the new drone racing sim. Every drone here is a different size, weight, axial inertias, etc. We reinforcement learn the policy in <2 minutes with PufferLib. This is an extension to the original sim submitted by Fin and Sam

412

24,882

Joseph Suarez 🐡 · Dec 9, 2024 · 2:24 PM UTC

Joseph Suarez 🐡

@jsuarez

9 Dec 2024

x.com/i/article/185380866896…

PufferLib 2.0: Reinforcement Learning at 1M Steps per Second

Welcome to a new age of reinforcement learning. This is our biggest release ever. 11 new environments totaling ~20,000 lines of pure C. They all run >1M steps/second (sps) on a single CPU core, and

402

36,580

Joseph Suarez 🐡 · Sep 2, 2024 · 8:01 PM UTC

Joseph Suarez 🐡

@jsuarez

2 Sep 2024

The Full RL Iceberg - everything wrong with reinforcement learning and how PufferLib is fixing it Join me for a dive through 10 layers of the RL stack. There's something here for beginners and world-class experts alike. Star the project on GitHub to feed the puffer!

387

48,176

Joseph Suarez 🐡 · Oct 17, 2025 · 10:57 PM UTC

Joseph Suarez 🐡

@jsuarez

17 Oct 2025

Never stopped. Now 1000x faster

Hassan Hayat 🔥

@TheSeaMouse

17 Oct 2025

It's funny how almost a decade later, the frontier labs are back at the same spot, building RL gyms

380

46,480

Joseph Suarez 🐡 · Oct 16, 2024 · 4:16 PM UTC

Joseph Suarez 🐡

@jsuarez

16 Oct 2024

RL Quickstart guide: 1. Papers: DQN, GAE, PPO, OpenAI Five, Alphastar. 2. Read CleanRL DQN and PPO implementations (500 lines total) 3. Watch my RL Iceberg video + read PufferLib paper. You now know enough to be useful. Star pufferai/pufferlib and come build with us on Discord!

366

22,376

Joseph Suarez 🐡 · Sep 13, 2025 · 3:08 PM UTC

Joseph Suarez 🐡

@jsuarez

13 Sep 2025

The pufferlib core is a few thousand lines. You have to be borderline nuts to believe that 1000x incompetence can exist, but it does

the tiny corp

@__tinygrad__

13 Sep 2025

If tinygrad succeeds, the effects will ripple far beyond a Tensor library. Can ~15k lines replace a 15 million line stack? Is software three orders of magnitude too large?

369

63,932

Joseph Suarez 🐡 · Apr 3, 2025 · 10:32 PM UTC

Joseph Suarez 🐡

@jsuarez

3 Apr 2025

The SF echo-chamber must be truly airtight for such sad, scared fan-fiction to gain traction. At risk of giving he who must not be named even more attention, here are some thoughts on the piece and AI in general.

363

78,138

Joseph Suarez 🐡 · May 18, 2025 · 11:31 AM UTC

Joseph Suarez 🐡

@jsuarez

18 May 2025

They used hundreds of GPUs and way more CPUs to simulate and train 100B steps. PufferLib simulates and trains 100B steps on your desktop. You can do fun RL with us right now!

ludwig

@ludwigABAP

17 May 2025

those were such fun times

358

25,229

Joseph Suarez 🐡 · May 27, 2025 · 10:32 PM UTC

Joseph Suarez 🐡

@jsuarez

27 May 2025

Remember when RL used to take hours to solve pong? Our latest PufferLib sweep solves in 4 seconds. Release soon!

359

25,368

Joseph Suarez 🐡 · Jun 24, 2025 · 4:52 PM UTC

Joseph Suarez 🐡

@jsuarez

24 Jun 2025

x.com/i/article/192234466119…

Puffing up PPO

Our day-to-day reinforcement learning work feels like a different field thanks to our new training algorithm. It often solves new environments out-of-the-box in seconds with default hyperparameters,

364

58,932

Joseph Suarez 🐡 · Nov 14, 2025 · 3:15 AM UTC

Joseph Suarez 🐡

@jsuarez

14 Nov 2025

If this works, it will make wandb PufferLib's top pick for monitoring. As is right now, all the web platforms are bloated and unusable for any reasonable amount of runs.

Weights & Biases

@wandb

12 Nov 2025

🤫 Something's been brewing in stealth. Our SDK team's side project, codenamed W&B LEET, is being unleashed. We are releasing a full Terminal UI (TUI) for live, interactive W&B monitoring right in your terminal. No browser, no internet, no problem.

WANDB LEET

349

34,997

Joseph Suarez 🐡 · Jul 2, 2025 · 1:34 AM UTC

Joseph Suarez 🐡

@jsuarez

2 Jul 2025

Replying to @willccbb

Oh they're good at visualizing flows alright

339

6,126

Joseph Suarez 🐡 · Jun 7, 2025 · 3:14 PM UTC

Joseph Suarez 🐡

@jsuarez

7 Jun 2025

Reinforcement learning on 100,000,000,000 observations overnight on a single TinyBox has set new SOTA on Neural MMO 3. Previous best was just under 6.0. This has an effective batch size of 3 million and a minibatch of ~180k. Star PufferLib to support and come dev with us!

340

28,420

Joseph Suarez 🐡 · Jul 25, 2025 · 6:23 PM UTC

Joseph Suarez 🐡

@jsuarez

25 Jul 2025

You can just do things! I've gotten dozens of DMs from people who have wanted to get involved with RL in PufferLib. The difference is that Spencer kept at it and is now helping us push the state of the art

Spencer Cheng

@spenccheng

25 Jul 2025

x.com/i/article/194874807143…

My first year in reinforcement learning

I sent a DM one year ago that changed my life. I was running construction on an 88-home development in Frisco when I first reached out to @jsuarez5341. I had only ever heard about reinforcement

332

19,554

Joseph Suarez 🐡 · Jul 26, 2025 · 6:17 PM UTC

Joseph Suarez 🐡

@jsuarez

26 Jul 2025

Reinforcement Learning + Material Science nitter.app/i/broadcasts/1rmxPyoaM…

324

23,006

Joseph Suarez 🐡 · Dec 28, 2024 · 5:55 PM UTC

Joseph Suarez 🐡

@jsuarez

28 Dec 2024

Replying to @vikhyatk

Sure, but you look stupid af after 1 minute

309

13,471

Joseph Suarez 🐡 · Oct 18, 2024 · 3:34 PM UTC

Joseph Suarez 🐡

@jsuarez

18 Oct 2024

Reinforcement learning research is hard blocked by lack of fast, easy to use environments. PufferLib is fixing this, all open source. I provide compute and mentorship to contributors. Only req is being able to write good code. DM me if you want to help!

315

21,045

Joseph Suarez 🐡 · Oct 1, 2025 · 8:02 PM UTC

Joseph Suarez 🐡

@jsuarez

1 Oct 2025

We are not blank slates. I have given this response every time this topic has come up for the past ~7 years. At least a few dozen times by now. Someone still makes this argument in conversation at least every few months.

François Fleuret

@francoisfleuret

1 Oct 2025

"the human brain doesn't need tons of training data to do stuff"

314

31,109

Joseph Suarez 🐡 · Jul 20, 2025 · 6:43 PM UTC

Joseph Suarez 🐡

@jsuarez

20 Jul 2025

I post a lot about how good RL is with PufferLib that I'm realizing sounds increasingly grifty. Please just go try it. It's free. If you've done RL years ago, it will feel like a different field. We have new programmers doing RL on custom sims. That wasn't a thing before.

310

19,825

Joseph Suarez 🐡 · Dec 9, 2024 · 2:25 PM UTC

Joseph Suarez 🐡

@jsuarez

9 Dec 2024

x.com/i/article/186396803064…

Neural MMO 3.0

The real world is not just multiagent, it's massively multiagent. I spent 7+ years during and before my PhD developing Neural MMO, a game-inspired environment for many-agent learning research. Today,

315

39,810

Joseph Suarez 🐡 · Aug 6, 2025 · 9:09 PM UTC

Joseph Suarez 🐡

@jsuarez

6 Aug 2025

Come say hi! Poster 41 at rlc

295

9,314

Joseph Suarez 🐡 · Jul 31, 2025 · 3:51 AM UTC

Joseph Suarez 🐡

@jsuarez

31 Jul 2025

It is a special form of torture to see the field I have dedicated the last decade of my life to perverted and twisted by grifters on every billboard lining the highway to sf

285

23,715

Joseph Suarez 🐡 · Aug 26, 2024 · 9:22 PM UTC

Joseph Suarez 🐡

@jsuarez

26 Aug 2024

You're not Karpathy. Mute the AI slop and learn to code. Supermaven for 1-line autocomplete is great. Save typing, not thinking.

268

27,879

Joseph Suarez 🐡 · Nov 19, 2025 · 1:04 PM UTC

Joseph Suarez 🐡

@jsuarez

19 Nov 2025

Replying to @tatarianempire @juliet_turner6

Yo moron. That one's actually useful. Ideas help in multiagent AI and other algos

335

16,269

Joseph Suarez 🐡 · Oct 23, 2025 · 1:16 AM UTC

Joseph Suarez 🐡

@jsuarez

23 Oct 2025

Findings from today: - Jax is not magically faster than torch - Fighting torch compile sucks but you can usually get more perf out of it if you do - You're not getting more than a few TFLOPS out of a small LSTM with small batches either way - Large batches still 20% MFU Bah.

283

20,197

Joseph Suarez 🐡 · Oct 26, 2025 · 10:25 PM UTC

Joseph Suarez 🐡

@jsuarez

26 Oct 2025

The next version will be even faster!

kache

@yacineMTB

26 Oct 2025

you guys should actually just go run the code. It's literally just a pip install. Install it and train a model on your computer in 60 seconds. Then literally just go read the code. It's actually simple Lots of AI salesmen selling complicated bullshit. This is simple and good

275

62,258

Joseph Suarez 🐡 · Aug 3, 2024 · 4:38 PM UTC

Joseph Suarez 🐡

@jsuarez

3 Aug 2024

RL researcher? You've had to suffer. But now there's Puffer! Pong solved in ~80 seconds on 1 GPU. Star pufferai/pufferlib for more!

260

28,488

Joseph Suarez 🐡 · Jul 31, 2025 · 3:56 PM UTC

Joseph Suarez 🐡

@jsuarez

31 Jul 2025

PufferAI's current collection of RL environments, playable on our site. The majority are by contributors, and we're merging a few new ones soon! Most train in minutes on your laptop, and we have extensive guides on how to get started. Check my articles tab for more!

268

24,272

Joseph Suarez 🐡 · Jul 12, 2025 · 3:53 PM UTC

Joseph Suarez 🐡

@jsuarez

12 Jul 2025

Reinforcement Learning Research Live nitter.app/i/broadcasts/1BRKjmnAn…

265

26,204

Joseph Suarez 🐡 · Aug 11, 2025 · 9:05 PM UTC

Joseph Suarez 🐡

@jsuarez

11 Aug 2025

PufferLib RLC 2025 Outstanding Paper: RL at Millions of Steps per Second. Free + open source for academic and commercial use. We offer support packages for businesses with existing internal RL efforts and larger contracts on solving new problems.

270

11,839

Joseph Suarez 🐡 · Jul 9, 2024 · 5:48 PM UTC

Joseph Suarez 🐡

@jsuarez

9 Jul 2024

x.com/i/article/181051489481…

Reinforcement Learning Infra is Trash and it's Our Fault

This snake environment simulates 14M and trains 1M snake steps per second. I wrote it from scratch in 5 days, including asci, rgb, and human playable renderers. It is ~450 lines of dead simple code.

267

149,032

Joseph Suarez 🐡 · Jul 13, 2025 · 9:47 PM UTC

Joseph Suarez 🐡

@jsuarez

13 Jul 2025

And LLMs won't even be the biggest application! Massive but diffuse impact across industries. Anywhere you can build sims

Andrej Karpathy

@karpathy

13 Jul 2025

Scaling up RL is all the rage right now, I had a chat with a friend about it yesterday. I'm fairly certain RL will continue to yield more intermediate gains, but I also don't expect it to be the full story. RL is basically "hey this happened to go well (/poorly), let me slightly increase (/decrease) the probability of every action I took for the future". You get a lot more leverage from verifier functions than explicit supervision, this is great. But first, it looks suspicious asymptotically - once the tasks grow to be minutes/hours of interaction long, you're really going to do all that work just to learn a single scalar outcome at the very end, to directly weight the gradient? Beyond asymptotics and second, this doesn't feel like the human mechanism of improvement for majority of intelligence tasks. There's significantly more bits of supervision we extract per rollout via a review/reflect stage along the lines of "what went well? what didn't go so well? what should I try next time?" etc. and the lessons from this stage feel explicit, like a new string to be added to the system prompt for the future, optionally to be distilled into weights (/intuition) later a bit like sleep. In English, we say something becomes "second nature" via this process, and we're missing learning paradigms like this. The new Memory feature is maybe a primordial version of this in ChatGPT, though it is only used for customization not problem solving. Notice that there is no equivalent of this for e.g. Atari RL because there are no LLMs and no in-context learning in those domains. Example algorithm: given a task, do a few rollouts, stuff them all into one context window (along with the reward in each case), use a meta-prompt to review/reflect on what went well or not to obtain string "lesson", to be added to system prompt (or more generally modify the current lessons database). Many blanks to fill in, many tweaks possible, not obvious. Example of lesson: we know LLMs can't super easily see letters due to tokenization and can't super easily count inside the residual stream, hence 'r' in 'strawberry' being famously difficult. Claude system prompt had a "quick fix" patch - a string was added along the lines of "If the user asks you to count letters, first separate them by commas and increment an explicit counter each time and do the task like that". This string is the "lesson", explicitly instructing the model how to complete the counting task, except the question is how this might fall out from agentic practice, instead of it being hard-coded by an engineer, how can this be generalized, and how lessons can be distilled over time to not bloat context windows indefinitely. TLDR: RL will lead to more gains because when done well, it is a lot more leveraged, bitter-lesson-pilled, and superior to SFT. It doesn't feel like the full story, especially as rollout lengths continue to expand. There are more S curves to find beyond, possibly specific to LLMs and without analogues in game/robotics-like environments, which is exciting.

260

42,003

Joseph Suarez 🐡 · Jul 28, 2025 · 6:27 PM UTC

Joseph Suarez 🐡

@jsuarez

28 Jul 2025

This is why I just made all the RL research 1000x faster. You don't need h100s when we train on petabytes with 4090s

nisten🇨🇦e/acc

@nisten

28 Jul 2025

China is publishing MIT licensed AI models while MIT doesn't even have any modern GPUs that can properly run them.

259

21,631

Joseph Suarez 🐡 · Sep 28, 2025 · 6:16 PM UTC

Joseph Suarez 🐡

@jsuarez

28 Sep 2025

What's the last research topic where you've been completely, unambiguously wrong? I'll go first: I thought optimizer research was just a nerd snipe for a decade. Then I integrated Muon into PufferLib and fully reswept hyperparams. Step change in capabilities, core default in 3.0.

262

29,668

Joseph Suarez 🐡 · Aug 30, 2025 · 5:14 PM UTC

Joseph Suarez 🐡

@jsuarez

30 Aug 2025

Don't slouch Clean your room Don't mix beer and wine ... Don't train on-policy RL with off-policy data I'm sick of rules!

326

29,825

Joseph Suarez 🐡 · Oct 19, 2025 · 1:42 AM UTC

Joseph Suarez 🐡

@jsuarez

19 Oct 2025

I have streamed 11 hours of RL dev today, 50 this week. Tomorrow I will lift weights and listen to books all day. Good night.

248

20,102

Joseph Suarez 🐡 · Feb 14, 2025 · 12:13 AM UTC

Joseph Suarez 🐡

@jsuarez

14 Feb 2025

If you want to learn as much as possible about RL by reading only one paper... Read the OpenAI Five report on achieving superhuman performance at DoTA

250

17,164

Joseph Suarez 🐡 · May 19, 2025 · 8:08 PM UTC

Joseph Suarez 🐡

@jsuarez

19 May 2025

There are no intuitions about what is going on here. MDPs are a bad model for real RL problems, and even for most toy ones. RL is hard to explain because your data comes from interacting with an environment. Non-stationary + hard to make fast!

will brown

@willccbb

19 May 2025

RL is hard to explain to people because it doesn't really make any sense without internalizing strong intuitions about everything going on here

247

25,158

Joseph Suarez 🐡 · Sep 19, 2024 · 4:32 PM UTC

Joseph Suarez 🐡

@jsuarez

19 Sep 2024

You're not GPU poor, you're data rich. It only took 168M params to beat DoTA world champs. Now what if we had complex envs on 1 cpu that could run as fast as the 50-100k cores used for OpenAI Five?

235

13,120

Joseph Suarez 🐡 · Sep 17, 2025 · 2:45 PM UTC

Joseph Suarez 🐡

@jsuarez

17 Sep 2025

Raylib is the best library I've had the pleasure of using. It completely removes the friction from graphics, is flexible enough for scientific visualization, and works in your browser via wasm with no code changes!

Ray

@raysan5

17 Sep 2025

WOW! raylib got its first Gold Sponsor!!! 🤯 It is @puffer_ai by @jsuarez, developers of PufferLib, an open-source Reinforcement Learning library for complex game environments! Every single sample game environment provided uses raylib! 🚀 Thanks for supporting raylib! ❤️

ALT raylib getting the first gold sponsor!

237

18,339

Joseph Suarez 🐡 · May 14, 2025 · 5:38 PM UTC

Joseph Suarez 🐡

@jsuarez

14 May 2025

We're retrying all those late 2010's deep RL ideas but without trolling by wasting 99.9% of the compute on slow sims. Next release in a few weeks!

Rohan Pandey

@khoomeik

12 May 2025

someone should probably retry all those late 2010s deep RL ideas to see if they work on LLMs

235

16,819

Joseph Suarez 🐡 · Jul 9, 2024 · 4:24 AM UTC

Joseph Suarez 🐡

@jsuarez

9 Jul 2024

Everyone go follow @vwxyzjn. CleanRL is the highest impact single contribution in RL. My work would not be possible without it. He better have more followers than me again by morning.

231

51,251

Joseph Suarez 🐡 · Dec 9, 2024 · 2:23 PM UTC

Joseph Suarez 🐡

@jsuarez

9 Dec 2024

PufferLib 2.0: Reinforcement Learning at 1,000,000 steps/second 11 new environments, all >1M steps/second/core Human and agent-playable in your browser 20,000 lines of C. All free + open source. Star on GitHub to feed the puffer! Priority service for business from 10k/mo

226

49,647

Joseph Suarez 🐡 · Nov 15, 2025 · 1:34 AM UTC

Joseph Suarez 🐡

@jsuarez

15 Nov 2025

This ain't complicated. LLM submission or review = - Banned from submitting - Banned from registering - Uni/Company informed - Suspended/fired if your org is worth a damn

Micah Goldblum @micahgoldblum

13 Nov 2025

An LLM-generated paper is in the top 17% of ICLR submissions in terms of average reviewer score, having received two 8's. The paper has tons of BS jargon and hallucinated references. Fortunately, one reviewer actually looked at the paper and gave it a zero. 1/3

230

57,244

Joseph Suarez 🐡 · Jun 3, 2025 · 6:59 PM UTC

Joseph Suarez 🐡

@jsuarez

3 Jun 2025

The tinyboxes are here! 12x 4090s from @__tinygrad__ ready for reinforcement learning. Want to run experiments on these? Contribute cool environments to pufferlib for access!

224

13,612

Joseph Suarez 🐡 · Sep 24, 2025 · 4:08 AM UTC

Joseph Suarez 🐡

@jsuarez

24 Sep 2025

A couple of undergrads PRed an RL env for drone control to PufferLib. So I sent them a drone to play with

228

15,427

Joseph Suarez 🐡 · Nov 18, 2024 · 11:06 PM UTC

Joseph Suarez 🐡

@jsuarez

18 Nov 2024

Replying to @PandaAshwinee

Reviewer should be permabanned. Can't blame authors for fighting incoherent robo autorejects

215

9,113

Joseph Suarez 🐡 · Sep 29, 2025 · 2:54 AM UTC

Joseph Suarez 🐡

@jsuarez

29 Sep 2025

Frontier RL is open source. PufferLib 3.0 uses a strict generalization of GAE and VTrace. You can recover either algorithm by setting gamma, lambda, rho_clip, and c_clip appropriately. It is very similar to retrace but for PPO.

Tanishq Mathew Abraham, Ph.D.

@iScienceLuvr

27 Sep 2025

OpenAI employees saying GRPO and other open-source research is significantly behind frontier tech what kinds of RL algorithms could frontier labs be using?

219

24,947

Joseph Suarez 🐡 · Aug 2, 2025 · 6:12 AM UTC

Joseph Suarez 🐡

@jsuarez

2 Aug 2025

At what point does perf optimization get ridiculous. During my PhD, everything was 500-5000 sps. Then I got 10k and was very proud. Then 100k in early versions of PufferLib. Then 1M in 2.0... and now we're at up to 6M productive SPS on some RL envs

215

12,270

Joseph Suarez 🐡 · Aug 28, 2024 · 1:08 AM UTC

Joseph Suarez 🐡

@jsuarez

28 Aug 2024

Replying to @tekbog

If you can replace something with a 300 line script... do it. If you mean "can I spend 3 months overengineering more garbage" ... then no, stop it, get some help

203

8,056

Joseph Suarez 🐡 · Mar 13, 2025 · 5:18 PM UTC

Joseph Suarez 🐡

@jsuarez

13 Mar 2025

Replying to @tenobrus

death is stupid and we should cure it

209

17,794

Joseph Suarez 🐡 · Dec 21, 2022 · 8:45 PM UTC

Joseph Suarez 🐡

@jsuarez

21 Dec 2022

🧵You have a PyTorch model, an environment, and an RL framework. They should work together but don't. Today, I'm releasing PufferLib, a toolkit that makes them play nice. Initial support for CleanRL and RLlib. pufferai.github.io/

208

53,582

Joseph Suarez 🐡 · Jul 12, 2025 · 12:25 AM UTC

Joseph Suarez 🐡

@jsuarez

12 Jul 2025

Since I've been getting lots of questions today - PufferAI is a private reinforcement learning lab with all OSS research and tools. Our business is helping companies solve RL problems and in-house the capabilities. DM if you would like to chat!

207

33,178

Joseph Suarez 🐡 · Oct 13, 2025 · 12:34 AM UTC

Joseph Suarez 🐡

@jsuarez

13 Oct 2025

Yesterday, I streamed ~11 hours of RL data vis dev in C. It was my day off from exercise. Today, I lifted weights all day. ~50 sets. It was my day off from thinking.

206

35,511

Joseph Suarez 🐡 · Nov 18, 2025 · 9:20 PM UTC

Joseph Suarez 🐡

@jsuarez

18 Nov 2025

HeavyBall's Muon from @Clashluke outperforms PyTorch's Muon. I implemented a numerically-matched version in <150 lines. There's a cpp version, too! In the latest PufferLib 4 dev branch. Star the repo to feed the puffer!

212

12,565

Joseph Suarez 🐡 · Sep 22, 2025 · 4:36 PM UTC

Joseph Suarez 🐡

@jsuarez

22 Sep 2025

x.com/i/article/196798114277…

I Cut 18 Pounds in 6 Weeks

Running and lifting saved my life. I'm sharing this rather personal story because, at least for many in my audience, this will help you more than any of my technical content. If this happens to be the

258

64,669

Joseph Suarez 🐡 · Nov 14, 2025 · 9:13 PM UTC

Joseph Suarez 🐡

@jsuarez

14 Nov 2025

Me: I ported Muon to cpp! Time to PR to torch! Torch: Cool, give me an hour to compile every possible flash attention kernel

207

14,710

Joseph Suarez 🐡 · Jun 23, 2023 · 6:11 PM UTC

Joseph Suarez 🐡

@jsuarez

23 Jun 2023

Announcing the Neural MMO 2.0 Competition on Multi-Task Reinforcement Learning and Curriculum Generation at NeurIPS 2023! Partnered with @StabilityAI @carperai @ParametrixAI @aicrowdHQ! Details in the coming weeks … 1/4🧵

200

46,168

Joseph Suarez 🐡 · Jul 23, 2025 · 3:09 PM UTC

Joseph Suarez 🐡

@jsuarez

23 Jul 2025

The highway to SF is paved with LLM agent billboards. No AI for revolutionizing manufacturing. No massive breakthroughs in other fields powered by AI. Just the Hollywood of saas. We have magic scifi tech, and the rest of the world should look the part!

199

14,082

Joseph Suarez 🐡 · Mar 5, 2025 · 6:40 PM UTC

Joseph Suarez 🐡

@jsuarez

5 Mar 2025

We beat Pokemon Red with online RL! Details here over the next several days. Led by @dsrubinstein. Follow him, me, @DanAdvantage, @kywch500, @computerender for more!

drubinstein

@dsrubinstein

5 Mar 2025

Excited to finally share our progress in developing a reinforcement learning system to beat Pokémon Red. Our system successfully completes the game using a policy under 10M parameters, PPO, and a few novel techniques. Blog posted below

204

145,608

Joseph Suarez 🐡 · Aug 9, 2024 · 7:43 PM UTC

Joseph Suarez 🐡

@jsuarez

9 Aug 2024

NeurIPS D&B reviews are out. Well done Academia. I'm out. No more papers

193

105,096

Joseph Suarez 🐡 · Jul 26, 2025 · 6:31 PM UTC

Joseph Suarez 🐡

@jsuarez

26 Jul 2025

After over a year of full-time development, PufferLib has reached 3,000 stars and ~2,000 Discord members! The future of reinforcement learning is grassroots OSS + ultra performant simulation

197

9,410