I build sane open-source RL tools. MIT PhD, creator of Neural MMO and founder of PufferAI. DM for business: non-LLM sim engineering, RL R&D, infra & support.

Releasing PufferLib 4.0: Train agents in seconds
40
100
1,198
203,812
I don't like courses. Most were a waste of time. Yes, even at Stanford. If you're new to ML, take CS231N.
your honor i object, i dont know about harvard but stanford literally releases SOTA courses
23
141
2,910
248,937
The world runs on TypeScript & JavaScript. Our bet is that AI engineering will follow suit. The growth in @aisdk downloads and adoption has been astonishing. When we wrote the Ship AI keynote it was at 3.4M weekly downloads. A couple weeks later, it’s now at 4.1M 😳 vercel.fyi/ai-npm
38
67
2,228
169,715
You have to have been in ML for over a decade to really understand how bad this was. From 2015-2018ish, any time anything went wrong in an experiment, at least someone would say "local minimum"
sent this to my dad this morning. It is really incredible how much you can learn today if you want to learn. There are no barriers except for yourself
26
52
2,148
247,701
I don't care if Karpathy is down on RL. He, Carmack, Ilya, and Alec Radford could all show up in person to tell me I'm wasting my time and I'd still keep doing it. Because damn it this tech is too cool not to exist.
102
42
1,911
300,707
Welcome to RL. If the pink curve overtakes the green curve, we'll have big news
84
25
1,781
436,524
Whenever there's an Internet spike, it launches flechettes out of the roof
1
8
1,409
56,381
If you ask me to do this I will 1) not be able to 2) assume you're a moron who tf remembers toy algos from CS101 with 0 application
54
28
1,155
149,170
RL really sucks. It takes 10 hours just to learn breakout. ... a few years ago. It's <30 seconds on 1 GPU now in PufferLib and still dropping. Write faster code.
53
55
1,170
267,633
PufferLib 3.0: We trained reinforcement learning agents on 1 Petabyte / 12,000 years of data with 1 server. Now you can, too! Our latest release includes algorithmic breakthroughs, massively faster training, and 10 new environments. Live demos on our site. Volume on for trailer!
34
108
1,053
194,727
Accidentally said "you all are status obsessed hollywood wannabees" instead of "you're literally building the future bro" and they kicked me out of SF
26
27
846
27,471
Replying to @trashh_dev
Lead from the front by replacing the Warp CEO with AI!
2
10
822
19,986
25 pages of educational material going live today!
i love pufferlib and i love joseph suarez
17
20
801
93,196
PufferLib has won a best paper award for resourcefulness in reinforcement learning! Thank you to our entire open-source community + of course Spencer @spenccheng, who has built more of the environments than anyone else! Come chat with us in person today/tomorrow!
54
40
713
48,662
My thesis has been accepted - 170 pages of Neural MMO available on my github io (jsuarez5341). Thank you to my advisor @phillip_isola, my committee @pulkitology @EugeneVinitsky, and my parents Jose and Patricia Suarez, to whom I devote my thesis.
31
56
587
53,741
We trained a tiny model to beat Pokemon Red on 1 GPU. This is a qualitatively new result for RL because the game takes ~25 hours to beat! Most tasks used in research run seconds to minutes. Work with @dsrubinstein @DanAdvantage @kywch500 @computerender
17
47
598
37,040
Replying to @mycoliza
Instabrick (tm)
3
5
535
20,151
I will be working on RL for drone racing and swarms on stream here/YT/Twitch for the next few hours. Goal is a ~100k param multitask model that we can deploy on real hardware
10
36
584
34,304
This is one of my favorite books. I've never read it. Created the easiest way ever to spot a snake.
I was confused when the new intern said my name 10 times in the span of a 2 minute conversation, then I saw this on his desk:
44
10
579
60,041
Nobody has spun up in RL faster than Spencer. He started contributing to PufferLib last September and is already better at applications than the vast majority of PhDs. My guides miss a lot of newcomer confusions because I've been doing this too long. Spencer's guides don't.
8
49
561
58,791
Live RL Research on PufferLib w/ Joseph Suarez nitter.app/i/broadcasts/1YpKkkdby…
7
27
557
337,096
Replying to @vikhyatk
That would have taken me under a minute as a freshman in undergrad and I'd have failed it now. Get better interview material.
7
2
522
55,058
Found the bug. Trains in 2 minutes. No collisions/randomization for first test but pretty zippy! Multi-task + domain randomized next
13
24
488
35,381
I just stream everything except private sim dev for clients. This is all from this week
Replying to @jsuarez
Bruh how tf you stream so much, some kind of mad man
10
20
488
149,218
A lot of cutting-edge simulation development for RL looks like this type of low-level game dev
flecs, raylib, ttf, aheasing, kdtree, gpu mesh instancing loving C perfection through minimalism, everything pixel perfect and snappy as hell #flecs #raylib #indiedev #gamedev @ajmmertens @raysan5
9
17
461
36,435
RL is useless... except if you want super-human perf on games, control, LLMs, chip design, rideshare matching, 5G, and more! It's also an area where you can make major progress with very few resources. Join PufferAI's open source efforts at discord gg/puffer or DM me!
14
24
446
45,852
Offline RL is not RL. RL is about interaction. No interaction, no RL.
61
21
458
174,959
RL games research hard trolled from 2015-2025 by writing 10,000x slow code. We fixed it and now it feels like something straight out of scifi. Currently scaling the exact same methods to a variety of industry sims. LLMs are the nerdsnipe braindraining all other AI progress
the 3 largest nerdsnipes of my career: - RL for games - Graph Neural Nets / MPNN - AlphaGo / MuZero Its amazing to watch the AI community wander right into these tarpits. its also amazing to watch RL at least be useful at scale this time around.
11
9
454
50,412
Welcome to academia. I submitted PufferLib when it made RL 10x faster. Rejected. 100x? Rejected. 1000x? Okay fine sry published + award. You have to be really, really stubborn
What are these reviewers doing?? We've been doing something like this since @dmayhem93 joined as RL lead and it's the single biggest efficiency gain you can add to an rl infra right now if you don't already use it.
10
8
441
77,003
Listening to this dude is a waste of time
Reinforcement learning is revealed yet again to be a waste of time.
26
3
426
57,441
Today, I wrote three kernels fusing various gates and scans in our recurrent cell and a fused PPO loss kernel. Result is +2M steps/second training before I even start optimizing. Streamed >12 hours of dev today. Tomorrow, I lift weights all day and relax. Star pufferlib. Gn
11
9
419
28,881
Replying to @sama
... So you're just deleting the subscription and replacing it with an API wrapper, but still charging monthly up front
8
5
409
31,500
Early prototype of the new drone racing sim. Every drone here is a different size, weight, axial inertias, etc. We reinforcement learn the policy in <2 minutes with PufferLib. This is an extension to the original sim submitted by Fin and Sam
20
29
412
24,882
The Full RL Iceberg - everything wrong with reinforcement learning and how PufferLib is fixing it Join me for a dive through 10 layers of the RL stack. There's something here for beginners and world-class experts alike. Star the project on GitHub to feed the puffer!
18
40
387
48,176
Never stopped. Now 1000x faster
It's funny how almost a decade later, the frontier labs are back at the same spot, building RL gyms
3
18
380
46,480
RL Quickstart guide: 1. Papers: DQN, GAE, PPO, OpenAI Five, Alphastar. 2. Read CleanRL DQN and PPO implementations (500 lines total) 3. Watch my RL Iceberg video + read PufferLib paper. You now know enough to be useful. Star pufferai/pufferlib and come build with us on Discord!
8
28
366
22,376
The pufferlib core is a few thousand lines. You have to be borderline nuts to believe that 1000x incompetence can exist, but it does
If tinygrad succeeds, the effects will ripple far beyond a Tensor library. Can ~15k lines replace a 15 million line stack? Is software three orders of magnitude too large?
6
9
369
63,932
The SF echo-chamber must be truly airtight for such sad, scared fan-fiction to gain traction. At risk of giving he who must not be named even more attention, here are some thoughts on the piece and AI in general.
22
16
363
78,138
They used hundreds of GPUs and way more CPUs to simulate and train 100B steps. PufferLib simulates and trains 100B steps on your desktop. You can do fun RL with us right now!
those were such fun times
3
14
358
25,229
Remember when RL used to take hours to solve pong? Our latest PufferLib sweep solves in 4 seconds. Release soon!
9
18
359
25,368
If this works, it will make wandb PufferLib's top pick for monitoring. As is right now, all the web platforms are bloated and unusable for any reasonable amount of runs.
🀫 Something's been brewing in stealth. Our SDK team's side project, codenamed W&B LEET, is being unleashed. We are releasing a full Terminal UI (TUI) for live, interactive W&B monitoring right in your terminal. No browser, no internet, no problem.
5
16
349
34,997
Replying to @willccbb
Oh they're good at visualizing flows alright
1
1
339
6,126
Reinforcement learning on 100,000,000,000 observations overnight on a single TinyBox has set new SOTA on Neural MMO 3. Previous best was just under 6.0. This has an effective batch size of 3 million and a minibatch of ~180k. Star PufferLib to support and come dev with us!
6
27
340
28,420
Reinforcement Learning + Material Science nitter.app/i/broadcasts/1rmxPyoaM…
3
28
324
23,006
Replying to @vikhyatk
Sure, but you look stupid af after 1 minute
4
309
13,471
Reinforcement learning research is hard blocked by lack of fast, easy to use environments. PufferLib is fixing this, all open source. I provide compute and mentorship to contributors. Only req is being able to write good code. DM me if you want to help!
5
21
315
21,045
We are not blank slates. I have given this response every time this topic has come up for the past ~7 years. At least a few dozen times by now. Someone still makes this argument in conversation at least every few months.
"the human brain doesn't need tons of training data to do stuff"
15
11
314
31,109
I post a lot about how good RL is with PufferLib that I'm realizing sounds increasingly grifty. Please just go try it. It's free. If you've done RL years ago, it will feel like a different field. We have new programmers doing RL on custom sims. That wasn't a thing before.
11
9
310
19,825
Come say hi! Poster 41 at rlc
5
8
295
9,314
It is a special form of torture to see the field I have dedicated the last decade of my life to perverted and twisted by grifters on every billboard lining the highway to sf
12
4
285
23,715
You're not Karpathy. Mute the AI slop and learn to code. Supermaven for 1-line autocomplete is great. Save typing, not thinking.
15
13
268
27,879
Yo moron. That one's actually useful. Ideas help in multiagent AI and other algos
4
335
16,269
Findings from today: - Jax is not magically faster than torch - Fighting torch compile sucks but you can usually get more perf out of it if you do - You're not getting more than a few TFLOPS out of a small LSTM with small batches either way - Large batches still 20% MFU Bah.
12
9
283
20,197
The next version will be even faster!
you guys should actually just go run the code. It's literally just a pip install. Install it and train a model on your computer in 60 seconds. Then literally just go read the code. It's actually simple Lots of AI salesmen selling complicated bullshit. This is simple and good
5
3
275
62,258
RL researcher? You've had to suffer. But now there's Puffer! Pong solved in ~80 seconds on 1 GPU. Star pufferai/pufferlib for more!
4
11
260
28,488
PufferAI's current collection of RL environments, playable on our site. The majority are by contributors, and we're merging a few new ones soon! Most train in minutes on your laptop, and we have extensive guides on how to get started. Check my articles tab for more!
4
16
268
24,272
Reinforcement Learning Research Live nitter.app/i/broadcasts/1BRKjmnAn…
2
14
265
26,204
PufferLib RLC 2025 Outstanding Paper: RL at Millions of Steps per Second. Free + open source for academic and commercial use. We offer support packages for businesses with existing internal RL efforts and larger contracts on solving new problems.
7
26
270
11,839
And LLMs won't even be the biggest application! Massive but diffuse impact across industries. Anywhere you can build sims
Scaling up RL is all the rage right now, I had a chat with a friend about it yesterday. I'm fairly certain RL will continue to yield more intermediate gains, but I also don't expect it to be the full story. RL is basically "hey this happened to go well (/poorly), let me slightly increase (/decrease) the probability of every action I took for the future". You get a lot more leverage from verifier functions than explicit supervision, this is great. But first, it looks suspicious asymptotically - once the tasks grow to be minutes/hours of interaction long, you're really going to do all that work just to learn a single scalar outcome at the very end, to directly weight the gradient? Beyond asymptotics and second, this doesn't feel like the human mechanism of improvement for majority of intelligence tasks. There's significantly more bits of supervision we extract per rollout via a review/reflect stage along the lines of "what went well? what didn't go so well? what should I try next time?" etc. and the lessons from this stage feel explicit, like a new string to be added to the system prompt for the future, optionally to be distilled into weights (/intuition) later a bit like sleep. In English, we say something becomes "second nature" via this process, and we're missing learning paradigms like this. The new Memory feature is maybe a primordial version of this in ChatGPT, though it is only used for customization not problem solving. Notice that there is no equivalent of this for e.g. Atari RL because there are no LLMs and no in-context learning in those domains. Example algorithm: given a task, do a few rollouts, stuff them all into one context window (along with the reward in each case), use a meta-prompt to review/reflect on what went well or not to obtain string "lesson", to be added to system prompt (or more generally modify the current lessons database). Many blanks to fill in, many tweaks possible, not obvious. Example of lesson: we know LLMs can't super easily see letters due to tokenization and can't super easily count inside the residual stream, hence 'r' in 'strawberry' being famously difficult. Claude system prompt had a "quick fix" patch - a string was added along the lines of "If the user asks you to count letters, first separate them by commas and increment an explicit counter each time and do the task like that". This string is the "lesson", explicitly instructing the model how to complete the counting task, except the question is how this might fall out from agentic practice, instead of it being hard-coded by an engineer, how can this be generalized, and how lessons can be distilled over time to not bloat context windows indefinitely. TLDR: RL will lead to more gains because when done well, it is a lot more leveraged, bitter-lesson-pilled, and superior to SFT. It doesn't feel like the full story, especially as rollout lengths continue to expand. There are more S curves to find beyond, possibly specific to LLMs and without analogues in game/robotics-like environments, which is exciting.
8
17
260
42,003
This is why I just made all the RL research 1000x faster. You don't need h100s when we train on petabytes with 4090s
China is publishing MIT licensed AI models while MIT doesn't even have any modern GPUs that can properly run them.
5
6
259
21,631
What's the last research topic where you've been completely, unambiguously wrong? I'll go first: I thought optimizer research was just a nerd snipe for a decade. Then I integrated Muon into PufferLib and fully reswept hyperparams. Step change in capabilities, core default in 3.0.
12
5
262
29,668
Don't slouch Clean your room Don't mix beer and wine ... Don't train on-policy RL with off-policy data I'm sick of rules!
16
8
326
29,825
I have streamed 11 hours of RL dev today, 50 this week. Tomorrow I will lift weights and listen to books all day. Good night.
9
4
248
20,102
If you want to learn as much as possible about RL by reading only one paper... Read the OpenAI Five report on achieving superhuman performance at DoTA
10
8
250
17,164
There are no intuitions about what is going on here. MDPs are a bad model for real RL problems, and even for most toy ones. RL is hard to explain because your data comes from interacting with an environment. Non-stationary + hard to make fast!
RL is hard to explain to people because it doesn't really make any sense without internalizing strong intuitions about everything going on here
9
11
247
25,158
You're not GPU poor, you're data rich. It only took 168M params to beat DoTA world champs. Now what if we had complex envs on 1 cpu that could run as fast as the 50-100k cores used for OpenAI Five?
8
14
235
13,120
Raylib is the best library I've had the pleasure of using. It completely removes the friction from graphics, is flexible enough for scientific visualization, and works in your browser via wasm with no code changes!
WOW! raylib got its first Gold Sponsor!!! 🀯 It is @puffer_ai by @jsuarez, developers of PufferLib, an open-source Reinforcement Learning library for complex game environments! Every single sample game environment provided uses raylib! πŸš€ Thanks for supporting raylib! ❀️
1
13
237
18,339
We're retrying all those late 2010's deep RL ideas but without trolling by wasting 99.9% of the compute on slow sims. Next release in a few weeks!
someone should probably retry all those late 2010s deep RL ideas to see if they work on LLMs
5
13
235
16,819
Everyone go follow @vwxyzjn. CleanRL is the highest impact single contribution in RL. My work would not be possible without it. He better have more followers than me again by morning.
5
9
231
51,251
PufferLib 2.0: Reinforcement Learning at 1,000,000 steps/second 11 new environments, all >1M steps/second/core Human and agent-playable in your browser 20,000 lines of C. All free + open source. Star on GitHub to feed the puffer! Priority service for business from 10k/mo
6
23
226
49,647
This ain't complicated. LLM submission or review = - Banned from submitting - Banned from registering - Uni/Company informed - Suspended/fired if your org is worth a damn
An LLM-generated paper is in the top 17% of ICLR submissions in terms of average reviewer score, having received two 8's. The paper has tons of BS jargon and hallucinated references. Fortunately, one reviewer actually looked at the paper and gave it a zero. 1/3
16
8
230
57,244
The tinyboxes are here! 12x 4090s from @__tinygrad__ ready for reinforcement learning. Want to run experiments on these? Contribute cool environments to pufferlib for access!
6
6
224
13,612
A couple of undergrads PRed an RL env for drone control to PufferLib. So I sent them a drone to play with
12
8
228
15,427
Replying to @PandaAshwinee
Reviewer should be permabanned. Can't blame authors for fighting incoherent robo autorejects
1
1
215
9,113
Frontier RL is open source. PufferLib 3.0 uses a strict generalization of GAE and VTrace. You can recover either algorithm by setting gamma, lambda, rho_clip, and c_clip appropriately. It is very similar to retrace but for PPO.
OpenAI employees saying GRPO and other open-source research is significantly behind frontier tech what kinds of RL algorithms could frontier labs be using?
4
14
219
24,947
At what point does perf optimization get ridiculous. During my PhD, everything was 500-5000 sps. Then I got 10k and was very proud. Then 100k in early versions of PufferLib. Then 1M in 2.0... and now we're at up to 6M productive SPS on some RL envs
6
8
215
12,270
Replying to @tekbog
If you can replace something with a 300 line script... do it. If you mean "can I spend 3 months overengineering more garbage" ... then no, stop it, get some help
1
1
203
8,056
Replying to @tenobrus
death is stupid and we should cure it
11
1
209
17,794
🧡You have a PyTorch model, an environment, and an RL framework. They should work together but don't. Today, I'm releasing PufferLib, a toolkit that makes them play nice. Initial support for CleanRL and RLlib. pufferai.github.io/
5
39
208
53,582
Since I've been getting lots of questions today - PufferAI is a private reinforcement learning lab with all OSS research and tools. Our business is helping companies solve RL problems and in-house the capabilities. DM if you would like to chat!
7
3
207
33,178
Yesterday, I streamed ~11 hours of RL data vis dev in C. It was my day off from exercise. Today, I lifted weights all day. ~50 sets. It was my day off from thinking.
14
1
206
35,511
HeavyBall's Muon from @Clashluke outperforms PyTorch's Muon. I implemented a numerically-matched version in <150 lines. There's a cpp version, too! In the latest PufferLib 4 dev branch. Star the repo to feed the puffer!
9
9
212
12,565
Me: I ported Muon to cpp! Time to PR to torch! Torch: Cool, give me an hour to compile every possible flash attention kernel
9
4
207
14,710
Announcing the Neural MMO 2.0 Competition on Multi-Task Reinforcement Learning and Curriculum Generation at NeurIPS 2023! Partnered with @StabilityAI @carperai @ParametrixAI @aicrowdHQ! Details in the coming weeks … 1/4🧡
6
38
200
46,168
The highway to SF is paved with LLM agent billboards. No AI for revolutionizing manufacturing. No massive breakthroughs in other fields powered by AI. Just the Hollywood of saas. We have magic scifi tech, and the rest of the world should look the part!
16
5
199
14,082
We beat Pokemon Red with online RL! Details here over the next several days. Led by @dsrubinstein. Follow him, me, @DanAdvantage, @kywch500, @computerender for more!
Excited to finally share our progress in developing a reinforcement learning system to beat PokΓ©mon Red. Our system successfully completes the game using a policy under 10M parameters, PPO, and a few novel techniques. Blog posted below
8
20
204
145,608
NeurIPS D&B reviews are out. Well done Academia. I'm out. No more papers
9
5
193
105,096
After over a year of full-time development, PufferLib has reached 3,000 stars and ~2,000 Discord members! The future of reinforcement learning is grassroots OSS + ultra performant simulation
5
5
197
9,410