Jakob Foerster · Apr 21, 2025 · 4:14 PM UTC

Jakob Foerster

Pinned Tweet

Jakob Foerster

@j_foerst

21 Apr 2025

Making offline RL more honest, reproducible, and robust.

Matthew Jackson @JacksonMattT

18 Apr 2025

🌹 Today we're releasing Unifloral, our new library for Offline Reinforcement Learning! We make research easy: ⚛️ Single-file 🤏 Minimal ⚡️ End-to-end Jax Best of all, we unify prior methods into one algorithm - a single hyperparameter space for research! ⤵️

133

61,192

Jakob Foerster · Oct 9, 2024 · 2:35 PM UTC

Jakob Foerster

@j_foerst

9 Oct 2024

When I discussed quitting Google to do a Phd, my manager, Steve Cheng, gave me the advice of "6 shots": Doing something meaningful usually takes about 5 years and we are productive for roughly 30 years. That gives you 6 attempts. So pick each one carefully and give it your best.

132

2,256

21,301

1,374,109

Jakob Foerster · Oct 9, 2024 · 2:53 PM UTC

Jakob Foerster

@j_foerst

9 Oct 2024

At the meta level, looking back I think it's mindboggling how much positive impact a few minutes of good advice can have. Giving (and listening) to life advice is one of the highest ROI activities ever.

1,759

87,460

Jakob Foerster · May 17, 2022 · 4:01 PM UTC

Jakob Foerster

@j_foerst

17 May 2022

I drafted a quick "How to" guide for writing ML papers. I hope this will be useful (if a little late!) for #NeurIPS2022. Happy paper writing and best of luck!! docs.google.com/document/d/1…

How to ML Paper - A brief Guide

How to ML Paper - A brief Guide Feel free to comment / share and happy paper writing! Also, please see caveats* below. If you like this, why not follow How to ML on Twitter and share the advice/love?...

docs.google.com

281

1,367

Jakob Foerster · Nov 4, 2024 · 3:51 PM UTC

Jakob Foerster

@j_foerst

4 Nov 2024

Currently Deep RL is going through an imagenet moment and very few people are aware. This has major implications for RL applications and anyone interested in modeling behaviour (e.g. Econ and neuroscience). To find out more watch my recent talk @ICML2024: slideslive.com/39022179

Jakob Foerster · Reinforcement Learning at the Hyperscale

Deep reinforcement learning is currently undergoing a revolution of scale, fuelled by jointly running the environment, data collection, and training loop on the GPU, which has resulted in orders of...

slideslive.com

113

836

69,288

Jakob Foerster · Oct 23, 2024 · 5:19 PM UTC

Jakob Foerster

@j_foerst

23 Oct 2024

Cold emails are hard and good ones can change a life. Here is my email to @NandoDF that started my career in ML (at the time I was a PM at Google) docs.google.com/document/d/1… Real effort (incl feedback) went into drafting it. Thanks to @EugeneVinitsky for nudging me to put it online

Jakob Foerster's 2015 Cold Email to Nando de Freitas

Subject Follow-up: PhD positions this fall Body Hi Nando, I would like to follow up on the first contact my friend Christoph made today in your Machine Learning lecture at Oxford. I am very interes...

docs.google.com

736

326,475

Jakob Foerster · Sep 21, 2022 · 7:12 PM UTC

Jakob Foerster

@j_foerst

21 Sep 2022

My "How to ML Paper - A brief Guide" (docs.google.com/document/d/1…) is getting visitors again! Good luck with your #ICLR2023 submissions :)

618

Jakob Foerster · Jun 29, 2025 · 1:57 PM UTC

Jakob Foerster

@j_foerst

29 Jun 2025

I was working at Google before my PhD. But quitting tech to do a PhD allowed me to retool/retrain and to rebrand. Both our skills and how others see them can be limiting factors.

Noam Brown

@polynoamial

28 Jun 2025

You don’t need a PhD to be a great AI researcher. Even @OpenAI’s Chief Research Officer doesn’t have a PhD.

573

54,315

Jakob Foerster · Nov 26, 2024 · 6:09 PM UTC

Jakob Foerster

@j_foerst

26 Nov 2024

My group at Oxford (@FLAIR_Ox) is talent rich but GPU poor (both compared to industry), so adding more GPUs would be a win for open science, but is difficult to finance from grants. Does anyone have leads for possible donors? Christmas is coming up so I guess I am allow to dream

571

75,876

Jakob Foerster · Dec 5, 2020 · 6:21 PM UTC

Jakob Foerster

@j_foerst

5 Dec 2020

The gradient is a locally greedy direction. Where do you get if you follow the eigenvectors of the Hessian instead? Our new paper, “Ridge Rider” (papers.nips.cc/paper/2020/fi…), explores how to do this and what happens in a variety of (toy) problems (if you dare to do so),.. Thread 1/N

560

Jakob Foerster · Dec 14, 2023 · 5:31 PM UTC

Jakob Foerster

@j_foerst

14 Dec 2023

LLMs are finally catching up to deep RL - we have been training on test from long before it was cool.

521

70,917

Jakob Foerster · Sep 11, 2019 · 4:26 PM UTC

Jakob Foerster

@j_foerst

11 Sep 2019

Excited to be starting as an Assistant Prof (👨‍🎓!!) at the @UofT (Scarborough Campus) w/ appointment at the @VectorInst in September of 2020. I am looking for exceptional Master/PhD students and Postdocs to be starting with me next fall. Till then, ..

526

Jakob Foerster · Oct 1, 2022 · 8:57 AM UTC

Jakob Foerster

@j_foerst

1 Oct 2022

"this amounts to solving the multi-agent planning problem" Tesla has now realised that self-driving is a multi-agent problem.. piped.video/ODSJsviD_SU?t=3997 4 years ago I tried to explain to @elonmusk that once CV etc was working, this was the next frontier. He said SL is all you need.

442

Jakob Foerster · Nov 12, 2025 · 11:36 AM UTC

Jakob Foerster

@j_foerst

12 Nov 2025

FAIRwell, @ylecun. you will be missed.

460

83,159

Jakob Foerster · Oct 22, 2024 · 2:59 PM UTC

Jakob Foerster

@j_foerst

22 Oct 2024

Life update! I have returned to FAIR (@AIatMeta) 50% of my time where I'll be supporting @yorambac in building up the Multi-Agent Universal Intelligence (MAUI) team in London. Instead of playing catchup, MAUI's mission are methods which allow open-source and science to leapfrog!

422

34,017

Jakob Foerster · Oct 21, 2024 · 4:35 PM UTC

Jakob Foerster

@j_foerst

21 Oct 2024

unpopular opinion: ML conferences should charge $100 per submission. For accepted papers this would count towards the registration fee of the attending author, so it's free. Extra funds collected could be used eg. for replication studies or other improvement to the review process

380

86,319

Jakob Foerster · Oct 9, 2024 · 7:16 PM UTC

Jakob Foerster

@j_foerst

9 Oct 2024

Replying to @tosinolaseinde @MuyiwaSaka

I quit and did the PhD. One of the best decisions. Have used this framework since as well for other big decisions.

372

27,674

Jakob Foerster · Oct 12, 2021 · 2:46 PM UTC

Jakob Foerster

@j_foerst

12 Oct 2021

Personal update: I just started as an Associate Prof in the engineering department @UniofOxford (and Tutorial Fellow @StAnnesCollege). It’s an incredible honour to return to this beautiful city and to have the chance to work with brilliant, friendly colleagues and students..

366

Jakob Foerster · Jul 12, 2024 · 8:16 AM UTC

Jakob Foerster

@j_foerst

12 Jul 2024

Waymo car failing to coordinate w/ another Waymo (credits in the comment). Interesting to see a toy example from my grant applications play out in the real world. Two cars playing a best-response to a human driver model are not mutually compatible, multi-agent challenges are real

video.mp4

349

65,324

Jakob Foerster · Sep 2, 2020 · 1:36 AM UTC

Jakob Foerster

@j_foerst

2 Sep 2020

Dear Reviewer: I don't really mind that you gave a low score because you had a suggestion for simplifying our method. I do mind that you evidently didn't read our rebuttal, where we tried your idea, showed that it doesn't work and explain why. We can all do better. Thanks a lot.

321

Jakob Foerster · Oct 23, 2025 · 11:31 PM UTC

Jakob Foerster

@j_foerst

23 Oct 2025

Google brain around 2016 also was a very special place. People were pursuing a ton of diverse, exploratory and ambitious directions to push the field forward. Here's a section of @JeffDean's Google Brain "2017 Look-back", see if you can spot the transformer :) The full document is in the link below and is full of wisdom. It also features many of the ideas that are now finally becoming mainstream and some alternative approaches that have been forgotten by the community. Needless to say that many of the current "big shots" in AI were at brain during that period (or had just left, @ilyasut!), often as interns (like me) or AI residents.

327

92,772

Jakob Foerster · Oct 23, 2025 · 8:20 PM UTC

Jakob Foerster

@j_foerst

23 Oct 2025

Yuandong was my manager during my first stint at FAIR. A fantastic researcher. Thank you for everything you have done for FAIR, Meta, and beyond (..and for taking any residual awkwardness out of being layed off by big tech..)

Yuandong Tian

@tydsh

23 Oct 2025

Several of my team members + myself are impacted by this layoff today. Welcome to connect :)

314

59,257

Jakob Foerster · Nov 30, 2023 · 6:30 PM UTC

Jakob Foerster

@j_foerst

30 Nov 2023

The field used to be 30 years behind Jürgen's ideas, now we have reduced the collective lag to 8 years thanks to OpenAl. If you extrapolate we might catch up by 2027. Singularity is near?

Jürgen Schmidhuber

@SchmidhuberAI

30 Nov 2023

Q*? 2015: reinforcement learning prompt engineer in Sec. 5.3 of “Learning to Think...” arxiv.org/abs/1511.09249. A controller neural network C learns to send prompt sequences into a world model M (e.g., a foundation model) trained on, say, videos of actors. C also learns to interpret answers of M, extracting algorithmic information from M. Acid test: does C learn its control tasks faster with M than without? Is it cheaper to learn C’s tasks from scratch, or to address algorithmic info in M in some computable way, enabling things such as abstract hierarchical planning and reasoning? 2018: collapsing C and M into a single network arxiv.org/abs/1802.08864 using the neural network distillation of 1991 nitter.app/SchmidhuberAI/st… 1990: online planning & reinforcement learning with recurrent world models and artificial curiosity / GANs: people.idsia.ch/~juergen/wor…

284

90,979

Jakob Foerster · Oct 23, 2024 · 10:33 AM UTC

Jakob Foerster

@j_foerst

23 Oct 2024

Doing a PhD in ML and tired of playing catch-up w arxiv and X? Catch yourself wondering what's next after LLMs run out of human data? Come do an internship with our Multi-Agent Universal Intelligence team at @AIatMeta to find out! Updated link metacareers.com/jobs/4498396… w @yorambac

281

31,568

Jakob Foerster · Sep 2, 2025 · 12:29 PM UTC

Jakob Foerster

@j_foerst

2 Sep 2025

After thousands of papers on meta-learning, the approach that ended up being successful (ICL) was an accidental byproduct of language modeling. Serendipity at its best and a good reminder that research needs to be open-ended and pursue a diversity of goals to escape local minima.

260

25,270

Jakob Foerster · Dec 13, 2023 · 6:22 PM UTC

Jakob Foerster

@j_foerst

13 Dec 2023

If I was @sundarpichai I would try to buy @perplexity_ai, urgently. Best time was a year ago, second best time is now. It's not good to be the second best product on the market in an area that's 90% (?) of your profit...

240

94,495

Jakob Foerster · Feb 15, 2018 · 3:16 PM UTC

Jakob Foerster

@j_foerst

15 Feb 2018

Excited to share "DiCE: The Infinitely Differentiable Monte Carlo Estimator": arxiv.org/abs/1802.05098 Try this one weird objective for correct any-order gradient estimators in all your stochastic graphs ;) With fantastic Oxford/CMU team: @greg_far @alshedivat @_rockt @shimon8282

225

Jakob Foerster · Nov 22, 2024 · 4:10 PM UTC

Jakob Foerster

@j_foerst

22 Nov 2024

Joao Henriques (joao.science) and I are hiring a fully funded PhD student (UK/international) for the FAIR-Oxford program. The student will spend 50% of their time @UniofOxford and 50% @AIatMeta (FAIR), while completing a DPhil (Oxford PhD). Deadline: 2nd of Dec AOE!!

231

51,567

Jakob Foerster · Jul 29, 2025 · 1:59 PM UTC

Jakob Foerster

@j_foerst

29 Jul 2025

I recently had a lunch time conversation with a very senior AI researcher about how are multi-agent problems differ from single agent (their starting point was they do not). One point that made them think: As computers scale, the rest of the world (i.e. no agentic parts) is not going to speed up or get more clever, so compute-scaling methods will succeed (think single agent robotics). In contrast, other agents will also become smarter/faster. So finding successful methods here is not a question of compute alone. No matter how much compute I have for decision making, I will be compute limited if I need to model other agents in the environment with the same budget as part of my inner loop. As a corollary it follows that in the (long term) future almost all flops will be spend on simulating other agents. Not many know this and you are invited to consider the implications for a second.

233

31,551

Jakob Foerster · Jan 26, 2019 · 4:29 PM UTC

Jakob Foerster

@j_foerst

26 Jan 2019

This is been an amazing journey that many of you have been part of. A true multi-agent endeavour 🤖😎 🤖😃🤖!! Huge thanks to the collaborators, friends, and institutions that made this possible.. Yours sincerely, Dr. Foerster (still getting used to it.. )

WhiRL (Now Part of BOLD)@whi_rl

26 Jan 2019

Huge congratulations to Dr. Jakob Foerster (@j_foerst) who successfully defended his PhD thesis "Deep Multi-Agent Reinforcement Learning" this week! 🎉🤓🎲🎓

200

Jakob Foerster · Jul 12, 2022 · 7:04 PM UTC

Jakob Foerster

@j_foerst

12 Jul 2022

Can an agent learn to optimise an MDP, while simultaneously encoding secret messages in its actions? Our ICML 2022 paper “Communicating via Markov Decision Processes” (arxiv.org/abs/2107.08295) shows: yes, indeed! @casdewitt, @MaxiIgl, @luisa_zintgraf, @zicokolter, @shimon8282 🧵

185

Jakob Foerster · Jan 21, 2025 · 4:53 PM UTC

Jakob Foerster

@j_foerst

21 Jan 2025

RL has always been the future and the future is now. Having an open-source version released _before_ major closed-source labs managed to rediscover this internally (as far as I know) is amazing.

Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)

@rao2z

21 Jan 2025

So @karthikv792 checked out @deepseek_ai's R1 LRM on PlanBench (arxiv.org/abs/2206.10498)--and found that it is very much competitive with o1 (preview), but at a fraction of the cost. The fact that it is open source and doesn't hide its intermediate tokens opens up a rich avenue for understanding LRMS based on RL post-training. 1/

185

22,848

Jakob Foerster · Sep 11, 2025 · 1:39 PM UTC

Jakob Foerster

@j_foerst

11 Sep 2025

Replying to @_Mira___Mira_

Yes, we have done that! openai.com/index/nonlinear-c… 3 Linear layers is all you need for 99% accuracy on MNist

Nonlinear computation in deep linear networks

openai.com

183

32,778

Jakob Foerster · Sep 18, 2019 · 5:31 PM UTC

Jakob Foerster

@j_foerst

18 Sep 2019

Thesis is online. Sorry for the delay & enjoy! ora.ox.ac.uk/objects/uuid:a5… Huge thanks to everyone involved in this multi-agent endeavor! 👨‍🎓👨‍🎓👨‍🎓..🤖🤖

169

Jakob Foerster · Nov 26, 2018 · 4:46 PM UTC

Jakob Foerster

@j_foerst

26 Nov 2018

Our practitioners guide for turning RL into a differentiable loss function with any order gradients is now available as a blog post with code examples. Huge thanks to @y0b1byte for pushing this!

WhiRL (Now Part of BOLD)@whi_rl

26 Nov 2018

We have a new blog post! Using higher-order gradients in your research? Working on Meta-Learning in RL? Learn about DiCE, an objective for correct any-order gradient estimators in stochastic graphs! 🤓🎲 whirl.cs.ox.ac.uk/blog/dice-…

161

Jakob Foerster · Jan 21, 2023 · 7:34 PM UTC

Jakob Foerster

@j_foerst

21 Jan 2023

Currently very little credit goes to the reviewers ('critics') compared to the authors ('generators'). As technology makes it easier and easier to generate ML papers, that balance needs to swift radically. Once it's easy to generate all papers, judging the good ones is the work

152

84,830

Jakob Foerster · Mar 20, 2023 · 8:23 PM UTC

Jakob Foerster

@j_foerst

20 Mar 2023

I am extremely grateful to my wonderful collaborators across different institutions and timezones who helped sharpen my thinking about coordination problems from a principled pov. This #ERCStG is an exciting next step towards machines that work smoothly and safely w/ humans 🤖+👤

Engineering Science, Oxford @oxengsci

20 Mar 2023

Professor Jakob Foerster has been awarded a 2.3m Euro, 5-year @ERC_Research starting grant to develop foundational #machinelearning algorithms for human-AI coordination in complex settings such as situations where humans & robots work alongside each other eng.ox.ac.uk/news/grant-to-f…

The research will set the foundations for AI systems that interact smoothly with human users in complex settings such as mixed-autonomy teams or traffic situations. This could have important applications in situations where humans and robots work alongside each other, such as in warehouses or service settings.

ALT The research will set the foundations for AI systems that interact smoothly with human users in complex settings such as mixed-autonomy teams or traffic situations. This could have important applications in situations where humans and robots work alongside each other, such as in warehouses or service settings.

158

18,880

Jakob Foerster · Nov 3, 2022 · 6:37 PM UTC

Jakob Foerster

@j_foerst

3 Nov 2022

Today I was approached by an expert in the area of competitive games who shared their concerns about this work with me. Since I believe this feedback will be useful for the community and understand they like to protect their anonymity I am sharing it below 0/N

Adam Gleave

@ARGleave

2 Nov 2022

Even superhuman RL agents can be exploited by adversarial policies. In arxiv.org/abs/2211.00241 we train an adversary that wins 99% of games against KataGo 🖥️ set to top-100 European strength. Below our adversary 😈=⚫ plays a surprising strategy that tricks 🖥️=⚪ into losing.🧵

149

Jakob Foerster · Sep 3, 2020 · 7:56 PM UTC

Jakob Foerster

@j_foerst

3 Sep 2020

PSA: As scientists we spend a lot of time in meetings, but typically don't get much guidance (if any) on how to make them effective. Here are a few best practices around note-keeping I adopted for research meetings (incl. supervision etc.) from my time as a product manager:1/6

150

Jakob Foerster · Dec 14, 2023 · 11:50 AM UTC

Jakob Foerster

@j_foerst

14 Dec 2023

It's time for ML academia to cut the cord/ our reliance on big tech. @NeurIPS and other ML conferences need to commit to and require open, reproducible science, rather than falling for PR gigs and product placements disguised as science. For better or worse the honeymoon is over.

Alex Hernandez-Garcia @alexhdezgcia

13 Dec 2023

The panel discussion at @NeurIPSConf about LLMs and beyond has just featured three panelists who were not willing to speak about the details of their work. It's secret stuff. Is this appropriate at a scientific conference?

149

29,550

Jakob Foerster · Jan 22, 2023 · 3:18 PM UTC

Jakob Foerster

@j_foerst

22 Jan 2023

Google invented the transformer and legacy auto developed the technology for early EVs. Both entities are now in "code red". Does anyone know other examples of this pattern? Also, it should have a name!

137

78,886

Jakob Foerster · Oct 15, 2019 · 11:39 PM UTC

Jakob Foerster

@j_foerst

15 Oct 2019

BBC headline: "Robot hand solves Rubik’s cube, but not the grand challenge". Also: "..OpenAI’s research paper was not peer-reviewed." Reporting on AI progress seems to be getting a lot more nuanced/accurate recently, a step in the right direction!(from:bbc.com/news/technology-5006…)

Robot hand solves Rubik’s cube, but not the grand challenge

Researchers hail robot hand as step in the right direction towards a multi-purpose robot at home or at work.

bbc.com

138

Jakob Foerster · Aug 10, 2020 · 1:31 AM UTC

Jakob Foerster

@j_foerst

10 Aug 2020

If you are disappointed/sad about @NeurIPSConf reviews, remember: a) Reviews are extremely noisy b) A good rebuttal can work magic c) Rejected papers have become best papers d) Look out for actionable insights, even if you disagree w/ score e) you may have been fortunate so far

135

Jakob Foerster · Mar 5, 2025 · 4:13 PM UTC

Jakob Foerster

@j_foerst

5 Mar 2025

What a well-timed Turing Award for a fantastic contribution. Great credit assignment :)

Association for Computing Machinery @TheOfficialACM

5 Mar 2025

Meet the recipients of the 2024 ACM A.M. Turing Award, Andrew G. Barto and Richard S. Sutton! They are recognized for developing the conceptual and algorithmic foundations of reinforcement learning. Please join us in congratulating the two recipients! bit.ly/4hpdsbD

137

4,809

Jakob Foerster · Aug 13, 2024 · 6:14 AM UTC

Jakob Foerster

@j_foerst

13 Aug 2024

Scientific progress is one of humanity's most impressive and impactful intellectual achievements. We introduce The AI Scientist, the first AI to carry out end-to-end science, from ideation to implementation, data analysis, struggling w/ latex, reviewing and iterative improvement!

Sakana AI

@SakanaAILabs

13 Aug 2024

Introducing The AI Scientist: The world’s first AI system for automating scientific research and open-ended discovery! sakana.ai/ai-scientist/ From ideation, writing code, running experiments and summarizing results, to writing entire papers and conducting peer-review, The AI Scientist opens a new era of AI-driven scientific research and accelerated discovery. Here are 4 example Machine Learning research papers generated by The AI Scientist. We published our report, The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery, and open-sourced our project! Paper: arxiv.org/abs/2408.06292 GitHub: github.com/SakanaAI/AI-Scien… Our system leverages LLMs to propose and implement new research directions. Here, we first apply The AI Scientist to conduct Machine Learning research. Crucially, our system is capable of executing the entire ML research lifecycle: from inventing research ideas and experiments, writing code, to executing experiments on GPUs and gathering results. It can also write an entire scientific paper, explaining, visualizing and contextualizing the results. Furthermore, while an LLM author writes entire research papers, another LLM reviewer critiques resulting manuscripts to provide feedback to improve the work, and also to select the most promising ideas to further develop in the next iteration cycle, leading to continual, open-ended discoveries, thus emulating the human scientific community. As a proof of concept, our system produced papers with novel contributions in ML research domains such language modeling, Diffusion and Grokking. We (@_chris_lu_, @RobertTLange, @hardmaru) proudly collaborated with the @UniOfOxford (@j_foerst, @FLAIR_Ox) and @UBC (@cong_ml, @jeffclune) on this exciting project.

127

25,362

Jakob Foerster · Dec 16, 2023 · 9:31 PM UTC

Jakob Foerster

@j_foerst

16 Dec 2023

Diffusion models have revolutionised a number of areas in ML, now they are coming for offline RL. In our paper we guide the samples to be closer to our current policy, reducing the off-policy-ness of the generated data. This will unlock novel world applications of off-policy RL.

Matthew Jackson @JacksonMattT

16 Dec 2023

Come check out a sneak peek of our work **Policy-Guided Diffusion** today at the NeurIPS Workshop on Robot Learning! Using offline data, we generate entire trajectories that are: ✅ On-policy, ✅ Without compounding error, ✅ Without model pessimism!

131

73,662

Jakob Foerster · Jul 31, 2025 · 1:47 PM UTC

Jakob Foerster

@j_foerst

31 Jul 2025

I used to think that sharing research ideas and insights publicly and informally (e.g. here) would universally increase the likelihood of those ideas becoming a reality. However, it could also have the opposite effect of creating "scorched earth" since readers who independently had had the same idea might now assume someone else is going to do it and may no longer feel ownership of their original insight. I don't have a good answer to this, but I think it's worth thinking about. One option might be sharing those ideas with a small randomly selected subset.

126

16,392

Jakob Foerster · Oct 9, 2024 · 2:51 PM UTC

Jakob Foerster

@j_foerst

9 Oct 2024

Replying to @firstadopter

nah, realistic

110

32,058

Jakob Foerster · Dec 16, 2024 · 9:34 PM UTC

Jakob Foerster

@j_foerst

16 Dec 2024

flying back from #NeurIPS2024: Academia and open-source are starting to "feel the AGI". if we coordinate better, we have magnitudes more brain power and creativity than all of the closed labs. new coordination tools also help prepare for and align AGI. win-win. 🧵

122

11,820

Jakob Foerster · Jun 18, 2025 · 12:27 PM UTC

Jakob Foerster

@j_foerst

18 Jun 2025

I suggest a new metric: Pass@1/K. For a given "K" You only get a point if all "K" attempts were successful. So it's a continuation of the Pass@K graph to the left hand site and intuitively measures robustness / confidence.

121

11,201

Jakob Foerster · Apr 10, 2024 · 4:38 PM UTC

Jakob Foerster

@j_foerst

10 Apr 2024

Diffusion is an extremely powerful and general purpose approach - here we combine it with _policy guidance_ to improve the distribution mismatch in offline RL, which in turn offers the chance to bring RL to the real world without having to collect online data.

Matthew Jackson @JacksonMattT

10 Apr 2024

🎮 Introducing the new and improved Policy-Guided Diffusion! Vastly more accurate trajectory generation than autoregressive models, with strong gains in offline RL performance! Plus a ton of new theory and results since our NeurIPS workshop paper... Check it out ⤵️

112

13,289

Jakob Foerster · Dec 6, 2019 · 5:44 PM UTC

Jakob Foerster

@j_foerst

6 Dec 2019

The research on Hanabi just got a lot more exciting - today we are adding search to the mix, vastly improving upon the previous SOTA 🎆🎇🤖 We are open-sourcing all code, incl. a new RL method and trained agents. A cooperative effort with @adamlerer, Hengyuan Hu, @polynoamial

AI at Meta

@AIatMeta

6 Dec 2019

To advance research on AI that can understand others’ points of view and collaborate effectively, Facebook AI has developed a bot that sets a new state of the art in Hanabi, a card game in which all players work together. ai.facebook.com/blog/buildin…

112

Jakob Foerster · Oct 9, 2024 · 8:44 AM UTC

Jakob Foerster

@j_foerst

9 Oct 2024

I am going on the record with this - when I grow up, I want to be like Geoff.

Jonathan Mannhart 🔎🔸@JMannhart

9 Oct 2024

“I'd also like to acknowledge my students (…) they've gone on to do many great things. I'm particularly proud of the fact that one of my students fired Sam Altman.“ 😳🫡

110

13,802

Jakob Foerster · May 27, 2025 · 3:19 PM UTC

Jakob Foerster

@j_foerst

27 May 2025

Hello World: My team at FAIR / @metaai (AI Research Agent) is looking to hire contractors across software engineering and ML. If you are interested and based in the UK, please fill in the following short EoI form: docs.google.com/forms/d/e/1F…

AIRA contracting - Expression of Interest

We are looking for contractors. If you have a track record of ML-Ops and / or SWE excellence and are looking to work with us on a contracting basis, please fill in below.

docs.google.com

114

15,851

Jakob Foerster · Apr 6, 2018 · 12:48 AM UTC

Jakob Foerster

@j_foerst

6 Apr 2018

Train and test sets for RL?! What is this, the 21st century??

OpenAI

@OpenAI

5 Apr 2018

Introducing the OpenAI Retro Contest — a contest where agents use their past experience to adapt to new environments: blog.openai.com/retro-contes…

107

Jakob Foerster · Jul 29, 2025 · 11:43 AM UTC

Jakob Foerster

@j_foerst

29 Jul 2025

I have a new example why Multi-Agent / Zero-shot coordination matters in the real world for my slides! Interestingly, this problem is going to get worse rather than better as we deploy more autonomous vehicles: Currently humans act as a robust regularizer of the system and AVs can usually play a safe "best response" (such as being passive) while the humans navigate around them. I expect that the tail of this distribution requires large scale Multi-Agent training in simulation using self-Play. Self-Play opens the door to emergent protocols and over-coordination, i.e. learning policies that are not compatible with independently trained agents. I coined this the "zero-shot coordination" challenge a few years ago and it's still wide open, while also rapidly becoming relevant to real world of agentic AI.

No Safe Words

@Cyber_Trailer

28 Jul 2025

San Francisco, CA (today). This is a banger. Gets better and better as the video goes on. File under ‘Non Safety Critical’ And under ‘WTF’

104

11,644

Jakob Foerster · Jul 7, 2025 · 1:17 PM UTC

Jakob Foerster

@j_foerst

7 Jul 2025

In May I missed a single email from openreview saying I'd be auto-enlisted as a reviewer. Then a few ACs missed my immediate and repeated messages on openreview saying that I won't be able to review since I'll be taking the second half of my paternity leave. Now all of my co-authors (most of them junior phd students) are getting emails that their papers are at risk of being desk rejected since I haven't submitted my reviews. @NeurIPSConf - I appreciate the intent here but this is not good.

104

16,012

Jakob Foerster · May 15, 2024 · 8:00 PM UTC

Jakob Foerster

@j_foerst

15 May 2024

Good luck with @NeurIPSConf 2024 submissions.. docs.google.com/document/d/1…

How to ML Paper - A brief Guide

docs.google.com

12,502

Jakob Foerster · Jun 30, 2025 · 2:01 PM UTC

Jakob Foerster

@j_foerst

30 Jun 2025

The AIRA team @metaai has the ambitious goal of building/training an agent that can do frontier AI research to help the open-source ecosystem leapfrog closed source LLMs. As a relatively small team we cannot succeed in this mission without the support of the community so we'll be open-sourcing our tools, methods, and benchmarks along the way. 🚨Meet our LLM Speedrunning Benchmark,🚨 which probes the ability of LLM agents to do LLM engineering in the "GPT2 speedrun", which is fast enough for efficient, high signal evals,. Crucially, past human records provide an existence proof for higher performance and allow us to test where the limiting factors for performance are (ideation vs implementation). Spoiler: both are currently a problem! Stay tuned - we are just getting started - and (even better) join the journey!

Minqi Jiang

@MinqiJiang

30 Jun 2025

Recently, there has been a lot of talk of LLM agents automating ML research itself. If Llama 5 can create Llama 6, then surely the singularity is just around the corner. How can we get a pulse check on whether current LLMs are capable of driving this kind of total self-improvement? Well, we know humans are pretty good at improving LLMs. In the NanoGPT speedrun challenge, created by @kellerjordan0, human researchers iteratively improved @karpathy's GPT-2 replication, slashing the training time (to the same target validation loss) from 45 minutes to under 3 minutes in just under a year (!). Surely, a necessary (but not sufficient) ability for an LLM that can automatically improve frontier techniques is the ability to *reproduce* known innovations on GPT-2, a tiny language model from over 5 years ago. 🤔 So we took several of the top models and combined them with various search scaffolds to create *LLM speedrunner agents*. We then asked these agents to reproduce each of the NanoGPT speedrun records, starting from the previous record, while providing them access to different forms of hints that revealed the exact changes needed to reach the next record. The results were surprising—not because we thought these agents would ace the benchmark, but because even the best agent failed to recover even half of the speed-up of human innovators on average in the easiest hint mode, where we show the agent the full pseudocode of the changes to the next record. We believe The Automated LLM Speedrunning Benchmark provides a simple eval for measuring the lower bound of LLM agents’ ability to reproduce scientific findings close to the frontier of ML. Beyond scientific reproducibility, this benchmark can also be run without hints, transforming into an automated *scientific innovation* benchmark. When run in "innovation mode," this benchmark effectively extends the NanoGPT speedrun to AI participants! While initial results here indicate that current agents seriously struggle to match human innovators beyond just a couple of records, benchmarks have a tendency to fall. This one is particularly exciting to watch, as new state-of-the-art here by definition implies a form of *superhuman innovation*.

102

12,793

Jakob Foerster · Oct 18, 2023 · 4:52 PM UTC

Jakob Foerster

@j_foerst

18 Oct 2023

Moving JAX has been a huge change (i.e. 1000x speedup) for our RL work at @FLAIR_Ox, it's really exciting to see Google Brain following suit here!! See our purejax library for sota implementations: github.com/luchris429/pureja…

GitHub - luchris429/purejaxrl: Really Fast End-to-End Jax RL Implementations

Really Fast End-to-End Jax RL Implementations. Contribute to luchris429/purejaxrl development by creating an account on GitHub.

github.com

Google DeepMind

@GoogleDeepMind

18 Oct 2023

Introducing MuJoCo 3.0: a major new release of our fast, powerful and open source tool for robotics research. 🤖 📈 GPU & TPU acceleration through #JAX 🖼️ Better simulation of more diverse objects - like clothes, screws, gears and donuts 💡 Find out more: mujoco.org/3

Announcing MuJoCo 3.0

17,647

Jakob Foerster · Dec 22, 2019 · 10:54 PM UTC

Jakob Foerster

@j_foerst

22 Dec 2019

Our "Simplified Action Decoder" (openreview.net/forum?id=B1xm…, w/ Hengyuan Hu), current SOTA for RL(w/o search) on 2-5 player Hanabi🎇 will be #ICLR spotlight! The code, github.com/facebookresearch/…, includes trained agents and fast Pytorch version of R2D2&Ape-x 🔥github.com/facebookresearch/…🔥

Simplified Action Decoder for Deep Multi-Agent Reinforcement Learning

We develop Simplified Action Decoder, a simple MARL algorithm that beats previous SOTA on Hanabi by a big margin across 2- to 5-player games.

openreview.net

Jakob Foerster · Dec 14, 2020 · 11:57 AM UTC

Jakob Foerster

@j_foerst

14 Dec 2020

When you wonder whether your WiFi isn't working because #Gmail, #Youtube and #GoogleDrive aren't responding. #Googledown?

Jakob Foerster · Dec 28, 2023 · 3:00 PM UTC

Jakob Foerster

@j_foerst

28 Dec 2023

How do you explain LLMs to the younger generation? @UniofOxford asked me to produce a 90s explainer, targeted at a TikTok audience. I don't use TioTok, but here is my attempt - feedback welcome and happy holidays!

University of Oxford

@UniofOxford

28 Dec 2023

EXPLAINED: What is an LLM? 🤔 Associate Prof @j_foerst shares everything you need to know about LLM (large language model) in 90 seconds. #OxfordAI

14,366

Jakob Foerster · Mar 10, 2020 · 2:52 AM UTC

Jakob Foerster

@j_foerst

10 Mar 2020

Moving beyond self-play: Communication, cooperation and coordination with humans and other AI systems zero-shot is one of the exciting frontiers of multi-agent learning. "Other-Play" is an exciting step is this direction! Thanks to a team of fantastic collaborators 🎇🎇🤖🙎‍♀️🎇🤖!

hengyuan-hu @HengyuanH

10 Mar 2020

How can we learn policies that can coordinate w/ humans (w/o human data)? 'Other-Play' (arxiv.org/abs/2003.02979 w/ @adamlerer @alex_peys @j_foerst) uses symmetries to avoid 'over-coordinating' during training. Final policies coordinate better w/ humans and bots in Hanabi🎇🙎‍♀️🤖🎇!

Jakob Foerster · Dec 10, 2022 · 1:37 AM UTC

Jakob Foerster

@j_foerst

10 Dec 2022

I am looking for an acronym for "Good Old Fashioned Machine Learning", i.e. supervised/RL systems etc that are trained for and good at a specific set of task and definitely know nothing about everything else (which is quite comforting). "GOFML" doesn't really roll off the tongue

Jakob Foerster · Apr 29, 2024 · 8:48 AM UTC

Jakob Foerster

@j_foerst

29 Apr 2024

I am honoured to have been awarded an Amazon Research Award for our proposal "Compute-only Scaling of Large Language Models" (i.e. Q* before it was cool!). Thanks to @AmazonScience amazon.science/research-awar… and to my amazing students @clockwk7 & @JonnyCoook! #AmazonResearchAwards

Amazon Research Awards

Collaborating with scientists around the globe to fund research, share knowledge and encourage innovation.

amazon.science

15,773

Jakob Foerster · Jul 19, 2022 · 1:36 AM UTC

Jakob Foerster

@j_foerst

19 Jul 2022

You think you understand why popular algorithms like PPO work? So did we @FLAIR_Ox, but then we “reflected” deeply upon it ;) Check out our @ICMLconf 2022 paper “Mirror Learning: A Unifying Framework of Policy Optimisation” (arxiv.org/pdf/2201.02373.pdf) w/ @kuba_AI, @casdewitt 1/N

Jakob Foerster · Jul 15, 2025 · 11:11 AM UTC

Jakob Foerster

@j_foerst

15 Jul 2025

Our benchmarks measure capabilities. What matters is the ability to learn and adapt. This disconnect is mind boggling.

6,818

Jakob Foerster · Aug 2, 2023 · 7:30 AM UTC

Jakob Foerster

@j_foerst

2 Aug 2023

Great to see activity on our short #HowToMLrebuttal guide -- good luck with #NeurIPS2023 rebuttals! docs.google.com/document/d/1… @HowTo_ML

12,563

Jakob Foerster · Jun 26, 2025 · 6:25 PM UTC

Jakob Foerster

@j_foerst

26 Jun 2025

Multi-agent interactions are the new frontier of AI and the ability to make sense of others (i.e. "theory of mind") is at the core of this 🧑‍🦰 ↔️🤖❓. Surprisingly, this is not commonly tested for in standard benchmarks. We address this with our Decrypto benchmark, which specifically focusses on ToM in a multi-turn setting, isolating it from common confounders such as symbolic reasoning or long term planning. We find LLMs do surprisingly poorly, so a lot of work needs to be done!

Andrei Lupu @_andreilupu

26 Jun 2025

Theory of Mind (ToM) is crucial for next gen LLM Agents, yet current benchmarks suffer from multiple shortcomings. Enter 💽 Decrypto, an interactive benchmark for multi-agent reasoning and ToM in LLMs! Work done with @TimonWilli & @j_foerst at @AIatMeta & @FLAIR_Ox 🧵👇

13,976

Jakob Foerster · Oct 23, 2024 · 5:19 PM UTC

Jakob Foerster

@j_foerst

23 Oct 2024

PS: Did he reply? No -- he was not taking students at the time. But, he did forward it to @shimon8282, then incoming faculty to Oxford, and the rest is history..

14,061

Jakob Foerster · Dec 10, 2024 · 10:15 AM UTC

Jakob Foerster

@j_foerst

10 Dec 2024

En route to #neurips2024 after traveling to Germany so that my wonderful in-laws can help take care of our two-under-two. 2024 has felt accelerated, both at the personal and professional level. Personally, our second son was born, professionally I went 50/50 with FAIR @AIatMeta🧵

9,689

Jakob Foerster · Nov 22, 2022 · 4:50 PM UTC

Jakob Foerster

@j_foerst

22 Nov 2022

If you are looking for a PhD position in ML, why not apply w @FLAIR_Ox? Deadline for applications is 9th Dec, instructions and recent work are on our website: foersterlab.com/research/. I am looking in particular for strong maths skills, creativity, and willingness to work in teams

Jakob Foerster · Nov 9, 2022 · 5:51 PM UTC

Jakob Foerster

@j_foerst

9 Nov 2022

Wow - @CompSciOxford is looking to hire not 1,2 or 3 but 4 (!) professors in CS: cs.ox.ac.uk/aboutus/vacancie…. This is unprecedented (and weirdly timely..!) It's a fantastic department and (you get to collaborate with @oxengsci ;) I highly recommend applying. Deadline is 14th of Dec⏰

Jakob Foerster · Jul 29, 2025 · 2:08 PM UTC

Jakob Foerster

@j_foerst

29 Jul 2025

When I stared my phd in deep learning in 2015 I thought I was late to the party. When I bought a few nvidia shares over a beer at @NeurIPSConf 2016 I was sure I had missed the boat (given a 3x run-up) but told my peers "better late than never". which boat did you miss?

7,998

Jakob Foerster · Feb 27, 2025 · 8:43 AM UTC

Jakob Foerster

@j_foerst

27 Feb 2025

value functions are losing value quickly

15,584

Jakob Foerster · Nov 21, 2024 · 6:41 PM UTC

Jakob Foerster

@j_foerst

21 Nov 2024

🎲Alea iacta est 🎲I am attending my first @NeurIPSConf conference since pre-covid! Super excited to see old friends and make new ones :) I'll be around from the 12th to the 16th, so come find me if you'd like to chat. Oh, and pack your running shoes + gloves.. #runconference

6,456

Jakob Foerster · Nov 22, 2024 · 11:54 AM UTC

Jakob Foerster

@j_foerst

22 Nov 2024

Dear reviewers, please engage. Dear ACs, please remind the reviewers to engage. Thank you everyone!

11,887

Jakob Foerster · Nov 11, 2024 · 4:59 PM UTC

Jakob Foerster

@j_foerst

11 Nov 2024

GenAI is changing the world but struggles with decision making/ taking actions. We push towards a foundation model for 2D control using #RLatTheHyperscale and show both zero-shot generalisation and fast fine-tuning!! All code is open source and you can be the agent!

Michael Matthews @mitrma

11 Nov 2024

We are very excited to announce Kinetix: an open-ended universe of physics-based tasks for RL! We use Kinetix to train a general agent on millions of randomly generated physics problems and show that this agent generalises to unseen handmade environments. 1/🧵

9,331

Jakob Foerster · Feb 5, 2021 · 2:27 AM UTC

Jakob Foerster

@j_foerst

5 Feb 2021

"Complete proofs are in the appendix" (silently crosses fingers)

Jakob Foerster · Feb 19, 2025 · 7:58 PM UTC

Jakob Foerster

@j_foerst

19 Feb 2025

I am late to the party, but the full episode of my @MLStreetTalk is now out! Find it in the comments (pls no downranking, dear ranking system). Btw, I lost the scarf somewhere in Oxford. If you find it, please let me know - @MinqiJiang had gifted it to me and I like it a lot.

11,241

Jakob Foerster · Jul 20, 2024 · 8:18 AM UTC

Jakob Foerster

@j_foerst

20 Jul 2024

1/🚀 @FLAIR_Ox is coming to #icml2024 in Vienna 🎉 (I am literally posting from the train) and we are very excited to share our work with you! You can find us here ⬇️✨ see below 🔗 for clickable links

8,254

Jakob Foerster · Sep 9, 2025 · 1:51 PM UTC

Jakob Foerster

@j_foerst

9 Sep 2025

Come join us! @FLAIR_Ox has a long history of hosting visiting students and we are now trialling a slightly more formal process for a six months internship early 2026. The successful students will become fully-fledged members of FLAIR working on cutting edge ML in a wonderfully supportive environment featuring some of the smartest and nicest people I have ever had the chance to work with 🫶

Foerster Lab for AI Research (now part of BOLD)@FLAIR_Ox

9 Sep 2025

🚨🚨Introducing the FLAIR internship program!🚨🚨 We are looking for two talented students to join us for an internship working in FLAIR for 6 months (5th January to 4th July 2026)! For details and eligibility criteria, please check: foersterlab.com/internship/

13,611

Jakob Foerster · Oct 9, 2024 · 3:15 PM UTC

Jakob Foerster

@j_foerst

9 Oct 2024

Replying to @animesh_garg

great pointer. Personally I'd be happy for _one_ of these long shots to _really_ land. But even that's a high bar.. !

39,573

Jakob Foerster · Nov 20, 2023 · 3:42 PM UTC

Jakob Foerster

@j_foerst

20 Nov 2023

❤️JAX meets multi-agent RL, a match made in heaven❤️ This would have made so many things faster and easier in my life. Can't wait to see the amazing things that people will build on this using _academic compute_. The frontier of the open-world just moved by orders of magnitude 🤯

Chris Lu

@_chris_lu_

20 Nov 2023

Crazy times. Anyways, excited to unveil JaxMARL! JaxMARL provides popular Multi-Agent RL environments and algorithms in pure JAX, enabling an end-to-end training speed up of up to 12,500x! Co-led w/ @alexrutherford0 @benjamin_ellis3 @MatteoGallici Post: blog.foersterlab.com/jaxmarl…

10,359

Jakob Foerster · Jul 12, 2024 · 7:53 AM UTC

Jakob Foerster

@j_foerst

12 Jul 2024

DQN kick-started the field of deep RL 12 years ago, but Q-learning has recently taken a backseat compared to PPO and other on-policy method. We introduce PQN, a greatly simplified version of DQN which is highly GPU compatible and theoretically supported by convergence proofs.

Matteo Gallici @MatteoGallici

12 Jul 2024

🚀 We're very excited to introduce Parallelised Q-Network (PQN), the result of an effort to bring Q-Learning into the world of pure-GPU training based on JAX! What’s the issue? Pure-GPU training can accelerate RL by orders of magnitude. However, Q-Learning heavily relies on replay buffers and target networks, making training computationally slow and memory-intensive on GPUs. As a result, researchers often prefer PPO, leaving Q-Learning behind 🥲 Our solution? Eliminate Q-Learning's legacy components. PQN challenges the standard DQN paradigm by training a Q-Network without replay buffers or target networks: just online Q-Learning with vectorised exploration and network normalization (layer or batch norm). Despite its simplicity, PQN sets a new strong baseline in many single-agent and multi-agent scenarios. Check out the thread for more details 🔥 📄 Paper: arxiv.org/abs/2407.04811 ⚙️ Code: github.com/mttga/purejaxql/ A fantastic collaboration with @mattiefoxcs and @benjamin_ellis3, and the support of the amazing people at @FLAIR_Ox directed by @j_foerst. Inspired by the groundbreaking work of @_chris_lu_ on compiling entire RL pipelines in GPU: github.com/luchris429/pureja…

7,954

Jakob Foerster · Oct 2, 2025 · 7:07 PM UTC

Jakob Foerster

@j_foerst

2 Oct 2025

staying away from X leads to clarity of mind clarity of mind leads to good ideas and insights good ideas and insights want to be shared on X sharing on X leads to engagement on X engagement on X leads to loss of clarity loss of clarity means nothing else to share nothing to share means staying away from X

14,209

Jakob Foerster · Apr 6, 2023 · 4:03 PM UTC

Jakob Foerster

@j_foerst

6 Apr 2023

This is a fundamental shift regarding the RL capabilities of academic research labs. At @FLAIR_Ox we have now done a number of projects on single digit GPUs that would have taken entire data centre to run using prior approaches. 4000x speed-up is quite a big deal, it turns out 🚀

Chris Lu

@_chris_lu_

6 Apr 2023

1/ 🚀 Presenting PureJaxRL: A game-changing approach to Deep Reinforcement Learning! We achieve over 4000x training speedups in RL by vectorizing agent training on GPUs with concise, accessible code. Blog post: chrislu.page/blog/meta-disco… 🧵

12,051

Jakob Foerster · Nov 7, 2018 · 2:57 AM UTC

Jakob Foerster

@j_foerst

7 Nov 2018

Agents learn to communicate by considering beliefs of others🤖📞🤖! Provides a way of exploring in the space of compatible encoders and decoders, getting around the "local minimum" problem of learning communication protocols. Huge thanks to a team of fantastic collaborators!🙏🙏

Google DeepMind

@GoogleDeepMind

6 Nov 2018

Bayesian Action Decoder (arxiv.org/abs/1811.01458): A new multi-agent RL method for learning to communicate via informative actions using ToM-like reasoning. Achieves the best known score for 2 players on the challenging #hanabigame

Jakob Foerster · Feb 22, 2023 · 9:34 AM UTC

Jakob Foerster

@j_foerst

22 Feb 2023

I watched Ex-Machina a few years ago. Looking back, the most unrealistic part of the movie is how much effort the scientists put into physically _isolating and containing_ the AI. Clearly they hadn't realised they can increase stock prises by just unleashing it on humanity ASAP.

6,280

Jakob Foerster · Jul 21, 2022 · 2:12 PM UTC

Jakob Foerster

@j_foerst

21 Jul 2022

Second session of the #runconference 🏃‍♂️ at #ICML2022 was a great success (photos below credit to @pcastr). For anyone who didn't make it today, we'll meet again tomorrow at 8am in front of the hilton.

Jakob Foerster · Oct 25, 2018 · 10:27 PM UTC

Jakob Foerster

@j_foerst

25 Oct 2018

Amazing @PyTorch implementation of our 2016 "Learning to Communicate with Deep MARL" paper. DIAL and RIAL for the win!! Goodbye, @TorchML and welcome to 2018 :) Also, the deadline for our NIPS emergent communication workshop is in 8 days - perfect timing..

Minqi Jiang

@MinqiJiang

25 Oct 2018

If you're interested in teaching deep reinforcement-learning agents to communicate with each other, check out my open-source PyTorch implementation of the classic RIAL and DIAL models by @j_foerst, @iassael, @NandoDF, and @shimon8282: github.com/minqi/learning-to…

Jakob Foerster · Nov 17, 2021 · 12:02 PM UTC

Jakob Foerster

@j_foerst

17 Nov 2021

How can we train RL agents that act optimally, *without* sharing any information between them through emergent conventions? "Off-Belief Learning" finally solves this! It takes the weirdness out of learning in Dec-POMDPs and is a huge leap for human-AI coordination & AI safety🤖🧑‍🔧

hengyuan-hu @HengyuanH

17 Nov 2021

How can AI agents discover human-compatible policies *without requiring human data*? An important step is to develop meaningful, interpretable conventions for communicating information, rather than relying on arbitrary encodings. (1)

Jakob Foerster · Jul 13, 2022 · 3:43 PM UTC

Jakob Foerster

@j_foerst

13 Jul 2022

General-sum games describe many scenarios, from negotiations to autonomous driving. How should an AI act in the presence of other learning agents? Our @icmlconf 2022 paper, “Model-Free Opponent Shaping”(M-FOS) approaches this as a meta-game. @_chris_lu_ @TimonWilli @casdewitt 🧵

Jakob Foerster · Feb 28, 2024 · 3:03 PM UTC

Jakob Foerster

@j_foerst

28 Feb 2024

Are you looking for an RL environment that is: 1) blazing fast 2) open-ended 3) language enabled 4) easy enough to get started on and 5) super fun to play? Your wish has been fulfilled! The only thing that's missing is the multi-agent extension :)

Michael Matthews @mitrma

28 Feb 2024

I’m excited to announce Craftax, a new benchmark for open-ended RL! ⚔️ Extends the popular Crafter benchmark with Nethack-like dungeons ⚡Implemented entirely in Jax, achieving speedups of over 100x 1/

7,990

Jakob Foerster · Oct 5, 2023 · 6:36 PM UTC

Jakob Foerster

@j_foerst

5 Oct 2023

Meta-learning is great, but what distribution of environments shall we train over to enable generalization? And wouldn't curriculum discovery for meta-learning be too compute intensive for a lab in academia? Curious? Then this is for you!

Matthew Jackson @JacksonMattT

5 Oct 2023

Meta-learned policy optimizers have shown incredible generalization, e.g. Grid-World to Atari games. But how do we discover training environments for truly general-purpose optimizers? I'm excited to announce our #NeurIPS2023 work studying this question!

10,092

Jakob Foerster · Oct 27, 2025 · 5:31 PM UTC

Jakob Foerster

@j_foerst

27 Oct 2025

Talent Density X Agency = Fun @FLAIR_Ox

5,745

Jakob Foerster · Jun 27, 2019 · 8:36 PM UTC

Jakob Foerster

@j_foerst

27 Jun 2019

someone asked me recently what breakthrough could prevent a major AI winter in the next 5 years. I said robotics and they looked confused.

Misha Denil @notmisha

27 Jun 2019

This is very impressive.

Jakob Foerster · Jul 13, 2020 · 5:07 PM UTC

Jakob Foerster

@j_foerst

13 Jul 2020

How can RL agents discover policies that can coordinate w/ humans w/o using human data? Why do we have to think beyond self-play and seriously consider Zero-Shot coordination? New (and improved??) 30min video on what I think is an exciting frontier for AI! piped.video/watch?v=VQ8h8kiQ…

Jakob Foerster · Feb 17, 2024 · 5:35 PM UTC

Jakob Foerster

@j_foerst

17 Feb 2024

Ok, it's been 24h so it's time for a resolution: This is a real video recorded by me. The fact that we genuinely can't tell whether this is real or not is really bothersome. Lastly, the audio and *Super-Human* tic-tac-toe (not a thing) were supposed to be little hints / giveaway

6,518