Associate Prof in ML @UniofOxford. Something Something Research Scientist @MetaAI. Something @BOLD_LAB_AI. Always #teamhuman. Opinions belong to the world.

Making offline RL more honest, reproducible, and robust.
🌹 Today we're releasing Unifloral, our new library for Offline Reinforcement Learning! We make research easy: ⚛️ Single-file 🤏 Minimal ⚡️ End-to-end Jax Best of all, we unify prior methods into one algorithm - a single hyperparameter space for research! ⤵️
4
11
133
61,192
When I discussed quitting Google to do a Phd, my manager, Steve Cheng, gave me the advice of "6 shots": Doing something meaningful usually takes about 5 years and we are productive for roughly 30 years. That gives you 6 attempts. So pick each one carefully and give it your best.
132
2,256
21,301
1,374,109
At the meta level, looking back I think it's mindboggling how much positive impact a few minutes of good advice can have. Giving (and listening) to life advice is one of the highest ROI activities ever.
8
57
1,759
87,460
Currently Deep RL is going through an imagenet moment and very few people are aware. This has major implications for RL applications and anyone interested in modeling behaviour (e.g. Econ and neuroscience). To find out more watch my recent talk @ICML2024: slideslive.com/39022179
17
113
836
69,288
Cold emails are hard and good ones can change a life. Here is my email to @NandoDF that started my career in ML (at the time I was a PM at Google) docs.google.com/document/d/1… Real effort (incl feedback) went into drafting it. Thanks to @EugeneVinitsky for nudging me to put it online
16
61
736
326,475
My "How to ML Paper - A brief Guide" (docs.google.com/document/d/1…) is getting visitors again! Good luck with your #ICLR2023 submissions :)
5
82
618
I was working at Google before my PhD. But quitting tech to do a PhD allowed me to retool/retrain and to rebrand. Both our skills and how others see them can be limiting factors.
You don’t need a PhD to be a great AI researcher. Even @OpenAI’s Chief Research Officer doesn’t have a PhD.
13
24
573
54,315
My group at Oxford (@FLAIR_Ox) is talent rich but GPU poor (both compared to industry), so adding more GPUs would be a win for open science, but is difficult to finance from grants. Does anyone have leads for possible donors? Christmas is coming up so I guess I am allow to dream
47
28
571
75,876
The gradient is a locally greedy direction. Where do you get if you follow the eigenvectors of the Hessian instead? Our new paper, “Ridge Rider” (papers.nips.cc/paper/2020/fi…), explores how to do this and what happens in a variety of (toy) problems (if you dare to do so),.. Thread 1/N
5
71
560
LLMs are finally catching up to deep RL - we have been training on test from long before it was cool.
13
43
521
70,917
Excited to be starting as an Assistant Prof (👨‍🎓!!) at the @UofT (Scarborough Campus) w/ appointment at the @VectorInst in September of 2020. I am looking for exceptional Master/PhD students and Postdocs to be starting with me next fall. Till then, ..
54
33
526
"this amounts to solving the multi-agent planning problem" Tesla has now realised that self-driving is a multi-agent problem.. piped.video/ODSJsviD_SU?t=3997 4 years ago I tried to explain to @elonmusk that once CV etc was working, this was the next frontier. He said SL is all you need.
9
40
442
FAIRwell, @ylecun. you will be missed.
5
4
460
83,159
Life update! I have returned to FAIR (@AIatMeta) 50% of my time where I'll be supporting @yorambac in building up the Multi-Agent Universal Intelligence (MAUI) team in London. Instead of playing catchup, MAUI's mission are methods which allow open-source and science to leapfrog!
23
14
422
34,017
unpopular opinion: ML conferences should charge $100 per submission. For accepted papers this would count towards the registration fee of the attending author, so it's free. Extra funds collected could be used eg. for replication studies or other improvement to the review process
38
8
380
86,319
I quit and did the PhD. One of the best decisions. Have used this framework since as well for other big decisions.
7
4
372
27,674
Personal update: I just started as an Associate Prof in the engineering department @UniofOxford (and Tutorial Fellow @StAnnesCollege). It’s an incredible honour to return to this beautiful city and to have the chance to work with brilliant, friendly colleagues and students..
21
11
366
Waymo car failing to coordinate w/ another Waymo (credits in the comment). Interesting to see a toy example from my grant applications play out in the real world. Two cars playing a best-response to a human driver model are not mutually compatible, multi-agent challenges are real
8
30
349
65,324
Dear Reviewer: I don't really mind that you gave a low score because you had a suggestion for simplifying our method. I do mind that you evidently didn't read our rebuttal, where we tried your idea, showed that it doesn't work and explain why. We can all do better. Thanks a lot.
2
6
321
Google brain around 2016 also was a very special place. People were pursuing a ton of diverse, exploratory and ambitious directions to push the field forward. Here's a section of @JeffDean's Google Brain "2017 Look-back", see if you can spot the transformer :) The full document is in the link below and is full of wisdom. It also features many of the ideas that are now finally becoming mainstream and some alternative approaches that have been forgotten by the community. Needless to say that many of the current "big shots" in AI were at brain during that period (or had just left, @ilyasut!), often as interns (like me) or AI residents.
6
25
327
92,772
Yuandong was my manager during my first stint at FAIR. A fantastic researcher. Thank you for everything you have done for FAIR, Meta, and beyond (..and for taking any residual awkwardness out of being layed off by big tech..)
Several of my team members + myself are impacted by this layoff today. Welcome to connect :)
2
8
314
59,257
The field used to be 30 years behind Jürgen's ideas, now we have reduced the collective lag to 8 years thanks to OpenAl. If you extrapolate we might catch up by 2027. Singularity is near?
Q*? 2015: reinforcement learning prompt engineer in Sec. 5.3 of “Learning to Think...” arxiv.org/abs/1511.09249. A controller neural network C learns to send prompt sequences into a world model M (e.g., a foundation model) trained on, say, videos of actors. C also learns to interpret answers of M, extracting algorithmic information from M. Acid test: does C learn its control tasks faster with M than without? Is it cheaper to learn C’s tasks from scratch, or to address algorithmic info in M in some computable way, enabling things such as abstract hierarchical planning and reasoning?  2018: collapsing C and M into a single network arxiv.org/abs/1802.08864 using the neural network distillation of 1991 nitter.app/SchmidhuberAI/st… 1990: online planning & reinforcement learning with recurrent world models and artificial curiosity / GANs: people.idsia.ch/~juergen/wor…
8
19
284
90,979
Doing a PhD in ML and tired of playing catch-up w arxiv and X? Catch yourself wondering what's next after LLMs run out of human data? Come do an internship with our Multi-Agent Universal Intelligence team at @AIatMeta to find out! Updated link metacareers.com/jobs/4498396… w @yorambac
5
30
281
31,568
After thousands of papers on meta-learning, the approach that ended up being successful (ICL) was an accidental byproduct of language modeling. Serendipity at its best and a good reminder that research needs to be open-ended and pursue a diversity of goals to escape local minima.
12
22
260
25,270
If I was @sundarpichai I would try to buy @perplexity_ai, urgently. Best time was a year ago, second best time is now. It's not good to be the second best product on the market in an area that's 90% (?) of your profit...
31
11
240
94,495
Excited to share "DiCE: The Infinitely Differentiable Monte Carlo Estimator": arxiv.org/abs/1802.05098 Try this one weird objective for correct any-order gradient estimators in all your stochastic graphs ;) With fantastic Oxford/CMU team: @greg_far @alshedivat @_rockt @shimon8282
3
74
225
Joao Henriques (joao.science) and I are hiring a fully funded PhD student (UK/international) for the FAIR-Oxford program. The student will spend 50% of their time @UniofOxford and 50% @AIatMeta (FAIR), while completing a DPhil (Oxford PhD). Deadline: 2nd of Dec AOE!!
3
42
231
51,567
I recently had a lunch time conversation with a very senior AI researcher about how are multi-agent problems differ from single agent (their starting point was they do not). One point that made them think: As computers scale, the rest of the world (i.e. no agentic parts) is not going to speed up or get more clever, so compute-scaling methods will succeed (think single agent robotics). In contrast, other agents will also become smarter/faster. So finding successful methods here is not a question of compute alone. No matter how much compute I have for decision making, I will be compute limited if I need to model other agents in the environment with the same budget as part of my inner loop. As a corollary it follows that in the (long term) future almost all flops will be spend on simulating other agents. Not many know this and you are invited to consider the implications for a second.
24
17
233
31,551
This is been an amazing journey that many of you have been part of. A true multi-agent endeavour 🤖😎 🤖😃🤖!! Huge thanks to the collaborators, friends, and institutions that made this possible.. Yours sincerely, Dr. Foerster (still getting used to it.. )
Huge congratulations to Dr. Jakob Foerster (@j_foerst) who successfully defended his PhD thesis "Deep Multi-Agent Reinforcement Learning" this week! 🎉🤓🎲🎓
15
6
200
Can an agent learn to optimise an MDP, while simultaneously encoding secret messages in its actions? Our ICML 2022 paper “Communicating via Markov Decision Processes” (arxiv.org/abs/2107.08295) shows: yes, indeed! @casdewitt, @MaxiIgl, @luisa_zintgraf, @zicokolter, @shimon8282 🧵
7
33
185
RL has always been the future and the future is now. Having an open-source version released _before_ major closed-source labs managed to rediscover this internally (as far as I know) is amazing.
So @karthikv792 checked out @deepseek_ai's R1 LRM on PlanBench (arxiv.org/abs/2206.10498)--and found that it is very much competitive with o1 (preview), but at a fraction of the cost. The fact that it is open source and doesn't hide its intermediate tokens opens up a rich avenue for understanding LRMS based on RL post-training. 1/
9
10
185
22,848
Replying to @_Mira___Mira_
Yes, we have done that! openai.com/index/nonlinear-c… 3 Linear layers is all you need for 99% accuracy on MNist
4
15
183
32,778
Thesis is online. Sorry for the delay & enjoy! ora.ox.ac.uk/objects/uuid:a5… Huge thanks to everyone involved in this multi-agent endeavor! 👨‍🎓👨‍🎓👨‍🎓..🤖🤖
5
18
169
Our practitioners guide for turning RL into a differentiable loss function with any order gradients is now available as a blog post with code examples. Huge thanks to @y0b1byte for pushing this!
We have a new blog post! Using higher-order gradients in your research? Working on Meta-Learning in RL? Learn about DiCE, an objective for correct any-order gradient estimators in stochastic graphs! 🤓🎲 whirl.cs.ox.ac.uk/blog/dice-…
2
52
161
Currently very little credit goes to the reviewers ('critics') compared to the authors ('generators'). As technology makes it easier and easier to generate ML papers, that balance needs to swift radically. Once it's easy to generate all papers, judging the good ones is the work
16
8
152
84,830
I am extremely grateful to my wonderful collaborators across different institutions and timezones who helped sharpen my thinking about coordination problems from a principled pov. This #ERCStG is an exciting next step towards machines that work smoothly and safely w/ humans 🤖+👤
Professor Jakob Foerster has been awarded a 2.3m Euro, 5-year @ERC_Research starting grant to develop foundational #machinelearning algorithms for human-AI coordination in complex settings such as situations where humans & robots work alongside each other eng.ox.ac.uk/news/grant-to-f…
24
5
158
18,880
Today I was approached by an expert in the area of competitive games who shared their concerns about this work with me. Since I believe this feedback will be useful for the community and understand they like to protect their anonymity I am sharing it below 0/N
Even superhuman RL agents can be exploited by adversarial policies. In arxiv.org/abs/2211.00241 we train an adversary that wins 99% of games against KataGo 🖥️ set to top-100 European strength. Below our adversary 😈=⚫ plays a surprising strategy that tricks 🖥️=⚪ into losing.🧵
2
14
149
PSA: As scientists we spend a lot of time in meetings, but typically don't get much guidance (if any) on how to make them effective. Here are a few best practices around note-keeping I adopted for research meetings (incl. supervision etc.) from my time as a product manager:1/6
1
17
150
It's time for ML academia to cut the cord/ our reliance on big tech. @NeurIPS and other ML conferences need to commit to and require open, reproducible science, rather than falling for PR gigs and product placements disguised as science. For better or worse the honeymoon is over.
The panel discussion at @NeurIPSConf about LLMs and beyond has just featured three panelists who were not willing to speak about the details of their work. It's secret stuff. Is this appropriate at a scientific conference?
5
14
149
29,550
Google invented the transformer and legacy auto developed the technology for early EVs. Both entities are now in "code red". Does anyone know other examples of this pattern? Also, it should have a name!
37
8
137
78,886
BBC headline: "Robot hand solves Rubik’s cube, but not the grand challenge". Also: "..OpenAI’s research paper was not peer-reviewed." Reporting on AI progress seems to be getting a lot more nuanced/accurate recently, a step in the right direction!(from:bbc.com/news/technology-5006…)
1
7
138
If you are disappointed/sad about @NeurIPSConf reviews, remember: a) Reviews are extremely noisy b) A good rebuttal can work magic c) Rejected papers have become best papers d) Look out for actionable insights, even if you disagree w/ score e) you may have been fortunate so far
1
7
135
What a well-timed Turing Award for a fantastic contribution. Great credit assignment :)
Meet the recipients of the 2024 ACM A.M. Turing Award, Andrew G. Barto and Richard S. Sutton! They are recognized for developing the conceptual and algorithmic foundations of reinforcement learning. Please join us in congratulating the two recipients! bit.ly/4hpdsbD
4
137
4,809
Scientific progress is one of humanity's most impressive and impactful intellectual achievements. We introduce The AI Scientist, the first AI to carry out end-to-end science, from ideation to implementation, data analysis, struggling w/ latex, reviewing and iterative improvement!
Introducing The AI Scientist: The world’s first AI system for automating scientific research and open-ended discovery! sakana.ai/ai-scientist/ From ideation, writing code, running experiments and summarizing results, to writing entire papers and conducting peer-review, The AI Scientist opens a new era of AI-driven scientific research and accelerated discovery. Here are 4 example Machine Learning research papers generated by The AI Scientist. We published our report, The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery, and open-sourced our project! Paper: arxiv.org/abs/2408.06292 GitHub: github.com/SakanaAI/AI-Scien… Our system leverages LLMs to propose and implement new research directions. Here, we first apply The AI Scientist to conduct Machine Learning research. Crucially, our system is capable of executing the entire ML research lifecycle: from inventing research ideas and experiments, writing code, to executing experiments on GPUs and gathering results. It can also write an entire scientific paper, explaining, visualizing and contextualizing the results. Furthermore, while an LLM author writes entire research papers, another LLM reviewer critiques resulting manuscripts to provide feedback to improve the work, and also to select the most promising ideas to further develop in the next iteration cycle, leading to continual, open-ended discoveries, thus emulating the human scientific community. As a proof of concept, our system produced papers with novel contributions in ML research domains such language modeling, Diffusion and Grokking. We (@_chris_lu_, @RobertTLange, @hardmaru) proudly collaborated with the @UniOfOxford (@j_foerst, @FLAIR_Ox) and @UBC (@cong_ml, @jeffclune) on this exciting project.
10
12
127
25,362
Diffusion models have revolutionised a number of areas in ML, now they are coming for offline RL. In our paper we guide the samples to be closer to our current policy, reducing the off-policy-ness of the generated data. This will unlock novel world applications of off-policy RL.
Come check out a sneak peek of our work **Policy-Guided Diffusion** today at the NeurIPS Workshop on Robot Learning! Using offline data, we generate entire trajectories that are: ✅ On-policy, ✅ Without compounding error, ✅ Without model pessimism!
4
19
131
73,662
I used to think that sharing research ideas and insights publicly and informally (e.g. here) would universally increase the likelihood of those ideas becoming a reality. However, it could also have the opposite effect of creating "scorched earth" since readers who independently had had the same idea might now assume someone else is going to do it and may no longer feel ownership of their original insight. I don't have a good answer to this, but I think it's worth thinking about. One option might be sharing those ideas with a small randomly selected subset.
11
8
126
16,392
Replying to @firstadopter
nah, realistic
2
1
110
32,058
flying back from #NeurIPS2024: Academia and open-source are starting to "feel the AGI". if we coordinate better, we have magnitudes more brain power and creativity than all of the closed labs. new coordination tools also help prepare for and align AGI. win-win. 🧵
3
9
122
11,820
I suggest a new metric: Pass@1/K. For a given "K" You only get a point if all "K" attempts were successful. So it's a continuation of the Pass@K graph to the left hand site and intuitively measures robustness / confidence.
6
6
121
11,201
Diffusion is an extremely powerful and general purpose approach - here we combine it with _policy guidance_ to improve the distribution mismatch in offline RL, which in turn offers the chance to bring RL to the real world without having to collect online data.
🎮 Introducing the new and improved Policy-Guided Diffusion! Vastly more accurate trajectory generation than autoregressive models, with strong gains in offline RL performance! Plus a ton of new theory and results since our NeurIPS workshop paper... Check it out ⤵️
11
112
13,289
The research on Hanabi just got a lot more exciting - today we are adding search to the mix, vastly improving upon the previous SOTA 🎆🎇🤖 We are open-sourcing all code, incl. a new RL method and trained agents. A cooperative effort with @adamlerer, Hengyuan Hu, @polynoamial
To advance research on AI that can understand others’ points of view and collaborate effectively, Facebook AI has developed a bot that sets a new state of the art in Hanabi, a card game in which all players work together. ai.facebook.com/blog/buildin…
1
17
112
I am going on the record with this - when I grow up, I want to be like Geoff.
“I'd also like to acknowledge my students (…) they've gone on to do many great things. I'm particularly proud of the fact that one of my students fired Sam Altman.“ 😳🫡
1
2
110
13,802
Hello World: My team at FAIR / @metaai (AI Research Agent) is looking to hire contractors across software engineering and ML. If you are interested and based in the UK, please fill in the following short EoI form: docs.google.com/forms/d/e/1F…
3
22
114
15,851
Train and test sets for RL?! What is this, the 21st century??
Introducing the OpenAI Retro Contest — a contest where agents use their past experience to adapt to new environments: blog.openai.com/retro-contes…
17
107
I have a new example why Multi-Agent / Zero-shot coordination matters in the real world for my slides! Interestingly, this problem is going to get worse rather than better as we deploy more autonomous vehicles: Currently humans act as a robust regularizer of the system and AVs can usually play a safe "best response" (such as being passive) while the humans navigate around them. I expect that the tail of this distribution requires large scale Multi-Agent training in simulation using self-Play. Self-Play opens the door to emergent protocols and over-coordination, i.e. learning policies that are not compatible with independently trained agents. I coined this the "zero-shot coordination" challenge a few years ago and it's still wide open, while also rapidly becoming relevant to real world of agentic AI.
San Francisco, CA (today). This is a banger. Gets better and better as the video goes on. File under ‘Non Safety Critical’ And under ‘WTF’
9
7
104
11,644
In May I missed a single email from openreview saying I'd be auto-enlisted as a reviewer. Then a few ACs missed my immediate and repeated messages on openreview saying that I won't be able to review since I'll be taking the second half of my paternity leave. Now all of my co-authors (most of them junior phd students) are getting emails that their papers are at risk of being desk rejected since I haven't submitted my reviews. @NeurIPSConf - I appreciate the intent here but this is not good.
4
8
104
16,012
The AIRA team @metaai has the ambitious goal of building/training an agent that can do frontier AI research to help the open-source ecosystem leapfrog closed source LLMs. As a relatively small team we cannot succeed in this mission without the support of the community so we'll be open-sourcing our tools, methods, and benchmarks along the way. 🚨Meet our LLM Speedrunning Benchmark,🚨 which probes the ability of LLM agents to do LLM engineering in the "GPT2 speedrun", which is fast enough for efficient, high signal evals,. Crucially, past human records provide an existence proof for higher performance and allow us to test where the limiting factors for performance are (ideation vs implementation). Spoiler: both are currently a problem! Stay tuned - we are just getting started - and (even better) join the journey!
Recently, there has been a lot of talk of LLM agents automating ML research itself. If Llama 5 can create Llama 6, then surely the singularity is just around the corner. How can we get a pulse check on whether current LLMs are capable of driving this kind of total self-improvement? Well, we know humans are pretty good at improving LLMs. In the NanoGPT speedrun challenge, created by @kellerjordan0, human researchers iteratively improved @karpathy's GPT-2 replication, slashing the training time (to the same target validation loss) from 45 minutes to under 3 minutes in just under a year (!). Surely, a necessary (but not sufficient) ability for an LLM that can automatically improve frontier techniques is the ability to *reproduce* known innovations on GPT-2, a tiny language model from over 5 years ago. 🤔 So we took several of the top models and combined them with various search scaffolds to create *LLM speedrunner agents*. We then asked these agents to reproduce each of the NanoGPT speedrun records, starting from the previous record, while providing them access to different forms of hints that revealed the exact changes needed to reach the next record. The results were surprising—not because we thought these agents would ace the benchmark, but because even the best agent failed to recover even half of the speed-up of human innovators on average in the easiest hint mode, where we show the agent the full pseudocode of the changes to the next record. We believe The Automated LLM Speedrunning Benchmark provides a simple eval for measuring the lower bound of LLM agents’ ability to reproduce scientific findings close to the frontier of ML. Beyond scientific reproducibility, this benchmark can also be run without hints, transforming into an automated *scientific innovation* benchmark. When run in "innovation mode," this benchmark effectively extends the NanoGPT speedrun to AI participants! While initial results here indicate that current agents seriously struggle to match human innovators beyond just a couple of records, benchmarks have a tendency to fall. This one is particularly exciting to watch, as new state-of-the-art here by definition implies a form of *superhuman innovation*.
1
10
102
12,793
Moving JAX has been a huge change (i.e. 1000x speedup) for our RL work at @FLAIR_Ox, it's really exciting to see Google Brain following suit here!! See our purejax library for sota implementations: github.com/luchris429/pureja…
Introducing MuJoCo 3.0: a major new release of our fast, powerful and open source tool for robotics research. 🤖 📈 GPU & TPU acceleration through #JAX 🖼️ Better simulation of more diverse objects - like clothes, screws, gears and donuts 💡 Find out more: mujoco.org/3
4
10
94
17,647
When you wonder whether your WiFi isn't working because #Gmail, #Youtube and #GoogleDrive aren't responding. #Googledown?
3
4
95
How do you explain LLMs to the younger generation? @UniofOxford asked me to produce a 90s explainer, targeted at a TikTok audience. I don't use TioTok, but here is my attempt - feedback welcome and happy holidays!
EXPLAINED: What is an LLM? 🤔 Associate Prof @j_foerst shares everything you need to know about LLM (large language model) in 90 seconds. #OxfordAI
1
4
91
14,366
Moving beyond self-play: Communication, cooperation and coordination with humans and other AI systems zero-shot is one of the exciting frontiers of multi-agent learning. "Other-Play" is an exciting step is this direction! Thanks to a team of fantastic collaborators 🎇🎇🤖🙎‍♀️🎇🤖!
How can we learn policies that can coordinate w/ humans (w/o human data)? 'Other-Play' (arxiv.org/abs/2003.02979 w/ @adamlerer @alex_peys @j_foerst) uses symmetries to avoid 'over-coordinating' during training. Final policies coordinate better w/ humans and bots in Hanabi🎇🙎‍♀️🤖🎇!
1
13
94
I am looking for an acronym for "Good Old Fashioned Machine Learning", i.e. supervised/RL systems etc that are trained for and good at a specific set of task and definitely know nothing about everything else (which is quite comforting). "GOFML" doesn't really roll off the tongue
43
10
92
I am honoured to have been awarded an Amazon Research Award for our proposal "Compute-only Scaling of Large Language Models" (i.e. Q* before it was cool!). Thanks to @AmazonScience amazon.science/research-awar… and to my amazing students @clockwk7 & @JonnyCoook! #AmazonResearchAwards
7
7
90
15,773
You think you understand why popular algorithms like PPO work? So did we @FLAIR_Ox, but then we “reflected” deeply upon it ;) Check out our @ICMLconf 2022 paper “Mirror Learning: A Unifying Framework of Policy Optimisation” (arxiv.org/pdf/2201.02373.pdf) w/ @kuba_AI, @casdewitt 1/N
2
16
90
Our benchmarks measure capabilities. What matters is the ability to learn and adapt. This disconnect is mind boggling.
9
4
95
6,818
Great to see activity on our short #HowToMLrebuttal guide -- good luck with #NeurIPS2023 rebuttals! docs.google.com/document/d/1… @HowTo_ML
2
17
90
12,563
Multi-agent interactions are the new frontier of AI and the ability to make sense of others (i.e. "theory of mind") is at the core of this 🧑‍🦰 ↔️🤖❓. Surprisingly, this is not commonly tested for in standard benchmarks. We address this with our Decrypto benchmark, which specifically focusses on ToM in a multi-turn setting, isolating it from common confounders such as symbolic reasoning or long term planning. We find LLMs do surprisingly poorly, so a lot of work needs to be done!
Theory of Mind (ToM) is crucial for next gen LLM Agents, yet current benchmarks suffer from multiple shortcomings. Enter 💽 Decrypto, an interactive benchmark for multi-agent reasoning and ToM in LLMs! Work done with @TimonWilli & @j_foerst at @AIatMeta & @FLAIR_Ox 🧵👇
3
12
92
13,976
PS: Did he reply? No -- he was not taking students at the time. But, he did forward it to @shimon8282, then incoming faculty to Oxford, and the rest is history..
3
1
86
14,061
En route to #neurips2024 after traveling to Germany so that my wonderful in-laws can help take care of our two-under-two. 2024 has felt accelerated, both at the personal and professional level. Personally, our second son was born, professionally I went 50/50 with FAIR @AIatMeta🧵
3
2
85
9,689
If you are looking for a PhD position in ML, why not apply w @FLAIR_Ox? Deadline for applications is 9th Dec, instructions and recent work are on our website: foersterlab.com/research/. I am looking in particular for strong maths skills, creativity, and willingness to work in teams
27
89
Wow - @CompSciOxford is looking to hire not 1,2 or 3 but 4 (!) professors in CS: cs.ox.ac.uk/aboutus/vacancie…. This is unprecedented (and weirdly timely..!) It's a fantastic department and (you get to collaborate with @oxengsci ;) I highly recommend applying. Deadline is 14th of Dec⏰
25
84
When I stared my phd in deep learning in 2015 I thought I was late to the party. When I bought a few nvidia shares over a beer at @NeurIPSConf 2016 I was sure I had missed the boat (given a 3x run-up) but told my peers "better late than never". which boat did you miss?
3
2
87
7,998
value functions are losing value quickly
7
84
15,584
🎲Alea iacta est 🎲I am attending my first @NeurIPSConf conference since pre-covid! Super excited to see old friends and make new ones :) I'll be around from the 12th to the 16th, so come find me if you'd like to chat. Oh, and pack your running shoes + gloves.. #runconference
6
86
6,456
Dear reviewers, please engage. Dear ACs, please remind the reviewers to engage. Thank you everyone!
2
6
83
11,887
GenAI is changing the world but struggles with decision making/ taking actions. We push towards a foundation model for 2D control using #RLatTheHyperscale and show both zero-shot generalisation and fast fine-tuning!! All code is open source and you can be the agent!
We are very excited to announce Kinetix: an open-ended universe of physics-based tasks for RL! We use Kinetix to train a general agent on millions of randomly generated physics problems and show that this agent generalises to unseen handmade environments. 1/🧵
3
11
85
9,331
"Complete proofs are in the appendix" (silently crosses fingers)
1
2
83
I am late to the party, but the full episode of my @MLStreetTalk is now out! Find it in the comments (pls no downranking, dear ranking system). Btw, I lost the scarf somewhere in Oxford. If you find it, please let me know - @MinqiJiang had gifted it to me and I like it a lot.
3
9
85
11,241
1/🚀 @FLAIR_Ox is coming to #icml2024 in Vienna 🎉 (I am literally posting from the train) and we are very excited to share our work with you! You can find us here ⬇️✨ see below 🔗 for clickable links
1
15
83
8,254
Come join us! @FLAIR_Ox has a long history of hosting visiting students and we are now trialling a slightly more formal process for a six months internship early 2026. The successful students will become fully-fledged members of FLAIR working on cutting edge ML in a wonderfully supportive environment featuring some of the smartest and nicest people I have ever had the chance to work with 🫶
🚨🚨Introducing the FLAIR internship program!🚨🚨 We are looking for two talented students to join us for an internship working in FLAIR for 6 months (5th January to 4th July 2026)! For details and eligibility criteria, please check: foersterlab.com/internship/
2
12
83
13,611
Replying to @animesh_garg
great pointer. Personally I'd be happy for _one_ of these long shots to _really_ land. But even that's a high bar.. !
1
75
39,573
❤️JAX meets multi-agent RL, a match made in heaven❤️ This would have made so many things faster and easier in my life. Can't wait to see the amazing things that people will build on this using _academic compute_. The frontier of the open-world just moved by orders of magnitude 🤯
Crazy times. Anyways, excited to unveil JaxMARL! JaxMARL provides popular Multi-Agent RL environments and algorithms in pure JAX, enabling an end-to-end training speed up of up to 12,500x! Co-led w/ @alexrutherford0 @benjamin_ellis3 @MatteoGallici Post: blog.foersterlab.com/jaxmarl…
1
7
77
10,359
DQN kick-started the field of deep RL 12 years ago, but Q-learning has recently taken a backseat compared to PPO and other on-policy method. We introduce PQN, a greatly simplified version of DQN which is highly GPU compatible and theoretically supported by convergence proofs.
🚀 We're very excited to introduce Parallelised Q-Network (PQN), the result of an effort to bring Q-Learning into the world of pure-GPU training based on JAX! What’s the issue? Pure-GPU training can accelerate RL by orders of magnitude. However, Q-Learning heavily relies on replay buffers and target networks, making training computationally slow and memory-intensive on GPUs. As a result, researchers often prefer PPO, leaving Q-Learning behind 🥲 Our solution? Eliminate Q-Learning's legacy components. PQN challenges the standard DQN paradigm by training a Q-Network without replay buffers or target networks: just online Q-Learning with vectorised exploration and network normalization (layer or batch norm). Despite its simplicity, PQN sets a new strong baseline in many single-agent and multi-agent scenarios. Check out the thread for more details 🔥 📄 Paper: arxiv.org/abs/2407.04811 ⚙️ Code: github.com/mttga/purejaxql/ A fantastic collaboration with @mattiefoxcs and @benjamin_ellis3, and the support of the amazing people at @FLAIR_Ox directed by @j_foerst. Inspired by the groundbreaking work of @_chris_lu_ on compiling entire RL pipelines in GPU: github.com/luchris429/pureja…
8
82
7,954
staying away from X leads to clarity of mind clarity of mind leads to good ideas and insights good ideas and insights want to be shared on X sharing on X leads to engagement on X engagement on X leads to loss of clarity loss of clarity means nothing else to share nothing to share means staying away from X
4
5
82
14,209
This is a fundamental shift regarding the RL capabilities of academic research labs. At @FLAIR_Ox we have now done a number of projects on single digit GPUs that would have taken entire data centre to run using prior approaches. 4000x speed-up is quite a big deal, it turns out 🚀
1/ 🚀 Presenting PureJaxRL: A game-changing approach to Deep Reinforcement Learning! We achieve over 4000x training speedups in RL by vectorizing agent training on GPUs with concise, accessible code. Blog post: chrislu.page/blog/meta-disco… 🧵
3
5
78
12,051
Agents learn to communicate by considering beliefs of others🤖📞🤖! Provides a way of exploring in the space of compatible encoders and decoders, getting around the "local minimum" problem of learning communication protocols. Huge thanks to a team of fantastic collaborators!🙏🙏
Bayesian Action Decoder (arxiv.org/abs/1811.01458): A new multi-agent RL method for learning to communicate via informative actions using ToM-like reasoning. Achieves the best known score for 2 players on the challenging #hanabigame
1
23
75
I watched Ex-Machina a few years ago. Looking back, the most unrealistic part of the movie is how much effort the scientists put into physically _isolating and containing_ the AI. Clearly they hadn't realised they can increase stock prises by just unleashing it on humanity ASAP.
2
72
6,280
Second session of the #runconference 🏃‍♂️ at #ICML2022 was a great success (photos below credit to @pcastr). For anyone who didn't make it today, we'll meet again tomorrow at 8am in front of the hilton.
4
2
73
Amazing @PyTorch implementation of our 2016 "Learning to Communicate with Deep MARL" paper. DIAL and RIAL for the win!! Goodbye, @TorchML and welcome to 2018 :) Also, the deadline for our NIPS emergent communication workshop is in 8 days - perfect timing..
If you're interested in teaching deep reinforcement-learning agents to communicate with each other, check out my open-source PyTorch implementation of the classic RIAL and DIAL models by @j_foerst, @iassael, @NandoDF, and @shimon8282: github.com/minqi/learning-to…
2
22
73
How can we train RL agents that act optimally, *without* sharing any information between them through emergent conventions? "Off-Belief Learning" finally solves this! It takes the weirdness out of learning in Dec-POMDPs and is a huge leap for human-AI coordination & AI safety🤖🧑‍🔧
How can AI agents discover human-compatible policies *without requiring human data*? An important step is to develop meaningful, interpretable conventions for communicating information, rather than relying on arbitrary encodings. (1)
2
13
71
General-sum games describe many scenarios, from negotiations to autonomous driving. How should an AI act in the presence of other learning agents? Our @icmlconf 2022 paper, “Model-Free Opponent Shaping”(M-FOS) approaches this as a meta-game. @_chris_lu_ @TimonWilli @casdewitt 🧵
1
14
71
Are you looking for an RL environment that is: 1) blazing fast 2) open-ended 3) language enabled 4) easy enough to get started on and 5) super fun to play? Your wish has been fulfilled! The only thing that's missing is the multi-agent extension :)
I’m excited to announce Craftax, a new benchmark for open-ended RL! ⚔️ Extends the popular Crafter benchmark with Nethack-like dungeons ⚡Implemented entirely in Jax, achieving speedups of over 100x 1/
3
8
73
7,990
Meta-learning is great, but what distribution of environments shall we train over to enable generalization? And wouldn't curriculum discovery for meta-learning be too compute intensive for a lab in academia? Curious? Then this is for you!
Meta-learned policy optimizers have shown incredible generalization, e.g. Grid-World to Atari games. But how do we discover training environments for truly general-purpose optimizers? I'm excited to announce our #NeurIPS2023 work studying this question!
2
11
70
10,092
Talent Density X Agency = Fun @FLAIR_Ox
1
4
73
5,745
someone asked me recently what breakthrough could prevent a major AI winter in the next 5 years. I said robotics and they looked confused.
This is very impressive.
3
6
71
How can RL agents discover policies that can coordinate w/ humans w/o using human data? Why do we have to think beyond self-play and seriously consider Zero-Shot coordination? New (and improved??) 30min video on what I think is an exciting frontier for AI! piped.video/watch?v=VQ8h8kiQ…
2
7
71
Ok, it's been 24h so it's time for a resolution: This is a real video recorded by me. The fact that we genuinely can't tell whether this is real or not is really bothersome. Lastly, the audio and *Super-Human* tic-tac-toe (not a thing) were supposed to be little hints / giveaway
7
3
65
6,518