Mikael Henaff (@HenaffMikael) | nitter

Pinned Tweet

Mikael Henaff @HenaffMikael

1 Oct 2025

Introducing Scalable Option Learning (SOL☀️), a blazingly fast hierarchical RL algorithm that makes progress on long-horizon tasks and demonstrates positive scaling trends on the largely unsolved NetHack benchmark, when trained for 30 billion samples. Details, paper and code in >

2

13

79

21,031

Mikael Henaff @HenaffMikael

9 Oct 2024

I'm looking for a PhD intern for next year! If you are interested in any combination of: intrinsic motivation, LLM/VLM-guided reward design, long-horizon tasks, hierarchical RL, NetHack, MineCraft, representation learning, I'd love to hear from you. Details below...

4

33

246

59,189

Mikael Henaff @HenaffMikael

21 Oct 2025

I'm looking for a PhD intern for next year, co-advised with Scott Fujimoto, for a project developing sample-efficient RL algorithms for long-horizon decision-making. If you've worked on off-policy/MBRL, hierarchical RL, embodied AI, we'd love to hear from you! Contact below.

5

27

221

20,557

Mikael Henaff @HenaffMikael

13 Dec 2022

Hiring a #research #intern for 2023 at FAIR (@MetaAI), if you're interested in working on exploration, generalization, imitation learning or hierarchical RL please get in touch :)

8

17

167

Mikael Henaff @HenaffMikael

18 Oct 2022

Excited to share our @NeurIPSConf paper where we propose E3B--a new algorithm for exploration in varying environments. Paper: arxiv.org/abs/2210.05805 Website: e3bagent.github.io/ E3B sets new SOTA for both MiniHack and reward-free exploration on Habitat. A thread [1/N]

3

23

106

Mikael Henaff @HenaffMikael

19 Dec 2024

Excited to share our latest work ONI, which enables learning intrinsic rewards online without pre-collected data. We do this by annotating the agent's collected experience with an asynchronously hosted LLM server. Paper: arxiv.org/abs/2410.23022 Code: github.com/facebookresearch/…

Online Intrinsic Rewards for Decision Making Agents from Large...

Automatically synthesizing dense rewards from natural language descriptions is a promising paradigm in reinforcement learning (RL), with applications to sparse reward problems, open-ended...

3

15

90

7,233

Mikael Henaff @HenaffMikael

31 Jan 2024

I am looking for an intern for 2024 to work on the Cortex project in @AIatMeta 's Embodied AI team! Relevant skills include: experience with LLMs/VLMs, EAI simulators such as Habitat, and RL. DM or email at mikaelhenaff [at] meta [dot] com ✨ #AI #InternshipOpportunity #LLM

2

17

78

37,122

Mikael Henaff @HenaffMikael

4 Nov 2023

Signed. Keeping models open is the best way to ensure high scientific standards for safety research and fair representation in AI development. open.mozilla.org/letter/ via @mozilla

1

9

63

29,397

Mikael Henaff @HenaffMikael

9 Jun 2025

A couple bits of news: 1. Happy to share my first (human) NetHack ascension-next step is RL agents :) 2. I wrote a post discussing some @NetHack_LE challenges & how they map to open problems in RL & agentic AI. Still the best RL benchmark imo. mikaelhenaff.substack.com/p/…

5

13

62

11,619

Mikael Henaff @HenaffMikael

14 Jan 2019

New paper with @alfcnz and @ylecun , which we will be presenting at #iclr2019. We learn policies from purely observational data using uncertainty-regularized forward models. #DeepLearning #autonomousdriving Paper: openreview.net/forum?id=HygQ… Project site: sites.google.com/view/model-…

Model-Predictive Policy Learning with Uncertainty Regularization...

A model-based RL approach which uses a differentiable uncertainty penalty to learn driving policies from purely observational data.

25

53

Mikael Henaff @HenaffMikael

13 Jun 2023

Exploration is well-studied for singleton MDPs, but many envs of interest change across episodes (i.e. procgen envs or embodied AI tasks). How should we explore in this case? arxiv.org/abs/2306.03236 In our upcoming @icmlconf oral, we study this question. A thread...1/N

A Study of Global and Episodic Bonuses for Exploration in Contextual MDPs

Exploration in environments which differ across episodes has received increasing attention in recent years. Current methods use some combination of global novelty bonuses, computed using the...

1

9

47

13,950

Mikael Henaff @HenaffMikael

4 Feb 2025

Another banger led by dream team @MartinKlissarov and @proceduralia, to be presented at ICLR 2025. MaestroMotif is a hierarchical agent which zero-shot composes Motif skills using an LLM controller, reaching new depths of the NetHack dungeon. Code available!

Martin Klissarov @MartinKlissarov

4 Feb 2025

Can AI agents adapt zero-shot, to complex multi-step language instructions in open-ended environments? We present MaestroMotif, a method for AI-assisted skill design that produces highly capable and steerable hierarchical agents. To the best of our knowledge, it is the first method that, without expert labeled datasets, solves compositional tasks requiring hundreds of steps for completion. All the modules within MaestroMotif are learned from interaction: from the highest level of planning to the lowest-level of sensorimotor control. On the open-ended domain of NetHack, it surpasses existing approaches, including those that are fine-tuned specifically for each task. At the heart of MaestroMotif is the idea that decomposing a task into subtasks significantly helps decision making. MaestroMotif leverages an agent designer's intuition about a domain to identify important skills and describe them in natural language. These short descriptions then get converted into adaptable hierarchical agents through AI feedback and in-context learning. Our paper was recently published at ICLR 2025 and we open-source the whole project including the code, prompts and pre-trained models. Paper: arxiv.org/abs/2412.08542 Code: github.com/mklissa/maestromo… NotebookLM Podcast: bit.ly/4jLi6mo This work was done with the amazing @HenaffMikael, @robertarail, @shagunsodhani, Pascal Vincent, @yayitsamyzhang, @pierrelux, Doina Precup, with equal supervision by @MarlosCMachado and @proceduralia. Take a look at the following thread:

4

4

39

35,042

Mikael Henaff @HenaffMikael

24 Oct 2023

Super stoked to share this work led by @proceduralia & @MartinKlissarov. Our method Motif uses LLMs to rank pairs of observation captions and synthesize dense intrinsic rewards specified by natural language. New SOTA on NetHack while being easily steerable. Paper+code in thread!

Pierluca D'Oro

@proceduralia

24 Oct 2023

Can reinforcement learning from AI feedback unlock new capabilities in AI agents? Introducing Motif, an LLM-powered method for intrinsic motivation from AI feedback. Motif extracts reward functions from Llama 2's preferences and uses them to train agents with reinforcement learning. On the complex NetHack game, Motif solves previously unsolved tasks without needing any expert demonstrations. Surprisingly, Motif's reward leads to better game score than the one obtained by using the score itself as a reward. Given access to an event captioning mechanism, a few properties make Motif a general method: • it is entirely based on open models • the LLM doesn't need direct access to the environment dynamics (e.g., its source code) • the LLM doesn't need to understand observation and action spaces The best part? You can start using Motif right now, even on a small compute budget: the whole pipeline can take less than two GPU-days. Feel free to read our paper and try our code out. Paper: arxiv.org/abs/2310.00166 Code: github.com/facebookresearch/… Blog post: mila.quebec/en/article/motif… Work co-lead by @MartinKlissarov and myself, with @shagunsodhani @robertarail @pierrelux Pascal Vincent @yayitsamyzhang @HenaffMikael Learn more in the thread 🧵

2

3

36

9,043

Mikael Henaff @HenaffMikael

16 Jan 2024

Very happy to share that our Motif work was accepted at #ICLR2024 :) come say hi in Vienna!

Pierluca D'Oro

@proceduralia

24 Oct 2023

Can reinforcement learning from AI feedback unlock new capabilities in AI agents? Introducing Motif, an LLM-powered method for intrinsic motivation from AI feedback. Motif extracts reward functions from Llama 2's preferences and uses them to train agents with reinforcement learning. On the complex NetHack game, Motif solves previously unsolved tasks without needing any expert demonstrations. Surprisingly, Motif's reward leads to better game score than the one obtained by using the score itself as a reward. Given access to an event captioning mechanism, a few properties make Motif a general method: • it is entirely based on open models • the LLM doesn't need direct access to the environment dynamics (e.g., its source code) • the LLM doesn't need to understand observation and action spaces The best part? You can start using Motif right now, even on a small compute budget: the whole pipeline can take less than two GPU-days. Feel free to read our paper and try our code out. Paper: arxiv.org/abs/2310.00166 Code: github.com/facebookresearch/… Blog post: mila.quebec/en/article/motif… Work co-lead by @MartinKlissarov and myself, with @shagunsodhani @robertarail @pierrelux Pascal Vincent @yayitsamyzhang @HenaffMikael Learn more in the thread 🧵

1

28

14,165

Mikael Henaff @HenaffMikael

19 May 2020

A simple way to help with #Covid_19 and medicine generally is to donate spare computer time to biomedical researchers through projects like @foldingathome or @RosettaAtHome. Small contributions add up to make distributed peta/exaFLOP supercomputers!

9

26

Mikael Henaff @HenaffMikael

12 Feb 2023

Replying to @_aidan_clark_

Step 1 doesn't have to be random: there is a large literature on directed exploration strategies, going back at least to Kearns and Singh's 2003 E^3 work that showed you can avoid the exponential sample complexity due to random random exploration.

2

1

24

5,974

Mikael Henaff @HenaffMikael

24 Nov 2024

The more I work with this env, the more its richness and complexity become apparent. Other than perception, it presents a hard challenge for nearly every other agentic capability, from long horizon planning and exploration to reasoning, memory and generalization.

Davide Paglieri @PaglieriDavide

21 Nov 2024

Replying to @PaglieriDavide

The ultimate test? NetHack 🏰 This beast remains unsolved: the best model, o1-preview, achieved just 1.5% average progression. BALROG pushes boundaries, uncovering where LLMs/VLMs struggle the most. Will your model fare better? 🤔 They’re nowhere near capable enough yet!

1

23

2,201

Mikael Henaff @HenaffMikael

16 Aug 2023

The embodied AI team I'm part of at @MetaAI has multiple Research Scientist / Research Engineer positions open, come work with us ✨

Mrinal Kalakrishnan @mkalakrishnan

16 Aug 2023

(1/6) The FAIR Embodied AI team at @MetaAI has multiple full-time openings! If you’re interested in cutting-edge research in AI for robotics, AR and VR, and sharing it with the world, read on. 🧵

3

21

4,403

Mikael Henaff @HenaffMikael

24 Jul 2023

Also feel free to reach out if you want to grab coffee and chat about RL, exploration, generalization, LLMs for decision making, or anything else :) #ICML2023

2

18

3,748

Mikael Henaff @HenaffMikael

30 Apr 2020

Excited to share some recent work in imitation learning at #iclr2020, which uses an ensemble of policies to reduce covariate shift. Joint work with @xkianteb and Wen Sun. Paper: openreview.net/pdf?id=rkgbYy… Talk: iclr.cc/virtual/poster_rkgbY…

5

17

Mikael Henaff @HenaffMikael

24 Jul 2024

Stoked about this new benchmark for long-horizon planning, intrinsic motivation, procedural generalization and memory

Michael Matthews @mitrma

28 Feb 2024

I’m excited to announce Craftax, a new benchmark for open-ended RL! ⚔️ Extends the popular Crafter benchmark with Nethack-like dungeons ⚡Implemented entirely in Jax, achieving speedups of over 100x 1/

2

17

1,485

Mikael Henaff @HenaffMikael

4 Mar 2025

Excited to share our Fast3R paper, to be presented at CVPR 2025. This recasts 3D reconstruction and camera pose estimation from video as an end-to-end learning problem, leading to ~4x-300x improvements in speed while maintaining performance. Code, model & demo in thread!

Jianing “Jed” Yang @jed_yang

4 Mar 2025

⚡️ Excited to announce Fast3R: 3D reconstruction of 1000+ images in a single forward pass! Fast3R achieves 251 FPS at its peak. 🔥 Try the demo with your images or video! 🔗 Website: fast3r-3d.github.io 🎮 Demo: fast3r.ngrok.app/ #CVPR2025 #3D @AIatMeta

5

16

4,658

Mikael Henaff @HenaffMikael

9 Oct 2024

Internship is in NYC, if interested please email me at: mikaelhenaff at meta dot com and apply here: metacareers.com/jobs/5325490… looking forward to hearing from you!

1

16

2,466

Mikael Henaff @HenaffMikael

24 Nov 2022

This is a very exciting dataset - stochastic policies/dynamics, large action space, partial observability, rich dynamics, *very* large scale while still enabling fast experiments. Can't wait to start playing with it and hope others do too!

You’re unable to view this Post because this account owner limits who can view their Posts.

4

15

Mikael Henaff @HenaffMikael

12 Feb 2024

Interview of @sharathraparthy discussing our recent work showing that transformers can in-context learn new sequential decision-making tasks in new environments. Check it out! arxiv.org/abs/2312.03801

Generalization to New Sequential Decision Making Tasks with...

Training autonomous agents that can learn new tasks from only a handful of demonstrations is a long-standing problem in machine learning. Recently, transformers have been shown to learn new...

TalkRL Podcast

@TalkRLPodcast

12 Feb 2024

Episode 48: Sharath Chandra Raparthy @sharathraparthy (AI Resident at @AIatMeta) on In-Context Learning for Sequential Decision Tasks, GFlowNets, and more! podcasts.apple.com/us/podcas…

1

1

14

1,415

Mikael Henaff @HenaffMikael

11 Apr 2024

Latest work where we present OpenEQA, a modern embodied Q&A benchmark which tests multiple capabilities such as spatial reasoning, object recognition and world knowledge on which SOTA VLMS like GPT4V/Claude/Gemini fail. A new challenge for embodied AI! To be presented @CVPR.

AI at Meta

@AIatMeta

11 Apr 2024

Today we’re releasing OpenEQA — the Open-Vocabulary Embodied Question Answering Benchmark. It measures an AI agent’s understanding of physical environments by probing it with open vocabulary questions like “Where did I leave my badge?” More details ➡️ go.fb.me/7vq6hm All of today’s state-of-art vision+language models (VLMs) fall well short of human performance. In fact, for questions that require spatial understanding, today’s VLMs are nearly “blind” – access to visual content provides only minor improvements over language-only models. We hope that OpenEQA motivates additional research into helping AI understand and communicate about the world it sees.

6

15

1,871

Mikael Henaff @HenaffMikael

22 Nov 2022

We are hiring a research intern for next year - if you would like to work on hierarchical RL, world models, modular networks and related topics with @shagunsodhani, myself and other researchers at FAIR please reach out! :)

Shagun Sodhani @shagunsodhani

21 Nov 2022

We are hiring a #research #intern at FAIR (@MetaAI) to work in areas related to #RL, #hierarchical RL, #modular #networks, and #world #models. Location: Montreal / New York / Remote. You can dm me your questions and resume!

14

Mikael Henaff @HenaffMikael

24 Jul 2023

In Hawaii for #ICML2023, presenting two works Tuesday: - A Study of Global and Episodic Bonuses in Contextual MDPs (poster at 2pm, oral at 6:10 pm) - Semi-Supervised Offline Reinforcement Learning with Action-Free Trajectories (poster at 11am) Hope to see you there :)

3

14

2,055

Mikael Henaff @HenaffMikael

23 Oct 2025

Replying to @tydsh

that is insane...way to shoot ourselves in the foot

14

3,203

Mikael Henaff @HenaffMikael

1 Oct 2025

All details are in our paper and code release: arxiv.org/abs/2509.00338 github.com/facebookresearch/… It was lots of fun working with Scott Fujimoto, @mitrma and Mike Rabbat on this!

Scalable Option Learning in High-Throughput Environments

Hierarchical reinforcement learning (RL) has the potential to enable effective decision-making over long timescales. Existing approaches, while promising, have yet to realize the benefits of...

15

653

Mikael Henaff @HenaffMikael

25 Oct 2025

Highly recommend

Olivier Hénaff

@olivierhenaff

23 Oct 2025

If you're an ML researcher or engineer impacted by recent events and want to continue pushing the frontier of intelligence through foundation modeling, reasoning, and large-scale RL, do get in touch!

1

13

4,762

Mikael Henaff @HenaffMikael

30 Nov 2022

Presenting our E3B work on exploration in changing environments at #NeurIPS at 11 am NOLA time in Hall J #105...come by and say hi! with @robertarail @MinqiJiang @_rockt

3

12

Mikael Henaff @HenaffMikael

15 Jan 2019

Nice article in @techreview about our paper on model-based RL with uncertainty regularization for #autonomousdriving

MIT Technology Review

@techreview

15 Jan 2019

Reinforcement learning makes mistakes as it learns. That's fine when playing a board game. It's, erm, not great in a life-or-death situation. trib.al/fOcXnPh

5

11

Mikael Henaff @HenaffMikael

24 Oct 2023

@_rockt @HeinrichKuttler @_samvelyan @erichammy you might be interested, this method is able to make progress on the Oracle task without demos (although sometimes in unexpected ways ;))

3

8

320

Mikael Henaff @HenaffMikael

4 Mar 2025

Btw, the lead author @jed_yang is graduating this year and will be on the job market. Jed is highly motivated and creative, a great engineer and researcher who gets stuff to work, and has been a pleasure to work with...if you're hiring I suggest reaching out to him!

Mikael Henaff @HenaffMikael

4 Mar 2025

Excited to share our Fast3R paper, to be presented at CVPR 2025. This recasts 3D reconstruction and camera pose estimation from video as an end-to-end learning problem, leading to ~4x-300x improvements in speed while maintaining performance. Code, model & demo in thread!

2

10

2,168

Mikael Henaff @HenaffMikael

1 Oct 2025

Hierarchy is a natural way to tackle long horizons, but until now has remained at relatively small scale. With SOL, we identify and solve bottlenecks in scaling hierarchical RL, resulting in a ~35-580x speed increase over prior hierarchical methods.

2

12

717

Mikael Henaff @HenaffMikael

9 Feb 2024

Replying to @EugeneVinitsky

The difference between algorithms that explore efficiently vs. not is essentially polynomial vs. exponential sample complexity (itself a lower bound on compute complexity). Imo more compute can crack some harder poly problems but will eventually hit a wall with exponential ones:)

3

10

629

Mikael Henaff @HenaffMikael

24 Nov 2024

Replying to @goodfellow_ian

Really sorry to hear this. I had a bad case of LC as well in 2020 and few understand how brutal it is. Are you sure POTS is the main culprit? Asking because I had that diagnosis too but it later turned out to be wrong. This ended up helping me: covidlonghaulers.com

3

1

9

9,945

Mikael Henaff @HenaffMikael

21 Oct 2025

Please email {mikaelhenaff, sfujimoto} at meta dot come with [2026 internship] in the subject line. We'd be particularly interested in any examples of papers/work around hierarchical/off-policy/model-based RL, embodied AI, etc. Looking forward to hearing from you!

11

1,733

Mikael Henaff @HenaffMikael

6 May 2023

Replying to @patrickmineault @ylecun

End to end memory networks in 2015 (arxiv.org/abs/1503.08895) by @tesatory were an important precursor in the sense that like the transformer (and unlike the NTM), they maintain the sequence structure and perform multiple layers of attention over it.

End-To-End Memory Networks

We introduce a neural network with a recurrent attention model over a possibly large external memory. The architecture is a form of Memory Network (Weston et al., 2015) but unlike the model in...

2

10

1,007

Mikael Henaff @HenaffMikael

9 Aug 2024

Takes me back to my days as a starry-eyed master's student, when Pytorch's grandparent Lush was still used in @ylecun 's lab <3 Lush was actually the first programming language I seriously learned (I'd been studying math until then). Such fond memories counting parentheses!

Alfredo Canziani

@alfcnz

9 Aug 2024

I wrote two blog posts about SN, Léon Bottou and @ylecun's 1988 Simulateur de Neurones. One is an English translation of the original paper, for which I've reproduced the figures. The other is a tutorial on how to run their code on Apple silicon. atcold.github.io/blog

1

8

1,725

Mikael Henaff @HenaffMikael

8 Dec 2023

New work led by @sharathraparthy and jointly with @robertarail @erichammy @_robertkirk showing that one can in-context learn completely *new tasks* on *new environments* via large-scale pretraining and few shot examples. To be presented at upcoming @NeurIPSConf FMDM workshop!

Sharath Raparthy

@sharathraparthy

8 Dec 2023

🚨 🚨 !!New Paper Alert!! 🚨 🚨 How can we train agents that learn new tasks (with different states, actions, dynamics and reward functions) from only a few demonstrations and no weight updates? In-context learning to the rescue! In our new paper, we show that by training transformers on large diverse datasets of sequences of demonstrations with certain properties, we can generalize to new Procgen or MiniHack tasks from only a few demonstrations and no weight updates! Paper: arxiv.org/pdf/2312.03801.pdf Work with these amazing collaborators @erichammy @_roberkirk @HenaffMikael @robertarail 1/13

2

9

1,673

Mikael Henaff @HenaffMikael

1 Oct 2025

SOL can be run on any RL problem for which we can define a few reasonable intrinsic rewards. We include some simple PointMaze and MiniHack environments to show this generality - this may also be useful for others working in HRL since they are faster to iterate on than NetHack.

1

10

700

Mikael Henaff @HenaffMikael

8 Jan 2024

Replying to @jsuarez

Procedural generation or settings where the environment changes across episodes. Exploration operates very differently in that setting and a lot of algorithms for static MDPs fail.

7

162

Mikael Henaff @HenaffMikael

1 Oct 2025

We demonstrate SOL's performance and scalability by training hierarchical agents for 30B steps on the complex game of NetHack, significantly outperforming flat agents and demonstrating promising scaling trends. Our agents still seem to be improving, even after 30B steps.

1

10

572

Mikael Henaff @HenaffMikael

14 Oct 2024

I'll be presenting a poster about some of our recent work on LLM-guided exploration and intrinsic motivation at the NY academy of sciences this coming Friday: nyas.org/shaping-science/eve… if you're in the tri-state area, it's a nice event to chat about ML in a relaxed setting.

15th Annual Machine Learning Symposium - NYAS

Machine Learning, a subfield of computer science, involves the development of mathematical algorithms that discover knowledge from specific data sets, and then "learn" from the data in an iterative...

1

7

1,063

Mikael Henaff @HenaffMikael

13 Dec 2022

Replying to @akbirkhan @MetaAI

definitely possible in NYC, would have to see about London...feel free to apply here metacareers.com/jobs/9018997… and send your cv to mikaelhenaff@meta.com :)

7

Mikael Henaff @HenaffMikael

3 Feb 2023

Nice opportunity to work with some great researchers!

Roberta Raileanu @robertarail

2 Feb 2023

Our group has multiple openings for internships at FAIR London (@MetaAI). I’m looking for someone to work on language models + decision making e.g. augmenting LMs with actions / tools / goals, interactive / open-ended learning for LMs, or RLHF. Apply at metacareers.com/jobs/8871228…

7

3,086

Mikael Henaff @HenaffMikael

17 Dec 2022

Replying to @HaqueIshfaq

Minihack is quite nice, there are lots of tasks and many of them are sparse reward, and it has the additional interesting twist of being procedurally generated. We have some code to train a variety of exploration algorithms here: github.com/facebookresearch/…

GitHub - facebookresearch/e3b: Official repo for the E3B algorithm described in the paper "Explor...

Official repo for the E3B algorithm described in the paper "Exploration via Elliptical Episodic Bonuses". - facebookresearch/e3b

1

1

6

530

Mikael Henaff @HenaffMikael

25 Sep 2024

High performing, open source VLMs and smol LLMs now available...nature is healing🌱

AI at Meta

@AIatMeta

25 Sep 2024

📣 Introducing Llama 3.2: Lightweight models for edge devices, vision models and more! What’s new? • Llama 3.2 1B & 3B models deliver state-of-the-art capabilities for their class for several on-device use cases — with support for @Arm, @MediaTek & @Qualcomm on day one. • Llama 3.2 11B & 90B vision models deliver performance competitive with leading closed models — and can be used as drop-in replacements for Llama 3.1 8B & 70B. • New Llama Guard models to support multimodal use cases and edge deployments. • The first official distro of Llama Stack simplifies and supercharges the way developers & enterprises can build around Llama to support agentic applications and more. Details in the full announcement ➡️ go.fb.me/229ug4 Download Llama 3.2 models ➡️ go.fb.me/w63yfd These models are available to download now directly from Meta and @HuggingFace — and will be available across offerings from 25+ partners that are rolling out starting today, including @accenture, @awscloud, @AMD, @azure, @Databricks, @Dell, @Deloitte, @FireworksAI_HQ, @GoogleCloud, @GroqInc, @IBMwatsonx, @Infosys, @Intel, @kaggle, @NVIDIA, @OracleCloud, @PwC, @scale_AI, @snowflakeDB, @togethercompute and more. With Llama 3.2 we’re making it possible to run Llama in even more places, with even more flexible capabilities. We’ve said it before and we’ll say it again: open source AI is how we ensure that these innovations reflect the global community they’re built for and benefit everyone. We’re continuing our drive to make open source the standard with Llama 3.2.

7

970

Mikael Henaff @HenaffMikael

24 Oct 2023

It was also a pleasure working with @shagunsodhani @robertarail @yayitsamyzhang and Pascal Vincent on this project!

5

230

Mikael Henaff @HenaffMikael

18 Oct 2022

E3B sets a new SOTA on 16 challenging sparse-reward tasks from the MiniHack suite. In particular, it does so without requiring any feature engineering or task-specific prior knowledge. [6/N]

1

2

6

Mikael Henaff @HenaffMikael

3 May 2024

Replying to @ssnl_tz @TongzhouWang

I think reorganizing information can be seen as adding new information encoded in the reorganization scheme. For example, if you are reorganizing N bits of information with program P of length K bits, you are effectively adding K bits of new information.

1

6

347

Mikael Henaff @HenaffMikael

25 Nov 2022

Replying to @_rockt @NetHack_LE

'nutritiously hard', sounds like a juicy problem and a hard nut to crack ;)

1

5

Mikael Henaff @HenaffMikael

24 Jun 2025

Replying to @jsuarez @_rockt @NetHack_LE

My experience has been that deep RL agents saturate at ~8-9% according to that metric

1

5

266

Mikael Henaff @HenaffMikael

8 Oct 2024

Replying to @FelixHill84

Sorry to hear about this Felix, but I'm glad things are starting to look up. I remember when we interned together in the early days which felt like a different world. I really admire both your scientific and human contributions to the field, wishing you well!

1

5

3,787

Mikael Henaff @HenaffMikael

18 Oct 2022

Exploration in standard MDPs is well studied, but what about contextual MDPs (CMDPs) where the environment changes each episode? This general framework captures scenarios such as procgen video games or embodied AI tasks where the agent must generalize across physical spaces.[2/N]

1

4

Mikael Henaff @HenaffMikael

17 Dec 2022

Replying to @HenaffMikael @HaqueIshfaq

It has several of the minigrid envs ported but is a lot more challenging because count-based episodic bonuses do not work, we discuss some more here: arxiv.org/abs/2210.05805

4

189

Mikael Henaff @HenaffMikael

25 Jun 2025

Replying to @jsuarez @_rockt @NetHack_LE

@jsuarez I also think that solving NLE or similar envs with smaller models will yield insights in exploration, long-horizon decision-making etc that are needed for a gpt-10 worthy of its name

1

4

137

Mikael Henaff @HenaffMikael

9 Jun 2020

New paper accepted to #icml2020 - this takes steps towards bridging the theory-practice gap in RL by providing a provably sample-efficient algorithm for block MDPs which uses contrastive learning. Long version: arxiv.org/abs/1911.05815 #ReinforcementLearning

4

Mikael Henaff @HenaffMikael

24 Nov 2024

Replying to @HenaffMikael @_rockt @jsuarez @PaglieriDavide @NetHack_LE

And for Nethack, |A| ~= 100, H >= 10000... Things change if we have a smart exploration algorithm though, which is one of the reasons RL is interesting :)

4

164

Mikael Henaff @HenaffMikael

18 Oct 2022

Big thanks to my co-authors @robertarail @MinqiJiang @_rockt! Try out our code at: github.com/facebookresearch/… [8/N, N=8]

GitHub - facebookresearch/e3b: Official repo for the E3B algorithm described in the paper "Explor...

Official repo for the E3B algorithm described in the paper "Exploration via Elliptical Episodic Bonuses". - facebookresearch/e3b

1

3

Mikael Henaff @HenaffMikael

24 May 2022

Replying to @alfcnz

When the turntable was invented, some people thought it was the end of music. Then people used it to make entirely new kinds of music (sampling, DJing etc). Human creativity always finds a way to express itself given the tools available :)

1

4

Mikael Henaff @HenaffMikael

24 Nov 2024

Replying to @_rockt @jsuarez @PaglieriDavide @NetHack_LE

I think that "naive" tabula rasa RL cannot solve tasks like Nethack even with a universe sized computer. Consider a sparse reward task with |A|=10 actions and horizon H=100. Expected num samples with naive exploration is O(|A|^H) = O(10^100), more than # atoms in the universe :)

3

4

318

Mikael Henaff @HenaffMikael

16 Jan 2024

Replying to @CupiaBart

Nice work, it's great to see interest in NetHack! If you're in this space you might be interested in a couple other repos: github.com/facebookresearch/… github.com/facebookresearch/… In particular, Motif makes some progress on the very challenging Oracle task and uses SF as the RL env.

GitHub - facebookresearch/motif: Intrinsic Motivation from Artificial Intelligence Feedback

Intrinsic Motivation from Artificial Intelligence Feedback - facebookresearch/motif

2

68

Mikael Henaff @HenaffMikael

18 Oct 2022

While exploration in CMDPs has recently started receiving attention, we show that existing methods critically rely on an episodic count-based bonus, and fail if this bonus is removed. This also means they fail in complex envs where each state is seen at most once. [3/N]

1

3

Mikael Henaff @HenaffMikael

12 Mar 2025

My good friend @arcanelibrary designs old-school D&D games and her latest kickstarter is up! I've had lots of fun playing Shadowdark, highly recommend if you're into RPGs :)

The Arcane Library @arcanelibrary

11 Mar 2025

Shadowdark: The Western Reaches is now live on Kickstarter and funded in two minutes! kickstarter.com/projects/sha…

3

399

Mikael Henaff @HenaffMikael

16 Dec 2022

just checked out Movetodon and it's a very easy way to automatically follow all your twitter contacts on Mastodon...i was pleasantly surprised that lots of people are there already! hope to see you there movetodon.org/

1

4

597

Mikael Henaff @HenaffMikael

13 Dec 2022

Replying to @SinghAyush2811 @MetaAI

these are for students in a PhD program, but we sometimes have AI resident spots too which do not have this requirement...will advertise if so

4

Mikael Henaff @HenaffMikael

18 Oct 2022

To address this limitation, we propose Exploration via Elliptical Episodic Bonuses (E3B). E3B uses an elliptical episodic bonus, which generalizes count-based episodic bonuses to continuous state spaces, paired with a feature extractor learned with an inverse dynamics model.[5/N]

1

2

Mikael Henaff @HenaffMikael

9 Jun 2025

Replying to @_rockt @nntsn @NetHack_LE

Thanks Tim! I increasing appreciate the vision you, @HeinrichKuttler and the team had in creating this benchmark :)

3

178

Mikael Henaff @HenaffMikael

18 Oct 2022

We also evaluate E3B for reward-free exploration on Habitat, which provides photorealistic simulations of real indoor environments. Here, E3B outperforms existing methods by a wide margin. [7/N]

1

2

Mikael Henaff @HenaffMikael

18 Jun 2025

Replying to @_rockt @ylecun

I think there are several coupled points here: 1) can you correct based on past failure data? 2) prediction in discrete vs. continuous space 3) prediction at single vs. multiple levels of abstraction. All these probably involve different solutions.

1

3

196

Mikael Henaff @HenaffMikael

9 Jun 2025

Replying to @farooqsheik @NetHack_LE

Solving NetHack requires a lot of capabilities current RL/agentic systems do not have (in-context exploration, long-horizon decision-making, very long memory, combination of low-level control and leveraging high-level textual resources). These are needed for more general systems.

1

3

240

Mikael Henaff @HenaffMikael

8 Mar 2025

Replying to @teortaxesTex @kalomaze @cloneofsimo

As discussed in lecture 1 :)

3

194

Mikael Henaff @HenaffMikael

2 Apr 2025

Replying to @alfcnz

Do you mean you got scooped on the April 1st joke? ;p

1

3

203

Mikael Henaff @HenaffMikael

28 Feb 2023

My friend Kelsey (aka @arcanelibrary ) designed a new D&D system inspired by the earlier versions of the game - simple, fast and deadly. I playtested the game during development and can't recommend it enough :) it's now available on Kickstarter! kickstarter.com/projects/sha…

Shadowdark RPG: Old-School Gaming, Modernized

Classic adventure gaming for 5E and old-school players alike! One book, all you need to play.

kickstarter.com

3

414

Mikael Henaff @HenaffMikael

9 Jun 2025

Replying to @nntsn @NetHack_LE

I couldn't believe it was happening - thank the RNG I found a few amulets of life saving along the way ;p

1

3

257

Mikael Henaff @HenaffMikael

3 Jan 2025

Replying to @douwekiela @FelixHill84

So sorry to hear this. I remember lots of fun times during our internship together. Rest in peace Felix.

2

723

Mikael Henaff @HenaffMikael

13 Jun 2023

Overall, this clarifies our understanding of how different exploration algorithms operate in CMDPs and opens up a number of exciting new directions. See paper for full details: arxiv.org/abs/2306.03236 Thanks to my collaborators @MinqiJiang and @robertarail ! 15/N, N=15

A Study of Global and Episodic Bonuses for Exploration in Contextual MDPs

Exploration in environments which differ across episodes has received increasing attention in recent years. Current methods use some combination of global novelty bonuses, computed using the...

2

236

Mikael Henaff @HenaffMikael

1 Oct 2025

Replying to @jsuarez

Thanks! The base SF agent gets ~50k SPS, SOL gets ~43k largely due to the v-trace operation (which is more complicated and slower). I pushed some code to Cython but suspect more could be done.

2

2

89

Mikael Henaff @HenaffMikael

19 Oct 2022

Replying to @HenaffMikael @cedcolas @robertarail @MinqiJiang @_rockt

...for NGU's KNN-based bonus, if one of the dimensions has much larger scale than the others it can dominate the bonus due to euclidean distance being used

2

Mikael Henaff @HenaffMikael

18 Jul 2023

Very nice work by @mklissar on learning long-horizon exploratory behaviors using Laplacian eigenfunctions.

Martin Klissarov @MartinKlissarov

18 Jul 2023

🎉I'm particularly excited to share this project I worked on under the guidance of @MarlosCMachado 🧙 We ask: what is the *right scaffold* for building temporal abstractions, from the ground up? Website: sites.google.com/view/dceo/ It will be presented next week at #ICML2023 🏝️

2

436

Mikael Henaff @HenaffMikael

28 Oct 2022

Replying to @NicoBohlinger

Thanks! We didn't compare to NGU but others have found it not to work well on procgen envs: openreview.net/pdf?id=j3GK3_… One conceptual difference is that the elliptical bonus automatically normalizes wrt scale but NGU's KNN-based one doesn't which means a few features could dominate

1

2

Mikael Henaff @HenaffMikael

7 Sep 2023

Replying to @UCL_DARK

Big congrats Dr. @MinqiJiang !!! Very well deserved and it's been a pleasure collaborating during your time at FAIR. Looking forward to seeing what you come up with next :)

1

2

196

Mikael Henaff @HenaffMikael

25 Oct 2021

Replying to @robertarail

Congrats and welcome Roberta!

2

Mikael Henaff @HenaffMikael

18 Jun 2025

Replying to @HenaffMikael @_rockt @ylecun

e.g. 1) is fixable by RL, planning or iterative relabeling (i.e. DAgger), 2) by appropriate (non-quantized) state & action space, 3) by hierarchical architecture.

1

2

97

Mikael Henaff @HenaffMikael

13 Jun 2023

Finally, we conduct a systematic comparison of global & episodic design choices across 16 MiniHack tasks. We find that combining the episodic E3B bonus with the global RND bonus sets a new SOTA on MiniHack. Multiplying is also consistently better than adding. 14/N

1

2

207

Mikael Henaff @HenaffMikael

6 Dec 2020

New paper at #NeurIPS2020 presenting PC-PG, a policy gradient algorithm that explores by growing a set of policies covering the set of possible states. Polynomial sample complexity in the linear case, and plays nice with modern deep RL methods. arxiv.org/abs/2007.08459

2

Mikael Henaff @HenaffMikael

27 Oct 2025

Replying to @AvivTamar1 @_amirbar

I think what @AvivTamar1 may be saying is (correct me if wrong): you can have a pixel-based dynamics model, but compute the cost function in a different (e.g. latent) space. This can simplify the optimization of the cost function while still having pixel-based WM.

2

108

Mikael Henaff @HenaffMikael

6 May 2024

Replying to @ssnl_tz @TongzhouWang

Oh interesting, yes that sounds quite related! Yeah algorithmic complexity leads to cool thought experiments despite being not practical unless you have a universe sized computer ;p

2

39

Mikael Henaff @HenaffMikael

5 Dec 2024

created an account on the celestial network, hope to see you there!

2

274

Mikael Henaff @HenaffMikael

12 Feb 2023

Replying to @HenaffMikael @_aidan_clark_

Near-Optimal Reinforcement Learning in Polynomial Time - UPenn CIS cis.upenn.edu/~mkearns/paper… This is based on the idea of novelty bonuses, which has also been extended to deep RL settings (e.g. RND, ICM, pseudocounts, etc)

1

182

Mikael Henaff @HenaffMikael

13 Jun 2023

Contextual MDPs are MDPs where the environment changes each episode, and have been gathering increasing interest. For example, Procgen, NetHack/MiniHack, Minecraft/Crafter and embodied AI envs all fall within this category. How should we best explore in this setting? 2/N

1

2

408

Mikael Henaff @HenaffMikael

19 Dec 2024

We share our code - excited to see what people build with this! Many thanks to @qqyuzu @adityagrover_ @yayitsamyzhang @brandondamos for another fun collaboration.

1

2

383

Mikael Henaff @HenaffMikael

4 Nov 2021

Replying to @yayitsamyzhang

Woo congrats Amy!! UT is a great place (I did my undergrad there)

2

Mikael Henaff @HenaffMikael

20 Sep 2023

Replying to @ashkamath20 @ylecun @kchonyc @sainingxie @mengyer

Congrats Dr. Kamath!!

2

932

Mikael Henaff @HenaffMikael

19 May 2020

That's awesome, 15 years is a lot of number crunching!

1

Mikael Henaff @HenaffMikael

13 Jun 2023

We also conduct pixel-based experiments on Habitat and Montezuma's Revenge, which suggest that the tradeoffs between global & episodic bonuses we identified previously apply more broadly. The combined bonus helps, but less than before - this remains an open area of research. 13/N

1

1

134

Mikael Henaff @HenaffMikael

7 Dec 2024

Replying to @proceduralia

Congrats Dr., very well deserved!

1

215