MTS at @amilabs, prev: Meta, MSR, NYU. All views my own.

New York, USA
Introducing Scalable Option Learning (SOL☀️), a blazingly fast hierarchical RL algorithm that makes progress on long-horizon tasks and demonstrates positive scaling trends on the largely unsolved NetHack benchmark, when trained for 30 billion samples. Details, paper and code in >
2
13
79
21,031
I'm looking for a PhD intern for next year! If you are interested in any combination of: intrinsic motivation, LLM/VLM-guided reward design, long-horizon tasks, hierarchical RL, NetHack, MineCraft, representation learning, I'd love to hear from you. Details below...
4
33
246
59,189
I'm looking for a PhD intern for next year, co-advised with Scott Fujimoto, for a project developing sample-efficient RL algorithms for long-horizon decision-making. If you've worked on off-policy/MBRL, hierarchical RL, embodied AI, we'd love to hear from you! Contact below.
5
27
221
20,557
Hiring a #research #intern for 2023 at FAIR (@MetaAI), if you're interested in working on exploration, generalization, imitation learning or hierarchical RL please get in touch :)
8
17
167
Excited to share our @NeurIPSConf paper where we propose E3B--a new algorithm for exploration in varying environments. Paper: arxiv.org/abs/2210.05805 Website: e3bagent.github.io/ E3B sets new SOTA for both MiniHack and reward-free exploration on Habitat. A thread [1/N]
3
23
106
Excited to share our latest work ONI, which enables learning intrinsic rewards online without pre-collected data. We do this by annotating the agent's collected experience with an asynchronously hosted LLM server. Paper: arxiv.org/abs/2410.23022 Code: github.com/facebookresearch/…
3
15
90
7,233
I am looking for an intern for 2024 to work on the Cortex project in @AIatMeta 's Embodied AI team! Relevant skills include: experience with LLMs/VLMs, EAI simulators such as Habitat, and RL. DM or email at mikaelhenaff [at] meta [dot] com ✨ #AI #InternshipOpportunity #LLM
2
17
78
37,122
Signed. Keeping models open is the best way to ensure high scientific standards for safety research and fair representation in AI development. open.mozilla.org/letter/ via @mozilla
1
9
63
29,397
A couple bits of news: 1. Happy to share my first (human) NetHack ascension-next step is RL agents :) 2. I wrote a post discussing some @NetHack_LE challenges & how they map to open problems in RL & agentic AI. Still the best RL benchmark imo. mikaelhenaff.substack.com/p/…
5
13
62
11,619
Exploration is well-studied for singleton MDPs, but many envs of interest change across episodes (i.e. procgen envs or embodied AI tasks). How should we explore in this case? arxiv.org/abs/2306.03236 In our upcoming @icmlconf oral, we study this question. A thread...1/N
1
9
47
13,950
Another banger led by dream team @MartinKlissarov and @proceduralia, to be presented at ICLR 2025. MaestroMotif is a hierarchical agent which zero-shot composes Motif skills using an LLM controller, reaching new depths of the NetHack dungeon. Code available!
Can AI agents adapt zero-shot, to complex multi-step language instructions in open-ended environments? We present MaestroMotif, a method for AI-assisted skill design that produces highly capable and steerable hierarchical agents. To the best of our knowledge, it is the first method that, without expert labeled datasets, solves compositional tasks requiring hundreds of steps for completion. All the modules within MaestroMotif are learned from interaction: from the highest level of planning to the lowest-level of sensorimotor control. On the open-ended domain of NetHack, it surpasses existing approaches, including those that are fine-tuned specifically for each task. At the heart of MaestroMotif is the idea that decomposing a task into subtasks significantly helps decision making. MaestroMotif leverages an agent designer's intuition about a domain to identify important skills and describe them in natural language. These short descriptions then get converted into adaptable hierarchical agents through AI feedback and in-context learning. Our paper was recently published at ICLR 2025 and we open-source the whole project including the code, prompts and pre-trained models. Paper: arxiv.org/abs/2412.08542 Code: github.com/mklissa/maestromo… NotebookLM Podcast: bit.ly/4jLi6mo This work was done with the amazing @HenaffMikael, @robertarail, @shagunsodhani, Pascal Vincent, @yayitsamyzhang, @pierrelux, Doina Precup, with equal supervision by @MarlosCMachado and @proceduralia. Take a look at the following thread:
4
4
39
35,042
Super stoked to share this work led by @proceduralia & @MartinKlissarov. Our method Motif uses LLMs to rank pairs of observation captions and synthesize dense intrinsic rewards specified by natural language. New SOTA on NetHack while being easily steerable. Paper+code in thread!
Can reinforcement learning from AI feedback unlock new capabilities in AI agents? Introducing Motif, an LLM-powered method for intrinsic motivation from AI feedback. Motif extracts reward functions from Llama 2's preferences and uses them to train agents with reinforcement learning. On the complex NetHack game, Motif solves previously unsolved tasks without needing any expert demonstrations. Surprisingly, Motif's reward leads to better game score than the one obtained by using the score itself as a reward. Given access to an event captioning mechanism, a few properties make Motif a general method: • it is entirely based on open models • the LLM doesn't need direct access to the environment dynamics (e.g., its source code) • the LLM doesn't need to understand observation and action spaces The best part? You can start using Motif right now, even on a small compute budget: the whole pipeline can take less than two GPU-days. Feel free to read our paper and try our code out. Paper: arxiv.org/abs/2310.00166 Code: github.com/facebookresearch/… Blog post: mila.quebec/en/article/motif… Work co-lead by @MartinKlissarov and myself, with @shagunsodhani @robertarail @pierrelux Pascal Vincent @yayitsamyzhang @HenaffMikael Learn more in the thread 🧵
2
3
36
9,043
Very happy to share that our Motif work was accepted at #ICLR2024 :) come say hi in Vienna!
Can reinforcement learning from AI feedback unlock new capabilities in AI agents? Introducing Motif, an LLM-powered method for intrinsic motivation from AI feedback. Motif extracts reward functions from Llama 2's preferences and uses them to train agents with reinforcement learning. On the complex NetHack game, Motif solves previously unsolved tasks without needing any expert demonstrations. Surprisingly, Motif's reward leads to better game score than the one obtained by using the score itself as a reward. Given access to an event captioning mechanism, a few properties make Motif a general method: • it is entirely based on open models • the LLM doesn't need direct access to the environment dynamics (e.g., its source code) • the LLM doesn't need to understand observation and action spaces The best part? You can start using Motif right now, even on a small compute budget: the whole pipeline can take less than two GPU-days. Feel free to read our paper and try our code out. Paper: arxiv.org/abs/2310.00166 Code: github.com/facebookresearch/… Blog post: mila.quebec/en/article/motif… Work co-lead by @MartinKlissarov and myself, with @shagunsodhani @robertarail @pierrelux Pascal Vincent @yayitsamyzhang @HenaffMikael Learn more in the thread 🧵
1
28
14,165
A simple way to help with #Covid_19 and medicine generally is to donate spare computer time to biomedical researchers through projects like @foldingathome or @RosettaAtHome. Small contributions add up to make distributed peta/exaFLOP supercomputers!
9
26
Replying to @_aidan_clark_
Step 1 doesn't have to be random: there is a large literature on directed exploration strategies, going back at least to Kearns and Singh's 2003 E^3 work that showed you can avoid the exponential sample complexity due to random random exploration.
2
1
24
5,974
The more I work with this env, the more its richness and complexity become apparent. Other than perception, it presents a hard challenge for nearly every other agentic capability, from long horizon planning and exploration to reasoning, memory and generalization.
Replying to @PaglieriDavide
The ultimate test? NetHack 🏰 This beast remains unsolved: the best model, o1-preview, achieved just 1.5% average progression. BALROG pushes boundaries, uncovering where LLMs/VLMs struggle the most. Will your model fare better? 🤔 They’re nowhere near capable enough yet!
1
23
2,201
The embodied AI team I'm part of at @MetaAI has multiple Research Scientist / Research Engineer positions open, come work with us ✨
(1/6) The FAIR Embodied AI team at @MetaAI has multiple full-time openings! If you’re interested in cutting-edge research in AI for robotics, AR and VR, and sharing it with the world, read on. 🧵
3
21
4,403
Also feel free to reach out if you want to grab coffee and chat about RL, exploration, generalization, LLMs for decision making, or anything else :) #ICML2023
2
18
3,748
Excited to share some recent work in imitation learning at #iclr2020, which uses an ensemble of policies to reduce covariate shift. Joint work with @xkianteb and Wen Sun. Paper: openreview.net/pdf?id=rkgbYy… Talk: iclr.cc/virtual/poster_rkgbY…
5
17
Stoked about this new benchmark for long-horizon planning, intrinsic motivation, procedural generalization and memory
I’m excited to announce Craftax, a new benchmark for open-ended RL! ⚔️ Extends the popular Crafter benchmark with Nethack-like dungeons ⚡Implemented entirely in Jax, achieving speedups of over 100x 1/
2
17
1,485
Excited to share our Fast3R paper, to be presented at CVPR 2025. This recasts 3D reconstruction and camera pose estimation from video as an end-to-end learning problem, leading to ~4x-300x improvements in speed while maintaining performance. Code, model & demo in thread!
⚡️ Excited to announce Fast3R: 3D reconstruction of 1000+ images in a single forward pass! Fast3R achieves 251 FPS at its peak. 🔥 Try the demo with your images or video! 🔗 Website: fast3r-3d.github.io 🎮 Demo: fast3r.ngrok.app/ #CVPR2025 #3D @AIatMeta
5
16
4,658
Internship is in NYC, if interested please email me at: mikaelhenaff at meta dot com and apply here: metacareers.com/jobs/5325490… looking forward to hearing from you!
1
16
2,466
This is a very exciting dataset - stochastic policies/dynamics, large action space, partial observability, rich dynamics, *very* large scale while still enabling fast experiments. Can't wait to start playing with it and hope others do too!
4
15
Interview of @sharathraparthy discussing our recent work showing that transformers can in-context learn new sequential decision-making tasks in new environments. Check it out! arxiv.org/abs/2312.03801
Episode 48: Sharath Chandra Raparthy @sharathraparthy (AI Resident at @AIatMeta) on In-Context Learning for Sequential Decision Tasks, GFlowNets, and more! podcasts.apple.com/us/podcas…
1
1
14
1,415
Latest work where we present OpenEQA, a modern embodied Q&A benchmark which tests multiple capabilities such as spatial reasoning, object recognition and world knowledge on which SOTA VLMS like GPT4V/Claude/Gemini fail. A new challenge for embodied AI! To be presented @CVPR.
Today we’re releasing OpenEQA — the Open-Vocabulary Embodied Question Answering Benchmark. It measures an AI agent’s understanding of physical environments by probing it with open vocabulary questions like “Where did I leave my badge?” More details ➡️ go.fb.me/7vq6hm All of today’s state-of-art vision+language models (VLMs) fall well short of human performance. In fact, for questions that require spatial understanding, today’s VLMs are nearly “blind” – access to visual content provides only minor improvements over language-only models. We hope that OpenEQA motivates additional research into helping AI understand and communicate about the world it sees.
6
15
1,871
We are hiring a research intern for next year - if you would like to work on hierarchical RL, world models, modular networks and related topics with @shagunsodhani, myself and other researchers at FAIR please reach out! :)
We are hiring a #research #intern at FAIR (@MetaAI) to work in areas related to #RL, #hierarchical RL, #modular #networks, and #world #models. Location: Montreal / New York / Remote. You can dm me your questions and resume!
14
In Hawaii for #ICML2023, presenting two works Tuesday: - A Study of Global and Episodic Bonuses in Contextual MDPs (poster at 2pm, oral at 6:10 pm) - Semi-Supervised Offline Reinforcement Learning with Action-Free Trajectories (poster at 11am) Hope to see you there :)
3
14
2,055
Replying to @tydsh
that is insane...way to shoot ourselves in the foot
14
3,203
Highly recommend
If you're an ML researcher or engineer impacted by recent events and want to continue pushing the frontier of intelligence through foundation modeling, reasoning, and large-scale RL, do get in touch!
1
13
4,762
Presenting our E3B work on exploration in changing environments at #NeurIPS at 11 am NOLA time in Hall J #105...come by and say hi! with @robertarail @MinqiJiang @_rockt
3
12
Nice article in @techreview about our paper on model-based RL with uncertainty regularization for #autonomousdriving
Reinforcement learning makes mistakes as it learns. That's fine when playing a board game. It's, erm, not great in a life-or-death situation. trib.al/fOcXnPh
5
11
@_rockt @HeinrichKuttler @_samvelyan @erichammy you might be interested, this method is able to make progress on the Oracle task without demos (although sometimes in unexpected ways ;))
3
8
320
Btw, the lead author @jed_yang is graduating this year and will be on the job market. Jed is highly motivated and creative, a great engineer and researcher who gets stuff to work, and has been a pleasure to work with...if you're hiring I suggest reaching out to him!
Excited to share our Fast3R paper, to be presented at CVPR 2025. This recasts 3D reconstruction and camera pose estimation from video as an end-to-end learning problem, leading to ~4x-300x improvements in speed while maintaining performance. Code, model & demo in thread!
2
10
2,168
Hierarchy is a natural way to tackle long horizons, but until now has remained at relatively small scale. With SOL, we identify and solve bottlenecks in scaling hierarchical RL, resulting in a ~35-580x speed increase over prior hierarchical methods.
2
12
717
Replying to @EugeneVinitsky
The difference between algorithms that explore efficiently vs. not is essentially polynomial vs. exponential sample complexity (itself a lower bound on compute complexity). Imo more compute can crack some harder poly problems but will eventually hit a wall with exponential ones:)
3
10
629
Replying to @goodfellow_ian
Really sorry to hear this. I had a bad case of LC as well in 2020 and few understand how brutal it is. Are you sure POTS is the main culprit? Asking because I had that diagnosis too but it later turned out to be wrong. This ended up helping me: covidlonghaulers.com
3
1
9
9,945
Please email {mikaelhenaff, sfujimoto} at meta dot come with [2026 internship] in the subject line. We'd be particularly interested in any examples of papers/work around hierarchical/off-policy/model-based RL, embodied AI, etc. Looking forward to hearing from you!
11
1,733
End to end memory networks in 2015 (arxiv.org/abs/1503.08895) by @tesatory were an important precursor in the sense that like the transformer (and unlike the NTM), they maintain the sequence structure and perform multiple layers of attention over it.
2
10
1,007
Takes me back to my days as a starry-eyed master's student, when Pytorch's grandparent Lush was still used in @ylecun 's lab <3 Lush was actually the first programming language I seriously learned (I'd been studying math until then). Such fond memories counting parentheses!
I wrote two blog posts about SN, Léon Bottou and @ylecun's 1988 Simulateur de Neurones. One is an English translation of the original paper, for which I've reproduced the figures. The other is a tutorial on how to run their code on Apple silicon. atcold.github.io/blog
1
8
1,725
New work led by @sharathraparthy and jointly with @robertarail @erichammy @_robertkirk showing that one can in-context learn completely *new tasks* on *new environments* via large-scale pretraining and few shot examples. To be presented at upcoming @NeurIPSConf FMDM workshop!
🚨 🚨 !!New Paper Alert!! 🚨 🚨 How can we train agents that learn new tasks (with different states, actions, dynamics and reward functions) from only a few demonstrations and no weight updates? In-context learning to the rescue! In our new paper, we show that by training transformers on large diverse datasets of sequences of demonstrations with certain properties, we can generalize to new Procgen or MiniHack tasks from only a few demonstrations and no weight updates! Paper: arxiv.org/pdf/2312.03801.pdf Work with these amazing collaborators @erichammy @_roberkirk @HenaffMikael @robertarail 1/13
2
9
1,673
SOL can be run on any RL problem for which we can define a few reasonable intrinsic rewards. We include some simple PointMaze and MiniHack environments to show this generality - this may also be useful for others working in HRL since they are faster to iterate on than NetHack.
1
10
700
Replying to @jsuarez
Procedural generation or settings where the environment changes across episodes. Exploration operates very differently in that setting and a lot of algorithms for static MDPs fail.
7
162
We demonstrate SOL's performance and scalability by training hierarchical agents for 30B steps on the complex game of NetHack, significantly outperforming flat agents and demonstrating promising scaling trends. Our agents still seem to be improving, even after 30B steps.
1
10
572
I'll be presenting a poster about some of our recent work on LLM-guided exploration and intrinsic motivation at the NY academy of sciences this coming Friday: nyas.org/shaping-science/eve… if you're in the tri-state area, it's a nice event to chat about ML in a relaxed setting.
1
7
1,063
Replying to @akbirkhan @MetaAI
definitely possible in NYC, would have to see about London...feel free to apply here metacareers.com/jobs/9018997… and send your cv to mikaelhenaff@meta.com :)
7
Nice opportunity to work with some great researchers!
Our group has multiple openings for internships at FAIR London (@MetaAI). I’m looking for someone to work on language models + decision making e.g. augmenting LMs with actions / tools / goals, interactive / open-ended learning for LMs, or RLHF. Apply at metacareers.com/jobs/8871228…
7
3,086
Replying to @HaqueIshfaq
Minihack is quite nice, there are lots of tasks and many of them are sparse reward, and it has the additional interesting twist of being procedurally generated. We have some code to train a variety of exploration algorithms here: github.com/facebookresearch/…
1
1
6
530
High performing, open source VLMs and smol LLMs now available...nature is healing🌱
📣 Introducing Llama 3.2: Lightweight models for edge devices, vision models and more! What’s new? • Llama 3.2 1B & 3B models deliver state-of-the-art capabilities for their class for several on-device use cases — with support for @Arm, @MediaTek & @Qualcomm on day one. • Llama 3.2 11B & 90B vision models deliver performance competitive with leading closed models — and can be used as drop-in replacements for Llama 3.1 8B & 70B. • New Llama Guard models to support multimodal use cases and edge deployments. • The first official distro of Llama Stack simplifies and supercharges the way developers & enterprises can build around Llama to support agentic applications and more. Details in the full announcement ➡️ go.fb.me/229ug4 Download Llama 3.2 models ➡️ go.fb.me/w63yfd These models are available to download now directly from Meta and @HuggingFace — and will be available across offerings from 25+ partners that are rolling out starting today, including @accenture, @awscloud, @AMD, @azure, @Databricks, @Dell, @Deloitte, @FireworksAI_HQ, @GoogleCloud, @GroqInc, @IBMwatsonx, @Infosys, @Intel, @kaggle, @NVIDIA, @OracleCloud, @PwC, @scale_AI, @snowflakeDB, @togethercompute and more. With Llama 3.2 we’re making it possible to run Llama in even more places, with even more flexible capabilities. We’ve said it before and we’ll say it again: open source AI is how we ensure that these innovations reflect the global community they’re built for and benefit everyone. We’re continuing our drive to make open source the standard with Llama 3.2.
7
970
It was also a pleasure working with @shagunsodhani @robertarail @yayitsamyzhang and Pascal Vincent on this project!
5
230
E3B sets a new SOTA on 16 challenging sparse-reward tasks from the MiniHack suite. In particular, it does so without requiring any feature engineering or task-specific prior knowledge. [6/N]
1
2
6
I think reorganizing information can be seen as adding new information encoded in the reorganization scheme. For example, if you are reorganizing N bits of information with program P of length K bits, you are effectively adding K bits of new information.
1
6
347
Replying to @_rockt @NetHack_LE
'nutritiously hard', sounds like a juicy problem and a hard nut to crack ;)
1
5
My experience has been that deep RL agents saturate at ~8-9% according to that metric
1
5
266
Replying to @FelixHill84
Sorry to hear about this Felix, but I'm glad things are starting to look up. I remember when we interned together in the early days which felt like a different world. I really admire both your scientific and human contributions to the field, wishing you well!
1
5
3,787
Exploration in standard MDPs is well studied, but what about contextual MDPs (CMDPs) where the environment changes each episode? This general framework captures scenarios such as procgen video games or embodied AI tasks where the agent must generalize across physical spaces.[2/N]
1
4
It has several of the minigrid envs ported but is a lot more challenging because count-based episodic bonuses do not work, we discuss some more here: arxiv.org/abs/2210.05805
4
189
@jsuarez I also think that solving NLE or similar envs with smaller models will yield insights in exploration, long-horizon decision-making etc that are needed for a gpt-10 worthy of its name
1
4
137
New paper accepted to #icml2020 - this takes steps towards bridging the theory-practice gap in RL by providing a provably sample-efficient algorithm for block MDPs which uses contrastive learning. Long version: arxiv.org/abs/1911.05815 #ReinforcementLearning
4
And for Nethack, |A| ~= 100, H >= 10000... Things change if we have a smart exploration algorithm though, which is one of the reasons RL is interesting :)
4
164
Replying to @alfcnz
When the turntable was invented, some people thought it was the end of music. Then people used it to make entirely new kinds of music (sampling, DJing etc). Human creativity always finds a way to express itself given the tools available :)
1
4
I think that "naive" tabula rasa RL cannot solve tasks like Nethack even with a universe sized computer. Consider a sparse reward task with |A|=10 actions and horizon H=100. Expected num samples with naive exploration is O(|A|^H) = O(10^100), more than # atoms in the universe :)
3
4
318
Replying to @CupiaBart
Nice work, it's great to see interest in NetHack! If you're in this space you might be interested in a couple other repos: github.com/facebookresearch/… github.com/facebookresearch/… In particular, Motif makes some progress on the very challenging Oracle task and uses SF as the RL env.
2
68
While exploration in CMDPs has recently started receiving attention, we show that existing methods critically rely on an episodic count-based bonus, and fail if this bonus is removed. This also means they fail in complex envs where each state is seen at most once. [3/N]
1
3
My good friend @arcanelibrary designs old-school D&D games and her latest kickstarter is up! I've had lots of fun playing Shadowdark, highly recommend if you're into RPGs :)
Shadowdark: The Western Reaches is now live on Kickstarter and funded in two minutes! kickstarter.com/projects/sha…
3
399
just checked out Movetodon and it's a very easy way to automatically follow all your twitter contacts on Mastodon...i was pleasantly surprised that lots of people are there already! hope to see you there movetodon.org/
1
4
597
these are for students in a PhD program, but we sometimes have AI resident spots too which do not have this requirement...will advertise if so
4
To address this limitation, we propose Exploration via Elliptical Episodic Bonuses (E3B). E3B uses an elliptical episodic bonus, which generalizes count-based episodic bonuses to continuous state spaces, paired with a feature extractor learned with an inverse dynamics model.[5/N]
1
2
Thanks Tim! I increasing appreciate the vision you, @HeinrichKuttler and the team had in creating this benchmark :)
3
178
We also evaluate E3B for reward-free exploration on Habitat, which provides photorealistic simulations of real indoor environments. Here, E3B outperforms existing methods by a wide margin. [7/N]
1
2
Replying to @_rockt @ylecun
I think there are several coupled points here: 1) can you correct based on past failure data? 2) prediction in discrete vs. continuous space 3) prediction at single vs. multiple levels of abstraction. All these probably involve different solutions.
1
3
196
Solving NetHack requires a lot of capabilities current RL/agentic systems do not have (in-context exploration, long-horizon decision-making, very long memory, combination of low-level control and leveraging high-level textual resources). These are needed for more general systems.
1
3
240
As discussed in lecture 1 :)
3
194
Replying to @alfcnz
Do you mean you got scooped on the April 1st joke? ;p
1
3
203
My friend Kelsey (aka @arcanelibrary ) designed a new D&D system inspired by the earlier versions of the game - simple, fast and deadly. I playtested the game during development and can't recommend it enough :) it's now available on Kickstarter! kickstarter.com/projects/sha…
3
414
Replying to @nntsn @NetHack_LE
I couldn't believe it was happening - thank the RNG I found a few amulets of life saving along the way ;p
1
3
257
So sorry to hear this. I remember lots of fun times during our internship together. Rest in peace Felix.
2
723
Replying to @jsuarez
Thanks! The base SF agent gets ~50k SPS, SOL gets ~43k largely due to the v-trace operation (which is more complicated and slower). I pushed some code to Cython but suspect more could be done.
2
2
89
...for NGU's KNN-based bonus, if one of the dimensions has much larger scale than the others it can dominate the bonus due to euclidean distance being used
2
Very nice work by @mklissar on learning long-horizon exploratory behaviors using Laplacian eigenfunctions.
🎉I'm particularly excited to share this project I worked on under the guidance of @MarlosCMachado 🧙 We ask: what is the *right scaffold* for building temporal abstractions, from the ground up? Website: sites.google.com/view/dceo/ It will be presented next week at #ICML2023 🏝️
2
436
Replying to @NicoBohlinger
Thanks! We didn't compare to NGU but others have found it not to work well on procgen envs: openreview.net/pdf?id=j3GK3_… One conceptual difference is that the elliptical bonus automatically normalizes wrt scale but NGU's KNN-based one doesn't which means a few features could dominate
1
2
Replying to @UCL_DARK
Big congrats Dr. @MinqiJiang !!! Very well deserved and it's been a pleasure collaborating during your time at FAIR. Looking forward to seeing what you come up with next :)
1
2
196
Replying to @robertarail
Congrats and welcome Roberta!
2
e.g. 1) is fixable by RL, planning or iterative relabeling (i.e. DAgger), 2) by appropriate (non-quantized) state & action space, 3) by hierarchical architecture.
1
2
97
Finally, we conduct a systematic comparison of global & episodic design choices across 16 MiniHack tasks. We find that combining the episodic E3B bonus with the global RND bonus sets a new SOTA on MiniHack. Multiplying is also consistently better than adding. 14/N
1
2
207
New paper at #NeurIPS2020 presenting PC-PG, a policy gradient algorithm that explores by growing a set of policies covering the set of possible states. Polynomial sample complexity in the linear case, and plays nice with modern deep RL methods. arxiv.org/abs/2007.08459
2
I think what @AvivTamar1 may be saying is (correct me if wrong): you can have a pixel-based dynamics model, but compute the cost function in a different (e.g. latent) space. This can simplify the optimization of the cost function while still having pixel-based WM.
2
108
Oh interesting, yes that sounds quite related! Yeah algorithmic complexity leads to cool thought experiments despite being not practical unless you have a universe sized computer ;p
2
39
created an account on the celestial network, hope to see you there!
2
274
Near-Optimal Reinforcement Learning in Polynomial Time - UPenn CIS cis.upenn.edu/~mkearns/paper… This is based on the idea of novelty bonuses, which has also been extended to deep RL settings (e.g. RND, ICM, pseudocounts, etc)
1
182
Contextual MDPs are MDPs where the environment changes each episode, and have been gathering increasing interest. For example, Procgen, NetHack/MiniHack, Minecraft/Crafter and embodied AI envs all fall within this category. How should we best explore in this setting? 2/N
1
2
408
We share our code - excited to see what people build with this! Many thanks to @qqyuzu @adityagrover_ @yayitsamyzhang @brandondamos for another fun collaboration.
1
2
383
Replying to @yayitsamyzhang
Woo congrats Amy!! UT is a great place (I did my undergrad there)
2
That's awesome, 15 years is a lot of number crunching!
1
We also conduct pixel-based experiments on Habitat and Montezuma's Revenge, which suggest that the tradeoffs between global & episodic bonuses we identified previously apply more broadly. The combined bonus helps, but less than before - this remains an open area of research. 13/N
1
1
134
Replying to @proceduralia
Congrats Dr., very well deserved!
1
215