Assistant Professor at University of Washington. I like robots, and reinforcement learning. Previously: post-doc at MIT, PhD at Berkeley

Seattle, WA
Here’s a pretty weird and surprising result - retrieval-augmented generation works unreasonably well for robot learning – but only when parameterized using difference vectors! We introduce Difference-Aware Retrieval Policies for Imitation Learning (DARP), a simple, semi-parametric RAG architecture for imitation learning that achieves gains of up to 200% over standard behavior cloning. No additional assumptions beyond BC, just a little architecture switch! The theory backing it up is pretty cool too and it works on real robots! :) Play with our website to understand better: weirdlabuw.github.io/darp-si… 🧵(1/7)
6
27
165
20,449
Thrilled to share that I will be starting as an assistant professor at the University of Washington @uwcse in Fall 2022! Grateful for wonderful mentors and collaborators at @berkeley_ai, especially @svlevine and @pabbeel. Looking forward to joining the wonderful folks @uwcse!
30
23
453
Punchline: World models == VQA (about the future)! Planning with world models can be powerful for robotics/control. But most world models are video generators trained to predict everything, including irrelevant pixels and distractions. We ask - what if a world model only predicted the semantic information necessary for decision-making? Introducing Semantic World Models (SWM). Given an observation and an action sequence, SWMs cast modeling as answering textual questions about the future outcome resulting from the actions. Recasting world modeling as a VQA problem lets us directly leverage the pretrained knowledge and machinery of VLMs for generalizable modeling. We had a lot of fun thinking about how this work helps connect these two seemingly very different fields of study - VLMs and world models! 🧵(1/6) Paper: arxiv.org/abs/2510.19818 Fun demo: weirdlabuw.github.io/swm
12
71
411
61,167
Anyone who knows me knows I love real world RL :) But anyone who works on real-world RL knows it’s quite a pain to get going. We tried to make everyone’s life easier by writing a software suite to get you going with real world RL out of the box, without all the pain! A 🧵(1/5)
3
32
245
33,323
So I hear that behavior cloning is all the rage now. What if we could do better, but with the same data? :) In CCIL, we show that imitation via BC is improved by synthesizing corrective labels to account for compounding error, without interactive oracles. Lets you do 👇! 🧵(1/9)
5
39
256
53,912
Imitation learning is great, but needs us to have (near) optimal data. We throw away most other data (failures, evaluation data, suboptimal data, undirected play data), even though this data can be really useful and way cheaper! In our new work - RISE, we show a simple way to *use all of this non-optimal data to robustify imitation learning* with minimal requirements beyond BC. Key idea: use non-expert data to learn how to *recover* back to expert data with a minimal frills offline RL that works under sparse data coverage. Allows usage of *all* available data, not just expert data - never throw your data away! Paper: arxiv.org/abs/2510.19495 Website: uwrobotlearning.github.io/RI… A 🧵(1/10)
7
45
234
20,578
I am recruiting PhD students to join us in the Washington Embodied Intelligence and Robotic Development Lab (WEIRD) weirdlab.cs.washington.edu/ at @uwcse. We work on robot learning, especially RL in the real world! Check out tinyurl.com/guptauw for details (1/3)
6
42
212
So you’ve trained your favorite diffusion/flow based policy, but it’s just not good enough 0-shot. Worry not, in our new work DSRL - we show how to *steer* pre-trained diffusion policies with off-policy RL, improving behavior efficiently enough for direct training in the real world! DSRL retains nice exploration from the base policy, but allows for quick improvement beyond this base policy with RL. The method is frustratingly simple, and super easy to throw on top of your favorite pretrained policy (VLA/diffusion policy, etc). diffusion-steering.github.io Let’s think about how it works, 🧵 (1/10)
6
26
194
19,072
So we did a bunch of projects with real world reinforcement learning - but it was often too inefficient to be practical to train tabula rasa. This suggests we need better priors, but acquiring these from on-robot data can often be expensive as well. In our recent work, we show that despite being fundamentally inaccurate, simulation can guide provide a cheap way to guide real-world RL finetuning to be super efficient! We propose Simulation-Guided Fine-Tuning (SGFT) - a simple paradigm for sim2real finetuning that uses simulation to provide reward shaping that accelerates real world RL finetuning *beyond* just providing an initialization. TLDR: Use value functions from sim to shape rewards for real-world RL, see large sample efficiency improvements 🧵(1/6)
3
27
189
13,631
Imagine this: you drop your robot in an environment, connect it to the internet and come back 10 hours later, and it has learned to solve tasks in the real world, autonomously, with no effort from you! We enable this in our work -Guided Exploration for Autonomous RL (GEAR)🧵(1/5)
3
23
178
24,732
Excited to share our work on uncertainty estimation using diffusion/score matching! The idea is simple: offline optimization (eg model-based RL, imitation) require us to estimate uncertainty. Estimating uncertainty is hard - score matching provides a scalable solution. A🧵(1/5)
2
18
167
32,169
Excited to share our work on self-supervised RL by modeling random features. The key premise behind RaMP is to learn about environment dynamics, without learning a dynamics model! This allows for transfer, without accruing compounding error. arxiv.org/abs/2305.17250 A🧵 (1/6)
1
29
155
28,997
Intrigued by decision transformers, we investigated why and when we should use return-conditioned RL as an alternative to dynamic prog (DP). Our findings are neat! With data coverage, RCSL can outperform DP, but fail to "stitch" trajectories. We analyze and propose a fix. 🧵(1/N)
3
20
160
42,237
Combinatorial complexity is often the bane of imitation learning - including VLA models! @Jesse_Y_Zhang and @memmelma proposed a way around this, using VLMs to perform problem reduction for imitation. The insight is simple - 1) High-level VLM takes a complex scene/task and reducing it a minimal representation (via masking and path prediction) that is needed to act in the world. 2) A low-level policy then takes this reduced representation and generates actions to be executed in the world. The high-level policy absorbs all the combinatorial complexity of the problem, leaving the low-level to focus on dexterity and geometric reasoning. Super simple, works really well across policy classes and problem settings! - 41.4× sim2real improvement (3DDA) and 2–3.5× boosts for π₀ and ACT in the real world. Paper: arxiv.org/abs/2509.18282 Website: peek-robot.github.io Demo: peek.a.pinggy.link Fun collaboration led by @Jesse_Y_Zhang @memmelma with lots of collaborators! Let us know what you think 😀
3
25
145
22,375
Excited about @ZoeyC17's new work on real2sim for robotics! We present URDFormer, a technique to learn models that go from RGB images to full articulated scene URDFs in sim by "inverting" pre-trained generative models. These can be used to train robots for the real-world! 🧵(1/8)
2
24
143
23,320
So you want to do robotics tasks requiring dynamics information in the real world, but you don’t want the pain of real-world RL? In our work to be presented as an oral at ICLR 2024, @memmelma showed how we can do this via a real-to-sim-to-real policy learning approach. A 🧵 (1/7)
1
26
136
19,984
Constructing interactive simulated worlds has been a challenging problem, requiring considerable manual effort for asset creation and articulation, and composing assets to form full scenes. In our new work - DRAWER, we made the process of creating scenes in simulation as simple as taking a video of the scene and out comes a high-quality, fully interactive environment in simulation. No human simulation designer involved! drawer-art.github.io/ A 🧵(1/7)
4
24
136
12,068
World modeling and imitation learning have largely been considered two disparate worlds. In our recent work, Unified World Models, just accepted to #RSS2025, @chuning_zhu provides a dead-simple unifying solution: just train a joint diffusion model over actions and future states, but with *decoupled* diffusion time steps across these modalities. Manipulating these decoupled time steps then allows for marginalization or conditioning on actions or states; a single model can serve as a policy, forward dynamics model, video prediction model, or inverse dynamics model by simply setting diffusion timesteps carefully. The resulting model can leverage video datasets along with robot training data much more effectively, and shows improved robustness, generalization, and flexibility. This is exciting because it is frustratingly simple, scalable, and shows strong improvement on real-world robotics problems. Please refer to @chuning_zhu 's excellent thread for more details! More details/code can be found on our website and in the paper - weirdlabuw.github.io/uwm/
Scaling imitation learning has been bottlenecked by the need for high-quality robot data, which are expensive to collect. But are we utilizing existing data to the fullest extent? A thread (1/11)
4
14
134
11,429
Excited to share our work on reset-free fine-tuning bootstrapped by offline data. We show results in a real-world kitchen, with a robot practicing autonomously to improve for over a day with minimal intervention! Paper: arxiv.org/abs/2203.15755 Website: dbap-rl.github.io
4
21
121
Haven't been to a conference in a while, really excited to be at #NeurIPS2024! I'll be helping present 4 of our group's recent papers: 1. Overcoming the Sim-to-Real Gap: Leveraging Simulation to Learn to Explore for Real-World RL arxiv.org/abs/2410.20254 2. Distributional Successor Features Enable Zero-Shot Policy Optimization arxiv.org/abs/2403.06328 3. Learning to Cooperate with Humans using Generative Agents arxiv.org/abs/2411.13934 4. Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning arxiv.org/abs/2408.10075 Find more details on each paper and where to find us in this thread (1/6)
2
17
125
10,801
Learned visuomotor policies are notoriously fragile, they break with changes in conditions like lighting, clutter, or object variations amongst other things. In @yunchuzh's latest work, we asked whether we could get these policies to be robust and generalizable with a clever choice of visual representation! The argument we made was - we want a choice of visual representation that specifically adapts to be sufficient, yet minimal for the task at hand. We thought about it from the perspective of flexible, key-point based representations. The key question becomes - how do we choose a sufficient, task-specific, yet minimal set of keypoints as a representation for policy learning. Yunchu proposes a neat way of automatically selecting task-relevant keypoints using a standard supervised learning objective, and using this for robust policy learning. This is largely under the same assumptions as behavior cloning, but with huge gains on robustness. Let’s understand how, 🧵 (1/8)
1
18
122
11,543
Over the last few months, we’ve been thinking about how to learn from “off-domain” data - data from non-robot sources like video or simulation. These data sources are not quite good enough to learn policies (even monolithic VLA models) directly, but they still contain lots of information that can be useful for generalizable robot control. How can we develop robot learning models that are able to make use of this type of data for generalizable control? In new work, that we call HAMSTER, we show that VLMs can be useful for enabling robotic learning from off-domain data, but specifically when used through hierarchical VLA architectures. We show that this class of models can learn generalizable robot policies for the real world from large-scale, off-domain data. A 🧵 (1/10)
2
19
117
12,040
Very excited to be at #ICLR2025 in Singapore helping present some of the work done by our group! We'll be presenting 4 papers: 1. Rapidly Adapting Policies to the Real-World via Simulation-Guided Fine-Tuning weirdlabuw.github.io/sgft/ 2. Robot Sub-Trajectory Retrieval for Augmented Policy Learning weirdlabuw.github.io/strap/ 3. HAMSTER: Hierarchical Action Models For Open-World Robot Manipulation hamster-robot.github.io/ 4. SRSA: Skill Retrieval and Adaptation for Robotic Assembly Tasks arxiv.org/abs/2503.04538 Find more details on each paper and where to find us in this thread (1/6)
3
19
118
8,397
Replying to @shaneguML
Agreed in principle! But I did find the original DAgger paper a little hard to parse on first read. Some resources from our colleagues that I thought were a bit easier to approach: rail.eecs.berkeley.edu/deepr… ri.cmu.edu/pub_files/2015/3/… wensun.github.io/CS4789_data… Hope these are helpful :)
1
6
120
8,187
In my experience, robot 'generalists' are often jacks of all trades but masters of none. In training across multiple tasks and environments, robot policies fail to generalize robustly and effectively to each particular test setting. What if at test time, we non-parametrically *retrieved* “relevant” data from the training set and used it to significantly improve the performance of few-shot imitation learning to be robust to various test time scenes. Notably, we are *not* collecting lots of new data, just training more on sub-components of the same training data! Now, we’re certainly not the first to suggest retrieval, but in our new work - STRAP, we show how retrieving relevant *sub-trajectories* from offline datasets can significantly increase data reuse across tasks, when paired with an appropriate metric space. A 🧵 (1/7)
2
21
115
12,062
“How can you enable your parents to train your robot?” We propose a system for enabling robot learning by hooking up a robot to the web, using noisy, occasional feedback from non-experts to guide exploration. Enables robot learning in sim and real w/out reward engineering!🧵(1/8)
2
20
109
27,523
Robot learning in the real world can be expensive and unsafe in human-centric environments. Solution: Construct simulation on the fly and train in it! Excited to share RialTo, led by @marceltornev on learning resilient policies via real-to-sim-to-real policy learning! A 🧵 (1/12)
2
20
113
38,065
Excited to be working with all these amazing people very soon! Exciting times ahead😀 On that note I'm also hoping to recruit students this cycle to start in Fall 22. If you like ML and robotics and want to get things to work in the real world, definitely apply to UW!, 1/3
Not even a pandemic could slow down #UWAllen faculty hiring. Over the past 2 cycles, we welcomed 15 (yes—15!) outstanding researchers and educators who have joined/will soon join us at @uwengineering @UW Seattle. Meet these new members of our community: news.cs.washington.edu/2021/…
1
9
110
Who doesn’t love good methods for reward inference. What if I told you that you could extract dense rewards from video, by ranking frames temporally using the BT model from RLHF (aka just doing temporal classification with cross-entropy). Let's see how, in rank2reward - a🧵(1/10)
2
19
99
13,793
New work from my time at MIT! We introduce Distributionally Adaptive Meta-Reinforcement Learning (DiAMetR) - arxiv.org/abs/2210.03104. Meta-RL struggles when test-tasks are OOD, which arguably is most of the time! We propose an algorithm resilient to distribution shift. 🧵 (1/N)
1
13
100
Want to get model-based RL to work in diverse, dynamic scenes? Check out @chuning_zhu's latest work (RePo) on model-based reinforcement learning without reconstruction, where we show how to learn world models that scale to dynamic, multi-task environments. A 🧵(1/6)
5
17
94
20,417
So I heard we need more data for robot learning :) Purely real world teleop is expensive and slow, making large scale data collection challenging. I’ve been excited about getting more data into robot learning, going beyond just real-world teleop data. To this end, we’ve been scaling up data generation with RL in realistic simulations generated on the fly from crowdsourced videos. Enables realistic data collection, much more cheaply than purely real world teleop. Importantly, data collection becomes even*cheaper* with more environments, allowing training with over 100x more data. Transfers to real robots for generalizable manipulation. A 🧵 (1/N)
2
23
89
13,345
I'm truly so tired of reading reviews about "novelty". What does that even mean... #ICML2023
3
2
80
19,257
Most offline RL methods try to constrain policies from deviating far from the offline data distribution. In cases where the data distribution is imbalanced or suboptimal, this makes it hard to actually learn good behavior! In new work, @ZhangWeiHong9 proposes a solution 🧵 (1/5)
4
9
75
13,373
Over the last year, we’ve been investigating how simulation can be a useful tool for real-world reinforcement learning on a robot. While simulation captures inherently incorrect dynamics, it can still be useful for real-world learning! In our #NeurIPS2024 work, Andrew W. theoretically showed how naive sim2real transfer can be inefficient, but if you *learn how to explore* in simulation, this can be provably efficient in transferring to the real world! We then pair this theory with robot experiments to validate this for real-world settings. 🧵 (1/6)
3
16
76
5,733
We've been working on getting robots to learn in the real world with many hours of autonomous reset free RL! Key idea is to leverage multi-task RL to enable scalable learning with no human intervention. Allows learning of cool dexterous manipulation tasks in the real world!
After over a year of development, we're finally releasing our work on real-world dexterous manipulation: MTRF. MTRF learns complex dexterous manipulation skills *directly in the real world* via continuous and fully autonomous trial-and-error learning. Thread below ->
1
11
68
Sharing two recent talks from my advisor @svlevine covering much of my recent work, as well as work from many of my colleagues. I really enjoyed watching these, they give a really cool perspective on frontiers of RL piped.video/watch?v=4vK6X9Jr… piped.video/watch?v=sXQlQg7H…
3
11
58
New work on learning how to grasp and navigate with mobile robots using RL. What I find very exciting is the ability of the system to be trained for > 60 hrs with minimal intervention, learning in diverse scenarios. Paper: arxiv.org/pdf/2107.13545.pdf Website: sites.google.com/view/relmm
4
8
57
I'm sadly unable to be at #RSS2025 this year, but my students @prodarhan, @chuning_zhu and @marceltornev will be! Find them presenting some exciting work today, 6/21: 1) @chuning_zhu will present Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets Spotlight talk: 4:30-5:30 pm (Bovard auditorium) Poster: 6:30-8:00pm, poster #50 (Associates park) Paper: arxiv.org/abs/2504.02792 Website: weirdlabuw.github.io/uwm/ 2) @prodarhan and @marceltornev will present Robot Learning with Super-Linear Scaling Spotlight talk: 5:30-6:30 pm (Bovard auditorium) Poster: 6:30-8:00pm, poster #58 (Associates park) Paper: arxiv.org/abs/2412.01770 Website: casher-robot-learning.github… Hope y'all can make it!
1
6
54
3,097
Excited to share the first of several papers toward leveraging generative models as data sources for RL! RL sees minimal data, gen models see lots of data. We show that gen models (here LLMs) can provide background info for RL common sense (here exploration)! Thread by @d_yuqing!
How can we encourage RL agents to explore human-meaningful behaviors *without* a human in the loop? @OliviaGWatkins2 and I are excited to share “Guiding Pretraining in Reinforcement Learning with LLMs”! 📜arxiv.org/abs/2302.06692 🧵1/
1
5
55
11,450
How can we enable transferable decision-making for *any* reward zero-shot? MBRL is task-agnostic but suffers from compounding error, while MFRL is task-specific. We propose a new class of world models that transfers across tasks zero-shot and avoids compounding error! A 🧵 (1/9)
1
12
56
3,604
Fun blog post on our work on unsupervised meta-reinforcement learning, for doing meta-reinforcement learning without explicit human provided task distributions! bair.berkeley.edu/blog/2020/… blog.ml.cmu.edu/2020/05/01/u… And associated paper arxiv.org/abs/1806.04640
19
46
Excited about our work on understanding the benefits of reward shaping! Reward shaping is critical in a large portion of practical RL problems and this paper tries to understand when and why it helps. Terrific collaboration with @aldopacchiano Simon Zhai @svlevine @ShamKakade6!
In theory RL is intractable w/o exploration bonuses. In practice, we rarely use them. What's up with that? Critical to practical RL is reward shaping, but there is little theory about it. Our new paper analyzes sample complexity w/ shaped rewards: arxiv.org/abs/2210.09579 Thread:
5
51
Very very exciting to have @Jesse_Y_Zhang join us at UW soon! He's done some incredible work - I'd recommend reading rewind-reward.github.io/! Congratulations on a fantastic Ph.D. @Jesse_Y_Zhang 🎉
Yes, I’ll be working with @fox_dieter17849 and @abhishekunique7 on enabling real world autonomous learning; super excited!!
3
47
4,664
While investigating RLHF methods last year, @sriyash__ and @yanming_wan noted that human annotators in a population often display diverse and conflicting preferences. While typical RLHF methods struggle with this diversity, we developed new techniques for plurastic RLHF! 🧵(1/7)
1
11
46
5,066
Tried to share some tips on faculty applications, do take a listen if you're thinking of applying. Hope it can be helpful! Thanks for having me @talkingrobotics!
"Start writing your research statement in the summer." Abhishek Gupta provided the BEST ADVICE if you are preparing for the #academic #job #market. This talk has TONS OF TIPS from his own experience in the job market last year. Listen now (links below). @uw @uw_robotics @uwcse
2
3
46
I aspire to give talks like this; piped.video/TN1M6vg4CsQ?feature… Yay @RussTedrake and TRI for helping inject some rigor into an all too confusing field! :)
1
46
3,528
Many in the robotics community have had a hunch that more is going on with diffusion policies than just multimodality. @max_simchowitz and colleagues with another extremely insightful paper on why :) Really enjoyable read!
There’s a lot of awesome research about LLM reasoning right now. But how is  learning in the physical world 🤖different than in language 📚? In a new paper, show that imitation learning in continuous spaces can be exponentially harder than for discrete state spaces, even when the underlying dynamics are seemingly benign and insensitive to perturbations. (1/n)🧵
1
45
3,311
I’m very very excited about led by @avivnet at #ICLR2023 on learning deep control policies that can extrapolate using a transductive approach. We show how we can get neural network policies to extrapolate without significant domain-specific assumptions. A 🧵 to explain how: (1/6)
1
3
37
8,168
Excited to share a new large-scale dataset for in-the-wild robotic learning! It was an honestly eye-opening experience for our whole group to be a part of this. Thanks to @SashaKhazatsky, @KarlPertsch and the rest of the team for putting together an amazing dataset! 🤖
After two years, it is my pleasure to introduce “DROID: A Large-Scale In-the-Wild Robot Manipulation Dataset” DROID is the most diverse robotic interaction dataset ever released, including 385 hours of data collected across 564 diverse scenes in real-world households and offices
1
2
39
2,509
Presenting "Ingredients of Real World Robotic RL" at ICLR 2020, 4/26 10pm-12am PST & 4/27 5am-7am PST. Blog: bair.berkeley.edu/blog/2020/… Paper: openreview.net/forum?id=rJe2… Descriptive Video: piped.video/watch… Poster livestream: iclr.cc/virtual/poster_rJe2s…
1
12
35
Some cool new updated results for offline pre-training followed by online fine-tuning with AWAC (advantage-weighted actor-critic). Offline RL does cool things on robots!
How can we get robots to solve complex tasks with RL? Pretrain with *offline* RL using prior data, and then finetune with *online* RL! In our updated paper on AWAC (advantage-weighted actor-critic), we describe a new set of robot experiments: awacrl.github.io/ thread ->
6
35
Excited to share our work on benchmarking reset free RL. We hope this presents a way to go beyond the standard episodic assumptions made in robotic RL, making it practical for the real world!
Embodied agents such as humans and robots live in a continual non-episodic world. Why do we continue to develop RL algorithms in episodic settings? This discrepancy also presents a practical challenge -- algorithms rely on extrinsic interventions (often humans) to learn ..
1
1
35
Big win for JHU! Go do cool stuff with @mangahomanga!
I'll be joining the faculty @JohnsHopkins late next year as a tenure-track assistant professor in @JHUCompSci Looking for PhD students to join me tackling fun problems in robot manipulation, learning from human data, understanding+predicting physical interactions, and beyond!
3
35
7,399
Exciting to see what @pabbeel, Anusha Nagabandi, @clavera_i, @CarlosFlorensa, Nikhil Mishra and other friends at covariant have been up to!
Today, we are introducing RFM-1, our Robotics Foundation Model giving robots human-like reasoning capabilities.
6
33
8,658
Replying to @harshit_sikchi
When one starts to feel the AGI😁
31
1,003
Excited to share a new blog post on our work on learning informative rewards for RL! By considering a more tractable class of outcome driven RL problems and a particular choice of uncertainty aware classifier, we learn more informative reward functions bair.berkeley.edu/blog/2021/…
2
6
31
I will be on an island in the Puget Sound this weekend, so sadly I will be missing #CoRL2025tv! But luckily the amazing students who did all the work anyways, will be 😄 Here's what the WEIRD lab at the University of Washington has going on at CoRL this time We'll be presenting 3 papers at the main conference: 1. Steering Your Diffusion Policy with Latent Space Reinforcement Learning diffusion-steering.github.io… (Oral, Nominated for Best Paper) 2. ATK: Automatic Task-driven Keypoint Selection for Robust Policy Learning yunchuzhang.github.io/ATK/ 3. RoboArena: Distributed Real-World Evaluation of Generalist Robot Policies robo-arena.github.io/ (Oral) I will be giving a talk at the RemembeRL workshop rememberl-corl25.github.io/ Plus we have several more at the workshops! Find more details on each paper below 🧵 (1/9)
1
11
32
3,455
I remember when I was first starting to work on dexterous hands, we were thinking about how to find and grasp objects in the dark with touch sensing. Here are our initial attempts at this problem taochenshh.github.io/project… arxiv.org/abs/2303.13482
1
2
32
2,885
Check out @yunchuzh's new work on automatically selecting keypoints as a representation for super robust policy learning!
How should a robot perceive the world? What kind of visual representation leads to robust visuomotor policy learning for robotics? Policies trained on raw images are often fragile—easily broken by lighting, clutter, or object variations—making it challenging to deploy policies learned via imitation learning in high variability test conditions. This same fragility is also reflected in the difficulty in transferring visuomotor policies from simulation to reality for robotic manipulation. Introducing ATK yunchuzhang.github.io/ATK/: an automatic task-driven method for selecting flexible keypoint-based visual representations that enables robust, generalizable robotic manipulation with minimal human effort.(1/8)👇
1
1
31
3,342
Check out some of our new work on distributed robot evaluation led by @KarlPertsch, @pranav_atreya and @tonyh_lee! Hopefully folks can contribute, and help us take a step towards systematic and standardized empiricism in robot learning! :) Also check out some of the fun sim eval tools contributed by @prodarhan!
We’re releasing the RoboArena today!🤖🦾 Fair & scalable evaluation is a major bottleneck for research on generalist policies. We’re hoping that RoboArena can help! We provide data, model code & sim evals for debugging! Submit your policies today and join the leaderboard! :) 🧵
4
31
3,363
MIT covering some of our work! Led by @marceltornev along with @pulkitology @anthonysimeono_ @taochenshh and others. Give it a read :)
To automate time-consuming tasks like household chores, robots must be precise & robust for very specific environments. With MIT’s “RialTo” method, users can scan their surroundings w/their phone so a robot can practice in a digital twin environment. This novel real-to-sim-to-real approach allows the machines to train much faster & safer than they would in the real world: bit.ly/4dqf0QL
4
25
3,064
Yay! Very well deserved @pabbeel!
Congratulations to @UCBerkeley’s Pieter Abbeel (@pabbeel) on receiving the 2022 @IEEEorg Kiyo Tomiyasu Award, sponsored by the late Dr. Kiyo Tomiyasu, @IEEE_GRSS, and @IEEEMTT, for contributions to #DeepLearning for #Robotics: bit.ly/IEEEAwards2022-TFAs #IEEEAwards2022 #IEEETFAs
1
23
Some of our most exciting work on new ways to do world modeling and zero-shot transfer! This work is important in reimagining what a generalizable world model looks like beyond autoregressive prediction. Check out @chuning_zhu's thread for details.
How can we train RL agents that transfer to any reward? In our @NeurIPSConf paper DiSPO, we propose to learn the distribution of successor features of a stationary dataset, which enables zero-shot transfer to arbitrary rewards without additional training! A thread 🧵(1/9)
3
23
2,154
Check out RoboHive - our new unified robot learning framework, tons of cool new environments, tasks, platforms. We hope this can be a helpful tool for folks in robot learning and beyond!
📢#𝗥𝗼𝗯𝗼𝗛𝗶𝘃𝗲 - a unified robot learning framework ✅Designed for genralizn first robot-learning era ✅Diverse (500 envs, 8 domain) ✅Single flag for Sim<>Real ✅TeleOper Support ✅Multi-(Skill x Task) realworld dataset ✅pip install robohive tinyurl.com/robohive 🧵👇
2
21
5,092
I'm unfortunately not at @iclr_conf, but our group and collaborators are presenting 4 papers this year! Come meet the awesome students presenting this work :) A 🧵 (1/5)
1
2
24
4,884
We hope this can be a useful tool to help use RL on your robots! Happy RL-ing. Website: serl-robot.github.io Code: github.com/rail-berkeley/ser… w/ @jianlanluo,@real_ZheyuanHu, Charles Xu, @youliangtan, @archit_sharma97, Stefan Schaal, @chelseabfinn, @svlevine (5/5)
1
1
22
1,960
Exciting work from @marceltornev and friends!
Giving history to our robot policies is crucial to solve a variety of daily tasks. However, diffusion policies get worse when adding history. 🤖 In our recent work we learn how adding an auxiliary loss that we name Past-Token Prediction (PTP) together with cached embeddings enables us to reliably add longer history context to our robot policies! 🧠 We also show how PTP enables some test-time scaling techniques for robotics! 🚀
1
1
23
2,361
Replying to @Ar_Douillard
I wouldn’t call out authors publicly - it can be immensely demoralizing, especially for junior authors. If you’re keen on providing them feedback, I’d send it to them privately and constructively and they can choose to use it to improve their work :)
21
580
Hell yea real world RL :)
Real-world RL, where robots learn directly from physical interactions, is extremely challenging — especially for high-DoF systems like mobile manipulators. 1⃣ Long-horizon tasks and large action spaces lead to difficult policy optimization. 2⃣ Real-world exploration with whole-body contact raises serious safety concerns. 🚀 Introducing SLAC, a framework that brings safety and efficiency to whole-body real-world RL. Paper: arxiv.org/abs/2506.04147 Video: piped.video/watch?v=bj5GhjZb… 🧵
1
2
22
2,623
Excited to share work led by Max Simchowitz on principled ways to approach combinatorial generalization using bilinear embeddings. Useful under “combinatorial” distribution shift - eg you’ve seen blue mugs, red mugs and blue cups, what happens when you see red cups? A 🧵 (1/3)
1
2
21
4,127
Gave a talk on dirty laundry in RL, ala advice from @Ken_Goldberg. Situated this in some dexterous manipulation work. Recordings should be up soon, y’all might enjoy it :) thanks @notmahi and the other organizers!
The first workshop on Learning Dexterous Manipulation at @RoboticsSciSys is starting now! Check out our speaker lineup at learn-dex-hand.github.io/rss… or tune in via zoom at learn-dex-hand.github.io/zoo… if you are not in person.
1
21
3,511
Check out our new work on learning human-AI cooperation agents using generative models. Led by @liangyanchenggg and @Daphne__Chen, to be presented at #NeurIPS2024 The overcooked game in the browser is fun to play :) sites.google.com/view/human-…
🎉 Excited to release our #NeurIPS2024 paper on zero-shot human-AI cooperation. For the first time, we use generative models to sample infinite human-like training partners to train a Cooperator agent. 🔥Experience it! 🚀Check out our 𝐥𝐢𝐯𝐞 𝐝𝐞𝐦𝐨 👉 sites.google.com/view/human-…
1
4
20
2,561
Max is 100% one of the smartest people I know and a fantastic mentor, go work with him!
A very exciting personal update: In January, I’ll be joining @CMUMLD as tenure-track assistant professor! My lab will focus on the mathematical foundations of, and new algorithms, for decision making. This includes everything from reinforcement learning in the physical world (diffusion-ppo.github.io/), to world modeling (boyuan.space/diffusion-forci…), to statistical guarantees for robotic agents (arxiv.org/abs/2307.14619). To learn more about my world, check out my personal webpage: msimchowitz.github.io/ To prospective students, stay tuned for a thread about PhD and Masters hiring!
1
21
2,587
If you're at #NeurIPS2023, check out @badsethcohen 's work on generative BC! A cool look into how to realize stability guarantees for imitation learning, in theory and practice Poster: Thu 14 Dec 10:45 a.m. CST — 12:45 p.m. CST, #1427 Paper: arxiv.org/abs/2307.14619
2
19
2,238
Real2Sim is great, exciting to see this 👏
Scalable, reproducible, and reliable robotic evaluation remains an open challenge, especially in the age of generalist robot foundation models. Can *simulation* effectively predict *real-world* robot policy performance & behavior? Presenting SIMPLER!👇 simpler-env.github.io/
4
19
4,376
These videos are incredible, congrats to @hausman_k, @TianheYu, and the team! Really exciting to see generative models provide big improvements in real micro kitchen environments. Looking forward to what's next!
Our most recent work showing bitter lesson 2.0 in action: using diffusion models to augment robot data. Introducing ROSIE: diffusion-rosie.github.io/ Our robots can imagine new environments, objects and backgrounds! 🧵
1
18
3,091
Some new insights on the problem of offline pretraining with online finetuning. Seems to work pretty well! Code is out too. @ashvinair @svlevine @mihdalal bair.berkeley.edu/blog/2020/… awacrl.github.io/ arxiv.org/pdf/2006.09359.pdf
2
18
I'm unfortunately not at @NeurIPSConf #NeurIPS2023 this year, but luckily my excellent students and collaborators who actually did the work are! Please do visit their posters and talks and ask them very hard questions 😀 A 🧵 (1/9)
1
19
3,195
The key thing I took away from here is - simulation is inherently wrong, but can still be very useful! Value functions from simulation can make the job of real-world RL *much* easier, making it far more practical as a solution. This was work conceptualized and led by @patrickhyin and Tyler Westenbroek, along with a great set of collaborators - Simran Bagaria, Kevin Huang, @chinganc_rl, @Andrey__Kolobov between UW and MSR Website: weirdlabuw.github.io/sgft/ Paper: arxiv.org/abs/2502.02705 We will be presenting this paper at #ICLR2025 this April 😃
5
20
1,399
Excited about work led by @xkelym @Zyc199539Chu @ab_deshpande! I was skeptical that we could solve these problems with RL, but they totally proved me wrong! 😄 Super interesting both from the perspective of system design and algorithmic choices! See @xkelym's🧵 with details
Let’s do 🍒 Cherry Picking with Reinforcement Learning goodcherrybot.github.io/ - 🥢 Dynamic fine manipulation with chopsticks - 🤖 Only 30 minutes of real world interactions - ⛔️ Too lazy for parameter tuning = off-the-shelf RL algo + default params + 3 seeds in real world
1
18
1,978
Replying to @ChongZzZhang
I think it's valid to say MuJoCo benchmarks shouldn't be trusted, I think most practitioners feel that way anyways. But saying something shouldn't be trusted without really suggesting a viable alternative leaves the community a tough spot because we have no meaningful metric to measure progress. And then we're all just vibe researching :)
1
19
1,434
A little video I made explaining our ICRA 2021 work on reset-free reinforcement learning for dexterous manipulation. Paper at arxiv.org/abs/2104.11203
Want to know how robots can learn to give you a hand with your NeurIPS submissions? So do I. In the meantime, you can check out @abhishekunique7's ICRA 2021 talk, how to train robotic hands to do lots of other stuff🙂from scratch, in the real world piped.video/watch?v=UG1wJPAC…
5
14
Excited to share our work on leveraging text2image generative models for data augmentation for robot learning! We leverage these models to generate a huge diversity of realistic scenes from very minimal on-robot data, which enables pretty cool generalization! Thread by @ZoeyC17
Need more data to train your robot in the real-world? Introducing GenAug, a semantic data augmentation framework to enable broad robot generalization by leveraging pre-trained text-to-image generative models. 🧵(1/N) Paper arxiv.org/pdf/2302.06671.pdf Website genaug.github.io/
1
2
18
5,776
Don’t miss a chance to work with @aviral_kumar2 :) he’s an incredible advisor already and I’m looking forward to his upcoming lab!
Thrilled to share that I will be joining Carnegie Mellon @SCSatCMU as an Assistant Professor of CS and ML @CSDatCMU @mldcmu in Fall 2024. Extremely thankful to my mentors & collaborators, especially @svlevine! Looking forward to working with amazing students & colleagues at CMU!
1
18
3,852
Our work on continual reinforcement learning that gets more and more efficient as it encounters more tasks is at CoRL 2023 this year. Come check out our poster on Nov 9, from 2:45-3:30 pm!
Check out work led by Zheyuan Hu and Aaron Rovinsky on how robot learning can get *more* efficient as it encounters more tasks! This was a pretty awesome exercise in system building and we learned a lot about making continual learning systems for real world dexterous robots
1
17
3,858
Check out work led by Zheyuan Hu and Aaron Rovinsky on how robot learning can get *more* efficient as it encounters more tasks! This was a pretty awesome exercise in system building and we learned a lot about making continual learning systems for real world dexterous robots
Can we get dexterous hands to learn efficiently from images entirely in the real world? With a combo of learned rewards, sample-efficient RL, and initialization from data of other tasks, robots can learn skills autonomously in a matter of hours: sites.google.com/view/reboot… A 🧵👇
2
17
7,601
People perform things at varying levels of suboptimality, typically because of constrained computational budgets. Most modeling frameworks don't account for this. We model agents with varying levels of rationality using latent inference budgets! See @apjacob03's 🧵for more!
⭐️ New Paper ⭐️ We introduce latent inference budget models (L-IBMs), a family of approaches for modeling how agents plan subject to computational constraints. Paper: arxiv.org/pdf/2312.04030.pdf 🧵👇(1/11)
1
2
15
3,430
Exciting news, cannot think of anyone more deserving! Congratulations :)
Super excited to announce that I've started as an Adjunct Professor @Stanford! I'll continue to work @GoogleAI but I'll also be spending some time at Stanford, where I'll be co-advising a few students and continue co-teaching CS 330 (cs330.stanford.edu) 🧑‍🏫
1
16
In the meanwhile, I will be spending a year at @MIT_CSAIL as a post-doc working with Russ Tedrake and @pulkitology. Looking forward to a fun collaboration!
16
Yay reset free RL :) love this task setup!
Don't Start From Scratch: good advice for ML with big models! Also good advice for robots with reset-free training: sites.google.com/view/ariel-… ARIEL allows robots to learn a new task with offline RL pretraining + online RL w/ forward and backward policy to automate resets. Thread:
15
Pre-trained visual representations are effective features, but @ZCCZHANG shows that they can also be used for identification of subgoals directly from long-horizon video behavior. Allows for improvements in both imitation and RL in sim and on robots. 🧵by @ZCCZHANG for more!
How can pre-trained visual representations help solve long-horizon manipulation? 🤔 Introducing Universal Visual Decomposer (UVD), an off-the-shelf method for identifying subgoals from videos - NO extra data, training, cost, or task knowledge required. (🧵1/n)
1
4
15
3,991
So what’s the key idea: while policies may not transfer directly from sim2real due to dynamics mismatch, value functions in simulation capture the approximate geometry of the problem that *does* transfer approximately from sim2real. The ordering of states defined by a sim-learned value function (V_sim) captures successful behaviors that are invariant between sim and real, even if the low-level dynamics differ somewhat. SGFT uses this insight to accelerate real-world finetuning by *using V_sim to perform potential-based reward shaping for real-world RL*. We show both theoretically and empirically that doing so effectively shortens the learning horizon, making learning far more efficient! (3/6)
1
2
14
878
Incredible projects from @sanjibac and the whole team! Massive respect for pulling this off :)
Cooking in kitchens is fun. BUT doing it collaboratively with two robots is even more satisfying! We introduce MOSAIC, a modular framework that coordinates multiple robots to closely collaborate and cook with humans via natural language interaction and a repository of skills.
1
14
2,355
Read the paper to see what makes it tick-lots of little details in there. Fun work led by @xkelym, @yunchuzh, @ab_deshpande, Quinn Pfeifer, with @siddhss5! Paper: arxiv.org/pdf/2405.19307 (robotics), arxiv.org/abs/2310.12972 (algorithmic) Website: personalrobotics.github.io/C… (9/9)
1
1
14
2,314
Replying to @natolambert
Although in my experience things that are high visibility on Twitter have a somewhat loose correlation to high quality research :) and so yes you get signal, but it is often misleading. Just my 2 cents
1
14
1,218