Abhishek Gupta · Jun 9, 2026 · 5:29 PM UTC

Abhishek Gupta

Pinned Tweet

Abhishek Gupta

@abhishekunique7

Jun 9

Here’s a pretty weird and surprising result - retrieval-augmented generation works unreasonably well for robot learning – but only when parameterized using difference vectors! We introduce Difference-Aware Retrieval Policies for Imitation Learning (DARP), a simple, semi-parametric RAG architecture for imitation learning that achieves gains of up to 200% over standard behavior cloning. No additional assumptions beyond BC, just a little architecture switch! The theory backing it up is pretty cool too and it works on real robots! :) Play with our website to understand better: weirdlabuw.github.io/darp-si… 🧵(1/7)

165

20,449

Abhishek Gupta · Jul 23, 2021 · 9:48 PM UTC

Abhishek Gupta

@abhishekunique7

23 Jul 2021

Thrilled to share that I will be starting as an assistant professor at the University of Washington @uwcse in Fall 2022! Grateful for wonderful mentors and collaborators at @berkeley_ai, especially @svlevine and @pabbeel. Looking forward to joining the wonderful folks @uwcse!

453

Abhishek Gupta · Oct 24, 2025 · 7:17 PM UTC

Abhishek Gupta

@abhishekunique7

24 Oct 2025

Punchline: World models == VQA (about the future)! Planning with world models can be powerful for robotics/control. But most world models are video generators trained to predict everything, including irrelevant pixels and distractions. We ask - what if a world model only predicted the semantic information necessary for decision-making? Introducing Semantic World Models (SWM). Given an observation and an action sequence, SWMs cast modeling as answering textual questions about the future outcome resulting from the actions. Recasting world modeling as a VQA problem lets us directly leverage the pretrained knowledge and machinery of VLMs for generalizable modeling. We had a lot of fun thinking about how this work helps connect these two seemingly very different fields of study - VLMs and world models! 🧵(1/6) Paper: arxiv.org/abs/2510.19818 Fun demo: weirdlabuw.github.io/swm

411

61,167

Abhishek Gupta · Jan 30, 2024 · 7:52 PM UTC

Abhishek Gupta

@abhishekunique7

30 Jan 2024

Anyone who knows me knows I love real world RL :) But anyone who works on real-world RL knows it’s quite a pain to get going. We tried to make everyone’s life easier by writing a software suite to get you going with real world RL out of the box, without all the pain! A 🧵(1/5)

245

33,323

Abhishek Gupta · Jun 10, 2024 · 6:34 PM UTC

Abhishek Gupta

@abhishekunique7

10 Jun 2024

So I hear that behavior cloning is all the rage now. What if we could do better, but with the same data? :) In CCIL, we show that imitation via BC is improved by synthesizing corrective labels to account for compounding error, without interactive oracles. Lets you do 👇! 🧵(1/9)

256

53,912

Abhishek Gupta · Nov 3, 2025 · 9:51 PM UTC

Abhishek Gupta

@abhishekunique7

3 Nov 2025

Imitation learning is great, but needs us to have (near) optimal data. We throw away most other data (failures, evaluation data, suboptimal data, undirected play data), even though this data can be really useful and way cheaper! In our new work - RISE, we show a simple way to *use all of this non-optimal data to robustify imitation learning* with minimal requirements beyond BC. Key idea: use non-expert data to learn how to *recover* back to expert data with a minimal frills offline RL that works under sparse data coverage. Allows usage of *all* available data, not just expert data - never throw your data away! Paper: arxiv.org/abs/2510.19495 Website: uwrobotlearning.github.io/RI… A 🧵(1/10)

234

20,578

Abhishek Gupta · Dec 7, 2022 · 11:09 PM UTC

Abhishek Gupta

@abhishekunique7

7 Dec 2022

I am recruiting PhD students to join us in the Washington Embodied Intelligence and Robotic Development Lab (WEIRD) weirdlab.cs.washington.edu/ at @uwcse. We work on robot learning, especially RL in the real world! Check out tinyurl.com/guptauw for details (1/3)

212

Abhishek Gupta · Jun 25, 2025 · 5:47 PM UTC

Abhishek Gupta

@abhishekunique7

25 Jun 2025

So you’ve trained your favorite diffusion/flow based policy, but it’s just not good enough 0-shot. Worry not, in our new work DSRL - we show how to *steer* pre-trained diffusion policies with off-policy RL, improving behavior efficiently enough for direct training in the real world! DSRL retains nice exploration from the base policy, but allows for quick improvement beyond this base policy with RL. The method is frustratingly simple, and super easy to throw on top of your favorite pretrained policy (VLA/diffusion policy, etc). diffusion-steering.github.io Let’s think about how it works, 🧵 (1/10)

194

19,072

Abhishek Gupta · Feb 13, 2025 · 9:46 PM UTC

Abhishek Gupta

@abhishekunique7

13 Feb 2025

So we did a bunch of projects with real world reinforcement learning - but it was often too inefficient to be practical to train tabula rasa. This suggests we need better priors, but acquiring these from on-robot data can often be expensive as well. In our recent work, we show that despite being fundamentally inaccurate, simulation can guide provide a cheap way to guide real-world RL finetuning to be super efficient! We propose Simulation-Guided Fine-Tuning (SGFT) - a simple paradigm for sim2real finetuning that uses simulation to provide reward shaping that accelerates real world RL finetuning *beyond* just providing an initialization. TLDR: Use value functions from sim to shape rewards for real-world RL, see large sample efficiency improvements 🧵(1/6)

189

13,631

Abhishek Gupta · Nov 1, 2023 · 2:01 AM UTC

Abhishek Gupta

@abhishekunique7

1 Nov 2023

Imagine this: you drop your robot in an environment, connect it to the internet and come back 10 hours later, and it has learned to solve tasks in the real world, autonomously, with no effort from you! We enable this in our work -Guided Exploration for Autonomous RL (GEAR)🧵(1/5)

178

24,732

Abhishek Gupta · Jun 27, 2023 · 7:01 PM UTC

Abhishek Gupta

@abhishekunique7

27 Jun 2023

Excited to share our work on uncertainty estimation using diffusion/score matching! The idea is simple: offline optimization (eg model-based RL, imitation) require us to estimate uncertainty. Estimating uncertainty is hard - score matching provides a scalable solution. A🧵(1/5)

167

32,169

Abhishek Gupta · May 30, 2023 · 4:23 AM UTC

Abhishek Gupta

@abhishekunique7

30 May 2023

Excited to share our work on self-supervised RL by modeling random features. The key premise behind RaMP is to learn about environment dynamics, without learning a dynamics model! This allows for transfer, without accruing compounding error. arxiv.org/abs/2305.17250 A🧵 (1/6)

Self-Supervised Reinforcement Learning that Transfers using Random Features

Model-free reinforcement learning algorithms have exhibited great potential in solving single-task sequential decision-making problems with high-dimensional observations and long horizons, but are...

arxiv.org

155

28,997

Abhishek Gupta · Dec 1, 2023 · 6:48 PM UTC

Abhishek Gupta

@abhishekunique7

1 Dec 2023

Intrigued by decision transformers, we investigated why and when we should use return-conditioned RL as an alternative to dynamic prog (DP). Our findings are neat! With data coverage, RCSL can outperform DP, but fail to "stitch" trajectories. We analyze and propose a fix. 🧵(1/N)

160

42,237

Abhishek Gupta · Oct 13, 2025 · 3:44 PM UTC

Abhishek Gupta

@abhishekunique7

13 Oct 2025

Combinatorial complexity is often the bane of imitation learning - including VLA models! @Jesse_Y_Zhang and @memmelma proposed a way around this, using VLMs to perform problem reduction for imitation. The insight is simple - 1) High-level VLM takes a complex scene/task and reducing it a minimal representation (via masking and path prediction) that is needed to act in the world. 2) A low-level policy then takes this reduced representation and generates actions to be executed in the world. The high-level policy absorbs all the combinatorial complexity of the problem, leaving the low-level to focus on dexterity and geometric reasoning. Super simple, works really well across policy classes and problem settings! - 41.4× sim2real improvement (3DDA) and 2–3.5× boosts for π₀ and ACT in the real world. Paper: arxiv.org/abs/2509.18282 Website: peek-robot.github.io Demo: peek.a.pinggy.link Fun collaboration led by @Jesse_Y_Zhang @memmelma with lots of collaborators! Let us know what you think 😀

145

22,375

Abhishek Gupta · May 24, 2024 · 7:07 PM UTC

Abhishek Gupta

@abhishekunique7

24 May 2024

Excited about @ZoeyC17's new work on real2sim for robotics! We present URDFormer, a technique to learn models that go from RGB images to full articulated scene URDFs in sim by "inverting" pre-trained generative models. These can be used to train robots for the real-world! 🧵(1/8)

143

23,320

Abhishek Gupta · Apr 24, 2024 · 6:25 PM UTC

Abhishek Gupta

@abhishekunique7

24 Apr 2024

So you want to do robotics tasks requiring dynamics information in the real world, but you don’t want the pain of real-world RL? In our work to be presented as an oral at ICLR 2024, @memmelma showed how we can do this via a real-to-sim-to-real policy learning approach. A 🧵 (1/7)

136

19,984

Abhishek Gupta · Apr 28, 2025 · 8:33 PM UTC

Abhishek Gupta

@abhishekunique7

28 Apr 2025

Constructing interactive simulated worlds has been a challenging problem, requiring considerable manual effort for asset creation and articulation, and composing assets to form full scenes. In our new work - DRAWER, we made the process of creating scenes in simulation as simple as taking a video of the scene and out comes a high-quality, fully interactive environment in simulation. No human simulation designer involved! drawer-art.github.io/ A 🧵(1/7)

136

12,068

Abhishek Gupta · Apr 11, 2025 · 4:05 PM UTC

Abhishek Gupta

@abhishekunique7

11 Apr 2025

World modeling and imitation learning have largely been considered two disparate worlds. In our recent work, Unified World Models, just accepted to #RSS2025, @chuning_zhu provides a dead-simple unifying solution: just train a joint diffusion model over actions and future states, but with *decoupled* diffusion time steps across these modalities. Manipulating these decoupled time steps then allows for marginalization or conditioning on actions or states; a single model can serve as a policy, forward dynamics model, video prediction model, or inverse dynamics model by simply setting diffusion timesteps carefully. The resulting model can leverage video datasets along with robot training data much more effectively, and shows improved robustness, generalization, and flexibility. This is exciting because it is frustratingly simple, scalable, and shows strong improvement on real-world robotics problems. Please refer to @chuning_zhu 's excellent thread for more details! More details/code can be found on our website and in the paper - weirdlabuw.github.io/uwm/

Chuning Zhu @chuning_zhu

10 Apr 2025

Scaling imitation learning has been bottlenecked by the need for high-quality robot data, which are expensive to collect. But are we utilizing existing data to the fullest extent? A thread (1/11)

134

11,429

Abhishek Gupta · Mar 30, 2022 · 4:17 AM UTC

Abhishek Gupta

@abhishekunique7

30 Mar 2022

Excited to share our work on reset-free fine-tuning bootstrapped by offline data. We show results in a real-world kitchen, with a robot practicing autonomously to improve for over a day with minimal intervention! Paper: arxiv.org/abs/2203.15755 Website: dbap-rl.github.io

121

Abhishek Gupta · Dec 7, 2024 · 1:06 AM UTC

Abhishek Gupta

@abhishekunique7

7 Dec 2024

Haven't been to a conference in a while, really excited to be at #NeurIPS2024! I'll be helping present 4 of our group's recent papers: 1. Overcoming the Sim-to-Real Gap: Leveraging Simulation to Learn to Explore for Real-World RL arxiv.org/abs/2410.20254 2. Distributional Successor Features Enable Zero-Shot Policy Optimization arxiv.org/abs/2403.06328 3. Learning to Cooperate with Humans using Generative Agents arxiv.org/abs/2411.13934 4. Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning arxiv.org/abs/2408.10075 Find more details on each paper and where to find us in this thread (1/6)

125

10,801

Abhishek Gupta · Jun 19, 2025 · 8:55 PM UTC

Abhishek Gupta

@abhishekunique7

19 Jun 2025

Learned visuomotor policies are notoriously fragile, they break with changes in conditions like lighting, clutter, or object variations amongst other things. In @yunchuzh's latest work, we asked whether we could get these policies to be robust and generalizable with a clever choice of visual representation! The argument we made was - we want a choice of visual representation that specifically adapts to be sufficient, yet minimal for the task at hand. We thought about it from the perspective of flexible, key-point based representations. The key question becomes - how do we choose a sufficient, task-specific, yet minimal set of keypoints as a representation for policy learning. Yunchu proposes a neat way of automatically selecting task-relevant keypoints using a standard supervised learning objective, and using this for robust policy learning. This is largely under the same assumptions as behavior cloning, but with huge gains on robustness. Let’s understand how, 🧵 (1/8)

122

11,543

Abhishek Gupta · Feb 24, 2025 · 10:59 PM UTC

Abhishek Gupta

@abhishekunique7

24 Feb 2025

Over the last few months, we’ve been thinking about how to learn from “off-domain” data - data from non-robot sources like video or simulation. These data sources are not quite good enough to learn policies (even monolithic VLA models) directly, but they still contain lots of information that can be useful for generalizable robot control. How can we develop robot learning models that are able to make use of this type of data for generalizable control? In new work, that we call HAMSTER, we show that VLMs can be useful for enabling robotic learning from off-domain data, but specifically when used through hierarchical VLA architectures. We show that this class of models can learn generalizable robot policies for the real world from large-scale, off-domain data. A 🧵 (1/10)

117

12,040

Abhishek Gupta · Apr 24, 2025 · 12:34 AM UTC

Abhishek Gupta

@abhishekunique7

24 Apr 2025

Very excited to be at #ICLR2025 in Singapore helping present some of the work done by our group! We'll be presenting 4 papers: 1. Rapidly Adapting Policies to the Real-World via Simulation-Guided Fine-Tuning weirdlabuw.github.io/sgft/ 2. Robot Sub-Trajectory Retrieval for Augmented Policy Learning weirdlabuw.github.io/strap/ 3. HAMSTER: Hierarchical Action Models For Open-World Robot Manipulation hamster-robot.github.io/ 4. SRSA: Skill Retrieval and Adaptation for Robotic Assembly Tasks arxiv.org/abs/2503.04538 Find more details on each paper and where to find us in this thread (1/6)

118

8,397

Abhishek Gupta · Jun 17, 2020 · 3:07 PM UTC

Abhishek Gupta

@abhishekunique7

17 Jun 2020

Reinforcement learning can be significantly accelerated by using offline datasets with a simple, but carefully designed actor critic algorithm! Solves dexterous manipulation tasks in <1 hour arxiv.org/abs/2006.09359 awacrl.github.io/ @ashvinair M Dalal @svlevine

AWAC: Accelerating Online Reinforcement Learning with Offline Datasets

Reinforcement learning (RL) provides an appealing formalism for learning control policies from experience. However, the classic active formulation of RL necessitates a lengthy active exploration...

arxiv.org

114

Abhishek Gupta · Oct 31, 2025 · 5:59 PM UTC

Abhishek Gupta

@abhishekunique7

31 Oct 2025

Replying to @shaneguML

Agreed in principle! But I did find the original DAgger paper a little hard to parse on first read. Some resources from our colleagues that I thought were a bit easier to approach: rail.eecs.berkeley.edu/deepr… ri.cmu.edu/pub_files/2015/3/… wensun.github.io/CS4789_data… Hope these are helpful :)

120

8,187

Abhishek Gupta · Dec 20, 2024 · 7:39 PM UTC

Abhishek Gupta

@abhishekunique7

20 Dec 2024

In my experience, robot 'generalists' are often jacks of all trades but masters of none. In training across multiple tasks and environments, robot policies fail to generalize robustly and effectively to each particular test setting. What if at test time, we non-parametrically *retrieved* “relevant” data from the training set and used it to significantly improve the performance of few-shot imitation learning to be robust to various test time scenes. Notably, we are *not* collecting lots of new data, just training more on sub-components of the same training data! Now, we’re certainly not the first to suggest retrieval, but in our new work - STRAP, we show how retrieving relevant *sub-trajectories* from offline datasets can significantly increase data reuse across tasks, when paired with an appropriate metric space. A 🧵 (1/7)

115

12,062

Abhishek Gupta · Jul 21, 2023 · 9:54 PM UTC

Abhishek Gupta

@abhishekunique7

21 Jul 2023

“How can you enable your parents to train your robot?” We propose a system for enabling robot learning by hooking up a robot to the web, using noisy, occasional feedback from non-experts to guide exploration. Enables robot learning in sim and real w/out reward engineering!🧵(1/8)

109

27,523

Abhishek Gupta · Mar 8, 2024 · 12:24 AM UTC

Abhishek Gupta

@abhishekunique7

8 Mar 2024

Robot learning in the real world can be expensive and unsafe in human-centric environments. Solution: Construct simulation on the fly and train in it! Excited to share RialTo, led by @marceltornev on learning resilient policies via real-to-sim-to-real policy learning! A 🧵 (1/12)

113

38,065

Abhishek Gupta · Nov 15, 2021 · 10:41 PM UTC

Abhishek Gupta

@abhishekunique7

15 Nov 2021

Excited to be working with all these amazing people very soon! Exciting times ahead😀 On that note I'm also hoping to recruit students this cycle to start in Fall 22. If you like ML and robotics and want to get things to work in the real world, definitely apply to UW!, 1/3

Allen School @uwcse

15 Nov 2021

Not even a pandemic could slow down #UWAllen faculty hiring. Over the past 2 cycles, we welcomed 15 (yes—15!) outstanding researchers and educators who have joined/will soon join us at @uwengineering @UW Seattle. Meet these new members of our community: news.cs.washington.edu/2021/…

ALT Collage of 15 people's portraits in a grid, three rows of five photos across

110

Abhishek Gupta · May 2, 2024 · 12:39 AM UTC

Abhishek Gupta

@abhishekunique7

2 May 2024

Who doesn’t love good methods for reward inference. What if I told you that you could extract dense rewards from video, by ranking frames temporally using the BT model from RLHF (aka just doing temporal classification with cross-entropy). Let's see how, in rank2reward - a🧵(1/10)

13,793

Abhishek Gupta · Oct 7, 2022 · 8:59 PM UTC

Abhishek Gupta

@abhishekunique7

7 Oct 2022

New work from my time at MIT! We introduce Distributionally Adaptive Meta-Reinforcement Learning (DiAMetR) - arxiv.org/abs/2210.03104. Meta-RL struggles when test-tasks are OOD, which arguably is most of the time! We propose an algorithm resilient to distribution shift. 🧵 (1/N)

100

Abhishek Gupta · Sep 6, 2023 · 3:28 AM UTC

Abhishek Gupta

@abhishekunique7

6 Sep 2023

Want to get model-based RL to work in diverse, dynamic scenes? Check out @chuning_zhu's latest work (RePo) on model-based reinforcement learning without reconstruction, where we show how to learn world models that scale to dynamic, multi-task environments. A 🧵(1/6)

20,417

Abhishek Gupta · Dec 4, 2024 · 7:17 PM UTC

Abhishek Gupta

@abhishekunique7

4 Dec 2024

So I heard we need more data for robot learning :) Purely real world teleop is expensive and slow, making large scale data collection challenging. I’ve been excited about getting more data into robot learning, going beyond just real-world teleop data. To this end, we’ve been scaling up data generation with RL in realistic simulations generated on the fly from crowdsourced videos. Enables realistic data collection, much more cheaply than purely real world teleop. Importantly, data collection becomes even*cheaper* with more environments, allowing training with over 100x more data. Transfers to real robots for generalizable manipulation. A 🧵 (1/N)

13,345

Abhishek Gupta · Mar 14, 2023 · 8:38 PM UTC

Abhishek Gupta

@abhishekunique7

14 Mar 2023

I'm truly so tired of reading reviews about "novelty". What does that even mean... #ICML2023

19,257

Abhishek Gupta · Oct 10, 2023 · 10:16 PM UTC

Abhishek Gupta

@abhishekunique7

10 Oct 2023

Most offline RL methods try to constrain policies from deviating far from the offline data distribution. In cases where the data distribution is imbalanced or suboptimal, this makes it hard to actually learn good behavior! In new work, @ZhangWeiHong9 proposes a solution 🧵 (1/5)

13,373

Abhishek Gupta · Dec 5, 2024 · 7:26 PM UTC

Abhishek Gupta

@abhishekunique7

5 Dec 2024

Over the last year, we’ve been investigating how simulation can be a useful tool for real-world reinforcement learning on a robot. While simulation captures inherently incorrect dynamics, it can still be useful for real-world learning! In our #NeurIPS2024 work, Andrew W. theoretically showed how naive sim2real transfer can be inefficient, but if you *learn how to explore* in simulation, this can be provably efficient in transferring to the real world! We then pair this theory with robot experiments to validate this for real-world settings. 🧵 (1/6)

5,733

Abhishek Gupta · Apr 23, 2021 · 3:51 AM UTC

Abhishek Gupta

@abhishekunique7

23 Apr 2021

We've been working on getting robots to learn in the real world with many hours of autonomous reset free RL! Key idea is to leverage multi-task RL to enable scalable learning with no human intervention. Allows learning of cool dexterous manipulation tasks in the real world!

Sergey Levine

@svlevine

23 Apr 2021

After over a year of development, we're finally releasing our work on real-world dexterous manipulation: MTRF. MTRF learns complex dexterous manipulation skills *directly in the real world* via continuous and fully autonomous trial-and-error learning. Thread below ->

Abhishek Gupta · May 11, 2020 · 8:04 PM UTC

Abhishek Gupta

@abhishekunique7

11 May 2020

Sharing two recent talks from my advisor @svlevine covering much of my recent work, as well as work from many of my colleagues. I really enjoyed watching these, they give a really cool perspective on frontiers of RL piped.video/watch?v=4vK6X9Jr… piped.video/watch?v=sXQlQg7H…

Unsupervised Reinforcement Learning

Lecture on unsupervised reinforcement learning by Sergey Levine. Or...

youtube.com

Abhishek Gupta · Aug 5, 2021 · 1:10 AM UTC

Abhishek Gupta

@abhishekunique7

5 Aug 2021

New work on learning how to grasp and navigate with mobile robots using RL. What I find very exciting is the ability of the system to be trained for > 60 hrs with minimal intervention, learning in diverse scenarios. Paper: arxiv.org/pdf/2107.13545.pdf Website: sites.google.com/view/relmm

Abhishek Gupta · Apr 23, 2021 · 2:55 PM UTC

Abhishek Gupta

@abhishekunique7

23 Apr 2021

I did a podcast thing! Here's a recent interview on Applying RL to Real-World Robotics with @samcharrington for the @twimlai podcast. Check it out! twimlai.com/go/466 via @twimlai

Applying RL to Real-World Robotics | TWIML - The Voice of Machine Learning & AI

Today we're joined by Abhishek Gupta, a Ph.D. Student at UC Berkeley. Abhishek, a member of the BAIR Lab, joined us to talk about his recent robotics and reinforcement...

twimlai.com

Abhishek Gupta · Jun 21, 2025 · 6:18 PM UTC

Abhishek Gupta

@abhishekunique7

21 Jun 2025

I'm sadly unable to be at #RSS2025 this year, but my students @prodarhan, @chuning_zhu and @marceltornev will be! Find them presenting some exciting work today, 6/21: 1) @chuning_zhu will present Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets Spotlight talk: 4:30-5:30 pm (Bovard auditorium) Poster: 6:30-8:00pm, poster #50 (Associates park) Paper: arxiv.org/abs/2504.02792 Website: weirdlabuw.github.io/uwm/ 2) @prodarhan and @marceltornev will present Robot Learning with Super-Linear Scaling Spotlight talk: 5:30-6:30 pm (Bovard auditorium) Poster: 6:30-8:00pm, poster #58 (Associates park) Paper: arxiv.org/abs/2412.01770 Website: casher-robot-learning.github… Hope y'all can make it!

3,097

Abhishek Gupta · Feb 16, 2023 · 1:59 AM UTC

Abhishek Gupta

@abhishekunique7

16 Feb 2023

Excited to share the first of several papers toward leveraging generative models as data sources for RL! RL sees minimal data, gen models see lots of data. We show that gen models (here LLMs) can provide background info for RL common sense (here exploration)! Thread by @d_yuqing!

Yuqing Du @d_yuqing

15 Feb 2023

How can we encourage RL agents to explore human-meaningful behaviors *without* a human in the loop? @OliviaGWatkins2 and I are excited to share “Guiding Pretraining in Reinforcement Learning with LLMs”! 📜arxiv.org/abs/2302.06692 🧵1/

11,450

Abhishek Gupta · Nov 12, 2024 · 7:04 PM UTC

Abhishek Gupta

@abhishekunique7

12 Nov 2024

How can we enable transferable decision-making for *any* reward zero-shot? MBRL is task-agnostic but suffers from compounding error, while MFRL is task-specific. We propose a new class of world models that transfers across tasks zero-shot and avoids compounding error! A 🧵 (1/9)

3,604

Abhishek Gupta · May 1, 2020 · 7:34 PM UTC

Abhishek Gupta

@abhishekunique7

1 May 2020

Fun blog post on our work on unsupervised meta-reinforcement learning, for doing meta-reinforcement learning without explicit human provided task distributions! bair.berkeley.edu/blog/2020/… blog.ml.cmu.edu/2020/05/01/u… And associated paper arxiv.org/abs/1806.04640

Abhishek Gupta · Oct 19, 2022 · 11:38 PM UTC

Abhishek Gupta

@abhishekunique7

19 Oct 2022

Excited about our work on understanding the benefits of reward shaping! Reward shaping is critical in a large portion of practical RL problems and this paper tries to understand when and why it helps. Terrific collaboration with @aldopacchiano Simon Zhai @svlevine @ShamKakade6!

Sergey Levine

@svlevine

19 Oct 2022

In theory RL is intractable w/o exploration bonuses. In practice, we rarely use them. What's up with that? Critical to practical RL is reward shaping, but there is little theory about it. Our new paper analyzes sample complexity w/ shaped rewards: arxiv.org/abs/2210.09579 Thread:

Abhishek Gupta · May 20, 2025 · 11:35 AM UTC

Abhishek Gupta

@abhishekunique7

20 May 2025

Very very exciting to have @Jesse_Y_Zhang join us at UW soon! He's done some incredible work - I'd recommend reading rewind-reward.github.io/! Congratulations on a fantastic Ph.D. @Jesse_Y_Zhang 🎉

Jesse Zhang

@Jesse_Y_Zhang

20 May 2025

Yes, I’ll be working with @fox_dieter17849 and @abhishekunique7 on enabling real world autonomous learning; super excited!!

4,664

Abhishek Gupta · Aug 26, 2024 · 5:23 PM UTC

Abhishek Gupta

@abhishekunique7

26 Aug 2024

While investigating RLHF methods last year, @sriyash__ and @yanming_wan noted that human annotators in a population often display diverse and conflicting preferences. While typical RLHF methods struggle with this diversity, we developed new techniques for plurastic RLHF! 🧵(1/7)

5,066

Abhishek Gupta · Jun 23, 2022 · 6:36 PM UTC

Abhishek Gupta

@abhishekunique7

23 Jun 2022

Tried to share some tips on faculty applications, do take a listen if you're thinking of applying. Hope it can be helpful! Thanks for having me @talkingrobotics!

Talking Robotics @talkingrobotics

22 Jun 2022

"Start writing your research statement in the summer." Abhishek Gupta provided the BEST ADVICE if you are preparing for the #academic #job #market. This talk has TONS OF TIPS from his own experience in the job market last year. Listen now (links below). @uw @uw_robotics @uwcse

Abhishek Gupta · Jun 19, 2025 · 1:59 AM UTC

Abhishek Gupta

@abhishekunique7

19 Jun 2025

I aspire to give talks like this; piped.video/TN1M6vg4CsQ?feature… Yay @RussTedrake and TRI for helping inject some rigor into an all too confusing field! :)

3,528

Abhishek Gupta · Apr 16, 2025 · 8:34 PM UTC

Abhishek Gupta

@abhishekunique7

16 Apr 2025

Many in the robotics community have had a hunch that more is going on with diffusion policies than just multimodality. @max_simchowitz and colleagues with another extremely insightful paper on why :) Really enjoyable read!

Max Simchowitz

@max_simchowitz

16 Apr 2025

There’s a lot of awesome research about LLM reasoning right now. But how is learning in the physical world 🤖different than in language 📚? In a new paper, show that imitation learning in continuous spaces can be exponentially harder than for discrete state spaces, even when the underlying dynamics are seemingly benign and insensitive to perturbations. (1/n)🧵

3,311

Abhishek Gupta · May 1, 2023 · 5:03 PM UTC

Abhishek Gupta

@abhishekunique7

1 May 2023

I’m very very excited about led by @avivnet at #ICLR2023 on learning deep control policies that can extrapolate using a transductive approach. We show how we can get neural network policies to extrapolate without significant domain-specific assumptions. A 🧵 to explain how: (1/6)

8,168

Abhishek Gupta · Mar 19, 2024 · 7:18 PM UTC

Abhishek Gupta

@abhishekunique7

19 Mar 2024

Excited to share a new large-scale dataset for in-the-wild robotic learning! It was an honestly eye-opening experience for our whole group to be a part of this. Thanks to @SashaKhazatsky, @KarlPertsch and the rest of the team for putting together an amazing dataset! 🤖

Alexander Khazatsky @SashaKhazatsky

19 Mar 2024

After two years, it is my pleasure to introduce “DROID: A Large-Scale In-the-Wild Robot Manipulation Dataset” DROID is the most diverse robotic interaction dataset ever released, including 385 hours of data collected across 564 diverse scenes in real-world households and offices

2,509

Abhishek Gupta · Apr 26, 2020 · 11:19 PM UTC

Abhishek Gupta

@abhishekunique7

26 Apr 2020

Presenting "Ingredients of Real World Robotic RL" at ICLR 2020, 4/26 10pm-12am PST & 4/27 5am-7am PST. Blog: bair.berkeley.edu/blog/2020/… Paper: openreview.net/forum?id=rJe2… Descriptive Video: piped.video/watch… Poster livestream: iclr.cc/virtual/poster_rJe2s…

Abhishek Gupta · Apr 27, 2021 · 8:59 PM UTC

Abhishek Gupta

@abhishekunique7

27 Apr 2021

Some cool new updated results for offline pre-training followed by online fine-tuning with AWAC (advantage-weighted actor-critic). Offline RL does cool things on robots!

Sergey Levine

@svlevine

27 Apr 2021

How can we get robots to solve complex tasks with RL? Pretrain with *offline* RL using prior data, and then finetune with *online* RL! In our updated paper on AWAC (advantage-weighted actor-critic), we describe a new set of robot experiments: awacrl.github.io/ thread ->

Abhishek Gupta · Dec 26, 2021 · 3:30 AM UTC

Abhishek Gupta

@abhishekunique7

26 Dec 2021

Excited to share our work on benchmarking reset free RL. We hope this presents a way to go beyond the standard episodic assumptions made in robotic RL, making it practical for the real world!

Archit Sharma @archit_sharma97

22 Dec 2021

Embodied agents such as humans and robots live in a continual non-episodic world. Why do we continue to develop RL algorithms in episodic settings? This discrepancy also presents a practical challenge -- algorithms rely on extrinsic interventions (often humans) to learn ..

Abhishek Gupta · Oct 20, 2025 · 1:34 PM UTC

Abhishek Gupta

@abhishekunique7

20 Oct 2025

Big win for JHU! Go do cool stuff with @mangahomanga!

Homanga Bharadhwaj

@mangahomanga

20 Oct 2025

I'll be joining the faculty @JohnsHopkins late next year as a tenure-track assistant professor in @JHUCompSci Looking for PhD students to join me tackling fun problems in robot manipulation, learning from human data, understanding+predicting physical interactions, and beyond!

7,399

Abhishek Gupta · Mar 11, 2024 · 3:10 PM UTC

Abhishek Gupta

@abhishekunique7

11 Mar 2024

Exciting to see what @pabbeel, Anusha Nagabandi, @clavera_i, @CarlosFlorensa, Nikhil Mishra and other friends at covariant have been up to!

Covariant

@CovariantAI

11 Mar 2024

Today, we are introducing RFM-1, our Robotics Foundation Model giving robots human-like reasoning capabilities.

8,658

Abhishek Gupta · Feb 1, 2025 · 12:22 AM UTC

Abhishek Gupta

@abhishekunique7

1 Feb 2025

Replying to @harshit_sikchi

When one starts to feel the AGI😁

1,003

Abhishek Gupta · Oct 25, 2021 · 3:35 PM UTC

Abhishek Gupta

@abhishekunique7

25 Oct 2021

Excited to share a new blog post on our work on learning informative rewards for RL! By considering a more tractable class of outcome driven RL problems and a particular choice of uncertainty aware classifier, we learn more informative reward functions bair.berkeley.edu/blog/2021/…

Abhishek Gupta · Sep 26, 2025 · 9:32 PM UTC

Abhishek Gupta

@abhishekunique7

26 Sep 2025

I will be on an island in the Puget Sound this weekend, so sadly I will be missing #CoRL2025tv! But luckily the amazing students who did all the work anyways, will be 😄 Here's what the WEIRD lab at the University of Washington has going on at CoRL this time We'll be presenting 3 papers at the main conference: 1. Steering Your Diffusion Policy with Latent Space Reinforcement Learning diffusion-steering.github.io… (Oral, Nominated for Best Paper) 2. ATK: Automatic Task-driven Keypoint Selection for Robust Policy Learning yunchuzhang.github.io/ATK/ 3. RoboArena: Distributed Real-World Evaluation of Generalist Robot Policies robo-arena.github.io/ (Oral) I will be giving a talk at the RemembeRL workshop rememberl-corl25.github.io/ Plus we have several more at the workshops! Find more details on each paper below 🧵 (1/9)

3,455

Abhishek Gupta · Mar 24, 2023 · 6:01 PM UTC

Abhishek Gupta

@abhishekunique7

24 Mar 2023

I remember when I was first starting to work on dexterous hands, we were thinking about how to find and grasp objects in the dark with touch sensing. Here are our initial attempts at this problem taochenshh.github.io/project… arxiv.org/abs/2303.13482

2,885

Abhishek Gupta · Jun 19, 2025 · 8:57 PM UTC

Abhishek Gupta

@abhishekunique7

19 Jun 2025

Check out @yunchuzh's new work on automatically selecting keypoints as a representation for super robust policy learning!

Yunchu Zhang @yunchuzh

19 Jun 2025

How should a robot perceive the world? What kind of visual representation leads to robust visuomotor policy learning for robotics? Policies trained on raw images are often fragile—easily broken by lighting, clutter, or object variations—making it challenging to deploy policies learned via imitation learning in high variability test conditions. This same fragility is also reflected in the difficulty in transferring visuomotor policies from simulation to reality for robotic manipulation. Introducing ATK yunchuzhang.github.io/ATK/: an automatic task-driven method for selecting flexible keypoint-based visual representations that enables robust, generalizable robotic manipulation with minimal human effort.(1/8)👇

3,342

Abhishek Gupta · Jun 20, 2025 · 9:47 PM UTC

Abhishek Gupta

@abhishekunique7

20 Jun 2025

Check out some of our new work on distributed robot evaluation led by @KarlPertsch, @pranav_atreya and @tonyh_lee! Hopefully folks can contribute, and help us take a step towards systematic and standardized empiricism in robot learning! :) Also check out some of the fun sim eval tools contributed by @prodarhan!

Karl Pertsch

@KarlPertsch

20 Jun 2025

We’re releasing the RoboArena today!🤖🦾 Fair & scalable evaluation is a major bottleneck for research on generalist policies. We’re hoping that RoboArena can help! We provide data, model code & sim evals for debugging! Submit your policies today and join the leaderboard! :) 🧵

3,363

Abhishek Gupta · Aug 1, 2024 · 10:38 PM UTC

Abhishek Gupta

@abhishekunique7

1 Aug 2024

MIT covering some of our work! Led by @marceltornev along with @pulkitology @anthonysimeono_ @taochenshh and others. Give it a read :)

MIT CSAIL

@MIT_CSAIL

1 Aug 2024

To automate time-consuming tasks like household chores, robots must be precise & robust for very specific environments. With MIT’s “RialTo” method, users can scan their surroundings w/their phone so a robot can practice in a digital twin environment. This novel real-to-sim-to-real approach allows the machines to train much faster & safer than they would in the real world: bit.ly/4dqf0QL

3,064

Abhishek Gupta · Feb 15, 2022 · 1:18 AM UTC

Abhishek Gupta

@abhishekunique7

15 Feb 2022

Yay! Very well deserved @pabbeel!

IEEE Awards

@IEEEAwards

14 Feb 2022

Congratulations to @UCBerkeley’s Pieter Abbeel (@pabbeel) on receiving the 2022 @IEEEorg Kiyo Tomiyasu Award, sponsored by the late Dr. Kiyo Tomiyasu, @IEEE_GRSS, and @IEEEMTT, for contributions to #DeepLearning for #Robotics: bit.ly/IEEEAwards2022-TFAs #IEEEAwards2022 #IEEETFAs

Abhishek Gupta · Nov 12, 2024 · 6:27 PM UTC

Abhishek Gupta

@abhishekunique7

12 Nov 2024

Some of our most exciting work on new ways to do world modeling and zero-shot transfer! This work is important in reimagining what a generalizable world model looks like beyond autoregressive prediction. Check out @chuning_zhu's thread for details.

Chuning Zhu @chuning_zhu

12 Nov 2024

How can we train RL agents that transfer to any reward? In our @NeurIPSConf paper DiSPO, we propose to learn the distribution of successor features of a stationary dataset, which enables zero-shot transfer to arbitrary rewards without additional training! A thread 🧵(1/9)

2,154

Abhishek Gupta · Oct 12, 2023 · 7:39 PM UTC

Abhishek Gupta

@abhishekunique7

12 Oct 2023

Check out RoboHive - our new unified robot learning framework, tons of cool new environments, tasks, platforms. We hope this can be a helpful tool for folks in robot learning and beyond!

Vikash Kumar

@Vikashplus

12 Oct 2023

📢#𝗥𝗼𝗯𝗼𝗛𝗶𝘃𝗲 - a unified robot learning framework ✅Designed for genralizn first robot-learning era ✅Diverse (500 envs, 8 domain) ✅Single flag for Sim<>Real ✅TeleOper Support ✅Multi-(Skill x Task) realworld dataset ✅pip install robohive tinyurl.com/robohive 🧵👇

5,092

Abhishek Gupta · May 7, 2024 · 4:49 AM UTC

Abhishek Gupta

@abhishekunique7

7 May 2024

I'm unfortunately not at @iclr_conf, but our group and collaborators are presenting 4 papers this year! Come meet the awesome students presenting this work :) A 🧵 (1/5)

4,884

Abhishek Gupta · Jan 30, 2024 · 7:55 PM UTC

Abhishek Gupta

@abhishekunique7

30 Jan 2024

We hope this can be a useful tool to help use RL on your robots! Happy RL-ing. Website: serl-robot.github.io Code: github.com/rail-berkeley/ser… w/ @jianlanluo,@real_ZheyuanHu, Charles Xu, @youliangtan, @archit_sharma97, Stefan Schaal, @chelseabfinn, @svlevine (5/5)

1,960

Abhishek Gupta · May 15, 2025 · 9:16 PM UTC

Abhishek Gupta

@abhishekunique7

15 May 2025

Exciting work from @marceltornev and friends!

Marcel Torné @marceltornev

15 May 2025

Giving history to our robot policies is crucial to solve a variety of daily tasks. However, diffusion policies get worse when adding history. 🤖 In our recent work we learn how adding an auxiliary loss that we name Past-Token Prediction (PTP) together with cached embeddings enables us to reliably add longer history context to our robot policies! 🧠 We also show how PTP enables some test-time scaling techniques for robotics! 🚀

2,361

Abhishek Gupta · Feb 21, 2025 · 4:43 PM UTC

Abhishek Gupta

@abhishekunique7

21 Feb 2025

Replying to @Ar_Douillard

I wouldn’t call out authors publicly - it can be immensely demoralizing, especially for junior authors. If you’re keen on providing them feedback, I’d send it to them privately and constructively and they can choose to use it to improve their work :)

580

Abhishek Gupta · Jun 5, 2025 · 5:52 PM UTC

Abhishek Gupta

@abhishekunique7

5 Jun 2025

Hell yea real world RL :)

Jiaheng Hu @JiahengHu1

5 Jun 2025

Real-world RL, where robots learn directly from physical interactions, is extremely challenging — especially for high-DoF systems like mobile manipulators. 1⃣ Long-horizon tasks and large action spaces lead to difficult policy optimization. 2⃣ Real-world exploration with whole-body contact raises serious safety concerns. 🚀 Introducing SLAC, a framework that brings safety and efficiency to whole-body real-world RL. Paper: arxiv.org/abs/2506.04147 Video: piped.video/watch?v=bj5GhjZb… 🧵

2,623

Abhishek Gupta · Jul 18, 2023 · 9:33 PM UTC

Abhishek Gupta

@abhishekunique7

18 Jul 2023

Excited to share work led by Max Simchowitz on principled ways to approach combinatorial generalization using bilinear embeddings. Useful under “combinatorial” distribution shift - eg you’ve seen blue mugs, red mugs and blue cups, what happens when you see red cups? A 🧵 (1/3)

4,127

Abhishek Gupta · Jul 14, 2023 · 8:27 AM UTC

Abhishek Gupta

@abhishekunique7

14 Jul 2023

Gave a talk on dirty laundry in RL, ala advice from @Ken_Goldberg. Situated this in some dexterous manipulation work. Recordings should be up soon, y’all might enjoy it :) thanks @notmahi and the other organizers!

Mahi Shafiullah 🏠🤖

@notmahi

14 Jul 2023

The first workshop on Learning Dexterous Manipulation at @RoboticsSciSys is starting now! Check out our speaker lineup at learn-dex-hand.github.io/rss… or tune in via zoom at learn-dex-hand.github.io/zoo… if you are not in person.

3,511

Abhishek Gupta · Nov 22, 2024 · 6:42 PM UTC

Abhishek Gupta

@abhishekunique7

22 Nov 2024

Check out our new work on learning human-AI cooperation agents using generative models. Led by @liangyanchenggg and @Daphne__Chen, to be presented at #NeurIPS2024 The overcooked game in the browser is fun to play :) sites.google.com/view/human-…

GAMMA

TLDR: We use generative models to sample infinite human-like partner agents to train a coordinator agent. These agents cooperate well with real human players, achieving better performance compared to...

sites.google.com

Yancheng Liang @liangyanchenggg

22 Nov 2024

🎉 Excited to release our #NeurIPS2024 paper on zero-shot human-AI cooperation. For the first time, we use generative models to sample infinite human-like training partners to train a Cooperator agent. 🔥Experience it! 🚀Check out our 𝐥𝐢𝐯𝐞 𝐝𝐞𝐦𝐨 👉 sites.google.com/view/human-…

2,561

Abhishek Gupta · Nov 22, 2024 · 7:07 AM UTC

Abhishek Gupta

@abhishekunique7

22 Nov 2024

Max is 100% one of the smartest people I know and a fantastic mentor, go work with him!

Max Simchowitz

@max_simchowitz

21 Nov 2024

A very exciting personal update: In January, I’ll be joining @CMUMLD as tenure-track assistant professor! My lab will focus on the mathematical foundations of, and new algorithms, for decision making. This includes everything from reinforcement learning in the physical world (diffusion-ppo.github.io/), to world modeling (boyuan.space/diffusion-forci…), to statistical guarantees for robotic agents (arxiv.org/abs/2307.14619). To learn more about my world, check out my personal webpage: msimchowitz.github.io/ To prospective students, stay tuned for a thread about PhD and Masters hiring!

2,587

Abhishek Gupta · Dec 14, 2023 · 3:41 AM UTC

Abhishek Gupta

@abhishekunique7

14 Dec 2023

If you're at #NeurIPS2023, check out @badsethcohen 's work on generative BC! A cool look into how to realize stability guarantees for imitation learning, in theory and practice Poster: Thu 14 Dec 10:45 a.m. CST — 12:45 p.m. CST, #1427 Paper: arxiv.org/abs/2307.14619

2,238

Abhishek Gupta · May 10, 2024 · 3:11 AM UTC

Abhishek Gupta

@abhishekunique7

10 May 2024

Real2Sim is great, exciting to see this 👏

Xuanlin Li (Simon)@XuanlinLi2

9 May 2024

Scalable, reproducible, and reliable robotic evaluation remains an open challenge, especially in the age of generalist robot foundation models. Can *simulation* effectively predict *real-world* robot policy performance & behavior? Presenting SIMPLER!👇 simpler-env.github.io/

4,376

Abhishek Gupta · Feb 23, 2023 · 12:51 AM UTC

Abhishek Gupta

@abhishekunique7

23 Feb 2023

These videos are incredible, congrats to @hausman_k, @TianheYu, and the team! Really exciting to see generative models provide big improvements in real micro kitchen environments. Looking forward to what's next!

Karol Hausman

@hausman_k

22 Feb 2023

Our most recent work showing bitter lesson 2.0 in action: using diffusion models to augment robot data. Introducing ROSIE: diffusion-rosie.github.io/ Our robots can imagine new environments, objects and backgrounds! 🧵

3,091

Abhishek Gupta · Sep 10, 2020 · 4:11 PM UTC

Abhishek Gupta

@abhishekunique7

10 Sep 2020

Some new insights on the problem of offline pretraining with online finetuning. Seems to work pretty well! Code is out too. @ashvinair @svlevine @mihdalal bair.berkeley.edu/blog/2020/… awacrl.github.io/ arxiv.org/pdf/2006.09359.pdf

Abhishek Gupta · Aug 12, 2024 · 5:34 PM UTC

Abhishek Gupta

@abhishekunique7

12 Aug 2024

An excellent piece by @stefan_milne covering some of our recent work pushing the paradigm of real-to-sim-to-real for scalable robot training, led by @ZoeyC17 @marceltornev and many others across @uwcse and @MITCSAIL! washington.edu/news/2024/08/… Give it a read :)

Using photos or videos, these AI systems can conjure simulations that train robots to function in...

Two new studies introduce AI systems that use either video or photos to create simulations that can train robots to function in the real world. This could significantly lower the costs of training...

washington.edu

2,330

Abhishek Gupta · Dec 11, 2023 · 8:18 PM UTC

Abhishek Gupta

@abhishekunique7

11 Dec 2023

I'm unfortunately not at @NeurIPSConf #NeurIPS2023 this year, but luckily my excellent students and collaborators who actually did the work are! Please do visit their posters and talks and ask them very hard questions 😀 A 🧵 (1/9)

3,195

Abhishek Gupta · Feb 13, 2025 · 9:46 PM UTC

Abhishek Gupta

@abhishekunique7

13 Feb 2025

The key thing I took away from here is - simulation is inherently wrong, but can still be very useful! Value functions from simulation can make the job of real-world RL *much* easier, making it far more practical as a solution. This was work conceptualized and led by @patrickhyin and Tyler Westenbroek, along with a great set of collaborators - Simran Bagaria, Kevin Huang, @chinganc_rl, @Andrey__Kolobov between UW and MSR Website: weirdlabuw.github.io/sgft/ Paper: arxiv.org/abs/2502.02705 We will be presenting this paper at #ICLR2025 this April 😃

1,399

Abhishek Gupta · Mar 10, 2023 · 7:16 PM UTC

Abhishek Gupta

@abhishekunique7

10 Mar 2023

Excited about work led by @xkelym @Zyc199539Chu @ab_deshpande! I was skeptical that we could solve these problems with RL, but they totally proved me wrong! 😄 Super interesting both from the perspective of system design and algorithmic choices! See @xkelym's🧵 with details

Kay - Liyiming Ke @xkelym

10 Mar 2023

Let’s do 🍒 Cherry Picking with Reinforcement Learning goodcherrybot.github.io/ - 🥢 Dynamic fine manipulation with chopsticks - 🤖 Only 30 minutes of real world interactions - ⛔️ Too lazy for parameter tuning = off-the-shelf RL algo + default params + 3 seeds in real world

1,978

Abhishek Gupta · Mar 28, 2025 · 1:58 AM UTC

Abhishek Gupta

@abhishekunique7

28 Mar 2025

Replying to @ChongZzZhang

I think it's valid to say MuJoCo benchmarks shouldn't be trusted, I think most practitioners feel that way anyways. But saying something shouldn't be trusted without really suggesting a viable alternative leaves the community a tough spot because we have no meaningful metric to measure progress. And then we're all just vibe researching :)

1,434

Abhishek Gupta · May 25, 2021 · 11:25 PM UTC

Abhishek Gupta

@abhishekunique7

25 May 2021

A little video I made explaining our ICRA 2021 work on reset-free reinforcement learning for dexterous manipulation. Paper at arxiv.org/abs/2104.11203

Sergey Levine

@svlevine

25 May 2021

Want to know how robots can learn to give you a hand with your NeurIPS submissions? So do I. In the meantime, you can check out @abhishekunique7's ICRA 2021 talk, how to train robotic hands to do lots of other stuff🙂from scratch, in the real world piped.video/watch?v=UG1wJPAC…

Abhishek Gupta · Feb 16, 2023 · 2:10 AM UTC

Abhishek Gupta

@abhishekunique7

16 Feb 2023

Excited to share our work on leveraging text2image generative models for data augmentation for robot learning! We leverage these models to generate a huge diversity of realistic scenes from very minimal on-robot data, which enables pretty cool generalization! Thread by @ZoeyC17

Zoey Chen

@ZoeyC17

16 Feb 2023

Need more data to train your robot in the real-world? Introducing GenAug, a semantic data augmentation framework to enable broad robot generalization by leveraging pre-trained text-to-image generative models. 🧵(1/N) Paper arxiv.org/pdf/2302.06671.pdf Website genaug.github.io/

5,776

Abhishek Gupta · Jul 17, 2023 · 8:10 PM UTC

Abhishek Gupta

@abhishekunique7

17 Jul 2023

Don’t miss a chance to work with @aviral_kumar2 :) he’s an incredible advisor already and I’m looking forward to his upcoming lab!

Aviral Kumar

@aviral_kumar2

17 Jul 2023

Thrilled to share that I will be joining Carnegie Mellon @SCSatCMU as an Assistant Professor of CS and ML @CSDatCMU @mldcmu in Fall 2024. Extremely thankful to my mentors & collaborators, especially @svlevine! Looking forward to working with amazing students & colleagues at CMU!

3,852

Abhishek Gupta · Nov 7, 2023 · 5:00 AM UTC

Abhishek Gupta

@abhishekunique7

7 Nov 2023

Our work on continual reinforcement learning that gets more and more efficient as it encounters more tasks is at CoRL 2023 this year. Come check out our poster on Nov 9, from 2:45-3:30 pm!

Abhishek Gupta

@abhishekunique7

20 Sep 2023

Check out work led by Zheyuan Hu and Aaron Rovinsky on how robot learning can get *more* efficient as it encounters more tasks! This was a pretty awesome exercise in system building and we learned a lot about making continual learning systems for real world dexterous robots

3,858

Abhishek Gupta · Aug 8, 2025 · 8:01 PM UTC

Abhishek Gupta

@abhishekunique7

8 Aug 2025

Replying to @_tonytao_

worry not, they already did - arxiv.org/abs/2101.04882 :)

Asymmetric self-play for automatic goal discovery in robotic manipulation

We train a single, goal-conditioned policy that can solve many robotic manipulation tasks, including tasks with previously unseen goals and objects. We rely on asymmetric self-play for goal...

arxiv.org

1,150

Abhishek Gupta · Sep 20, 2023 · 3:37 AM UTC

Abhishek Gupta

@abhishekunique7

20 Sep 2023

Sergey Levine

@svlevine

9 Sep 2023

Can we get dexterous hands to learn efficiently from images entirely in the real world? With a combo of learned rewards, sample-efficient RL, and initialization from data of other tasks, robots can learn skills autonomously in a matter of hours: sites.google.com/view/reboot… A 🧵👇

7,601

Abhishek Gupta · Dec 11, 2023 · 12:10 AM UTC

Abhishek Gupta

@abhishekunique7

11 Dec 2023

People perform things at varying levels of suboptimality, typically because of constrained computational budgets. Most modeling frameworks don't account for this. We model agents with varying levels of rationality using latent inference budgets! See @apjacob03's 🧵for more!

Athul Paul Jacob

@apjacob03

9 Dec 2023

⭐️ New Paper ⭐️ We introduce latent inference budget models (L-IBMs), a family of approaches for modeling how agents plan subject to computational constraints. Paper: arxiv.org/pdf/2312.04030.pdf 🧵👇(1/11)

3,430

Abhishek Gupta · May 28, 2021 · 3:17 AM UTC

Abhishek Gupta

@abhishekunique7

28 May 2021

Exciting news, cannot think of anyone more deserving! Congratulations :)

Karol Hausman

@hausman_k

27 May 2021

Super excited to announce that I've started as an Adjunct Professor @Stanford! I'll continue to work @GoogleAI but I'll also be spending some time at Stanford, where I'll be co-advising a few students and continue co-teaching CS 330 (cs330.stanford.edu) 🧑‍🏫

Abhishek Gupta · Jul 23, 2021 · 9:52 PM UTC

Abhishek Gupta

@abhishekunique7

23 Jul 2021

Replying to @abhishekunique7 @uwcse @berkeley_ai @svlevine @pabbeel

In the meanwhile, I will be spending a year at @MIT_CSAIL as a post-doc working with Russ Tedrake and @pulkitology. Looking forward to a fun collaboration!

Abhishek Gupta · Jul 14, 2022 · 9:20 PM UTC

Abhishek Gupta

@abhishekunique7

14 Jul 2022

Yay reset free RL :) love this task setup!

Sergey Levine

@svlevine

14 Jul 2022

Don't Start From Scratch: good advice for ML with big models! Also good advice for robots with reset-free training: sites.google.com/view/ariel-… ARIEL allows robots to learn a new task with offline RL pretraining + online RL w/ forward and backward policy to automate resets. Thread:

Abhishek Gupta · Oct 18, 2023 · 3:30 AM UTC

Abhishek Gupta

@abhishekunique7

18 Oct 2023

Pre-trained visual representations are effective features, but @ZCCZHANG shows that they can also be used for identification of subgoals directly from long-horizon video behavior. Allows for improvements in both imitation and RL in sim and on robots. 🧵by @ZCCZHANG for more!

Zichen "Charles" Zhang @ZCCZHANG

17 Oct 2023

How can pre-trained visual representations help solve long-horizon manipulation? 🤔 Introducing Universal Visual Decomposer (UVD), an off-the-shelf method for identifying subgoals from videos - NO extra data, training, cost, or task knowledge required. (🧵1/n)

3,991

Abhishek Gupta · Feb 13, 2025 · 9:46 PM UTC

Abhishek Gupta

@abhishekunique7

13 Feb 2025

So what’s the key idea: while policies may not transfer directly from sim2real due to dynamics mismatch, value functions in simulation capture the approximate geometry of the problem that *does* transfer approximately from sim2real. The ordering of states defined by a sim-learned value function (V_sim) captures successful behaviors that are invariant between sim and real, even if the low-level dynamics differ somewhat. SGFT uses this insight to accelerate real-world finetuning by *using V_sim to perform potential-based reward shaping for real-world RL*. We show both theoretically and empirically that doing so effectively shortens the learning horizon, making learning far more efficient! (3/6)

878

Abhishek Gupta · Mar 6, 2024 · 5:21 PM UTC

Abhishek Gupta

@abhishekunique7

6 Mar 2024

Incredible projects from @sanjibac and the whole team! Massive respect for pulling this off :)

Sanjiban Choudhury @sanjibac

4 Mar 2024

Cooking in kitchens is fun. BUT doing it collaboratively with two robots is even more satisfying! We introduce MOSAIC, a modular framework that coordinates multiple robots to closely collaborate and cook with humans via natural language interaction and a repository of skills.

2,355

Abhishek Gupta · Jun 10, 2024 · 6:44 PM UTC

Abhishek Gupta

@abhishekunique7

10 Jun 2024

Read the paper to see what makes it tick-lots of little details in there. Fun work led by @xkelym, @yunchuzh, @ab_deshpande, Quinn Pfeifer, with @siddhss5! Paper: arxiv.org/pdf/2405.19307 (robotics), arxiv.org/abs/2310.12972 (algorithmic) Website: personalrobotics.github.io/C… (9/9)

2,314

Abhishek Gupta · Feb 29, 2024 · 4:28 AM UTC

Abhishek Gupta

@abhishekunique7

29 Feb 2024

Replying to @natolambert

Although in my experience things that are high visibility on Twitter have a somewhat loose correlation to high quality research :) and so yes you get signal, but it is often misleading. Just my 2 cents

1,218