Introducing Dynamism v1 (DYNA-1) by @DynaRobotics – the first robot foundation model built for round-the-clock, high-throughput dexterous autonomy. Here is a time-lapse video of our model autonomously folding 850+ napkins in a span of 24 hours with • 99.4% success rate — zero human intervention • 60% human throughput speed • 4.3/5 quality ratings (set by the client) A thread on our motivation, insights and results:
64
125
945
518,740
Introducing DrEureka🎓, our latest effort pushing the frontier of robot learning using LLMs! DrEureka uses LLMs to automatically design reward functions and tune physics parameters to enable sim-to-real robot learning. DrEureka can propose effective sim-to-real configurations for several robots and tasks, and we even got a bit creative with it: Let’s make a robot dog walk and balance on a yoga ball! Check out these fun videos, and follow the thread for a deep dive!
23
112
586
247,256
Excited to finally share Generative Value Learning (GVL), my @GoogleDeepMind project on extracting universal value functions from long-context VLMs via in-context learning! We discovered a simple method to generate zero-shot and few-shot values for 300+ robot tasks and 50+ datasets using SOTA VLMs like Gemini (Try out the demo on our website on your robot video today!) I worked a lot on leveraging foundation models as guidance for robots in my PhD, and to me, this result forges a new frontier in how we can use foundation models for robot learning, given its broad applicability independent of embodiment and task types. Quite excited about how we can build on this work as a community!
9
113
587
98,175
We just did World’s first on-stage autonomous demo of long-horizon dexterous VLA 🚨 No training. No setup. Performance out of the box. Live demo is hard and unpredictable, but we felt great about our model’s generalization, and it went pretty well! 💯 Zero-shot. 100% success.
18
60
507
63,623
We have raised $120M to accelerate our mission of building and delivering high-performance general-purpose robots to the physical world. Within one year, we have made research breakthroughs, showing that it is possible to achieve real-world reliability with large VLAs, and demonstrated commercial and deployment traction, with our DYNA-1 models running live in sites at SF, LA, and Sacramento. This is just the beginning, and I never felt more optimistic about a future where AI-powered robots can positively impact human productivity. From when I started PhD where robot policies barely worked even in highly controlled settings to now deploying DYNA robots with the confidence of out-of-box model performance, when I think about the trajectory of robotics, it’s astonishing how quickly we’ve gone from if it works in the lab to it works in the world. The next frontier isn’t about proving robots can move—it’s about proving they can reliably help in real-world environments, at scale, across industries. That’s what we’re building at @DynaRobotics The impact of this shift will be massive: - Unlocking productivity across logistics, manufacturing, and beyond. - Expanding what small teams and businesses can achieve. - Freeing humans to focus on higher-level creativity, problem-solving, and connection. The mission is bigger than any single deployment. It’s about ushering in an era where general-purpose robots are as ubiquitous and trusted as computers or smartphones. We’re just getting started, and I couldn’t be more excited for what’s ahead. Join us!🚀🤖
Excited to announce that we have raised $120M in our Series A to advance the frontier of general-purpose high-performance robots. 🤖 The new funding will accelerate progress towards our mission of bringing foundation-model powered robots to everyone, everywhere. Read more 👇
41
30
427
68,289
I recently gave a talk at MIT's Embodied Intelligence seminar, covering some of my recent works and perspectives on "Foundation Model Supervision for Robot Learning". Recording is up on Youtube: piped.video/watch?v=JfZYtpEi… Hope you like it and find it useful!
6
45
345
29,632
Sharing some exciting DYNA-1 result: zero-shot environment generalization We put DYNA-1 under test in a completely different environment from our training distribution – with an entirely different background (@DynaRobotics banner) and metal table. The table has a reflective and smooth surface, creating a wildly different visual appearance as well as interaction dynamics. The model is able to proceed as usual, adeptly folding and recovering from its own mistakes. By focusing on task mastery, we achieve robust generalization out of the box
12
33
303
49,104
Excited to launch @DynaRobotics with a team of incredible researchers, engineers and company builders! At Dyna, our mission is to bring affordable general-purpose AI robots to real production environments.
34
27
280
30,431
Super excited to share Eureka, our "spin" on how to use LLMs to teach low-level dexterity skills! Eureka is an open-ended reward design agent that can write and evolve superhuman reward functions for a large suite of robots and tasks, including challenging pen spinning tricks!
Can GPT-4 teach a robot hand to do pen spinning tricks better than you do? I'm excited to announce Eureka, an open-ended agent that designs reward functions for robot dexterity at super-human level. It’s like Voyager in the space of a physics simulator API! Eureka bridges the gap between high-level reasoning (coding) and low-level motor control. It is a “hybrid-gradient architecture”: a black box, inference-only LLM instructs a white box, learnable neural network. The outer loop runs GPT-4 to refine the reward function (gradient-free), while the inner loop runs reinforcement learning to train a robot controller (gradient-based). We are able to scale up Eureka thanks to IsaacGym, a GPU-accelerated physics simulator that speeds up reality by 1000x. On a benchmark suite of 29 tasks across 10 robots, Eureka rewards outperform expert human-written ones on 83% of the tasks by 52% improvement margin on average. We are surprised that Eureka is able to learn pen spinning tricks, which are very difficult even for CGI artists to animate frame by frame! Eureka also enables a new form of in-context RLHF, which is able to incorporate a human operator’s feedback in natural language to steer and align the reward functions. It can serve as a powerful co-pilot for robot engineers to design sophisticated motor behaviors. As usual, we open-source everything! Welcome you all to check out our video gallery and try the codebase today: eureka-research.github.io/ Paper: arxiv.org/abs/2310.12931 Code: github.com/eureka-research/E… Deep dive with me: 🧵
13
16
173
49,726
Excited to share VIP, a self-supervised visual reward and representation pre-trained on diverse human videos! VIP’s frozen reward and rep. can solve diverse unseen robot tasks using TrajOpt, online RL, and enables real-world few-shot offline RL! sites.google.com/view/vip-rl 🧵:
2
38
164
Excited to share our #ICML2023 paper ✨LIV✨! Extending VIP, LIV is at once a pre-training, fine-tuning, and (zero-shot!) multi-modal reward method for (real-world!) language-conditioned robotic control. Project: penn-pal-lab.github.io/LIV Code & Model: github.com/penn-pal-lab/LIV 🧵:
1
44
159
55,047
Humbled to share that I was selected as an Apple Scholar in AIML PhD Fellowship! Very grateful to Apple, my advisors @dineshjayaraman @obastani as well as all my mentors and collaborators for their support! machinelearning.apple.com/up…
22
5
154
11,271
Giving a talk tomorrow at the Foundation Models for Interactive Robot Learning workshop (lfmrss2025.weebly.com/) at RSS! Will cover some @DynaRobotics results too! What would people like to see?
3
8
139
7,884
Excited to share my first paper as an "advisor" :D We show that pre-trained visual representations enable a simple, fast, no-training subgoal decomposition method for long-horizon robotic manipulation! Paper: arxiv.org/abs/2310.08581 Website: zcczhang.github.io/UVD/ (🧵1/n)
3
18
124
18,985
I am attending #CORL2023 and presenting two new papers at various workshops! Excited to make new friends and catch up! Please reach out if you are attending and would like to chat about anything robot learning :)
4
15
112
31,482
We have been stress testing our model flywheel and seeing strong results in many challenging tasks! Task 1: chopping veggie🧑‍🍳 A test of coordinated tool use that involves asymmetric task and dynamic feedback to make consistent cuts. We used just 70 trajectories to get the results you see in the video. Task 2: cup stacking 🎉 A test of high-precision control, which requires precise and delicate positioning at every step. Mistakes at any step are catastrophic! The arms we use have high control error, but the model makes up for it. These are quite distinct tasks that are difficult in different ways than some of our earlier demos like laundry/napkin folding. Very glad to see by injecting dextereous control into pre-training in an integrated way, we see substantial boost in post-training robustness and efficiency!
Excited to share our latest progress on DYNA-1 pre-training! 🤖 The base model now can perform diverse, dexterous tasks (laundry folding, package sorting, …) without any post-training, even in unseen environments. This powerful base also allows extremely efficient fine-tuning to ~100% success on challenging new tasks with as little as 1 hour of data! 🤯 Watch it master two of them: cup stacking & celery chopping on repeat, no failures. 👇
3
10
130
16,995
It's so satisfying to watch our models just do the task on demand; human like and no downtime
We have started taking DYNA-1, our dexterous robust VLA model, to conferences and showcasing it for hours on end! The model run for 3 days, 8 hours each day at #HITEC2025 3 weeks ago with 99.9% overall success rate (dropped 1 towel in day 2). No intervention, it just works :)
2
5
116
7,637
I am very excited about our new paper (CoRL 2024 Oral) on using LLM code generation to automate environment curricula. It is often hypothesized that intelligent motor control is driven by the need to habituate in and adapt to varied environments. While there's a beautiful literature on unsupervised env design, scaling such idea to robotics has been very difficult so far given the high-dimensional env configuration space. Instead, robotics engineers still use their domain knowledge to manually design environments and their curricula thereof. In Eurekaverse, we demonstrated possibility of automating all that using LLMs and showed a compelling use case on challenging robot parkour! @willjhliang did an amazing job leading the work, and he's presenting it this week at #CoRL2024! Make sure to check out the poster and our oral presentation!
Introducing Eurekaverse 🌎, a path toward training robots in infinite simulated worlds! Eurekaverse is a framework for automatic environment and curriculum design using LLMs. This iterative method creates useful environments designed to progressively challenge the policy during training, enabling the learning of complex skills. We applied it to train quadruped parkour—check out these fun videos! 👟 We’ll be presenting Eurekaverse at #CoRL2024 this week and would love to see you at our oral talk and poster! And, of course, all results, details, and code are released here: Website: eureka-research.github.io/eu… Paper: eureka-research.github.io/eu… Code: github.com/eureka-research/e… Deep dive with me… 🧵
3
12
114
12,479
😲 Even I am surprised sometimes for what our models are capable of
@DynaRobotics successfully zero-shot folded a new @corl_conf shirt -- even with the sleeve tucked in awkwardly! Amazing stuff. @JasonMa2020
2
7
110
10,169
We did a fun and timely halloween experiment benchmarking our VLA models' robust reasoning capabilities! 🎃 There's a lot of interest in reasoning for VLA models, but I personally felt most tasks the community benchmark on (1) do not require meaningful reasoning capabilities, or (2) are somewhat unrealistic and do not represent tasks in real-world scenarios. So we decided to use object counting and manipulation as a real benchmark; it's quite common and realistic, but I haven't seen much work in this area. End-to-end Imitation learning would fail because of combinatorially many permutations you can ask to the robot. Our VLA model can count and follow language commands fairly robustly -- all in an end-to-end architecture without external memory modules or counting logic. The model also robustly handles external disturbances to the scene (like shuffling the candy baskets). It's a small cute experiment we did to benchmark reasoning, but it's pretty fun so thought we'd share!
🎃 Halloween is coming. Our hardworking team is lining up for sweet treats, of course, served by Dynasaur! DYNA VLA model now has robust agentic reasoning capability, allowing it to serve arbitrary combinations and counts of candies! Pure imitation learning can’t work given the combinatorially many possibilities. No video edits. Uninterrupted, real-life, as always 🤖 Happy Halloween from DYNA!🍬
1
9
105
12,205
the best part of this is that the model is totally operating *zero-shot*; we didn't prep anything particular for corl. we just brought a robot (a new one that we never ran models on), loaded the same model i demo'd live on stage during actuate, and let it run for the entire conf
Everyone keeps talking about this demo, if anyone won corl it was these guys
8
5
98
14,164
It's really unfortunate to hear the FAIR layoff news.. Meta friends: if you are interested in working on frontier real-world robotics, @DynaRobotics is hiring! Please DM me!
2
18
99
12,213
I'm attending #ICRA2025 this week! Happy to chat about Dyna and all things embodied AI. DMs are open!
3
5
93
9,421
“Don’t practice until you get it right. Practice until you can’t get it wrong.” We have developed a general recipe for robust and autonomous robot foundation models for real-world applications. The linchpin in our recipe is an accurate reward model (RM) that scores every robot interaction with precision. Building on our prior research, we have delivered the first scalable foundation reward model for robotics. This model outperforms previous approaches and can reliably estimate task progress on challenging dexterity tasks, like napkin folding. This capability unlocks a host of production-critical capabilities, such as (1) autonomous exploration, (2) intentional error recovery, (3) high-quality dataset creation and curation, and much more.
4
6
86
22,839
Sharing some exciting @DynaRobotics results tomorrow! Stay tuned :D
2
3
75
7,307
Excited to start my internship at @MetaAI in their Menlo Park, CA office! I will be working with @yayitsamyzhang, @shagunsodhani, and @Vikashplus on RL and robot learning topics. If you are in the area and want to chat/hang out, please let me know!!
4
73
Thanks for having me! I gave a talk recently at MIT's Embodied Intelligence seminar. Recording should be up soon!
Welcome to MIT @JasonMa2020! Amazing talk and work, keep it up!
1
4
67
8,928
Thanks Chris! Dyna’s core philosophy: ship general and robust models and let everyone see. No hype, no cherry picking, just results
Replying to @avizurlo
My favorite is still the Dyna Robotics 24 hours of folding video; short videos like this are too easy to cherry pick
1
3
65
6,735
Autonomous deployment in real production sites is the real litmus test of general purpose robots
1
5
62
8,245
When we founded Dyna, we began with a first-principles question that shaped our focus: What single technical hurdle must we clear to unlock unlimited demand for robots? After talking to hundreds of customers, the answer is so clear yet so underemphasized in current robotics discourse: PERFORMANCE — high throughput and high-quality output, consistently. Delivering robot performance has been our singular north star ever since; no one wants a robot that only kind of works slowly. We have tested DYNA-1 with several distinct 24-hr trials under different natural environment variations and found DYNA-1 robustly complete 700+ napkins in all trials. My favorite part of these timelapse videos is the 6-7am period when the sunrises and the model just continues performing the task as usual!
1
6
60
9,347
Honored to be part of the cohort!
List of 33 #RSSPioneer2025 is out! Their research interests cover fundamental robot design, modelling and control, robot perception and learning, localisation and mapping, human-robot interaction, healthcare and medical robotics, and soft robots! sites.google.com/view/rsspio…
1
53
4,722
Our cute Dynasaurs running around the clock; no downtime, they just work🦖🤖 Kudos to my teammates who did all the work here; it's a testament to how robust our models are. I didn't have to do any work to get this up and running for days!
We brought Dynasaur to The Clean Show and ran it live, 8 hours a day. Each deployment fuels our robotics foundation model — scaling data, accelerating iteration, and enabling real-world generalization. Embodied intelligence won’t come from lab demos, but from deployment-first robotics running continuously in the wild.
2
6
53
7,249
This is so impressive! I can't imagine the amount of progress we will unlock as a community with low-cost, highly capable robots. Congrats to the Unitree Team!
Unitree Introducing | Unitree G1 Humanoid Agent | AI Avatar Price from $16K 🤩 Unlock unlimited sports potential(Extra large joint movement angle, 23~34 joints) Force control of dexterous hands, manipulation of all things Imitation & reinforcement learning driven #Unitree #AI
1
6
52
9,513
Learning policies in simulation and transferring to the real world (or Sim-To-Real in short) is a promising strategy for robots to learn complex skills. However, humans need to tune the simulator carefully so that the policies work robustly in the real world: this is difficult, slow, and tedious. DrEureka aims to automate this by having LLMs design crucial components of sim-to-real transfer: Reward Design and Domain Randomization. Here are a few bonus several-minute-long uncut videos of DrEureka yoga ball walking policy in-the-wild on Penn’s campus, enjoy!
2
5
46
8,435
I ll be hanging out at the LeRobot Hackathon this weekend, come say hi!
The impact of accessible low-cost robot arms and the community that’s built up around @LeRobotHF has been so awesome to see! 🤖 🚀 I am honored to be a guest judge this weekend at the Global LeRobot hackathon’s SF location. Thanks to @BitRobotNetwork for hosting.
2
4
51
10,056
Due to popular requests, we have now uploaded our pre-trained LIV model on HuggingFace for easier downloads! This is my first time doing it, and the experience was quite smooth @_akhaliq huggingface.co/jasonyma/LIV
Excited to share our #ICML2023 paper ✨LIV✨! Extending VIP, LIV is at once a pre-training, fine-tuning, and (zero-shot!) multi-modal reward method for (real-world!) language-conditioned robotic control. Project: penn-pal-lab.github.io/LIV Code & Model: github.com/penn-pal-lab/LIV 🧵:
1
17
48
20,337
Using LLMs to program a robot dog to play with balls.. sounds familiar ;) jokes aside, AI assisted development of robot capabilities will be the future. We are still so early
New Anthropic research: Project Fetch. We asked two teams of Anthropic researchers to program a robot dog. Neither team had any robotics expertise—but we let only one team use Claude. How did they do?
3
6
48
7,532
Delighted to announce that our work on a unified framework for offline imitation from observations and examples has been accepted to ICML 2022! #icml #ICML2022
SMODICE: Versatile Offline Imitation Learning via State Occupancy Matching abs: arxiv.org/abs/2202.02433 project page: sites.google.com/view/smodic… A single algorithm for offline IL from observations, mismatched experts, and examples. Sota results in all settings. @JasonMa2020
2
5
45
DYNA-1 now folds napkins for paying customers, and we’re unlocking more skills to ship into more commercial environments in the coming weeks & months. Mastering napkin folding won’t transform daily life, but it’s a pivotal step toward making embodied AI commercially viable. This is a dream come true. After years of PhD research aimed at making robots genuinely useful in the real world, nothing felt as close as to what we accomplished in the last few months at Dyna. As we embark on this journey, we are excited to start sharing some of our results and research more widely with the robotics community! Alongside real-world robustness, we are also pushing the boundaries of cutting-edge large-scale robot learning. Join us in building robots for the real world. We look forward to hearing from you! Check out our blog post with our results: dyna.co/research
3
3
44
4,206
Agreed; it's my first time attending a developer conference instead of an academic conference. Really got me thinking deeper about deployment-in-the-loop model iteration. Also thanks for the photo feature :D
Actuate 2025 was a very cool developer conference for the new era of robotics, a short blog post: itcanthink.substack.com/p/my…
1
5
42
9,524
We are presenting LIV today at #ICML2023! Exhibit Hall 1, #827 2:00pm - 3:30pm HST The future of robotics is multi-modal, and LIV demonstrates how multi-modal value pre-training from diverse human videos can bootstrap language-conditioned robot skill learning. See you there!
Excited to share our #ICML2023 paper ✨LIV✨! Extending VIP, LIV is at once a pre-training, fine-tuning, and (zero-shot!) multi-modal reward method for (real-world!) language-conditioned robotic control. Project: penn-pal-lab.github.io/LIV Code & Model: github.com/penn-pal-lab/LIV 🧵:
1
6
42
6,409
A unique challenge we run into at Dyna is how can we make best use of the large amount of data autonomously collected by DYNA-1 during deployment? In continuous deployment settings, robot data does not naturally come with episodic boundaries. We have also developed an approach that can automatically segment the streaming data and provide accurate progress estimation and language labeling to enhance the model's task understanding.
1
2
43
6,388
Super excited to share that GoFAR has been accepted to #NeurIPS2022 and flagged for an award! This is a new foundation for the theory and practice of (offline) goal-conditioned RL, check it out!
How Far I'll Go: Offline Goal-Conditioned Reinforcement Learning via f-Advantage Regression abs: arxiv.org/abs/2206.03023 project page: jasonma2016.github.io/GoFAR/
1
2
41
In contrast, We find that standard recipes for training robot foundation models are insufficient for real-world PERFORMANCE. On a demanding production task like restaurant-grade napkin folding, state-of-the-art models saturate at ≈ 80% single-episode success even after hundreds of hours of domain-specific data. At that level, the chance of 30 flawless consecutive executions is (0.8)³⁰ ≈ 0.1 %—functionally zero for 24/7 operations. We observe the same failure mode in-house: after one to two hours the policy drifts into unfamiliar states and cannot self-recover. The time-lapse below shows our strongest base model collapsing despite an initially perfect run. Flashy demos hide this brittleness; sustained autonomy demands ≥ 99% step-level reliability and robust fault-recovery, not just high single-episode accuracy. So what is our approach?
1
2
42
7,715
Excited to give this talk! Also lmk if you are going to be there; DMs are open
Our cofounder @JasonMa2020 speaks tomorrow at #Actuate2025: "Foundation Reward Models for Robot Learning"! If you’re around, don’t miss it! And yes, we’re HIRING across research & robotics → dyna.co/careers
1
1
40
7,857
Honored to see Eureka on this list along side many amazing works! eureka-research.github.io/
👀 Discover the top 10 #NVIDIAresearch projects of the year. ✨ From Neuralangelo's high-fidelity neural surface reconstruction to Magic3D's text-to-3D content creation, these projects push the boundaries of innovation in #AI. nvda.ws/3RlypJr
1
2
40
6,308
Looking forward to sharing some of my thoughts on how programmatic outputs from large foundation models can accelerate robot learning!
Our #ICML2025 Programmatic Representations for Agent Learning workshop will take place tomorrow, July 18th, at the West Meeting Room 301-305, exploring how programmatic representations can make agent learning more interpretable, generalizable, efficient, and safe! Come join us!
2
1
38
3,091
We (@dineshjayaraman , @akrishna42 , and I) recently went on the Economist podcast to discuss DrEureka (eureka-research.github.io/dr…) and other recent works from our lab as well as trends on robot foundation models! Give it a listen if you are interested!
Why are robots suddenly getting cleverer? This week on “Babbage” @alokjha explores how advances in AI are bringing about a renaissance in robotics: econ.st/3KNARoV 🎧
7
38
5,641
By scaling our RM-in-the-loop training, DYNA-1 has leapt forward in just a few weeks: • Week 1: Base model can complete single success, but falls apart after 5 minutes • Week 2: Ran 1 hours unaided, but compounding errors make recovery impossible • Week 3: Ran 8 hours, but executed only 6-7 napkins per hour (~10 mins per fold) • Week 4: Completed our first 24-hour run—but ~200 folds at low quality and speed • Week 5: Completed 24+ hours with ~350 folds at decent production-grade quality. • Week 6: Sustained 24+ hours with ~850 folds and high production-grade quality From stop-and-go to round-the-clock excellence, our continual learning recipe drives rapid, tangible gains.
3
2
37
5,565
I am attending #RSS this week! Participating in the Pioneers workshop as well as giving a talk at the Foundation Models for Interactive Robot Learning workshop. Happy to meet up and chat if you are around!
1
2
36
2,711
Excited to be speaking at #Actuate2025!
Robots that actually work 24/7, no babysitting, are the holy grail of robotics. @JasonMa2020 and the team at @DynaRobotics realized the standard recipes for training robot foundation models lacked real-world 'performance' — high throughput, high quality, every time. Enter DYNA-1, the robot foundation model that autonomously folded 850+ napkins in 24 hours with a 99.4% success rate—zero human intervention at 60% human speed. How? An accurate reward model (RM) that scores every robot interaction. We are excited to have Jason Ma on stage at #Actuate2025 to talk more about the breakthroughs his team is achieving at Dyna Robotics. Tickets for Actuate are available now!: hubs.li/Q03tH3N_0
1
35
3,451
I am attending #ICML2023 next week in Hawaii! Excited to make new friends and re-connect with old ones! Please reach out if you are attending and would like to chat about anything related to research or ML! My particular interests include foundation models, RL, and robotics!
34
4,369
We are organizing Workshop on Goal-Conditioned Reinforcement Learning (GCRL) at #NeurIPS 2023! Submission Deadline: October 4th, 2023 Website: goal-conditioned-rl.github.i…
1
4
32
7,127
It was super fun catching up with @micoolcho and @chris_j_paxton and talking about our Generative Value Learning (GVL) work on @RoboPapers! We are presenting this work next month at ICLR 2025 as a Spotlight paper!
Full episode dropping soon! Geeking out with @JasonMa2020 on generative-value-learning.gi… (Vision Language Models are In-Context Value Learners) Co-hosted by @chris_j_paxton & @micoolcho
4
32
3,633
Big congrats to my mentors and collaborators @DrJimFan and @yukez on the new group! Embodied AI and robotics research just kicked up a gear ;)
Career update: I am co-founding a new research group called "GEAR" at NVIDIA, with my long-time friend and collaborator Prof. @yukez. GEAR stands for Generalist Embodied Agent Research. We believe in a future where every machine that moves will be autonomous, and robots and simulated agents will be as ubiquitous as iPhones. We are building the Foundation Agent — a generally capable AI that learns to act skillfully in many worlds, virtual and real. 2024 is the Year of Robotics, the Year of Gaming AI, and the Year of Simulation. We are setting out on a moon-landing mission, and getting there will spin off mountains of learnings and breakthroughs. Join us on the journey: research.nvidia.com/labs/gea…
1
1
30
4,271
At a technical level, DrEureka, following our prior work Eureka (eureka-research.github.io/), uses LLM-guided evolutionary search to generate safety-aware reward functions in code that can be used to train policies in sim. Then, leveraging LLMs’ capability as hypothesis generators, DrEureka uses the LLM to choose (1) which physics parameters to randomize, and (2) what ranges they should be randomized over based on reward-aware physics prior (RAPP) over DR parameters. Finally, using the synthesized reward and DR parameters, it trains policies for real-world deployment.
3
3
31
4,920
happy 4th! it was really fun to play with this human robot interaction🤖🇺🇸
Happy July 4th from Dyna! 🇺🇸🇺🇸
28
2,554
First, check out our project website for the paper, interactive demos, and getting your robot video labeled by GVL today! You can even listen to an AI podcast about our paper, or ask Gemini questions about our paper too! We (especially @xf1280) put in a lot of effort in getting these demos up. Let us know how you find these new ways to engage with paper! generative-value-learning.gi…
1
7
27
5,083
DYNA-1 achieved an unprecedented level of robustness for robot foundation models. But at Dyna, we hold ourselves to an even higher standard: production-grade quality (grades 4 or 5 out of 5 point scale). While 98% of folds reach near-perfect quality (grade ≥3), only 75% hit our rigorous quality bar. What’s the difference? Less than ⅓ inch precision on the initial fold separates perfection (grade 5) from near-perfection (grade 3). Our customers demand perfection—not near perfection—and we deliver. Tiny differences define commercial-grade quality at Dyna. This level of precision also raises the bar of our research, as every research idea is rigorously vetted to ensure measurable and significant real-world performance improvement.
1
1
28
4,459
Excited to share some of my recent works on pre-training for robotics with the MILA community!
Hello! We have @JasonMa2020 from UPenn giving a talk at this week's robot learning seminar (Thursday 11:30am EST online). Hope to see you all there! Title: Foundation Reward Models for General Robot Skill Acquisition piped.video/@MontrealRobotic… #Robotics #MachineLearning
3
27
3,977
Over this learning process, DYNA-1 iteratively becomes much better at handling extremely difficult and out-of-distribution situations. Napkin folding is particularly challenging because: Single-pull precision: Extracting exactly one napkin from a tall stack demands fine control and rapid feedback; otherwise the gripper drags out multiple napkins, causing misfolds and chaos (as you can see in the attached videos). Flattening: When a multi-pull leaves napkins crumpled, the policy must (1) detect that multiple sheets were removed, (2) locate corners folded inward, and (3) separate & flatten overlapped layers before refolding. All of which are nontrivial dexterous endeavors. Rapid self-recovery: Once in an out-of-distribution state, the robot must untangle the mess and resume folding fast enough to keep throughput intact. Every extra second spent on edge cases erodes throughput, so the policy needs to find the quickest remedy. DYNA-1’s ability to handle chaotic scenarios even surprised us, and is the fundamental reason why it can go on for 24-hr with 99+% completion rate. There are too many robustness snippets to list, but here are a few of our favorites:
2
1
27
5,107
…and what about task generalization? By focusing on dexterity and real-world robustness, we’re seeing strong positive transfer to other tough commercial tasks, such as laundry folding and, at a client’s request, cup-filling. DYNA-1 can autonomously fold many shirts of different sizes and materials in a row and also fill ingredient cups with utmost precision. Cup-filling is perhaps the hardest “no-reset” task we’ve encountered: delicate pickup, precise placement, handover, tool use—one slip ends the run. Though not perfect yet, DYNA-1 can clear every step while our internal baselines fail to move beyond the first step consistently.
3
1
24
4,443
Thanks @_akhaliq! LIV is now on arXiv: arxiv.org/abs/2306.00958 Check it out if you are interested in the space of (RL-based) vision-language pre-training for robotics! Happy to answer any questions about the paper :)
LIV: Language-Image Representations and Rewards for Robotic Control paper page: huggingface.co/papers/2306.0… Language-Image Value (LIV) is a unified pre-training, fine-tuning, and reward learning algorithm for language-conditioned visual manipulation. LIV can perform zero-shot multi-modal reward prediction on unseen robot videos and is an effective vision-language encoder for real-world robotic control.
7
24
17,134
Happy to announce that UVD is announced as the best paper at the CORL LEAP Workshop!
I am attending #CORL2023 and presenting two new papers at various workshops! Excited to make new friends and catch up! Please reach out if you are attending and would like to chat about anything robot learning :)
2
2
25
4,155
DrEureka is co-led by @willjhliang and me with collaborators from @Penn and @NVIDIA: @johnnywang_16, @sam_wang23, and our advisors @yukez @DrJimFan @obastani @dineshjayaraman Check out our project website for the paper and more videos: eureka-research.github.io/dr… Code: github.com/eureka-research/D…
2
2
24
3,309
Check out what I have been working on the past few months! SMODICE is a simple and versatile offline IL algorithm that is compatible with learning from observations, mismatched experts, and even just examples! Detailed tweet coming soon 📅
SMODICE: Versatile Offline Imitation Learning via State Occupancy Matching abs: arxiv.org/abs/2202.02433 project page: sites.google.com/view/smodic… A single algorithm for offline IL from observations, mismatched experts, and examples. Sota results in all settings. @JasonMa2020
4
22
Attending #CVPR2024 this week in Seattle! Looking forward to making new friends and catching up with everyone!
1
24
3,209
We have added example code for visualizing **animated** zero-shot VIP reward curve on robot videos! Try it out (on your own robot videos) here: github.com/facebookresearch/…
Excited to share VIP, a self-supervised visual reward and representation pre-trained on diverse human videos! VIP’s frozen reward and rep. can solve diverse unseen robot tasks using TrajOpt, online RL, and enables real-world few-shot offline RL! sites.google.com/view/vip-rl 🧵:
9
21
4,778
Check out our #L4DC paper on learning policy-aware dynamics model for reinforcement learning! The idea is very simple: focus model learning on the current policy’s visitation distribution. We theoretically show why this is desirable and extend the dual RL paradigm to MBRL, resulting in a simple, practical algorithm, TOM!
Excited to share our our #L4DC2023 paper that introduces "Transition Occupancy Matching"(TOM) TOM learns a dynamics model that keeps up with the improving policy, facilitating continued progress Paper 📰: arxiv.org/abs/2305.12663 Code 💻: github.com/kausiksivakumar/T… 🧵
1
21
1,992
We found that DYNA-1 can achieve zero-shot environment generalization for long-horizon dexterity. While we have seen foundation models that can generalize to new environments for simple pick-and-place skills by training on diverse environments and objects, such results remain elusive for bi-manual fine-grained dexterity. The video below shows DYNA-1 folding napkins at a customer site with no additional training, but it’s worth noting that we did observe noticeable performance loss (a topic we will be conducting more research on). With additional on-site training, DYNA-1 quickly improves and becomes adept at continuous folding at the customer site. This milestone represents a significant step towards our vision of delivering performance, out of the box.
4
1
21
4,356
I am attending #NeurIPS2022 next week to present several works on offline RL and pre-training for robotics! Would love to meet people and discuss anything, in particular, RL and robot learning topics! DM me if you are attending and want to chat or grab coffee 😀
1
20
We at @GRASPlab hosted @ericjang11 last week for a thought-provoking talk on Humanoid robots and 1X! The full recording is up now, check it out!
Nice talk on the technical approach to intelligent humanoids at 1X! grasp.upenn.edu/events/sprin… As usual for @ericjang11’s spicy takes, I agree strongly with 60%, ambivalent on 20%, and disagree with 20%. Highly recommend a watch! 💯
1
1
20
3,950
Come to our workshop on Goal-Conditioned RL tomorrow at #NeurIPS2023!
Check out the #NeurIPS2023 workshop on Goal Conditioned Reinforcement Learning: * Tomorrow (Friday) 900 -- 1830 CST * Great speakers: @jeffclune @r_mirsky @olexandr @ybisk @SusanMurphylab1 * Program: goal-conditioned-rl.github.i…
2
19
2,645
I won't be at #RSS2024, but @willjhliang will be presenting our DrEureka work! Go talk to him to get the latest scoop on using foundation models for dexterous robot skill learning!
I'll be presenting DrEureka at @RoboticsSciSys on Thursday (July 18), 8:30-9:30 am! Happy to chat at the poster session right afterwards (Commissiekamer 2) or any time during the conference! Looking forward to making new friends! #RSS2024 eureka-research.github.io/dr…
1
3
18
2,539
This was a very fun project and we learned a lot about how to use LLMs to enable robot skill learning! There are many challenges and potential future directions. For example, how to combine DrEureka with real-world execution feedback and using vision to provide feedback on reward and DR generation. And of course, not having to use a leash in the real world would be nice, but safety was important (no robot was hurt in the demos!). To cap off, here are some failures and blooper videos :)
2
19
3,531
Jim has been a fantastic mentor for me! Apply if you are interested in foundation models for decision making and AI Agents!
1
18
9,519
Reward learning is a fundamental challenge in RL. In VIP, we address this by pre-training a value function on action-free human videos, and the pre-trained VIP value function can zero-shot transfer to unseen robot tasks! Find out more about VIP at the Deep RL workshop tomorrow!
Deep RL workshop @NeurIPSConf starts on Friday! Hear invited talks from @tobigerstenberg on counterfactual simulation of causal judgments, @j_foerst on opponent-shaping in games, @IMordatch on sequence modeling, @yayitsamyzhang on learning generalist agents. Don't miss out! 🧵
1
16
sim2real for manipulation tasks is really hard! Great to see this work come out
Does your sim2real robot falter at critical moments 🤯? Want to help but unsure how, all you can do is reward tuning in sim 😮‍💨? Introduce 𝐓𝐑𝐀𝐍𝐒𝐈𝐂 for manipulation sim2real. Robots learned in sim can accomplish complex tasks in real, such as furniture assembly. 🤿🧵
1
2
16
5,416
The robotics industry has received much excitement lately. However, current technical challenges significantly hinder the widespread viability of robots. Most robotic foundation models today operate at just 10-30% of human-level speeds, impacting productivity on high-speed production lines. They also lack generalization capabilities, requiring extensive data collection even for minor environmental or task variations. Additionally, robot hardware remains expensive and insufficiently durable, limiting deployments to scenarios where ROI is easily justified... And these are just a few of the issues.
2
17
1,632
We are hiring, and my DMs are open! dyna.co/careers
1
15
1,270
I'd like to thank all my collaborators for making this a super fun and rewarding project: @JoeyHejna @ayzwah @ChuyuanFu @shahdhruv_ @jackyliang42 @drzhuoxu @SeanKirmani @sippeyxp @DannyDriess @xiao_ted @JonathanTompson @obastani @dineshjayaraman @Stacormed @tingnan1986 @DorsaSadigh @xf1280 . Many of them are currently at @corl_conf , make sure to talk to them about our paper! I am particularly grateful to @xf1280 for his mentorship and guidance throughout this project; I benefited a lot from his expertise and insights on frontier VLMs for robotics!
15
1,671
Papers covered: Generative Value Learning (GVL): arxiv.org/abs/2411.04549 Language-Image Value Learning (LIV): arxiv.org/abs/2306.00958 Value Implicit-Pretraining (VIP): arxiv.org/abs/2210.00030 Eurekaverse: arxiv.org/abs/2411.01775 DrEureka: arxiv.org/abs/2406.01967 Eureka: arxiv.org/abs/2310.12931 Universal Visual Decomposer (UVD): arxiv.org/abs/2310.08581 Goal-Contrastive Rewards (GCR): arxiv.org/abs/2410.19989
1
14
1,657
If our mission excites you, please reach out! We are looking for incredible talents in both AI and robotics! DM me or visit our site for open roles! dyna.co
1
14
1,056
Excited to have @ericjang11 visit @GRASPlab @PennEngineers!!
I'll be giving a talk at @GRASPlab on Wednesday, 3/27 on the robot learning we're doing at @1x_tech. If you're a researcher at Penn working on similar things I'd love to visit your labs and see what you're working on as well! Please DM grasp.upenn.edu/events/sprin…
15
3,699
Giving contributed talks on VIP at the Offline RL, Foundation Model for Decision Making, and Deep RL workshops at #NeurIPS2022. Come check out how we can pre-train a value function on passive human data and zero-shot transfer to robotics manipulation!
Replying to @Vikashplus
#VIP: Self-supervised pre-trained visual reward and representation for robotics 🎯 DeepRL, OfflineRL, SSL workshops 🔗sites.google.com/view/vip-rl @shagunsodhani @dineshjayaraman @obastani @Vikashplus @yayitsamyzhang @JasonMa2020 nitter.app/JasonMa2020/status/157… ⏯️⏯️
15
We highlight quadruped yoga ball walking. This task is particularly hard because (1) it is a novel task for which the LLM could not have seen human-generated reward functions or DR, and (2) simulation cannot model the deformable surface of an air-inflated ball, making good sim-to-real absolutely necessary. It was not clear at all when we started that this could work, so we are very excited about the result! Here is a comparison against other controllers and we clearly see that alternatives quickly fail and are quite unsafe to deploy.
3
15
3,774
UVD code is open-source now: github.com/zcczhang/UVD/ Get your long-horizon demonstrations segmented in few seconds! We support various SOTA pre-trained reps (VIP, R3M, LIV, VC-1, ...) as well as policy backbones (MLP, GPT).
Excited to share my first paper as an "advisor" :D We show that pre-trained visual representations enable a simple, fast, no-training subgoal decomposition method for long-horizon robotic manipulation! Paper: arxiv.org/abs/2310.08581 Website: zcczhang.github.io/UVD/ (🧵1/n)
3
14
3,487
Replying to @HaqueIshfaq
We are organizing a workshop on goal-conditioned RL! Please do consider submitting/participating if relevant :) more details to follow.
1
1
15
890
We believe mastery of one beats mediocrity in many. We’re laser-focused on achieving general-purpose capabilities by perfecting one task at a time. Today, we’re bringing cost-effective, easy-to-deploy robotics solutions to businesses of all sizes.
1
14
3,198
How do you know you have product-market fit? When your robot looks like crap but delivers so much value that customers happily pay for it.
1
14
1,075
This is my internship project at @NVIDIAAI! I had a blast working on it and learned a lot from the experience. I am really grateful to my mentors @AnimaAnandkumar @DrJimFan @yukez for their guidance on the project!
2
13
1,039
Congrats Tony!! Very impressive :)
1
11
7,478
The answer is yes, and the method is simple yet intriguing! We propose formulating value learning as an autoregressive prediction task over *shuffled* sequence of the input video. Why? Think about a standard video showing a task unfolding in chronological order. We empirically find that this actually makes it harder for the VLM to estimate progress because it might just latch onto the order of the frames instead of the underlying changes that signify actual progress towards completing the task. By shuffling, we force the VLM to work “harder” to figure out the correct order based on the visual cues of task progress, and doing so significantly improves the faithfulness of the value predictions! In a way, GVL poses value predictions as an ‘’temporal unshuffling’’ puzzle to the VLM; it has all the pieces, but it has to figure out how those pieces fit together in a way that makes sense based on progress towards a goal.
2
14
2,963
After conversations with hundreds of potential customers, we discovered that people don’t care about fancy designs—they just want robots to perform reliably and deliver clear ROI. Our approach? Start simple and excel.
1
14
1,230
Nice work on how to go beyond BC by learning to search using world and reward models! The nice thing is that you don't need to collect any additional data.
Say ahoy to 𝚂𝙰𝙸𝙻𝙾𝚁⛵: a new paradigm of *learning to search* from demonstrations, enabling test-time reasoning about how to recover from mistakes w/o any additional human feedback! 𝚂𝙰𝙸𝙻𝙾𝚁 ⛵ out-performs Diffusion Policies trained via behavioral cloning on 5-10x data!
2
13
2,845
Another video of our VLA's robust reasoning capabilities against real-time disturbances:
1
13
937
We are organizing a workshop on task specification at #RSS2024! Consider submitting your latest work to our workshop and attending!
Submit to our #RSS2024 workshop on “Robotic Tasks and How to Specify Them? Task Specification for General-Purpose Intelligent Robots” by June 12th. Join our discussion on what constitutes various task specifications for robots, in what scenarios they are most effective and more!
2
13
3,286