Pete Florence · Apr 7, 2026 · 2:51 PM UTC

Pete Florence

Pinned Tweet

Pete Florence

@peteflorence

Apr 7

x.com/i/article/204137153848…

Going Beyond World Models & VLAs

In GEN-1, approximately 99% of the parameters are trained from scratch. Previously, this might be considered wild. For Generalist, it’s a deliberate choice. It follows our strong conviction — pursued

162

1,077

331,714

Pete Florence · Oct 13, 2022 · 1:55 AM UTC

Pete Florence

@peteflorence

13 Oct 2022

"Interactive Language: Talking to Robots in Real Time" interactive-language.github.… - Real-time, interactive, open-vocabulary, language+pixels -> actions - A new scale (~600,000 traj.) for language-conditioned behavior - Dataset, sim, models, code all to be released! (1/n)...

176

791

Pete Florence · Jan 17, 2019 · 3:14 PM UTC

Pete Florence

@peteflorence

17 Jan 2019

Excited to share some work with colleagues last summer at Facebook Reality Labs that is now up on arXiv! “DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation” Here’s a snippet of some fun interpolations in the shape latent space.

151

636

Pete Florence · Oct 28, 2021 · 7:29 PM UTC

Pete Florence

@peteflorence

28 Oct 2021

It may be time to settle this. Poll in next tweet.

448

Pete Florence · Jun 17, 2025 · 5:06 PM UTC

Pete Florence

@peteflorence

17 Jun 2025

Last Spring I took off from Google DeepMind, and I've been heads-down building since with an amazing team. Excited to share more today -- introducing Generalist. It's felt to me for a couple years, since we started bringing multimodal LLMs into robotics, that a subset of the ingredients for creating truly general purpose robot intelligence seem to be falling into place. But what's been needed is a new focus at the intersection of data, models, and hardware. No amount of downloading data from the internet, by itself, will create the level of fast, fluid, precise, reactive layer of intelligence in being able to interact with the physical world. In due time we'll be excited to share more, but what we're sharing today is about what the models have grown to be capable of. We think we've hit a new point on the frontier of general purpose real world intelligence – new levels of simultaneously fast, smooth, precise, reactive, bi-manual coordinated dexterity. Looking forward to sharing even more. Super proud of the team we've put together, and where we're headed. Reach out if you'd like to chat about working together!

Generalist

@GeneralistAI

17 Jun 2025

Today we're excited to share a glimpse of what we're building at Generalist. As a first step towards our mission of making general-purpose robots a reality, we're pushing the frontiers of what end-to-end AI models can achieve in the real world. Here's a preview of our early results in autonomous general-purpose dexterous capabilities – fast, reactive, smooth, precise, bi-manual coordinated sensorimotor control.

414

36,783

Pete Florence · Mar 10, 2023 · 6:15 PM UTC

Pete Florence

@peteflorence

10 Mar 2023

Today we share more on PaLM-E! (palm-e.github.io) Thread 🧵with blog post link at the end. PaLM-E can do a lot of things across robotics, vision, and language… but let’s look at a few capabilities in detail, step by step 😉 👇

PaLM-E: An Embodied Multimodal Language Model

Project page for PaLM-E: An Embodied Multimodal Language Model.

palm-e.github.io

Danny Driess

@DannyDriess

7 Mar 2023

What happens when we train the largest vision-language model and add in robot experiences? The result is PaLM-E 🌴🤖, a 562-billion parameter, general-purpose, embodied visual-language generalist - across robotics, vision, and language. Website: palm-e.github.io

245

102,846

Pete Florence · Jun 28, 2020 · 11:23 PM UTC

Pete Florence

@peteflorence

28 Jun 2020

From Scott Kuindersma's @BostonDynamics talk on Friday -- Atlas jumping between boxes now with computer vision in the loop. From robotics today seminar -- see @RoboticsSeminar or roboticstoday.github.io/ for more.

242

Pete Florence · Nov 8, 2021 · 2:29 AM UTC

Pete Florence

@peteflorence

8 Nov 2021

Excited to share more about our "Implicit Behavioral Cloning" work! ✅*code* just released: github.com/google-research/i… ✅*videos*: implicitbc.github.io/ Will be sharing more this week at #CoRL2021. I'll also maybe write a TL;DR thread soon, meanwhile, check out the website!

240

Pete Florence · Apr 7, 2022 · 3:51 PM UTC

Pete Florence

@peteflorence

7 Apr 2022

You may have seen this week some pretty powerful large "foundational" models. (i.e., PaLM, DALLE-2). With "Socratic Models" we look into combining such models... composing them zero-shot to do various new tasks, including across modalities. A couple more thoughts below 🧵

Andy Zeng

@andyzengineer

7 Apr 2022

With multiple foundation models “talking to each other”, we can combine commonsense across domains, to do multimodal tasks like zero-shot video Q&A or image captioning, no finetuning needed. Socratic Models: website + code: socraticmodels.github.io paper: arxiv.org/abs/2204.00598

236

Pete Florence · Aug 2, 2023 · 11:07 PM UTC

Pete Florence

@peteflorence

2 Aug 2023

A comparison of the largest model sizes used for real-robot control:

164

37,416

Pete Florence · Sep 15, 2020 · 1:36 AM UTC

Pete Florence

@peteflorence

15 Sep 2020

Can robots model the world with keypoints, and learn how to see, predict, and control them into the future? "Keypoints into the Future: Self-Supervised Correspondence in Model-Based Reinforcement Learning" @lucas_manuelli, @YunzhuLiYZ, me, @rtedrake arxiv.org/pdf/2009.05085.pdf (1/n)

137

Pete Florence · Mar 10, 2022 · 1:52 AM UTC

Pete Florence

@peteflorence

10 Mar 2022

TL:DR: “How can NeRF be useful for robotics?” One option: train precise correspondence models, made possible by generating training data from NeRF’s beautiful geometry. @yen_chen_lin did an amazing job leading this project.

Yen-Chen Lin @yen_chen_lin

4 Mar 2022

Hi everyone, I'm happy to share our new #ICRA2022 paper on 𝐦𝐚𝐤𝐢𝐧𝐠 𝐍𝐞𝐑𝐅 𝐮𝐬𝐞𝐟𝐮𝐥 𝐟𝐨𝐫 𝐫𝐨𝐛𝐨𝐭𝐬! NeRF-Supervision is a method that learns dense visual descriptors from NeRF for category-level robotic pick and place. yenchenlin.me/nerf-supervisi…

133

Pete Florence · Mar 11, 2022 · 9:36 PM UTC

Pete Florence

@peteflorence

11 Mar 2022

Very nice real-time reactive robot manipulation demo from @MarcToussaint17's group.

Marc Toussaint @Marc__Toussaint

11 Mar 2022

Finally a step from Logic-Geometric Programming to a reactive robotic manipulation framework: "Sequence-of-Constraints MPC: Reactive Timing-Optimal Control of Sequential Manipulation" Paper & Videos: user.tu-berlin.de/mtoussai/2… Thanks to all collaborators! @DannyDriess

Pete Florence · Oct 26, 2021 · 4:45 PM UTC

Pete Florence

@peteflorence

26 Oct 2021

New xArm robot (the "Lite 6"), and they're selling some for $1,199. kickstarter.com/projects/ufa… I've really enjoyed using the bigger xArm 6 for robot research. They're simple but pretty high quality for the price point. Exciting to see prices jump even lower.

Pete Florence · Mar 8, 2023 · 3:38 AM UTC

Pete Florence

@peteflorence

8 Mar 2023

Very nice! Was hoping somebody would get Diffusion working really well for real-world robot policy learning. Comprehensive display of results (see website), nice visualizations and tasks. 👏 @chichengcc and @SongShuran's lab together with Siyuan and Eric (TRI) and Yilun (MIT) !

@_akhaliq

8 Mar 2023

Diffusion Policy: Visuomotor Policy Learning via Action Diffusion abs: arxiv.org/abs/2303.04137 project page: diffusion-policy.cs.columbia…

16,521

Pete Florence · Nov 23, 2023 · 8:35 PM UTC

Pete Florence

@peteflorence

23 Nov 2023

The amount of people trying to claim they already did Q* without even knowing why Q* is, this is hilarious

10,451

Pete Florence · Jan 5, 2024 · 6:05 PM UTC

Pete Florence

@peteflorence

5 Jan 2024

One way of thinking about these results — this is the widest diversity of complex tasks I’ve *ever* seen performed by *any* robot. Finally, something actually exceeds the ~2010 PR1 videos :) Also to clarify the video below is teleop, but they have autonomous results for a smaller set but still impressive mix of tasks. Amazing work @zipengfu @tonyzzhao @chelseabfinn

Zipeng Fu

@zipengfu

4 Jan 2024

Mobile ALOHA's hardware is very capable. We brought it home yesterday and tried more tasks! It can: - do laundry👔👖 - self-charge⚡️ - use a vacuum - water plants🌳 - load and unload a dishwasher - use a coffee machine☕️ - obtain drinks from the fridge and open a beer🍺 - open doors🚪 - play with pets🐱 - throw away trash - turn on/off a lamp💡 Project website: mobile-aloha.github.io/ Co-lead @tonyzzhao, advised by @chelseabfinn (amazing photographing from @qingqing_zhao_ )

20,120

Pete Florence · Jul 13, 2022 · 9:49 PM UTC

Pete Florence

@peteflorence

13 Jul 2022

Excited to have this paper come out, it studies a lot of ideas under one roof! Melts together ideas/models from: "LMs as Zero-Shot Planners", SayCan, Socratic Models, PaLM, Chain-of-thought.

Karol Hausman

@hausman_k

13 Jul 2022

Have you ever “heard” yourself talk in your head? Turns out it's a useful tool for robots too! Introducing Inner Monologue: feeding continual textual feedback into LLMs allows robots to articulate a grounded “thought process” to execute long, abstract instructions 🧵👇

Pete Florence · Jun 29, 2020 · 12:00 AM UTC

Pete Florence

@peteflorence

29 Jun 2020

Another recent new video showing more computer vision in the loop for @BostonDynamics robots, this one extracted from a short talk by Marc Raibert here -- piped.video/watch?v=C8-w9eF2… @VentureBeat In this one, Spot picking up clothes.

Pete Florence · Feb 8, 2024 · 12:33 AM UTC

Pete Florence

@peteflorence

8 Feb 2024

🤙 Hardware improvements make life much better in unexpected ways. Example: I used to think there was a software bug causing gripper latency… nope, just static friction in the original grippers!! + new sim 💯 👏 team for @tonyzzhao’s Google DeepMind internship!

Tony Zhao

@tonyzzhao

7 Feb 2024

Led by @GoogleDeepMind, we present ALOHA 2 🤙: An Enhanced Low-Cost Hardware for Bimanual Teleoperation. ALOHA 2 🤙 significantly improves the durability of the original ALOHA 🏖️, enabling fleet-scale data collection on more complex tasks. As usual, everything is open-sourced!

5,957

Pete Florence · May 9, 2020 · 11:18 PM UTC

Pete Florence

@peteflorence

9 May 2020

Replying to @ericjang11

Love this. The key trend it turns out is that the year is monotonically increasing.

Pete Florence · Sep 17, 2019 · 1:29 AM UTC

Pete Florence

@peteflorence

17 Sep 2019

Having robots learn dexterous tasks requiring real-time hand-eye coordination is hard. Can learning visual correspondence make it easier? New paper: “Self-Supervised Correspondence in Visuomotor Policy Learning” Pdf: arxiv.org/abs/1909.06933 Video: piped.video/watch?v=nDRBKb4A…

Pete Florence · Feb 3, 2022 · 7:34 PM UTC

Pete Florence

@peteflorence

3 Feb 2022

New 🤖 paper led by the awesome @WiYoungsun! arxiv.org/pdf/2202.00868.pdf The paper is essentially "using the Force* to deform neural fields" (In this case, DeepSDF-style representations.) A cool thing here is that robots can have tactile (e.g., force-torque) sensing...

Pete Florence · Jun 19, 2022 · 7:57 PM UTC

Pete Florence

@peteflorence

19 Jun 2022

Happening tomorrow — join us online or in New Orleans!

Andy Zeng

@andyzengineer

16 Jun 2022

Join us next week at the CVPR Tutorial on Vision-Based Robot Learning! We’ll distribute Colabs that show you how to run Socratic Models for language-driven robot pick & place right in your browser (in person, or online!) sites.google.com/view/cvpr20…

Pete Florence · Mar 10, 2022 · 12:51 AM UTC

Pete Florence

@peteflorence

10 Mar 2022

First Question: “Which is the best action space for learning?” 🤔... Second Question: “Can we just *not* choose any one specific action space, and let the model figure it out?” 🙋‍♂️🎉 One step closer to action spaces that *just work* :)

Andy Zeng

@andyzengineer

8 Mar 2022

For end-to-end robot learning: pixels to joint angles? or to cartesian poses? IKP uses Implicit BC + (differentiable) kinematics to learn inductive patterns in both action spaces. arxiv.org/abs/2203.01983 w/ @AdityaGanapathi @peteflorence Jake Varley @kaylburns @Ken_Goldberg

Pete Florence · Apr 5, 2022 · 1:43 AM UTC

Pete Florence

@peteflorence

5 Apr 2022

You may have noticed, even earlier today, that Large Language Models are getting better. Now, this work from colleagues on our team at Google, shows how to use Large Language Models to make robots work better at planning in the real world. LLMs —> 🤖👍🏻

Karol Hausman

@hausman_k

5 Apr 2022

Super excited to introduce SayCan (say-can.github.io): 1st publication of a large effort we've been working on for 1+ years Robots ground large language models in reality by acting as their eyes and hands while LLMs help robots execute long, abstract language instructions

Pete Florence · Jan 31, 2024 · 5:23 PM UTC

Pete Florence

@peteflorence

31 Jan 2024

In your head, when you *plan* into the future - how much planning do you do in "language" in your head? - if not using language, do you visualize? If you visualize, is it photoreal from an ego view, or something else? - is there a 3rd, not language or visualizing, way you plan?

11,092

Pete Florence · Jan 9, 2024 · 5:18 AM UTC

Pete Florence

@peteflorence

9 Jan 2024

Recent talk at MIT by Toyota Research Institute on their work scaling diffusion models and dexterous data collection piped.video/live/fwBbj6UmK-I… @Ben_Burchfiel, Siyuan, @eacousineau, @naveenoid, Russ, and co.

3,345

Pete Florence · Mar 10, 2023 · 6:15 PM UTC

Pete Florence

@peteflorence

10 Mar 2023

Here’s another multimodal reasoning question addressed with chain-of-thought, this time doing visual math questions, no OCR required despite needing spatial-textual context, just does everything all in one model. This prompt by @xf1280!

2,834

Pete Florence · Jan 18, 2022 · 4:36 PM UTC

Pete Florence

@peteflorence

18 Jan 2022

🎙We podcasted! Thanks for putting it together @kevin_zakka and it was great as always chatting with @ericjang11. I think we covered a bunch of topics. For example I didn’t expect to learn what stigmergy is (thanks @ericjang11!).

Kevin Zakka @kevin_zakka

17 Jan 2022

Super stoked to release the very first episode of my Casual Robotics podcast: "Progress Towards General Purpose Robots" ft. the brilliant @ericjang11 and @peteflorence. casualrobotics.ai/

Pete Florence · Mar 10, 2023 · 6:15 PM UTC

Pete Florence

@peteflorence

10 Mar 2023

For one, “Let’s think step by step” comes to multimodal models! Zero-shot chain-of-thought has been one of these emergent behaviors that has caught considerable interest in researching LLM capabilities… With PaLM-E-562B, zero-shot visual chain-of-thought comes “included”.

3,436

Pete Florence · Mar 31, 2023 · 4:12 PM UTC

Pete Florence

@peteflorence

31 Mar 2023

Why RoboPianist? - Full bi-manual anthropomorphic hands, contact-rich manipulation. - CV/NLP have had ambitious high-quality quantitative benchmarks, this helps add more in robotics. - Tons of expansion opportunity: learning from humans / multimodal input / generative music..

Kevin Zakka @kevin_zakka

30 Mar 2023

Introducing 𝗥𝗼𝗯𝗼𝗣𝗶𝗮𝗻𝗶𝘀𝘁 🎹🤖, a new benchmark for high-dimensional robot control! Solving it requires mastering the piano with two anthropomorphic hands. This has been one year in the making, and I couldn’t be happier to release it today! Some highlights below:

5,952

Pete Florence · Oct 13, 2022 · 2:07 AM UTC

Pete Florence

@peteflorence

13 Oct 2022

Much more in the paper and in the narrated video. Have to run to dinner but might add more comments later. - paper: arxiv.org/abs/2210.06407 - video: piped.video/5vfVhNMf3GQ Especially looking forward to sharing the data, models, etc., with the community soon!

Interactive Language: Talking to Robots in Real Time

We present a framework for building interactive, real-time, natural language-instructable robots in the real world, and we open source related assets (dataset, environment, benchmark, and...

arxiv.org

Pete Florence · Feb 3, 2024 · 1:37 AM UTC

Pete Florence

@peteflorence

3 Feb 2024

One of the bigger bimanual teleop demos I’ve seen 🤣

HOW THINGS WORK

@HowThingsWork_

2 Feb 2024

Excavator team work 👷‍♂️🤝🏼👷‍♂️

3,881

Pete Florence · Mar 13, 2024 · 3:59 PM UTC

Pete Florence

@peteflorence

13 Mar 2024

Replying to @coreylynch

@coreylynch 👏 very nice bimanual policies!

2,928

Pete Florence · Jul 27, 2023 · 3:49 AM UTC

Pete Florence

@peteflorence

27 Jul 2023

One of the largest areas for impact of *generalist mulitmodal models* may be in the medical domain. 🩺🩻👩‍⚕️(radiology, dermatology, genomics…) With this new step, Med-PaLM becomes multimodal — a generalist biomedical AI. And it’s finetuned from PaLM-E! 🌴-🤖—> 🥼

Vivek Natarajan

@vivnat

27 Jul 2023

Medicine is inherently multimodal. Thrilled to share Med-PaLM M, the first demonstration of a generalist multimodal biomedical AI system with a stellar team @GoogleAI @GoogleDeepMind @GoogleHealth Paper: arxiv.org/pdf/2307.14334.pdf

4,675

Pete Florence · Jul 25, 2023 · 8:48 PM UTC

Pete Florence

@peteflorence

25 Jul 2023

In Honolulu to present PaLM-E! 2 time slots: - today (Tuesday) 2:00-3:30 pm poster session, Exhibit Hall 1, #237 - tomorrow (Wednesday) 10:30 am - 12:00 noon, Google DeepMind booth Several authors here and looking forward to chatting with folks!

3,147

Pete Florence · Jan 17, 2019 · 4:38 PM UTC

Pete Florence

@peteflorence

17 Jan 2019

Full video for DeepSDF is here for reference: piped.video/LILRJzMQw5o

DeepSDF: Learning Continuous Signed Distance Functions for Shape...

DeepSDF: Learning Continuous Signed Distance Functions for Shape Re...

youtube.com

Pete Florence · Jun 29, 2020 · 3:07 AM UTC

Pete Florence

@peteflorence

29 Jun 2020

One more -- onboard view & faster.

Pete Florence · Nov 2, 2021 · 1:24 AM UTC

Pete Florence

@peteflorence

2 Nov 2021

I'm looking forward to sharing more on our Implicit BC work, and we should have our own implementations out soon. Meanwhile though, Kevin did a very nice PyTorch implementation here of one of the results!

Kevin Zakka @kevin_zakka

27 Oct 2021

Always fun to use homework as an excuse to implement friends/collaborators' newest work :) Learned a lot taking a stab at @peteflorence and @andyzengtweets's newest Implicit Behavior Cloning in @PyTorch. github.com/kevinzakka/ibc

Pete Florence · Jul 28, 2023 · 4:45 PM UTC

Pete Florence

@peteflorence

28 Jul 2023

In the language of Moravec’s paradox: Training on billions of the easy problems seems to be making some of the hard problems more tractable.

3,914

Pete Florence · Nov 19, 2021 · 11:45 PM UTC

Pete Florence

@peteflorence

19 Nov 2021

Check out our new blog post! We talk a bit more about our research process and the questions in our recent Implicit BC work (implicitbc.github.io/).

Google AI

@GoogleAI

19 Nov 2021

It can be challenging for robots to imitate precise and decisive behaviors. Introducing Implicit Behavioral Cloning, a simple method that scales to difficult real-world tasks and achieves state-of-the-art performance on human-expert offline RL benchmarks→ goo.gle/3FurkP6

Pete Florence · Apr 7, 2020 · 3:52 PM UTC

Pete Florence

@peteflorence

7 Apr 2020

If you or anybody you know has time at home and has ever wanted to learn 3D CAD, here's a step-by-step tutorial with GIFs at every step. From a 16 y.o. we taught last summer: "I learned more in a couple hours than I did in a year poking at CAD". stageoneeducation.com/cad-tu…

dailySTEM Chris Woods

@dailystem

6 Apr 2020

Hey teachers...challenge your kids to learn #CAD with the great GIF tutorials at stageoneeducation.com/ #RemoteLearning #PBL #CTE #Rockets

Pete Florence · Jan 5, 2024 · 8:58 PM UTC

Pete Florence

@peteflorence

5 Jan 2024

Sonic The Hedgehog robot! A solution to the classic “legs vs wheels” debate? Six legs *and* a single omnidirectional wheel. Also a different take on legs+wheels than “wheels at the bottom of legs”, for example Boston Dynamics’ ~2017 Handle bot, Ascento , etc. Also relevant see “Ballbot” (piped.video/8BtDuzu2WeI?si=9a0T…) and other ball-balancing robots, including BB-8, but add legs.

T.Yamazaki @ZappyZappy7

5 Jan 2024

ボール状に変形する六脚ロボット、あらゆる方向に転がることもできる。 piped.video/yn3FWb-vQQ4 #DIY #handmade #robot #robotics #Biomimicry #生物ロボット #生物模倣 #バイオミミクリー #Armadillo #hexapod #MorpHex #ZentaRobotics

2,579

Pete Florence · Dec 20, 2022 · 9:23 PM UTC

Pete Florence

@peteflorence

20 Dec 2022

In addition to the paper, I wanted to highlight that @simonlc_ made a beautifully distilled, narrated, and animated explainer video, intro-ing key topics in simulating contact, which is a pillar of robotics. See Simon's tweet for full YouTube link. Some snippets:

Simon LC @simonlc_

20 Dec 2022

I'm excited to present Single-Level Differentiable Contact Simulation. It's a novel formulation that unifies contact dynamics and collision detection in a single optimization problem. paper: arxiv.org/abs/2212.06764 code: github.com/simon-lc/Silico.j… video: piped.video/oaGLTR13iF8

2,718

Pete Florence · Oct 8, 2019 · 4:44 PM UTC

Pete Florence

@peteflorence

8 Oct 2019

Excited to be starting this week as a Research Scientist @GoogleAI working with many talented folks on the Brain Robotics team! In SF Bay Area — looking forward to spending time with old friends and new ones too.

Pete Florence · Feb 5, 2024 · 8:17 PM UTC

Pete Florence

@peteflorence

5 Feb 2024

Always such a joy seeing new Atlas videos :) Looks like a hard task, especially with the contact mode switching into/from sliding, and the constraints involved, on both grabbing and stowing. Also looks like a heavy widget. Interesting new fingers too!

Boston Dynamics

@BostonDynamics

5 Feb 2024

Can't trip Atlas up! Our humanoid robot gets ready for real work combining strength, perception, and mobility.

2,503

Pete Florence · Apr 7, 2022 · 4:22 PM UTC

Pete Florence

@peteflorence

7 Apr 2022

And one more callout: Low-level physical control skills ("manipulation") that are 1. highly capable and 2. broadly general, remains very challenging. This is not much addressed by these works from our team this week. Tons more work to do there. Moravec's Paradox continues.

Pete Florence · Jan 8, 2024 · 9:32 PM UTC

Pete Florence

@peteflorence

8 Jan 2024

Replying to @Stone_Tao

The hardest solved problem in robotics is camera calibration. The hardest unsolved problem in robotics research is communication to the broader public about what is/isn’t hard.

1,069

Pete Florence · Oct 13, 2022 · 2:01 AM UTC

Pete Florence

@peteflorence

13 Oct 2022

Excited to have this come out. A large effort with a lot of folks behind this. Note these videos (previous one and this one below) are "1x speed" (real time)! Here are rollouts for one of the ~87,000 strings the robot can do, "push the yellow star between the green blocks"

Pete Florence · Jan 6, 2024 · 8:19 PM UTC

Pete Florence

@peteflorence

6 Jan 2024

Now that I have a kiddo of my own, when it’s my own birthday, my main thought is, *wow* thank you mom and dad!

2,411

Pete Florence · Mar 10, 2023 · 6:15 PM UTC

Pete Florence

@peteflorence

10 Mar 2023

Here’s a many-step zero-shot CoT example (prompt by @ayzwah!). Note large VQA training datasets (VQAv2, OKVQA, etc.) typically only have 1-, 2-, 3-word answers, so these many-step answers are considerably out-of-distribution.

1,793

Pete Florence · Jan 17, 2024 · 5:00 AM UTC

Pete Florence

@peteflorence

17 Jan 2024

If you work on robot learning -- What do you think would have higher impact on robotics over a 1-year horizon: 100 million diverse high-quality dexterous demonstrations, or the entirety of the rest of robotics research for the year? Context in replies

57% 100 M demonstrations

43% Rest of robot research

308 votes • Final results

7,294

Pete Florence · Jul 28, 2023 · 5:01 PM UTC

Pete Florence

@peteflorence

28 Jul 2023

Huy Ha from @SongShuran ‘s lab is brand new to Twitter/X today, currently at 3 followers. Give him a follow? Amazing work by him on this project. Addresses scalability and LLMs and diffusion policies. And check out that website! Also, code is all available :)

Huy Ha @haqhuy

28 Jul 2023

How can we put robotics on the same scaling trend as large language models while not compromising on rich low-level manipulation and control?

3,591

Pete Florence · Mar 7, 2023 · 12:47 AM UTC

Pete Florence

@peteflorence

7 Mar 2023

🌴🤖: 🦾👀✍️

Danny Driess

@DannyDriess

7 Mar 2023

2,290

Pete Florence · Feb 16, 2021 · 7:26 PM UTC

Pete Florence

@peteflorence

16 Feb 2021

New blog post is out on Transporter Nets! @andyzengtweets made us a new Blendered explainer visual, and I love it. Code is open source now too and major kudos to @ayzwah for help with the code release. github.com/google-research/r…

GitHub - google-research/ravens: Train robotic agents to learn pick and place with deep learning...

Train robotic agents to learn pick and place with deep learning for vision-based manipulation in PyBullet. Transporter Nets, CoRL 2020. - google-research/ravens

github.com

Google AI

@GoogleAI

16 Feb 2021

Can models more efficiently learn rearrangement tasks by overlaying 3D space instead of using object-centric representations? Check out Transporter Nets, an open-source framework for sample-efficient robot manipulation, with related benchmark tasks. See ↓ goo.gle/37k9KOW

Pete Florence · Mar 17, 2020 · 1:15 AM UTC

Pete Florence

@peteflorence

17 Mar 2020

Do you want to help? Open-source project for low-cost, Arduino-based, partially-3D-printed ventilator: github.com/jcl5m1/ventilator This is to address the potential case of COVID-19 hospitalizations depleting all FDA approved ventilators. Started by Johnny Lee. Plenty help needed.

Pete Florence · Jan 18, 2019 · 9:27 PM UTC

Pete Florence

@peteflorence

18 Jan 2019

Also here is link for paper, didn’t have it in last tweet! arxiv.org/pdf/1901.05103v1.p… DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation. Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, Steven Lovegrove.

Pete Florence · Apr 7, 2022 · 4:00 PM UTC

Pete Florence

@peteflorence

7 Apr 2022

It is pretty remarkable to me how quickly just some creative programming can combine models in this style. Our open-source example for image captioning is about 40 lines of non-boiler-plate code. (colab.research.google.com/dr…)

SocraticModels-ImageCaptioning.ipynb

Colaboratory notebook

colab.research.google.com

Pete Florence · Feb 23, 2024 · 3:44 PM UTC

Pete Florence

@peteflorence

23 Feb 2024

Congrats @DrJimFan!! And @yukez !!

Jim Fan

@DrJimFan

23 Feb 2024

Career update: I am co-founding a new research group called "GEAR" at NVIDIA, with my long-time friend and collaborator Prof. @yukez. GEAR stands for Generalist Embodied Agent Research. We believe in a future where every machine that moves will be autonomous, and robots and simulated agents will be as ubiquitous as iPhones. We are building the Foundation Agent — a generally capable AI that learns to act skillfully in many worlds, virtual and real. 2024 is the Year of Robotics, the Year of Gaming AI, and the Year of Simulation. We are setting out on a moon-landing mission, and getting there will spin off mountains of learnings and breakthroughs. Join us on the journey: research.nvidia.com/labs/gea…

6,176

Pete Florence · Apr 16, 2022 · 7:44 PM UTC

Pete Florence

@peteflorence

16 Apr 2022

Blog post by Vincent Vanhoucke which weaves together: - New Yorker cartoons by @BobMankoff - new work in AI from last week including from our team, and - “language as the connective tissue of AI”. vanhoucke.medium.com/bob-man…

Pete Florence · Oct 28, 2021 · 8:58 PM UTC

Pete Florence

@peteflorence

28 Oct 2021

Replying to @snikolov

Nice, this is a great point. Should we call B "mulimodality models" then? I think I like that. Here's a reference of folks calling B "multimodal", the 4th MULA workshop: mula-workshop.github.io/ Maybe they could call all that multimodality learning

Pete Florence · Jan 23, 2024 · 5:16 PM UTC

Pete Florence

@peteflorence

23 Jan 2024

Introduction by way of a massively oversimplified Haiku -- VLM problem: Suck at 3D reasoning Generate data :) Actually getting this done, at scale, comes with a very creative pipeline, and lots of analysis. Awesome work lead by @BoyuanChen0 and amazing hosting by @xf1280 !

Boyuan Chen

@BoyuanChen0

23 Jan 2024

Introducing Spatial VLM, a Vision-Language Model with 3D Spatial Reasoning Capabilities by @GoogleDeepmind. We investigate to what extent synthetic data can help VLMs learn - 3D relationship - quantitative distance - CoT spatial reasoning - RL reward spatial-vlm.github.io (1/6)

3,257

Pete Florence · Oct 13, 2022 · 2:05 AM UTC

Pete Florence

@peteflorence

13 Oct 2022

One capability we study is *interactive language guidance* in which the robot can be iteratively guided by a human to accomplish long-horizon complex tasks requiring multiple minutes of coordinated actions. (These videos, are long, sped up to 4x)

Pete Florence · Mar 10, 2023 · 6:15 PM UTC

Pete Florence

@peteflorence

10 Mar 2023

Multimodal chain-of-thought can be very helpful to get a sense of what the model is picking up on. While the question here is only a 1-bit (yes/no) answer, the chain-of-thought provides much more than 1 bit of information on what the model sees.

1,297

Pete Florence · Oct 28, 2020 · 8:47 PM UTC

Pete Florence

@peteflorence

28 Oct 2020

If you know anybody who’d be interested, encourage them to apply! I signed up to work with students interested in robotics, computer vision, ML. What is is: mentorship in the intangibles of navigating the research world, and intended for students from under-represented groups.

Google AI

@GoogleAI

27 Oct 2020

Applications are open for our CS Research Mentorship Program — CSRMP. Students from underrepresented groups in computing are paired with @Google mentors to support their pursuit of research pathways. Learn more & apply by Nov 18 ➡️ research.google/outreach/csr…

Pete Florence · Mar 10, 2023 · 6:15 PM UTC

Pete Florence

@peteflorence

10 Mar 2023

Moving on from chain-of-thought, another capability of PaLM-E that “just comes included” is the ability to do multi-image reasoning… despite only ever being trained on single-image examples.

2,741

Pete Florence · Apr 7, 2022 · 4:17 PM UTC

Pete Florence

@peteflorence

7 Apr 2022

I also want to call out another effort, "SayCan", released this week from colleagues on our team. Clearly, trying some "multi-foundation-model" (Socratic) approach + "use LLM for robot planning" (SayCan) will be on the docket for things to try next :)

Karol Hausman

@hausman_k

5 Apr 2022

Pete Florence · Mar 10, 2023 · 6:15 PM UTC

Pete Florence

@peteflorence

10 Mar 2023

Extending multi-image further, we can do more than just 2 images... For this, let’s look at a capability we showed last year with Socratic Models (socraticmodels.github.io/, led by @andyzengtweets), where we could do long-form egocentric video understanding, some examples here:

1,002

Pete Florence · Oct 28, 2021 · 7:35 PM UTC

Pete Florence

@peteflorence

28 Oct 2021

The most practical resolution I can think of is for A to become "non-unimodal". But unfortunately that's kind of a mouthful.

Pete Florence · Jan 14, 2022 · 6:01 PM UTC

Pete Florence

@peteflorence

14 Jan 2022

So… it trains about as fast as you can say “Neural Radiance Field” 🤯

@_akhaliq

14 Jan 2022

Instant Neural Graphics Primitives with a Multiresolution Hash Encoding paper: nvlabs.github.io/instant-ngp… project page: nvlabs.github.io/instant-ngp… github: github.com/NVlabs/instant-ng…

Pete Florence · Jan 7, 2024 · 7:15 PM UTC

Pete Florence

@peteflorence

7 Jan 2024

Congrats Figure team, nice to see the hands and manipulation learning! 👏@coreylynch oscar @adcock_brett and co!

Brett Adcock

@adcock_brett

7 Jan 2024

Figure-01 has learned to make coffee ☕️ Our AI learned this after watching humans make coffee This is end-to-end AI: our neural networks are taking video in, trajectories out Join us to train our robot fleet: figure.ai/careers

3,111

Pete Florence · Mar 10, 2023 · 6:15 PM UTC

Pete Florence

@peteflorence

10 Mar 2023

But of course, avoiding forgetting is a low bar :) I’ve been glad to see that folks are picking up on the **transfer** story of PaLM-E – for example see r/MachineLearning: teddit.net/r/MachineLearning…

1,170

Pete Florence · Oct 21, 2022 · 12:52 AM UTC

Pete Florence

@peteflorence

21 Oct 2022

Very nice @kanjun ! Love this: rich interactable 3D worlds, 10k steps/sec, single-GPU trainable, 215 hours of human data… all open source with great docs (github.com/Avalon-Benchmark/…), excited to see where this heads!

GitHub - Avalon-Benchmark/avalon: A 3D video game environment and benchmark designed from scratch...

A 3D video game environment and benchmark designed from scratch for reinforcement learning research - Avalon-Benchmark/avalon

github.com

Kanjun 🐙

@kanjun

20 Oct 2022

Replying to @kanjun

Avalon is the world's fastest 3D simulator for RL agents. All baselines train on 1 GPU in ~1 day. We want academic researchers to be able to study aspects of intelligence missing from today’s models, even w/o access to large-scale compute. Get started: generallyintelligent.com/ava…

Pete Florence · Mar 10, 2023 · 6:15 PM UTC

Pete Florence

@peteflorence

10 Mar 2023

With PaLM-E, we can do this end-to-end, all in one model, with no explicit textual intermediate stage. A wide set of temporal/visual reasoning capabilities are in scope. Lots of potential AR & Robotics applications here.

1,108

Pete Florence · Mar 10, 2023 · 6:15 PM UTC

Pete Florence

@peteflorence

10 Mar 2023

Another capability of PaLM-E-562B is that it's, quantitatively, an excellent language model. Roughly as good as PaLM-540B. Notable that scaling the model significantly reduces catastrophic language forgetting 🤔

Danny Driess

@DannyDriess

7 Mar 2023

Replying to @DannyDriess

We observe a notable trend with model scale: the larger the language model, the more it maintains its language capabilities when training on visual-language and robotics tasks – quantitatively, the 562B PaLM-E model nearly retains all of its language capabilities.

1,100

Pete Florence · Jan 19, 2024 · 10:59 PM UTC

Pete Florence

@peteflorence

19 Jan 2024

Replying to @peteflorence @simonkalouche @jackyliang42

Interesting to see where people landed here. In the 2nd poll, the cumulative total is that 71% voted for 1 B demos. And 29% still held out! Here’s the talk from CoRL where I originally had this type of question (timestamped link), I simplified it a bit for this Twitter poll piped.video/aeCZ7DY8KHw?si=oGVH… In the talk there’s a crude back-of-the-envelope estimate that if for 1 year instead of working on CoRL papers everybody just collected demos, we’d have ~230 million demos. As I say in the talk, I’m not actually saying we should do that… we need a diversity of ideas, and CoRL is a great conference full of lots of great ideas, but the point of the question is to get people to think. Phrasing the question this way is elucidating because (a) with this group the “people time” is already paid for, and (b) we know that robot researchers typically collect high-quality demo data 🙂. Maybe this makes you realize that collecting a lot of data is indeed feasible… ideally in a way where people are also doing other research too. And maybe it helps you think about how, if at all, you might adjust your own work for a future world where we have such a large scale of robot demonstration data. Rather than thinking about whether or not 100 M demos would “solve” any subset of robotics, I think it’s more helpful to think about how the robotics landscape would change in such a world. NLP and Computer Vision are quite different since LLMs, CLIP-style models, large-scale text-to-image diffusion models, multimodal LLMs have existed. What shifts in importance? What is then possible, and what doesn't change? What is perhaps even more important than before? Re: the poll’s question, my own view is that both would be immensely valuable. I am both gung-ho on getting lots of demos and also on research on lots of other, potentially unrelated things. In the talk I originally said “nonzero probability” that 230 million demos might be more valuable than everything else combined. That’s a weaker statement than the poll’s question. Wild guess, probability is maybe around 50%. If forced to pick as the poll asked, we’ll say maybe over 50%, so I’d take the 100 M demos. Also keep in mind that the impact of 100 M demos might be pretty immediate in under 1 year, whereas methodological research ideas typically take longer to bear fruit in terms of impact. Thoughts?

1,668

Pete Florence · Sep 21, 2019 · 7:39 PM UTC

Pete Florence

@peteflorence

21 Sep 2019

Replying to @rasbt

My personal favorites to recommend: - Great for any level of background (including zero): codecademy.com/learn/learn-p… - Great refresher: cs231n.github.io/python-nump…

Learn Python 3 | Codecademy

Develop your Python 3 skills in our comprehensive course. Start coding and build versatile applications.

codecademy.com

Pete Florence · Oct 23, 2023 · 11:16 PM UTC

Pete Florence

@peteflorence

23 Oct 2023

The importance of context in communication: Blinking headlights while driving either means - Go f yourself - Thank you - Get out of my way - Go ahead - Your lights are off A single bit, but potentially many bits of context. :)

2,138

Pete Florence · Apr 8, 2022 · 5:04 PM UTC

Pete Florence

@peteflorence

8 Apr 2022

Here's some examples (see thread) from @maxbraun, running our Socratic-Models-based image captioning in our open-source colab (colab.research.google.com/dr…) If anybody would prefer, we can also provide a "request-result-over-Twitter" API :) -- just send some images.

SocraticModels-ImageCaptioning.ipynb

Colaboratory notebook

colab.research.google.com

Max Braun @maxbraun

7 Apr 2022

Replying to @maxbraun

0.2773 A high-tech cafeteria where robots serve delicious food without a single human in sight. 0.2362 An empty room with only a cleaning robot to keep it company. 0.2253 A robotic future?

Pete Florence · Mar 10, 2023 · 6:15 PM UTC

Pete Florence

@peteflorence

10 Mar 2023

In a recent-ish podcast (recorded in October, released in January), I had a few comments on where large-scale multimodal models are headed and “one big model” approach... (see around 42 minutes here)

The Gradient

@gradientpub

6 Jan 2023

Check out our interview with Google's @peteflorence! We chat about how robotics can benefit from dense visual representations, neural radiance fields, and large language models. It's an exciting time for robotics, take a listen! 👇 thegradientpub.substack.com/…

1,422

Pete Florence · Jun 9, 2022 · 1:56 AM UTC

Pete Florence

@peteflorence

9 Jun 2022

Replying to @pfau

Yitang Zhang bounded gaps between primes at age ~58, in ~2013

Pete Florence · Mar 10, 2023 · 6:15 PM UTC

Pete Florence

@peteflorence

10 Mar 2023

In Socratic Models, this worked by writing out a language-based world-state history – a timestamped log of textually-represented events:

930

Pete Florence · Oct 28, 2021 · 7:34 PM UTC

Pete Florence

@peteflorence

28 Oct 2021

For context, with more and more research happening on B (right), it can be hard to search for things related to A (left). Maybe I'm the only person that runs into this though :) Poll

24% Use different name for A

50% Use different name for B

27% Doesn't bother me

1,568 votes • Final results

Pete Florence · Mar 10, 2023 · 6:15 PM UTC

Pete Florence

@peteflorence

10 Mar 2023

For robotics, PaLM-E is a rapid learner of new planning tasks, requiring only a handful of samples to start generalizing well in a given domain. Here we plot PaLM-E sample complexity relative to baseline – the difference is solely transfer learning. (Subset of Table 2)

1,325

Pete Florence · Jun 7, 2022 · 9:19 PM UTC

Pete Florence

@peteflorence

7 Jun 2022

Can NeRF help reinforcement learning? See Danny’s (dannydriess@) thread on “NeRF-RL”! A few more comments in this thread too.

Danny Driess

@DannyDriess

7 Jun 2022

New preprint on Reinforcement Learning with Neural Radiance Fields Paper: arxiv.org/abs/2206.01634 Video: dannydriess.github.io/nerf-r… Amazing collaboration between @DannyDriess, @IngmarSchubert, @peteflorence, @YunzhuLiYZ, @Marc__Toussaint (1/6)

Pete Florence · Feb 8, 2024 · 4:29 PM UTC

Pete Florence

@peteflorence

8 Feb 2024

Replying to @ankurhandos @tonyzzhao

Rough math: $27k per ALOHA2 x 9 ALOHA2s = $243k… Equivalently, only would have needed to hold onto about $13k of NVIDIA stock from 5 yrs ago 😉 (+1802%)

257

Pete Florence · Mar 10, 2023 · 6:15 PM UTC

Pete Florence

@peteflorence

10 Mar 2023

Here is the link for the blog post:

Google AI

@GoogleAI

10 Mar 2023

Today we share PaLM-E, a generalist, embodied language model for robotics. The largest instantiation, 562 billion parameters, is also a state-of-the-art visual-language model, has PaLM’s language skills, and can be successfully applied across robot types →goo.gle/3JsszmK

1,076

Pete Florence · Jan 19, 2024 · 11:01 PM UTC

Pete Florence

@peteflorence

19 Jan 2024

Replying to @peteflorence @chris_j_paxton

That talk is public now! piped.video/aeCZ7DY8KHw?si=86Xh…

Robot Neonatology - talk at CoRL 2023 Workshop (NeuRL-RMW)

Sorry for the audio/video quality dropping in a couple spots! I cu...

youtube.com

500

Pete Florence · Jan 12, 2022 · 4:46 PM UTC

Pete Florence

@peteflorence

12 Jan 2022

Has anyone figured out what the optimal first Wordle word guess is? There’s no information for the first word. Should be the same optimal first guess every time.

Pete Florence · Mar 10, 2023 · 6:15 PM UTC

Pete Florence

@peteflorence

10 Mar 2023

For this multi-image reasoning, since PaLM-E flexibly supports multimodal sentences, it can answer questions about specific relationships between images. While the previous example was a “what matches?” question, this one is a “what’s different?” question.

1,395

Pete Florence · Apr 7, 2022 · 4:13 PM UTC

Pete Florence

@peteflorence

7 Apr 2022

But certainly "learning" / "meta-learning" the form of the interaction itself seems possible.

Pete Florence · Mar 10, 2023 · 6:15 PM UTC

Pete Florence

@peteflorence

10 Mar 2023

And I want to close with a Haiku. Prompt in gray by @brian_ichter, and the completion written by PaLM-E-562B:

1,235

Pete Florence · Jan 23, 2022 · 9:51 PM UTC

Pete Florence

@peteflorence

23 Jan 2022

Replying to @MikkoMononen

Cool. A couple references to broaden your rabbit hole :) you might find interesting: 1: groups.csail.mit.edu/robotic… (says it’s about UAVs but I think you’ll see could be applied to any motion planning really.) 2. arxiv.org/pdf/2101.11565.pdf (also talk on YouTube: piped.video/wciDaoNSwwk)

Pete Florence · Mar 10, 2023 · 6:15 PM UTC

Pete Florence

@peteflorence

10 Mar 2023

Interesting to look back at that interview now – finishing out the results of PaLM-E has definitely shifted my perspective! (btw, thanks @gradientpub + @andrey_kurenkov for having me on!)

708

Pete Florence · Sep 19, 2019 · 4:59 PM UTC

Pete Florence

@peteflorence

19 Sep 2019

OpenAI’s “Dactyl” work is one of the most discussed recent results in robotics but AFAIK was an arxiv-only submission Pdf: arxiv.org/abs/1808.00177 Blog: openai.com/blog/learning-dex…

Learning Dexterous In-Hand Manipulation

We use reinforcement learning (RL) to learn dexterous in-hand manipulation policies which can perform vision-based object reorientation on a physical Shadow Dexterous Hand. The training is...

arxiv.org

Pete Florence · Jul 28, 2023 · 5:23 PM UTC

Pete Florence

@peteflorence

28 Jul 2023

This morning's Hard Fork podcast is a nice intro to RT-2. @hausman_k thread for more links nitter.app/hausman_k/status/16849… Also see @haqhuy thread on "Scaling up and Distilling Down" nitter.app/haqhuy/status/16849671… And background on Moravec's paradox by @chelseabfinn: piped.video/watch?v=raHM3k-u…

Huy Ha @haqhuy

28 Jul 2023

How can we put robotics on the same scaling trend as large language models while not compromising on rich low-level manipulation and control?

2,615

Pete Florence · Aug 2, 2023 · 11:07 PM UTC

Pete Florence

@peteflorence

2 Aug 2023

Model size is certainly not everything, but I think the comparisons are notable. RoboCat/Gato are pretty large, so are MVP/VC-1 (both ViT-L). RT-2 is a considerable step up. Everything else is pretty small. "Largest over time" view, log scale:

873

Pete Florence · Feb 8, 2024 · 3:38 AM UTC

Pete Florence

@peteflorence

8 Feb 2024

V curious to see this hand in action! @ericjang11 great show btw piped.video/X7HmltUWXgs?si=Z6pD…

Rick and Morty - You pass Butter

What is my purpose?You pass butter.

youtube.com

Bernt Bornich

@BerntBornich

7 Feb 2024

NEO just picked its first cup, excited to finally share some hands-on details🧵

3,066

Pete Florence · Jan 16, 2024 · 1:32 AM UTC

Pete Florence

@peteflorence

16 Jan 2024

Great task to show off the new Tesla hardware, very nice, looking forward to more! 👏@julianibarz @aelluswamy

Elon Musk

@elonmusk

15 Jan 2024

Optimus folds a shirt

2,115