Prof, EECS, UC Berkeley. VP & Distinguished Scientist, Amazon. people.eecs.berkeley.edu/~ma…

Angjoo Kanazawa @akanazawa and I taught CS 280, graduate computer vision, this semester at UC Berkeley. We found a combination of classical and modern CV material that worked well, and are happy to share our lecture material from the class. cs280-berkeley.github.io/ Enjoy!
6
98
748
65,098
Our robot dog can go up and down stairs, walk on stepping stones where even a single bad foot placement would lead to a disastrous fall, and rough terrain. All with just a single onboard RGBD camera & no maps. arxiv.org/pdf/2211.07638.pdf
12
64
583
We can now reconstruct 3D humans from single images (forthcoming, ICCV 2023). Model available to try on your images / videos at shubham-goel.github.io/4dhum…
5
43
339
105,203
I delivered the 110th Annual Martin Meyerson UC Berkeley Faculty Research Lecture on March 20, 2023. piped.video/watch?v=f6fDpKDx…
10
25
264
94,308
Happy to present LVM (Large Vision Model). Scalable and tasks can be specified via prompts. Enjoy!
1
29
224
59,330
We cast real-world humanoid control as a next token prediction problem, akin to predicting the next word in language. Check out our robot walking in San Francisco (Ilija Radosavovic et al) humanoid-next-token-predicti…
1
25
190
36,650
Enjoy watching a humanoid walking around UC Berkeley. It only looks inebriated :-)
our new system trains humanoid robots using data from cell phone videos, enabling skills such as climbing stairs and sitting on chairs in a single policy (w/ @redstone_hong @junyi42 @davidrmcall)
1
3
93
9,596
Autoregressive modeling is not just for language, it can equally be used to model human behavior. This paper shows how..
Replying to @vonekels
Please see the website for more details. synNsync🪩is joint work with my awesome ✨co-authors✨: @LeaMue27 @jathushan @geopavlakos @shiryginosar @akanazawa @JitendraMalikCV Website🖥️: von31.github.io/synNsync/ Data💾: github.com/Von31/swing_dance… Arxiv📜: arxiv.org/abs/2409.04440 🧵6/6
3
85
13,283
I am honored to be accepted into the robotics community. Many of the central problems on the way to "AGI" lie in the space of robotics or "embodied AI" & there is much that researchers with backgrounds in computer vision and ML can offer. We are part of the same family!
1
3
74
7,230
Happy to share these exciting new results on video synthesis of humans in movement. Arguably, these establish the power of having explicit 3D representations. Popular video generation models like Sora don't do that, making it hard for the resulting video to be 4D consistent.
I’ve dreamt of creating a tool that could animate anyone with any motion from just ONE image… and now it’s a reality! 🎉 Super excited to introduce updated 3DHM: Synthesizing Moving People with 3D Control. 🕺💃3DHM can generate human videos from a single real or synthetic human image. #Animation #GenAI #AI #3DHM ✨ The magic of 3D control? Turning 2D pixels into lifelike, animated humans. 🎥 Check out our demo (and Merry Christmas)! piped.video/watch?v=obABCBP-… Paper: arxiv.org/abs/2401.10889 Github: github.com/Boyiliee/3DHM Webpage: boyiliee.github.io/3DHM.gith… Proudly working with the great @JunmingChenleo, @jathushan, @YGandelsman, Alyosha Efros and @JitendraMalikCV😃 Kindly note: This video is intended solely for research purposes and is not authorized for commercial use.
8
72
12,119
Another success of sim-to-real for training robot policies! This task, using two multi-fingered hands, requires considerable dexterity, and is hopefully representative of other household tasks that we wish to solve in the future.
Achieving bimanual dexterity with RL + Sim2Real! toruowo.github.io/bimanual-t… TLDR - We train two robot hands to twist bottle lids using deep RL followed by sim-to-real. A single policy trained with simple simulated bottles can generalize to drastically different real-world objects.
5
63
14,596
Again the power of tactile sensing and multi-finger hands comes through. This is the future of dexterous manipulation!
🤖 What if a humanoid robot could make a hamburger from raw ingredients—all the way to your plate? 🔥 Excited to announce ViTacFormer: our new pipeline for next-level dexterous manipulation with active vision + high-resolution touch. 🎯 For the first time ever, we demonstrate ~2.5 minutes of continuous, autonomous control—combining active vision, high-res touch, and high-DoF robot hands SharpaWave — to complete complex, real-world tasks. Code is fully released; check out our: Homepage: roboverseorg.github.io/ViTac… Paper link: arxiv.org/abs/2506.15953 Github: github.com/RoboVerseOrg/ViTa…
1
8
74
9,679
Touche', Sergey!
Lots of memorable quotes from @JitendraMalikCV at CoRL, the most significant one of course is: “I believe that Physical Intelligence is essential to AI” :) I did warn you Jitendra that out of context quotes are fair game. Some liberties taken wrt capitalization.
56
10,754
Want to make your photorealistic 3D avatar dance like your favorite actor? Check this out!
Super excited to announce our new work: Synthesizing Moving People with 3D Control (3DHM)💡 Why is 3DHM unique? With 3D Control, 3DHM can animate a 𝗿𝗮𝗻𝗱𝗼𝗺 human photo with 𝗮𝗻𝘆 poses in a 𝟯𝟲𝟬-𝗱𝗲𝗴𝗿𝗲𝗲 camera view and 𝗮𝗻𝘆 camera azimuths from 𝗮𝗻𝘆 video!
3
35
7,700
Replying to @prfsanjeevarora
Congrats, OpenAI Team! And yes, you win our bet, Sanjeev. FYI for the curious: I had bet against "Some AI model will be able to perform at IMO gold level by May 1 2026".
2
1
36
1,731
Just to be clear, this is a classic problem in computer vision. The novelty is in the high quality of the 3D reconstructions, even in the presence of occlusions and for unusual poses.
26
5,182
Frank, That was a tongue-in-cheek remark not to be taken seriously. I do seriously believe that CV people should get into robotics because the perception-action loop provides meaning to visual processing. Else we might be wasting time on artificial problems.
1
2
21
1,163
Replying to @ToruO_O
There have been impressive recent results using tele-op trajectories for training robot policies. These are typically for hands that are parallel jaw grippers; work led by @ToruO_O in our lab at BAIR shows that one can do this now for multifingered hands with vision and touch.
1
15
2,458
Replying to @deedydas
Having 3D representations of the humans may be the way to fix this problem. Check out
I’ve dreamt of creating a tool that could animate anyone with any motion from just ONE image… and now it’s a reality! 🎉 Super excited to introduce updated 3DHM: Synthesizing Moving People with 3D Control. 🕺💃3DHM can generate human videos from a single real or synthetic human image. #Animation #GenAI #AI #3DHM ✨ The magic of 3D control? Turning 2D pixels into lifelike, animated humans. 🎥 Check out our demo (and Merry Christmas)! piped.video/watch?v=obABCBP-… Paper: arxiv.org/abs/2401.10889 Github: github.com/Boyiliee/3DHM Webpage: boyiliee.github.io/3DHM.gith… Proudly working with the great @JunmingChenleo, @jathushan, @YGandelsman, Alyosha Efros and @JitendraMalikCV😃 Kindly note: This video is intended solely for research purposes and is not authorized for commercial use.
1
10
1,481
Replying to @svlevine
Touche', Sergey!
11
1,824
Exciting results on the ability to 3dfy hands and objects given only a single image. This could pave the way for training robots to perform manipulation tasks by watching Internet videos.
7
778
Replying to @ir413
Signal boosting Ilija's post. More at learning-humanoid-locomotion…
4
801