Senior Research Scientist @GoogleDeepMind, core contributor of Gemini Pretraining and Omni Post-training; Prev: PhD @CornellCIS, BS @Tsinghua_Uni

New York City
First day as a research scientist @GoogleDeepMind in NYC 😆 Looking forward to working with old friends and making new ones!
26
6
345
46,517
I’m on the job market! Looking for a research scientist/engineer or post-doc position. Some cool stuff I have worked on: RealFill, Diffusion Features, Magic3D, and Visual Prompt Tuning. More details at lumingtang.info. Feel free to reach out if you have any openings!
5
17
98
25,323
I’ll defend my PhD thesis entitled “Mining Visual Knowledge from Pre-trained Models” tomorrow (June 4th) at 10am EST. Feel free to DM me for the zoom link if you are interested!
3
3
80
18,325
Thanks to everyone for attending my talk yesterday! Here's the recording if you are interested: piped.video/watch?v=MDhNiPwI…
I’ll defend my PhD thesis entitled “Mining Visual Knowledge from Pre-trained Models” tomorrow (June 4th) at 10am EST. Feel free to DM me for the zoom link if you are interested!
10
1
44
7,294
First time attending @siggraph and excited to present our work RealFill on Thursday morning! Also first time attending conference with crutches and walking boot 😂 looking forward to meeting more friends!
3
38
4,242
Finally arrived at Seattle 😂 Excited to meet old friends and make new ones at #CVPR2024 Hope I’m not too late to the party 🥲
1
1
35
5,226
Heading to #NeurIPS2023 now and gonna present our work “Emergent Correspondence from Image Diffusion” on Tuesday morning. Excited to meet old friends and make new friends! I’m also looking for full-time jobs. Feel free to DM me if you’re interested in meeting up!
1
32
2,899
We'll present RealFill at the Controllable Image Generation session in Mile High 1 at 9:40 tomorrow (Thursday) morning. Come to our talk to learn more about reference-based image completion! #SIGGRAPH2024
RealFill: Reference-Driven Generation for Authentic Image Completion paper page: huggingface.co/papers/2309.1… Recent advances in generative imagery have brought forth outpainting and inpainting models that can produce high-quality, plausible image content in unknown regions, but the content these models hallucinate is necessarily inauthentic, since the models lack sufficient context about the true scene. In this work, we propose RealFill, a novel generative approach for image completion that fills in missing regions of an image with the content that should have been there. RealFill is a generative inpainting model that is personalized using only a few reference images of a scene. These reference images do not have to be aligned with the target image, and can be taken with drastically varying viewpoints, lighting conditions, camera apertures, or image styles. Once personalized, RealFill is able to complete a target image with visually compelling contents that are faithful to the original scene. We evaluate RealFill on a new image completion benchmark that covers a set of diverse and challenging scenarios, and find that it outperforms existing approaches by a large margin.
7
36
4,186
Hi @CVPR, I reviewed 2 papers this year, however, both of them withdrew during the rebuttal stage and didn’t post any author feedback. The entries also disappeared in the cmt portal. Just curious in this case, am I still be able to be considered in the top reviewers list?🤔
1
23
Really cool work on city-level 3D generation using only google street-view data!
Thought about generating realistic 3D urban neighbourhoods from maps, dawn to dusk, rain or shine? Putting heavy snow on the streets of Barcelona? Or making Paris look like NYC? We built a Streetscapes system that does all these. See boyangdeng.com/streetscapes. (Showreel w/ 🔊 ↓)
2
14
2,272
Truly mindblowing quality and speed!🤯 Still cannot believe it has been barely over a whole year since the release of DreamFusion, and the field is moving soooo fast!
Introduce Tripo, the advanced 3D Foundation Model. 👣 Tripo can generate textured 3D mesh models in 8 seconds. Moreover, refinement only takes 5 minutes, giving you 3D models that rival the quality of handcrafted ones with an impressive 95% success rate and beyond. 🚀
12
1,399
thanks a lot to the great organizers! really glad to meet all the vision folks at NYC and hear the wonderful talks!
Organizing the first NYC vision workshop was super fun! Shout out to other organizers @elliottszwu @Haian_Jin and especially @Jimantha for the generous support!
10
1,108
Really cool work on scene generation conditioning on single image!
Can generative AI imagine what Alice saw in her journey in the Wonderland 🏞️🚶‍♀️? Introducing WonderJourney: Create a journey (a long sequence of diverse yet connected 3D scenes) from a single image or text! 🧵1/N Web: kovenyu.com/wonderjourney/ arxiv: arxiv.org/abs/2312.03884
1
10
1,719
Really cool work on rethinking VLM’s image classification capabilities!
🧐Can VLMs recognize the species of the mushroom you're eating? Probably not. Our latest work shows that current VLMs struggle with the most basic image classification task. To understand why, we investigate six hypotheses regarding VLM's training, inference, and data, and find that data determines the VLM's classification performance. Based on the findings, we enhance a VLM by integrating classification data. So the VLM can accurately classify the mushroom and know whether it's poisonous. arxiv.org/abs/2405.18415 🧵
1
1
10
1,302
Really cool work on diffusion guidance!
Guidance on top of diffusion models can now be used to drag and manipulate images, create pose-conditioned images, and so much more! Check out Readout Guidance: readout-guidance.github.io Work w/ @trevordarrell, @oliver_wang2, @danbgoldman, @holynski_. More in thread 🧵.
8
1,198
amazing work! the shaking rose looks so real!
Excited to share our work on Generative Image Dynamics! We learn a generative image-space prior for scene dynamics, which can turn a still photo into a seamless looping video or let you interact with objects in the picture. Check out the interactive demo: generative-dynamics.github.i…
7
1,792
This is so cool!
Have you noticed colored glints on human hair and animal fur under sunlight? At #SIGGRAPH2023 next week, we will present our new wave optics based fiber model (with Bruce Walter, Christophe Hery, Olivier Maury, Eric Michielssen, and Steve Marschner).😃mandyxmq.github.io/research/…
8
1,059
Really cool work on applying 2d diffusion features to 3d part segmentation!
#ECCV2024 Large 2D visual foundation models like DINOv2 and Stable Diffusion have shown impressive capabilities in understanding object semantics. How can we leverage these features for 3D object part segmentation? Paper: arxiv.org/abs/2407.09648
8
1,226
Super cool work! Finally we have a scalable human-aligned way to evaluate and benchmark text-to-3d models!
Looking for a way to evaluate your text-to-3D model? We found that GPT-4V can be prompted to be a human-aligned and versatile 3D evaluator! Arxiv: arxiv.org/abs/2401.04092 Code: github.com/3DTopia/GPTEval3D Page: gpteval3d.github.io/
8
1,373
It’s me when trying to find a summer lease at Bay Area
8
1,205
this looks so cool!
4K4D: Real-Time 4D View Synthesis at 4K Resolution paper page: huggingface.co/papers/2310.1… paper targets high-fidelity and real-time view synthesis of dynamic 3D scenes at 4K resolution. Recently, some methods on dynamic view synthesis have shown impressive rendering quality. However, their speed is still limited when rendering high-resolution images. To overcome this problem, we propose 4K4D, a 4D point cloud representation that supports hardware rasterization and enables unprecedented rendering speed. Our representation is built on a 4D feature grid so that the points are naturally regularized and can be robustly optimized. In addition, we design a novel hybrid appearance model that significantly boosts the rendering quality while preserving efficiency. Moreover, we develop a differentiable depth peeling algorithm to effectively learn the proposed model from RGB videos. Experiments show that our representation can be rendered at over 400 FPS on the DNA-Rendering dataset at 1080p resolution and 80 FPS on the ENeRF-Outdoor dataset at 4K resolution using an RTX 4090 GPU, which is 30x faster than previous methods and achieves the state-of-the-art rendering quality. We will release the code for reproducibility.
1
7
1,482
Cannot wait for the flight to CVPR tmr morning 😝 #CVPR2022
7
Really cool work on using simple corresponding points as visual prompts to improve MLLM's 3D space-time understanding capabilities!
🌟Introducing 𝐂𝐨𝐚𝐫𝐬𝐞 𝐂𝐨𝐫𝐫𝐞𝐬𝐩𝐨𝐧𝐝𝐞𝐧𝐜𝐞 (coarse-correspondence.github…), a 𝘀𝗶𝗺𝗽𝗹𝗲, 𝗴𝗲𝗻𝗲𝗿𝗮𝗹, 𝗮𝗻𝗱 𝗲𝗳𝗳𝗲𝗰𝘁𝗶𝘃𝗲 visual prompting method that elicits multimodal LLMs’ 𝐮𝐧𝐝𝐞𝐫𝐬𝐭𝐚𝐧𝐝𝐢𝐧𝐠 𝐨𝐟 𝟑𝐃 𝐬𝐩𝐚𝐜𝐞𝐭𝐢𝐦𝐞! We believe AI’s understanding of the world should also be a joint understanding of 3D space and time, achieving the Spatial Intelligence @drfeifei envisions. For the first time, we demonstrate that 𝐚 𝐠𝐞𝐧𝐞𝐫𝐚𝐥-𝐩𝐮𝐫𝐩𝐨𝐬𝐞 𝐌𝐋𝐋𝐌 𝐟𝐨𝐫 𝟐𝐃 𝐢𝐦𝐚𝐠𝐞𝐬 can also develop a strong understanding of 𝟑𝐃 𝐬𝐜𝐞𝐧𝐞𝐬 𝐚𝐧𝐝 𝐥𝐨𝐧𝐠 𝐯𝐢𝐝𝐞𝐨𝐬, achieving 𝐒𝐎𝐓𝐀 results without task-specific model design or fine-tuning. And all this stems from the traditional wisdom of computer vision: correspondence. Details below🧵: (1/n)
1
1
8
1,609
This looks so cool! 🤩
Introducing “FlowMap”, the first self-supervised, differentiable structure-from-motion method that is competitive with conventional SfM like Colmap! cameronosmith.github.io/flow… IMO this solves a major missing piece for internet-scale training of 3D Deep Learning methods. 1/n
1
7
3,153
Replying to @unixpickle
DreamSim is a good option with synthetic data: dreamsim-nights.github.io/
6
573
really amazing work! 🤯🤯 huge congrats! @ChrisWu6080
Excited to share ReconFusion! 3D reconstruction of real-world scenes from only a few photos, powered by diffusion priors: reconfusion.github.io w/ amazing team @ChrisWu6080 @BenMildenhall @philipphenzler @KeunhongP @RuiqiGao @watson_nn @_pratul_ @dorverbin @jon_barron @poolio
5
820
Really cool work on training-free personalized generation!
Today, with collaborators at Google and UT Austin, we're announcing 🤖 RB-Modulation 🤖! It's a whole new training-free framework for conditioning on reference images (for style or subject) without adapters (!) with an elegant formulation 🔥 web: rb-modulation.github.io/
1
6
2,533
really cool work!
Happy to announce StyleAligned – our new work from @GoogleAI: Style Aligned Image Generation via Shared Attention 📜 arXiv: arxiv.org/abs/2312.02133 👀 project page: style-aligned-gen.github.io/ (with quiz game!) 💻 code: github.com/google/style-alig… Details below! ⬇️
5
603
Replying to @ftm_guney @ftmguney
sounds interesting! but the math in Max's notes on volumetric rendering looks reasonable to me: courses.cs.duke.edu/spring03…. Could u please elaborate more on why do u think there shouldn't be an alpha multiplier in the integration?
1
1
6
Replying to @chrisoffner3d
thanks a lot for the nice words, and really glad to know you enjoy our work! your recent findings on diffusion attentions also look cool! and I'll definitely dig more into it haha. the arxiv paper you mentioned is also a great work!
2
47
This looks so cool!
🌎 𝕤𝕒𝕪 𝕙𝕖𝕝𝕝𝕠 𝕥𝕠 𝕧𝕚𝕣𝕝 🌏 virl-platform.github.io
5
1,040
This looks so cool!
With collaborators @Google we're announcing 💫 ZipLora 💫! Merging LoRAs has been a big thing in the community, but tuning can be an onerous process. ZipLora allows us to easily combine any subject LoRA with any style LoRA! Easy to reimplement 🥳 link: ziplora.github.io/
1
4
1,098
Huge congrats Prof. Cai now! 🎉 @CaiQizhe
I am SO proud of @CaiQizhe, my second PhD student to graduate! Qizhe will soon start at @CS_UVA as an assistant professor. Qizhe's thesis explores an extremely important and hard problem---is it possible to enable software-based network stacks...
5
1,197
Amazing work!
Excited to announce 🛠️Neuralangelo🛠️! Neuralangelo turns images into 3D meshes with extremely high fidelity and scales up in large environments. Come and chat with us during #CVPR2023, or check out our project page: research.nvidia.com/labs/dir… More details in thread below (1/4)
4
633
Super cool work! @tianyuanzhang99
PhysDreamer Physics-Based Interaction with 3D Objects via Video Generation Realistic object interactions are crucial for creating immersive virtual experiences, yet synthesizing realistic 3D object dynamics in response to novel interactions remains a significant
1
5
1,803
Super cool work! MegaDepth -> MegaScenes
Introducing MegaScenes—a scene-level dataset containing 100K SfM reconstructions and 2M images with open content licenses. We validate its effectiveness in training large-scale, generalizable models on the task of novel view synthesis. 1/N project page: megascenes.github.io
5
1,952
A message from the past?🧐@Twitter @TwitterSupport
4
This is so cool!
Proud to present threestudio: A unified framework for 3D content generation! With the recently released powerful T2I model DeepFloyd IF, we can finally reproduce the stunning results in the DreamFusion and Magic3D paper 🥳🥳🥳 Try it out now 👉github.com/threestudio-proje… #3D #AIGC
4
752
This looks so cool!
Given a single 3D asset, can we generate its variations without relying on prior knowledge? Introducing Sin3DM ✨, a diffusion model that learns from a single 3D asset and generates high-quality variations with fine geometry and texture details. sin3dm.github.io/ [1/4]
4
866
This is so cool! Even hard to tell the difference between the wrong pairs by my eyes 👁️
Check out our #ICCV203 paper called Doppelgangers. We train a classifier to detect distinct but visually similar image pairs ("doppelgangers") and apply it to SfM disambiguation, enabling COLMAP to create correct 3D models in hard cases. Project page: doppelgangers-3d.github.io/
4
792
Super cool work!
Today, with collaborators at @Google , we're excited to announce 🥳🥳HyperDreamBooth🥳 🥳! It's like DreamBooth, but smaller, faster and better. 25x faster. Think of 30 minutes vs. 14 hours for 100 models. And works on a single image! (Thread 👇) webpage: hyperdreambooth.github.io
3
611
this is so cool! huge congrats! @mli0603
We’re honored @TIME recognized #NVIDIAResearch's Neuralangleo as one of the 🏆 Best Inventions of 2023. We are excited to see how our community will explore this technology to build 3D renders and scenes. ➡️nvda.ws/3rWT4ef
3
857
after dropping the learning rate, my model finally beats the baseline!
2
really interesting test!
Hey Gemini, is this a pigeon?
3
698
Replying to @YiifeiWang18
I got rejected after leetcode coding interview, if this could make u feel better lol
3
797
Cool work on seperating 3D objects in a scene!
Excited to share our work, ObjectCarver! Given multiview images and click points on one image, ObjectCarver decomposes scenes into separate objects, providing high-quality 3D surfaces while handling occlusion and close-contact objects. (1/6) website: objectcarver.github.io
1
3
923
Cannot wait to try this on my cat!
Today, along with collaborators at @GoogleAI, we’re excited to announce StyleDrop! It allows a user to generate new images that follow a specific style of their choice given only a single style reference image 🤯 (Thread 👇) webpage: styledrop.github.io/
1
3
703
Super cool work! @stevenygd
Huge fan of this piece of research (i.e. wish it was mine). Time to bring geometry processing to the new era! And _plenty_ of ways to extend this paper... "Geometry Processing with Neural Fields" vladlen.info/papers/neural-f…
3
Really cool work on reference-based object insertion. Hope we could have this feature on more image editing softwares!
With friends at @Google we announce 💜 Magic Insert 💜 - a generative AI method that allows you to drag-and-drop a subject into an image with a vastly different style achieving a style-harmonized and realistic insertion of the subject (Thread 🧵) web: magicinsert.github.io/
1
3
1,367
wow huge congrats! @Songwei_Ge looking forward to more of your great works!
Congrats!! Songwei @Songwei_Ge for winning the NVIDIA Graduate Fellowship! 🥳🎉 Very proud of your achievements! Appreciate NVIDIA for the support and recognition! blogs.nvidia.com/blog/gradua…
1
2
946
amazing!
IRON: Inverse Rendering by Optimizing Neural SDFs and Materials from Photometric Images abs: arxiv.org/abs/2204.02232 project page: kai-46.github.io/IRON-websit…
2
Replying to @natanielruizg
Thank you for all the help man! And looking forward to future collaborations!
1
2
365
Replying to @yongyuanxi
wow Towaki, long time no see! thanks a lot for your nice words! It was really pleasure to work with you last year 😇 (cannot believe it has been one year already lol). looking forward to see more of your exciting works in the near future!
2
268
This is sooo coool!
We now release BIMT codes and welcome you to train your own modular & interpretable networks, like growing a "brain"! This thread 🧵will cover the basic idea & beyond (1/N) 📃Paper: arxiv.org/abs/2305.08746 📷Code: github.com/KindXiaoming/BIMT 🔗Demo:colab.research.google.com/dr…
2
480
Thanks for suggesting the good example pair! I just downloaded two random images with different pose from unsplash and gave it a try. Here's the video. (Feel free to play with the interactive demo, it's a lot of fun! github.com/Tsingularity/dift…)
1
2
151
Very cool work! Adding a siu in the end could be more fun 😀
I can’t help but feel bad for the little guy
2
814
this looks so cool!
AI will enable to film a scene and completely replace the scenery and characters in it. How? DynVideo-E utilizes dynamic NeRFs to edit human-centric videos in 3D space and propagate the changes to the entire video. The 👏 results 👏 look 👏 stunning. showlab.github.io/DynVideo-E
2
629
the trophy looks really cool 😮 @SkyLi0n
Cornell Tech researcher and PhD student Aaron Gokaslan has received the @PyTorch Award for 2023: bit.ly/3N2cOUM #PyTorch #Code #AI #Research #PhD #CornellTech #OnlyAtCornellTech #EngineeredToMatter
2
515
Replying to @hexiang
Thanks Frank! Looking forward to working with you at GDM in the near future!
1
112
this is so cool! huge congrats! @yuchengthekid
We are excited to release RedPajama-Data-v2: 30 trillion filtered & de-duplicated tokens from 84 CommonCrawl dumps, 25x larger than our first dataset. It exposes a diverse range of quality annotations so you can slice & weight the data for LLM training. together.ai/blog/redpajama-d…
2
786
Really cool work! @NathanYan2012
As with LMs, modern Diffusion models rely heavily on Attention. This improves quality but requires patching to scale. Working with Apple, we designed a model without attention that matches top imagenet accuracy and removes this resolution bottleneck. arxiv.org/abs/2311.18257
2
822
@yuchengthekid looks so cool!
Amazing team 🤘🏽
2
115
Replying to @RchalYang
What a goal! 😍
1
162
Replying to @haotiant1998
Thanks Haotian!
1
96
Replying to @natanielruizg
Thank u so much man!
1
92
Replying to @vincent_jh_cho
Thanks a lot Vincent! It's been quite a while and let's catch up at the bay area this summer!
1
1
106
Just curious when is the exact deadline for neurips rebuttal? The previous email said Aug 9th, 4 pm EDT while Openreview and Neurips website says Aug 9th 10pm PST. Any clarification would be appreciated, thanks! @NeurIPSConf
1
1
1,360
Replying to @YuanqiD
Hi Yuanqi, thank you so much for attending as well as the nice words. Really glad to know u enjoyed my talk. It means a lot to me!
1
241
How Haaland becomes Haaland 😂
In Norway, during the winter, parents leave their babies to sleep outside
1
546
Replying to @ruoshi_liu
Thanks you so much Ruoshi!
1
114
Replying to @zhenjun_zhao
Thank you so much. Please continue sharing good papers on X. I really love it!
1
1
92
Replying to @nanliuuu
Thanks man! Let's catch up more at NYC!
1
172
Replying to @thuanz123
Thanks Thuan! Looking forward to meeting you in-person at future conferences!
1
105
Replying to @Haoxiang__Wang
Thank u, Haoxiang! Wish you all the best at NVIDIA!
1
320
Replying to @sstj389
thanks for being the second AK on my twitter, bro!😉
1
1
Stuck on NJ transit for over 2 hrs, just curious how could I get refund for my ticket and wasted time?@NJTRANSIT it’s funny that the last time I took NJ transit, it also delayed over 2 hrs and made me miss the EuroCup final last year 😅
Replying to @xiaolonw
Big Congrats!🥳
1
Replying to @esx2ve
thanks a lot for reaching out! and just DMed
213
super cool work!
Check out @xiaojuan_wang7's new project! 🔎Generative Powers of Ten🔍 Use a pre-trained text-to-image model to generate deeeeep zoom videos! (Excuse Twitter's terrible compression, check the webpage instead: powers-of-10.github.io/)
460
Replying to @JingxiangSun42
Thanks Jingxiang! And looking forward to more of ur great works on 3D!
1
142
Replying to @_vztu
Thank you so much, Prof. Tu!
1
229
Replying to @hila_chefer
Thanks Hila! Let’s catch up more while u r at google!
1
1
88
Replying to @kusichan
Thanks Andrey! Looking forward to chatting with you at Google again in the near future!
1
58
Looks so cool! Just curious is there a live stream or recording to the live talks? Thanks!
1
622
Replying to @mli0603
amazing profile photo 😆
1
145
Replying to @zizhang_li
Thanks Zizhang! Wish you all the best at Stanford!
1
113
Replying to @natanielruizg
Thank u so much man!
1
1
233
Replying to @LiuBenlin
Thanks man! 😆
1
115
Super cool!
Introducing GANcraft, a method to convert user-created semantic 3D block worlds, like those from Minecraft, to realistic-looking worlds, without paired training data! arxiv: arxiv.org/abs/2104.07659 webpage: nvlabs.github.io/GANcraft/ by @zekunhao19951, @SergeBelongie, @liu_mingyu
1
Replying to @LiuBenlin
Haha really excited to see all the cool stuff happening in the field especially the ones from my friends!
90
Replying to @shrutirij
Thanks Shruti! And let's catch up more at NYC!
1
101
@Thereisnocat_ 😼 there is no cat in this image
1