Senior Staff Research Scientist / Manager @ Google DeepMind

Mountain View, CA
Our team at Google DeepMind Foundational Research is hiring full-time Research Scientists and Research Interns! Multimodal, Reasoning, self-improving agents, Video Understanding. Looking for candidates with strong papers at top ML and CV conferences. Email: af_hiring@google.com
13
63
614
61,548
Our team at Google DeepMind is seeking a Research Scientist with a strong publication record (multiple first-author papers) on multi-modal LLMs in top ML venues like NeurIPS, ICLR, CVPR. Email me at af_hiring@google.com @CordeliaSchmid
4
46
377
53,311
✨ Our team at Google DeepMind is hiring Research Interns (Summer 2025)! Multimodal, text-to-3D, Personalized LLMs, Video Understanding and Generation. Looking for candidates with multiple first-author papers in top ML conferences. Email: af_hiring@google.com @CordeliaSchmid
4
40
351
43,787
Our team at Google DeepMind Foundational Research has an opening for a full-time Research Scientist! Areas of Interest are Multimodal, 3D and Spatial Reasoning, Self-improving Agents. Looking for candidates with strong publications at top ML and CV conferences. Email: af_hiring@google.com
2
28
350
37,627
Robotics at Google has released a very high quality dataset of scanned objects. It could enable interesting research in 3d shape modeling. app.ignitionrobotics.org/Goo…
2
78
294
Jitendra Malik's thoughts on Foundation Models, in the Stanford HAI workshop piped.video/watch?v=dG628PEN…
3
38
177
We have released TensorFlow 3D!
Announcing the release of TensorFlow 3D, a set of training and evaluation pipelines for state-of-the-art 3D semantic segmentation, object detection and instance segmentation, with support for distributed training. Check it out and download the code at goo.gle/3pchcSG
2
19
75
Most of the previous work on 3d object detection use only one frame of data. In our #eccv2020 paper, we present a 3d sparse LSTM model that achieves more accurate results when applied to a sequence of point clouds. arxiv.org/abs/2007.12392
5
30
Our recent work on object-centric neural rendering. Our new formulation makes it possible to move the objects around in the scene and still be able to render high quality images from different views.
We made NeRF compositional! By learning object-centric neural scattering functions (OSFs), we can now compose dynamic scenes from captured images of objects. Website: shellguo.com/osf Joint work with @alirezafathi @jiajunwu_cs Thomas Funkhouser
1
2
29
I am glad that our #cvpr2020 reviews are very positive, but at the same time I am very worried that the quality of the reviews have significantly degraded compared to few years ago.
1
25
Congratulations to Yue Wang (research intern), Rui Huang (AI resident), Wanyue Zhang (AI resident) and @_abhijit_kundu_ for getting their papers accepted to #eccv2020.
1
24
Today marks my 7th year at Google! How time flies! Thank you, Google, for giving me the opportunity to work on what I enjoy...
4
20
2,767
Tesla’s event did a great job showing how far ahead Waymo is compared to everyone else!
1
20
1,745
Here is our Google AI blog post on AVIS, a Large Language Model Agent that achieves state-of-the-art results on visual information seeking tasks. @acbuller @ahmetius @jesu9 @CordeliaSchmid
Today on the blog, read all about AVIS — Autonomous Visual Information Seeking with Large Language Models — a novel method that iteratively employs a planner and reasoner to achieve state-of-the-art results on visual information seeking tasks → goo.gle/3P2y2mY
3
17
5,720
Our ECCV paper on "Pillar-based Object Detection for Autonomous Driving" that achieves state of the art results on 3d object detection on the Waymo Open Dataset. arxiv.org/abs/2007.10323
2
15
REVEAL will be a highlight at @CVPR. Looking forward to discussing it in more details there with @acbuller, @ahmetius, @jesu9, @CordeliaSchmid
Learn how REVEAL, an end-to-end retrieval-augmented visual-language model that learns to use multi-source multi-modal data to answer knowledge-intensive queries, achieves state-of-the-art results on visual question answering and image caption tasks. goo.gle/3qcZwwc
1
2
16
2,166
Another CVPR2020 paper by our group on detecting 3d objects and predicting their 3d shapes arxiv.org/abs/2004.01170
12
Neural Networks seem to follow a puzzlingly simple strategy to classify images medium.com/bethgelab/neural-…
2
12
We are gonna be able to go back to office starting July 12th! Never thought I would be this excited to go back to work in person :)
1
12
Having to take shelter in place, I have been spending some time on gardening! Here is how our sour cherry tree is looking like today!
12
Looking forward to presenting our work on 3d scene understanding in the Deep Learning 2.0 Virtual Summit.
I am looking forward to Alireza Fathi presenting his research advancements at the Deep Learning 2.0 Virtual Summit, Jan 2021. Alireza is currently working on object detection and segmentation in 3D. Join us, and Alireza in January: re-work.co/summits/deep-lear… #computervision
12
That is why you need Lidar! It is not enough to detect the event eventually! A hundred milliseconds late in detecting such event will result in a catastrophic crash!
Kids chasing dogs, chasing balls on the streets of LA… once again, @Waymo AI with advanced sensing making our roads safer.
1
12
1,258
Great work Francis Engelman! Our CVPR 2020 paper achieving the state of the art results on 3d instance segmentation in ScanNet and S3DIS :) arxiv.org/abs/2003.13867
"3D-MPA: Multi Proposal Aggregation for 3D Semantic Instance Segmentation" #CVPR2020 piped.video/ifL8yTbRFDk We perform SemInstSeg by proposal aggregation using a GraphConvNet to model higher-order proposal interactions! Great results on ScanNet and S3DIS :) @FrancisEngelman
11
Vote for CVPR 2023 at Vancouver if you are at #CVPR2019
It’s hard to think of a better place than #Vancouver for #CVPR 2023. Beyond our strong team, it’s fitting that a conference on vision should take place in one of the most beautiful spots on earth. Check out our awesome bid #AINorth #AI #computervision cs.sfu.ca/~mori/cvpr2023_van…
3
11
I am sorry to see colleagues and friends getting affected by mass layoffs in recent days. Please reach out and I would try my best to help with any resources I can think of. Hopefully things will bounce back soon.
10
One of the sad things during this pandemic is to observe the ugly gap between the rich and the poor. At the same time that the rich stays home and orders groceries online to avoid exposure, the poor shops those groceries in store and delivers them to make a living
1
10
Check out our CVPR paper on generative retrieval for web-scale entity recognition!
Happy to introduce GERALD - our new VLM that recognizes 6M+ entities, an exciting step towards Web-scale visual entity recognition! Predictions are simply made by auto-regressively decoding a code representing the entity name. Check out our CVPR24 paper: arxiv.org/abs/2403.02041
4
9
920
Our new #Neurips2024 paper explores the power of multimodal LLMs for building better datasets. We demonstrate significant improvements on visual entity recognition with a novel approach to label verification and data enrichment.
Our new #NeurIPS2024 paper tackles web-scale visual entity recognition by automatically curating a training dataset with a multimodal LLM, achieving SOTA results (+6.9% on OVEN)! Learn how we use multimodal LLMs for label verification and data enrichment: arxiv.org/abs/2410.23676
1
10
1,311
🚀Introducing AVIS: a groundbreaking system that couples #LLM powered planning & reasoning with external tools, resulting in #StateOfTheArt performance on VQA datasets that demand external knowledge! 🧠🔍
AVIS: Autonomous Visual Information Seeking with Large Language Models paper page: huggingface.co/papers/2306.0… In this paper, we propose an autonomous information seeking visual question answering framework, AVIS. Our method leverages a Large Language Model (LLM) to dynamically strategize the utilization of external tools and to investigate their outputs, thereby acquiring the indispensable knowledge needed to provide answers to the posed questions. Responding to visual questions that necessitate external knowledge, such as "What event is commemorated by the building depicted in this image?", is a complex task. This task presents a combinatorial search space that demands a sequence of actions, including invoking APIs, analyzing their responses, and making informed decisions. We conduct a user study to collect a variety of instances of human decision-making when faced with this task. This data is then used to design a system comprised of three components: an LLM-powered planner that dynamically determines which tool to use next, an LLM-powered reasoner that analyzes and extracts key information from the tool outputs, and a working memory component that retains the acquired information throughout the process. The collected user behavior serves as a guide for our system in two key ways. First, we create a transition graph by analyzing the sequence of decisions made by users. This graph delineates distinct states and confines the set of actions available at each state. Second, we use examples of user decision-making to provide our LLM-powered planner and reasoner with relevant contextual instances, enhancing their capacity to make informed decisions. We show that AVIS achieves state-of-the-art results on knowledge-intensive visual question answering benchmarks such as Infoseek and OK-VQA.
3
10
1,891
Random thought! No company is as undervalued as Waymo! If it’s self-driving technology is used in 10% of the ~100M global annual car sales (recent Toyota deal) resulting in $5k per vehicle, that’s $50bn in annual revenue! And pair that with Android car OS and the rest of the Google ecosystem!
2
1
9
1,612
We have just released the instance segmentation support for the Tensor Flow Object Detection API. #TensorFlow #ObjectDetection #Google #API #Segmentation #InstanceSegmentation github.com/tensorflow/models…
5
9
Google has launched it's best thing for everything guide. No need for consumer reports subscription anymore! shopping.google.com/m/bestth…
1
8
Sundar Pichai is now the CEO of Alphabet... blog.google/inside-google/co…
7
Something interesting that I just learned today! Are green, red, yellow and orange bell peppers different or the same? bbc.co.uk/newsround/45522834
8
Great job Steven. A network for predicting surface normals running in real-time on a pixel 2 phone @StevenDHickson @aCromulentName Kevin Murphy @irrfaan arxiv.org/abs/1906.06792
1
8
After almost a decade and billions in outside investment, Magic Leap's first product is finally on sale for $2,295. Here's what it's like. cnbc.com/2018/08/08/magic-le… #MagicLeap
1
6
An interesting blog post on using unity for creating synthetic data for object detection and beyond blogs.unity3d.com/2020/06/10…
1
6
In this work led by @ahmetius we show that image recognition can benefit when retrieving similar images from a web-scale corpus of image-text pairs.
New #CVPR2023 paper "Improving Image Recognition by Retrieving from Web-Scale Image-Text Data". arxiv.org/abs/2304.05173 We improve the recognition capabilities of the model by retrieving images/texts from large-scale memory. Joint work with @alirezafathi and @CordeliaSchmid .
7
1,278
Here is the link if you are interested in applying for the Google Summer Research Internship :) careers.google.com/jobs/resu…
7
Great course for learning deep reinforcement learning!
Want to learn deep RL? My deep RL course now has a permanent course number (CS285) and is being offered this semester: rail.eecs.berkeley.edu/deepr… Lecture videos here (so far, we've gotten through most of model-free RL, model-based RL coming up next): piped.video/playlist?list=PL…
1
6
I have a #TensorFlow joke but I need to be in eager mode!
6
This would be a great resource for software engineers and researchers outside Google
Google's software engineering best practices facilitate consistency & productivity. All code is peer reviewed for clarity, correctness, and adherence to standards. We've just published these practices. Highly recommended for any lab, academic or otherwise. google.github.io/eng-practic…
5
OpenAI's new model fine-tuned from GPT3 for summarizing books! openai.com/blog/summarizing-…
1
1
6
Happy 25th birthday Google 🎉
Happy 25th Birthday Google! 🎉 I have gotten incredible enjoyement from being along for the ride for 24+ of these years. When I joined, we were a handful of people wedged into a small office area in downtown Palo Alto above what is now a T-Mobile store. 1/
1
6
1,167
These short Neurips reviews could be done by LLMs! Probably we don't need reviewers anymore...LLM would write the review and AC makes the decision by looking at the review and the paper!
2
1
6
3,013
Moore's law vs. reality animation. Very cool.
Fascinating: Moore’s Law predictions vs actual growth in transistor count. by @datagrapha teddit.net/r/dataisbeautiful…
1
5
Replying to @elonmusk
One keeps a car for 5 years on average. I promise u there won't be self driving cars in streets five years from now :)
1
5
An interesting blog post on transformers in deep learning models
New blogpost! Transformers from scratch. Modern transformers are super simple, so we can explain them in a really straightforward manner. Includes pytorch code. peterbloem.nl/blog/transform…
5
Replying to @fdellaert
So you submitted HiNeRF to CVPR? :D
5
An interesting podcast with Jitendra Malik on challenges in computer vision piped.video/watch?v=LRYkH-fA…
1
4
"We continue to see overall query growth in Search. That includes an increase in total queries coming from Apple’s devices and platforms. More generally, as we enhance Search with new features, people are seeing that Google Search is more useful for more of their queries — and they’re accessing it for new things and in new ways, whether from browsers or the Google app, using their voice or Google Lens. We’re excited to continue this innovation and look forward to sharing more at Google I/O." - blog.google/products/search/…
5
593
Spread between 2-year and 30-year U.S. Treasury securities over time!
1
4
428
'3D' is the most frequently used keyword after 'detection' in CVPR 2019 towardsdatascience.com/lates…
1
5
in CVPR, some reviewers came up with I just got sick or I just realized this paper is not related to my expertise excuses. Hopefully you will find a way to handle those cases too
1
3
590
Replying to @m__dehghani
Somehow there is a very large jump from step 1.5 to 1.75 :)
1
5
1,635
Replying to @JeffDean
Maximum possible distance on earth is about 19,000km. So this one is probably very unlikely to beat :) en.m.wikipedia.org/wiki/Extr…
3
I was thinking LLM mostly does a summarization and comparison to previous work. Not necessarily scoring the paper. This would make ACs job much easier, but AC would make the final decision by both looking at the summary and the paper itself.
1
2
172
3D object detection and segmentation for self driving cars / robotics, augmented reality, etc.
4
Interesting to know! Number of deaths by risk factor ourworldindata.org/grapher/n…
1
3
Replying to @docmilanfar
That probably is right. But raising $90M in the current environment where most startups are having a hard time raising any money is a very strong signal
1
4
772
Replying to @_akhaliq
Everything is now "Everything Everywhere All at Once"!
3
1,180
GPipe, an Open Source Library for Efficiently Training Large-scale Neural Network Models ai.googleblog.com/2019/03/in…
4
This is how betting odds changed after last night's debate realclearpolitics.com/electi…
4
Google's plan to build 6,600 houses in Mountain View realestate.withgoogle.com/no…
4
Which company has best AI model end of August? polymarket.com/event/which-c…
1
5
1,330
Folks in our team have released the Tensorflow 2.0 version of Object Detection API #tensorflow #ObjectDetection blog.tensorflow.org/2020/07/…
1
1
4
"Model the world, not the data"!
1
4
510
Rumors that apparently Apple is buying drive ai engadget.com/2019/06/06/appl…
2
This might be a useful idea for last minute researchers like myself :)
I have a system to plan writing papers for conference deadlines. My students and some collaborators know about it. With the ICLR 2020 deadline coming up, I thought this might be a good time to share this with a wider audience. link.medium.com/XASmjK6ftZ
3
Congratulations @yukez and @drfeifei. Have been lucky to work with both of you
3
Ego is the anesthesia that deadens the pain of stupidity #famousquotes
2
Google just publicly released its DeepFakes dataset so all researchers can work on it.
Detecting deepfakes is one of the most important challenges ahead of us. Following our release of a synthetic audio dataset in Jan, we're releasing a large dataset of visual deepfakes to support researchers working on synthetic video detection #GoogleAI ai.googleblog.com/2019/09/co…
3
Waymo open dataset is publicly released. Orders of magnitude larger than Kitti
Today, we're launching our Waymo Open Dataset. This high resolution lidar and camera data has been collected by our self-driving cars across a diverse range of situations. We're excited to share it directly with the research community. Download now: waymo.com/open
3
It is true 🙂
This might be the perfect overhyped #AI meme. Courtesy of @c_russl
3
Working from Google SF today! Look at the view... #sf #working #Google #googlesf
3
I feel so out of touch with the people and what they care about around me. I thought I will look at Google trends to see what people are thinking about politics or economic situation, but I realized the main thing they care about at this moment is #NFL
3
200 Billion galaxies in the observable universe, and each galaxy has on average 100 Million stars! Don't take your life so serious stressing out for things that do not even matter on multi-galaxy level!
3
416
Amazing photos from Pixel 4 show how computer vision and machine learning can give a strong boost to the camera hardware cnet.com/google-amp/news/16-…
3
NeurIPS2019 Competition tracks are released, including a 20K competition on 3d object detection organized by Lyft #NeurIPS #NeurIPS2019 nips.cc/Conferences/2019/Com…
3
Fill in the blanks! What is your prediction on where this curve is going? #NASDAQ
3
More than 17 million Americans have more than 1 million dollars in assets! en.m.wikipedia.org/wiki/Mill…
3
Wow...Go Man U...What a come back...
3
Replying to @HesamAslan
Try this prompt on a video generation model: “white ball hits other balls and scatters them around on a pool table” and see how good the model is at physics :)
1
3
267
Interesting deep learning research at hardware level phys.org/news/2019-08-all-op…
2
This whole last few months feels like a dream. One weird part of this dream is that everyday I wake up I see stocks going up! #ShelterInPlace
2
🔥
🔥 Calling all #CVPR2024 attendees! 🔥 Join us for the 1st Tool-Augmented VIsion (TAVI) Workshop on Monday morning in Summit 321! 💡 5 inspiring keynote talks 🎨 5 invited posters from the main conference Don't miss out! ➡️ More info: sites.google.com/corp/view/t…
2
565