Research Scientist at Google DeepMind, Berlin. stronglyconvex.com

Berlin, DE
Introducing SMERF: a streamable, memory-efficient method for real-time exploration of large, multi-room scenes on everyday devices. Our method brings the realism of Zip-NeRF to your phone or laptop! Project page: smerf-3d.github.io ArXiv: arxiv.org/abs/2312.07541 (1/n)
21
189
842
188,085
Our paper, “NeRF in the Wild”, is out! NeRF-W is a method for reconstructing 3D scenes from internet photography. We apply it to the kinds of photos you might take on vacation: tourists, poor lighting, filters, and all. nerf-w.github.io (1/n)
72
1,306
5,465
For lighting and image post-processing, we introduce a low-dimensional embedding space controlling NeRF’s radiance field. This not only gives NeRF-W the capacity to model photo-specific lighting, it enables us to “relight” a scene from new angles. (3/n)
5
31
214
This project wouldn’t have been possible without my amazing coauthors: @rmbrualla, Noha Radwan, Mehdi S. M. Sajjadi, @jon_barron, and Alexey Dosovitskiy. Check out our paper: arxiv.org/abs/2008.02268
6
7
94
We build on NeRF, a method for learning a volumetric radiance field from a posed photo collection. We introduce two extensions to soften NeRF’s “static world” assumption: one for lighting/post-processing, the other for transient objects. (2/n)
1
8
80
NeRF-W improves on the SOTA by >5dB in PSNR and reduces error on other metrics by 20-50%. Qualitatively, NeRF-W produces consistent, crisp 3D geometry without fog or checkerboard artifacts. Check out the project website for more videos and the paper. nerf-w.github.io (5/n)
2
8
78
For transient objects, we introduce a secondary volumetric radiance field combined with an uncertainty field. The former explicitly captures transient objects; the latter uncertainty about the color of a pixel passing through part of the 3D space. (4/n)
3
5
69
SMERF has the best of both worlds: we produce renders nearly indistinguishable from Zip-NeRF while rendering at 60 fps or more on desktops, laptops, and even recent smartphones, all while scaling to scenes as big as a house! (3/n)
2
5
45
12,135
How does one trade-off sample quality and diversity in a language model? Which decoding method is best? We introduce a multi-objective framework maximizing human judgement score subject to a constraint on diversity (entropy). arxiv.org/abs/2004.10450 (1/7)
2
8
35
Infinite! We don't use polygons. We learn a "volumetric radiance field" scene representation.
3
30
We used a few hundred to low-digit thousands. Based on the results of NeRF, if you capture your images in a controlled environment, you might be able to get away with as few as one hundred! matthewtancik.com/nerf
3
4
35
How do we achieve this? We distill a teacher model into a family of MERF-like student submodels, each of which specializes to a different part of the scene. Each submodel captures the entire scene, so rendering stays fast and GPU memory consumption stays low. (4/n)
1
22
2,490
Indeed, we were heavily inspired by Photosynth! I remember being in awe when I first saw it back in high school.
2
20
Only a single submodel needs to be in memory at a time, and while the user explores the space, we swap out old submodels and stream in new ones. We train submodels to be mutually consistent, making transitions barely noticeable. (6/n)
1
20
2,121
Proximal Gradient Descent? ADMM? They're more similar than you think! stronglyconvex.com/blog/admm…
1
3
3
We also modify MERF to significantly improve visual fidelity on small-to-medium size scenes. Our submodels capture thin geometry, high-resolution textures, and specular highlights better than ever before. (5/n)
1
1
17
2,292
The result: a set of compact, streaming-ready submodels ready to run at up to 60 fps in your browser. The best part: you can try it out yourself: smerf-3d.github.io (7/n)
1
1
16
3,087
Like Photosynth, we require one to run a registration pipeline such as COLMAP to derive camera parameters (position, direction, focal length, etc). Once that's done, we learn the scene representation.
2
14
Existing approaches for view-synthesis are torn between two conflicting goals: high quality and fast rendering. Most methods only achieve one or the other. (2/n)
1
1
14
3,605
You'll have to read the paper for all of the technical details, but in short: clever use of OpenGL texture lookups + lots of model training magic.
1
2
93
For all -1 of you reading this, I started a blog @ stronglyconvex.com. It's almost entirely optimization proofs. LAGRANGIANS TO THE FACE
4
7
Proud to have my first arXiv publication with Sam, Jascha, and Quoc!
Stochastic natural gradient descent corresponds to Bayesian training of neural networks, with a modified prior. This equivalence holds *even away from local minima*. Very proud of this work with Sam Smith, Daniel Duckworth, and Quoc Le. arxiv.org/abs/1806.09597
5
I'm stoked to be a contributor on Object SRT, a new method for unsupervised, posed-images-to-3D-scene representation and segmentation! It's crazy fast and, while far from perfect, is leaps and bounds better than anything I've seen yet :)
So excited to share Object Scene Representation Transformer (OSRT): OSRT learns about complex 3D scenes & decomposes them into objects w/o supervision, while rendering novel views up to 3000x faster than prior methods! 🖥️ osrt-paper.github.io 📜 arxiv.org/abs/2206.06922 1/7
7
I'm proud to announce the release of our new paper relating Whitening, Newton's Method, and Generalization! tl;dr whitening w/o regularization significantly reduces a model's ability to generalize. Work with @negative_result @sschoenholz @ethansdy @jaschasd
Whitening and second order optimization both destroy information about the dataset, and can make generalization impossible: arxiv.org/abs/2008.07545 We examine what information is usable for training neural networks, and how second order methods destroy exactly that information.
3
6
Super proud of my ACL publication with @daphneipp! tl;dr we find that the decoding methods that produce the most "human-like" text are also the easiest for BERT-style classifiers to identify. We humans and our models don't see text the same way!
1
4
Replying to @RadianceFields
Want to learn more? Check out my OG explainer thread!
Introducing SMERF: a streamable, memory-efficient method for real-time exploration of large, multi-room scenes on everyday devices. Our method brings the realism of Zip-NeRF to your phone or laptop! Project page: smerf-3d.github.io ArXiv: arxiv.org/abs/2312.07541 (1/n)
1
5
278
I had the pleasure last month of giving slightly provocative talk at Bliss AI, a a wonderful student-run organization here in Berlin. The best part: the talk is online for everyOne to enjoy! Behold: "NeRF is dead" piped.video/FtaC5lh8hxs?si=dnJ0…
1
5
3,255
Don't forget our amazing coauthors, @jfthibert and @RSzeliski!
3
133
Replying to @Snosixtytwo
Thank you, Bernhard :). It's an honor to hear it from you!
3
77
Replying to @GiorgioPatrini
Obligatory "This Video Does Not Exists", haha
1
4
None of this would be possible without my amazing collaborators! @negative_result, @sschoenholz, @ethansdyer, and @jaschasd.
3
Replying to @w4nderlus7
I believe arxiv.org/pdf/2001.09977.pdf and this work aren't in conflict! Two points: (1) The Meena paper says, "given two models, the one with better perplexity produces better samples." This work says, "given two samples, the more likely isn't always better."
2
3
Replying to @gkopanas
Thanks for the kind words, Georgios! I look forward to the next generation of 3DGS work as well. It's just a matter of time till 3D capture & presentation is accessible as 2D is today.
3
109
But all is not lost! We also find that *regularized* second-order optimization leads to better generalization than un-regularized second-order optimization or gradient descent.
2
3
Replying to @MusingsOfSamu
@Samu_tweetz I'm living in SF, but I'll be in Europe till October. Let's have a reunion when I get back!
2
Replying to @tanay
@tanay @DOOMTREE fuck yeah, saw them do a mini-set in Berkeley and a full show in San Francisco. Totally reminded me of you.
2
Replying to @duck @w4nderlus7
On the decoding side, the Meena paper also advocates for a sample-and-rank method, N=20. I hypothesize that the method doesn't surface decodes on the "too likely" side of the Likelihood Trap. We didn't compare sample-and-rank as...
1
2
Key takeaways: (i) very high likelihood samples are bad, (ii) compare decoding methods fairly by controlling entropy, and (iii) there's more to decoding methods than favoring high-likelihood samples. (6/7)
1
2
Have you wondered how effective social distancing is? Or quarantining? What happens if a few people ignore social distancing? How bad is it to go to the grocery store? In short, everything helps -- especially early testing and quarantine! We're all in this together.
New video: Simulating an epidemic. What happens when people avoid each other for the most part but still go to a common central location like a store? What if you can track and isolate cases, but 20% slip through the cracks? 50%? And much more. piped.video/gxAaO2rsdIs
1
1
Replying to @johnmyleswhite
@johnmyleswhite I'm happy to switch as long as I'm not fighting (too many) compiler bugs! I'll give it a shot, it can't hurt.
1
2
Replying to @duck @w4nderlus7
...we didn't know how to measure its entropy. If you know how, let me know :)
1
2
Here’s a head-to-head comparison of nucleus, top-k, and temperature sampling, and our newly proposed decoding method. Sampling directly from the model is by far the worst and nucleus p=0.3 is best according to human judgement. (7/7)
1
1
2
Replying to @josephreisinger
Thanks Joe :). It warms my heart to hear your kind words!
2
A new day, a new proof. That's right kiddies, Accelerated Proximal Gradient Descent works. stronglyconvex.com/blog/acce…
2
This and other large scenes are captured with a DSLR camera and a fisheye lens. Approximately ~1500 photos are used. Capture takes 30~60 min.
1
2
23
Replying to @Kyrannio
Don't forget my amazing collaborators at Google Research, Google Inc, and Tübingen! This was very much a team effort. More info here:
Introducing SMERF: a streamable, memory-efficient method for real-time exploration of large, multi-room scenes on everyday devices. Our method brings the realism of Zip-NeRF to your phone or laptop! Project page: smerf-3d.github.io ArXiv: arxiv.org/abs/2312.07541 (1/n)
1
1
2
125
Replying to @conroydave
Thanks for the heads up. It looks like it's running to me, except for a few dropped images. Will fix ASAP.
2
2
1,157
@weargustin I missed your Kickstarter! Any way to get an order in? If not, any friends you recommend in your place?
1
2
My first blog post in over a year: ADMM revisited stronglyconvex.com/blog/admm…
1
1
2
No GSplats were harmed in the making of SMERF :)
32
When using log likelihood as a proxy for human judgement ("quality"), we obtain "Global Temperature Sampling", a globally-normalized decoding method that optimally traverses the quality-diversity curve. (2/7)
1
1
@bumptech Do you guys have a public API?
1
Replying to @RadianceFields
Big thanks to @RadianceFields for the quick write-up! Fantastic work :)
2
337
Spot on article on the state of AI and the Mind. Definitely worth the read! "Despite the remarkable commercial success of current AI systems...we still have a long way to go in mimicking truly human like intelligence." hai.stanford.edu/news/the_in…
1
1
While this blog post may only have two authors, the project itself is the hard work of a number of amazing teammates. Take a peak at the "Acknowledgments" section -- you may spot a few familiar names :)
2
2,435
We perform the first large-scale human study (>38,000 ratings) comparing decoding method/hyperparameter combinations against each other. When controlling for entropy, we find nucleus > top-k > temperature sampling in low-entropy regimes. (4/7)
1
2
Just published a Kalman Filter library for Python. Go check it out! documentation: pykalman.github.com/ , and source: github.com/pykalman/pykalman
1
2
@kyledoherty list comprehensions are a higher form of happiness, directly followed by grilled cheese sandwiches.
1
Goodbye, @rxbofficial . It's been a wonderful 5 years to know you, and I look forward to whatever comes next.
1
@DataKind I'd like to come to the DataDive, but will be traveling from MA. Can you hook me up with anyone else to split a place to stay?
1
1
1
Replying to @mmalex
Thank you! We're all super happy with how this turned out :)
56
Surprisingly, this method is *worse* than token-by-token decoding methods according to human raters! We discover this is a consequence of the "Likelihood Trap", wherein samples with exceptionally high likelihood receive low human judgement scores. (3/7)
1
1
Replying to @noahlt
@noahlt I DON'T KNOW WHAT YOU'RE TALKING ABOUT
1
Replying to @syhw
@syhw I've seen that picture many times and know what it means, but somehow the intuition doesn't pop out to me. Still, I should include it.
Replying to @johnmyleswhite
*Fidgets excitedly in my seat* Oh Boyd Oh Boyd Oh Boyd Oh Boyd.
1
Is this not the most beautiful thing you've ever seen? theoatmeal.com/blog/bearodac…
1
1
Thanks for noticing! Fixed.
1
Replying to @heykushan
Not just DeepMind! This wouldn't have been possible without my amazing colleagues in Google Research, Google Inc, and Tübingen.
1
1
95
@ritikm @cuttlewig free ice cream in Cory Courtyard!
1
1
Replying to @wxswxs
The original NeRF folks produced depth maps and meshes learned from those maps on their project website. You can see our depth map of Trevi Fountain in the overview video @ 2:25 piped.video/watch?v=yPKIxoN2…
1
1
Replying to @bengeliscious
If your observations can be expressed as integrals a la CT scans, I don't see why not :)
1
1
Replying to @ali_thespaceguy
We accomplished this using a couple thousand photos from the Image Matching Challenge 2020 dataset. vision.uvic.ca/image-matchin…
1
We fit a small CNN, MLPs, and a linear model w/ and w/o Natural Gradient Descent (a second order optimizer) and find all generalize more poorly than those trained w/ gradient descent.
1
1
1
Thanks to @jdanbrown, Mirai is cleaner and safer than ever. Mirai: multithreading done right for Python! github.com/duckworthd/mirai
1
Lookout, Sims just dropped his new EP #wildlife for free! sims.bandcamp.com/album/wild…
1
Replying to @rezart
@rezcubed man, that's what I get further going to a school that starts early!
Replying to @Alpa
@Alpa @echen Very nice! How do you select and manage your custom worker pool?
1
Replying to @MusingsOfSamu
@Samu_tweetz I had no idea you were a metalhead, lol
1
1
Replying to @cuttlewig
@cuttlewig as measured in opportunities to visit the donutdonut shop?
1
1
There is a God, and he wrote XStream. Shit is magic I swear.
1
Happiness is Mochi + Strawberry twitpic.com/6h6rg0
1
1
This is a highly unintuitive result! In linear regression, training on a whitened dataset w/ fewer data points than dimensions results in a model that *cannot* of doing better than random chance on a validation set!
1
1
Further, applying gradient descent on a whitened dataset is *exactly* equivalent to applying Newton's Method on the original dataset. This suggests that models trained w/ second order methods may generalize as well as those trained w/ SGD.
1
1
We further find that, when pairing samples from decoding methods with random samples from the model *with equal likelihood*, temperature sampling is preferred to nucleus and top-k sampling by human raters. (5/7)
1
1
@DOOMTREE I'm digging the Summer Tour, but is there no hope of a West Coast stop?
Thanks! My collaborators and I are super proud of our work :)
1
27
Replying to @wholemars
Want to learn more? Check out my OG explainer thread:
Introducing SMERF: a streamable, memory-efficient method for real-time exploration of large, multi-room scenes on everyday devices. Our method brings the realism of Zip-NeRF to your phone or laptop! Project page: smerf-3d.github.io ArXiv: arxiv.org/abs/2312.07541 (1/n)
1
53
From the moment NeRF was first published, the research community knew it would be something game-changing. I'm proud to be part of the team turning this amazing line of work into a real product experience!
Immersive View gives users a virtual, close-up look at indoor spaces in 3D! Learn how it uses neural radiance fields to seamlessly fuse photos to produce realistic, multidimensional reconstructions of your favorite businesses and public spaces → goo.gle/3X6L9G8

ALT Reconstruction of The Seafood Bar in Amsterdam in Immersive View.

2
1
5,606
Replying to @MartinNebelong
This is absolutely wild! The future is now.
1
183