Building AI that learns by interacting with the world. Associate Professor @ MIT, leading the Scene Representation Group (scenerepresentations.org).

Cambridge, Massachusetts
Introducing MilliVid, our new method for long-context video generation! MilliVid creates videos that are consistent over long time spans, without using retrieval heuristics or 3D maps! (1/n) davidcharatan.com/millivid/#
11
69
402
54,501
In personal news, I’m thrilled to announce that I’ll be joining @MIT as tenure-track assistant professor in July 2022! My lab will investigate neural scene representations, inverse graphics, neural rendering, and their applications in vision, graphics, robotics, and AI! (1/n)
79
28
1,441
Excited to share our work on "Implicit Neural Representations with Periodic Activations" vsitzmann.github.io/siren We show how to fit complex signals, such as room-scale SDFs, video, & audio, and supervise implicit reps via their gradients to solve boundary value problems! (1/n)
26
220
910
We released the code for SIREN! vsitzmann.github.io/siren We also wrote a comprehensive Colab notebook with a no-frills implementation that reproduces image, audio, and poisson experiments, and explores initialization- and shift-invariance properties! colab.research.google.com/gi…
5
162
622
Introducing “FlowMap”, the first self-supervised, differentiable structure-from-motion method that is competitive with conventional SfM like Colmap! cameronosmith.github.io/flow… IMO this solves a major missing piece for internet-scale training of 3D Deep Learning methods. 1/n
11
102
607
128,581
Introducing “Neural Descriptor Fields: SE(3)-Equivariant Object Representations for Manipulation”! yilundu.github.io/ndf/ (w/ video!) NDFs are an object representation for robotic manipulation enabling imitation of pick-and-place tasks with pose generalization guarantees (1/n)
8
98
554
Implicit neural representations have recently gotten a lot of attention. I have compiled a reading list that I give students to get started in this area, inspired by the awesome-computer-vision list with extra commentary & notes. Check it out! github.com/vsitzmann/awesome…
4
126
558
Introducing "Light Field Networks: Neural Scene Representations with Single-Evaluation Rendering"! vsitzmann.github.io/lfns (w/ video!) LFNs are the first fully implicit neural scene representation with real-time rendering, without post-processing / hybrid data-structures! (1/n)
7
117
530
I am hiring graduate students for my new lab at MIT, where I will start as faculty in July 2022! If you want to push what's possible with neural scene representations & inverse graphics please apply under: gradapply.mit.edu/eecs/apply… Deadline is Dec 15th!
4
111
529
Introducing Diffusion Forcing, a new way of training sequence generative models that unifies next-token prediction (think LLM) and full-sequence diffusion (think video diffusion models)! I’m super excited about this - it has a number of unique skills! (1/n)
Introducing Diffusion Forcing, which unifies next-token prediction (eg LLMs) and full-seq. diffusion (eg SORA)! It offers improved performance & new sampling strategies in vision and robotics, such as stable, infinite video generation, better diffusion planning, and more! (1/8)
4
61
519
64,437
Introducing Neural Jacobian Fields, robot 3D kinematic models learned only from vision! They can model & control robots from just a single RGB camera, even those w/ intractable kinematics & no embedded sensors such as soft, 3D-printed pneumatic hands! sizhe-li.github.io/publicati… 1/n
2
78
500
54,204
Introducing “FlowCam: Training Generalizable 3D Radiance Fields w/o Camera Poses via Pixel-Aligned Scene Flow”! We train a generalizable 3D scene representation self-supervised on datasets of raw videos, without any pre-computed camera poses or SFM! cameronosmith.github.io/flow… 1/n
2
91
468
88,512
Introducing “Diffusion with Forward Models”, 𝗮 𝗺𝗼𝗱𝗲𝗹 𝘁𝗵𝗮𝘁 𝗰𝗮𝗻 𝗴𝗲𝗻𝗲𝗿𝗮𝘁𝗲 𝗱𝗶𝘃𝗲𝗿𝘀𝗲, 𝗿𝗲𝗮𝗹 𝟯𝗗 𝘀𝗰𝗲𝗻𝗲𝘀 𝗳𝗿𝗼𝗺 𝗮 𝘀𝗶𝗻𝗴𝗹𝗲 𝗶𝗺𝗮𝗴𝗲, 𝘁𝗿𝗮𝗶𝗻𝗲𝗱 𝘄𝗶𝘁𝗵 𝗶𝗺𝗮𝗴𝗲𝘀 𝘄/𝗼 𝗮𝗻𝘆 𝟯𝗗 𝗱𝗮𝘁𝗮! diffusion-with-forward-model… 1/n
14
90
476
88,724
NeRFs will transform computer graphics. But we need to be able to edit them! In “Decomposing NeRF for Editing via Feature Field Distillation” we use Image and Image/Language foundation models for easy, query-based editing via language- and patch queries! pfnet-research.github.io/dis…
6
73
450
In personal news, I graduated Stanford with my thesis on "Self-supervised Scene Representation Learning"! My next stop will be a postdoc at MIT, with Josh Tenenbaum, Fredo Durand, and Bill Freeman, starting in August. If you're around & wanna talk neural scene reps, reach out!
17
7
434
For folks prepping their grad school documents right now, I published my SoPs for my PhD, my MSc, and my MSc scholarship here: github.com/vsitzmann/phd-mas… Definitely don't copy it (I'd probably write it differently today), but maybe it'll give you some inspiration for your app docs!
6
73
364
I am looking to hire a PhD student with a background in representation theory and an interest in geometric deep learning *for vision*. If that is you, please apply to MIT (deadline Dec 15th) and mention me in your application, I would love to chat!
8
55
274
59,380
Introducing pixelSplat: feed-forward Gaussian splats from image pairs! Led by @DavidCharatan and @sizhe_lester_li, collaborating with @taiyasaki! We propose a memory-efficient, fast and editable alternative to pixelNeRF based on 3D Gaussian Splatting! davidcharatan.com/pixelsplat… 1/n
8
46
270
47,612
Considering a PhD and interested in differentiable rendering, self-supervised representation learning in vision, and 3D scene representations? Apply to MIT and consider my research group! scenerepresentations.org/ Application under gradapply.mit.edu/eecs/apply…!
2
31
223
Many people have asked us how SIREN is different from positional encodings (ReLU P.E.). First, SIREN fits complex signals, such as images, audio, video, etc. better - see video, website, paper! (1/n)
3
40
214
Methods such as PixelNeRF can synthesize novel views given few input images. However, they are limited to simple scenes and small baselines. In our CVPR paper, we present a method for high-quality novel view synthesis given only two distant observations: yilundu.github.io/wide_basel…
1
34
207
31,036
Our NeurIPS spotlight on using neural light fields instead of 3D neural fields for scene representation has been covered by MIT news! Neural light fields allow rendering without ray-marching. IMO, light fields are the way to go for learning at scale! news.mit.edu/2021/3-d-image-…
29
201
Wow - we are honored that pixelSplat wins the "Best Paper Runner-Up" at CVPR 2024! Congratulations to @DavidCharatan and @sizhe_lester_li who made this happen in their first year of their PhD - you guys rock!! Also thanks @taiyasaki for the fun collab :)
Introducing pixelSplat: feed-forward Gaussian splats from image pairs! Led by @DavidCharatan and @sizhe_lester_li, collaborating with @taiyasaki! We propose a memory-efficient, fast and editable alternative to pixelNeRF based on 3D Gaussian Splatting! davidcharatan.com/pixelsplat… 1/n
6
10
168
16,302
Welcome Ayush Tewari (@_atewari) and Krishna Murthy (@_krishna_murthy), who have joined MIT as post-docs! Thrilled to have you as colleagues :) Lots more cool work on Neural Rendering, Scene Representations, and ML for Physics to come :D
6
5
171
Sometime in the next few weeks, we will do an explainer video on world models, video gen models, and embodied intelligence. If you have any questions you'd like me to discuss, please post them in the replies!! First time I'm doing something like that, I hope it'll be interesting!
Ask us your questions about embodied intelligence or AI systems that interact w/the world. We’re featuring a few in an upcoming explainer w/MIT prof. Vincent Sitzmann (@vincesitzmann). For more on his work: vincentsitzmann.com/
7
9
159
17,535
Our paper on learning controllable 3D robot models from vision is published in Nature! Huge congrats to Lester and the team, @annan__zhang, @BoyuanChen0, Hanna Matusik, Chao Liu, and Daniela Rus!! Learning joint world models for the environment & the agent is super exciting :)
Now in Nature! 🚀 Our method learns a controllable 3D model of any robot from vision, enabling single-camera closed-loop control at test time! This includes robots previously uncontrollable, soft, and bio-inspired, potentially lowering the barrier of entry to automation! Paper: nature.com/articles/s41586-0… (1/n)
3
20
152
20,115
If you have experience in generative modeling and differentiable rendering and are looking to join a fun team, I've recently co-founded a stealth startup in this space and we're looking for 1-2 ML experts still. Reach out w/ email & summary of what you've worked on in the past!
2
16
152
31,840
My student @DavidCharatan published all the code to make figures for both pixelSplat and FlowMap in the respective github repositories, and just published the Figma file that generates the figures in the paper as well: figma.com/file/WLHx9d6qDRol9…
2
15
144
13,764
Our new method for diffusion stitching allows us to generate ultra-long video sequences that follow a long, pre-defined camera trajectory! All segments are generated in parallel (not auto-regressive) and so the model never generates walls that it has to later step through!
Introducing Generative View Stitching (GVS), a non-autoregressive sampling method for length extrapolation of video diffusion models. GVS enables collision-free camera-guided video generation for predefined trajectories, including Oscar Reutersvärd's Impossible Staircase (1/9).
1
17
154
22,017
How can we learn to generate 3D scenes directly with diffusion models if we only have images, no ground-truth 3d scenes? Ayush, Tianwei and George will tell you at our poster “diffusion with Forward Models”, #202!
16
145
13,646
In other news, I have a first version of the website for my future MIT research group! scenerepresentations.org/ The name I have decided on for now is the "Scene Representation Group". There's even a logo! Thanks to the amazing @ludwigschubert for opinions & hands-on help :)
7
5
140
Excited to introduce DittoGym @ ICLR, in which we study the control of a neat new kind of robot: soft shape-shifters! This is work done by @SuningHuan44558 during his visit at my group at MIT, jointly with my student @BoyuanChen0! Project page: dittogym.github.io/ 1/n
3
26
144
29,963
My student Boyuan always joked he would open a Boba shop someday - he actually made it happen, and now students on our floor are always supplied with (free) Boba 😀
I quit PhD (for a day) and opened a boba shop at @MIT - Generative Boba! It’s a huge success - right next to our office so all the AI researchers are enjoying it. Checkout our boba diffusion algorithm in the poster to understand why boba generation is so important to @MIT_CSAIL !
2
3
140
24,547
Introducing XFactor: the first pose- and geometry-free method capable of true Novel View Synthesis (NVS). We re-think NVS and the concept of camera poses completely without concepts from multi-view geometry as a pure representation learning problem! mitchel.computer/xfactor/ (1/n)
2
24
151
9,078
Recordings of our CVPR 2022 workshop on neural fields are now public - check it out!
Neural fields are emerging as useful signal representations in computer vision & beyond. Our full-day introductory @CVPR tutorial on the topic is now public. Video: piped.video/PeRRp1cFuH4 Slides: drive.google.com/drive/folde… Web: neuralfields.cs.brown.edu/cv…
16
136
Thrilled to share that this paper was accepted to NeurIPS as a spotlight! Code coming soon as well!
Introducing “Diffusion with Forward Models”, 𝗮 𝗺𝗼𝗱𝗲𝗹 𝘁𝗵𝗮𝘁 𝗰𝗮𝗻 𝗴𝗲𝗻𝗲𝗿𝗮𝘁𝗲 𝗱𝗶𝘃𝗲𝗿𝘀𝗲, 𝗿𝗲𝗮𝗹 𝟯𝗗 𝘀𝗰𝗲𝗻𝗲𝘀 𝗳𝗿𝗼𝗺 𝗮 𝘀𝗶𝗻𝗴𝗹𝗲 𝗶𝗺𝗮𝗴𝗲, 𝘁𝗿𝗮𝗶𝗻𝗲𝗱 𝘄𝗶𝘁𝗵 𝗶𝗺𝗮𝗴𝗲𝘀 𝘄/𝗼 𝗮𝗻𝘆 𝟯𝗗 𝗱𝗮𝘁𝗮! diffusion-with-forward-model… 1/n
2
14
135
31,396
📢📢📢 Code for our 3D generative model that learns to generate 3D appearance and geometry from just a single image is out now! It's trained just from real-world multi-view images, and generates scenes directly w/o score distillation! github.com/ayushtewari/DFM/
1
17
134
16,988
We have released the code for Video Diffusion via 3D UNet and Temporal Attention trained with Diffusion Forcing! The results are really cool - the model can roll out far beyond the training horizon :) Thanks to our UROP @kiwhansong0 who is really outstanding!!
Diffusion Forcing Update: code & ckpt for 3D-UNet + Temporal Attention version is released thanks to my amazing undergrad mentee @kiwhansong0! See project github for more info. I also added suggested future directions to the our website boyuan.space/diffusion-forci…. Check them out!
1
11
132
16,119
At CVPR and interested in INRs / Neural Fields / Neural Scene Representations? Come to our tutorial on Monday! We'll cover fundamental techniques and latest advances in Neural Fields, and reflect on what's next together with exciting invited speakers! neuralfields.cs.brown.edu/cv… 1/4
1
25
130
Our review article on neural fields is out! arxiv.org/abs/2111.11426 W/ @yiheng @yongyuanxi @psyth91 @orlitany Shiqin Yan (bit.ly/3l2obhK) Numair Khan (bit.ly/30VuRY6) @fedassa Sunny Li (bit.ly/3cJtrCa) @jtompkin @drsrinathsridha (equal advising). (1/n)
2
25
124
(1/4) Announcing the 2nd 3DReps Workshop at ICCV on "Learning 3D Representations for Shape and Appearance"! We will bring together researchers across disciplines, from vision to graphics, neuroscience, robotics, and geometry! ivl.cs.brown.edu/3DReps/
2
22
119
We wrote a new video diffusion paper! @kiwhansong0 and @BoyuanChen0 and co-authors did absolutely amazing work here. Apart from really working, the method of "variable-length history guidance" is really cool and based on some deep truths about sequence generative modeling....
Announcing Diffusion Forcing Transformer (DFoT), our new video diffusion algorithm that generates ultra-long videos of 800+ frames. DFoT enables History Guidance, a simple add-on to any existing video diffusion models for a quality boost. Website: boyuan.space/history-guidanc… (1/7)
3
8
121
13,009
Finally get to tweeting about this: @BoyuanChen0, my student co-advised with @RussTedrake, graduated recently! Boyuan has done groundbreaking work on video generative modeling and video as the "language" of robotics. He is off to OpenAI where I am sure he will do amazing things!
5
3
123
18,973
Great results - congrats to the team! I think that this is already very close to outperforming any differentiable-rendering based NVS method. While NVS has always been somewhat of a toy problem, I think pose-conditioned diffusion models plus large data have essentially solved it.
Hi there, 🎉 We are thrilled to introduce Stable Virtual Camera, a generalist diffusion model designed to address the exciting challenge of Novel View Synthesis (NVS). With just one or a few images, it allows you to create a smooth trajectory video from any viewpoint you desire. We’re naming this model in tribute to the Virtual Camera cinematography technology. @StabilityAI 🏠 Project Page: stable-virtual-camera.github… 📄 Paper: stable-virtual-camera.github… 📃 Blog: stability.ai/news/introducin… 💻 Code: github.com/Stability-AI/stab… 🤗 Model Card: huggingface.co/stabilityai/s… 🚀 Gradio Demo: huggingface.co/spaces/stabil… 🎬 Video: piped.video/channel/UCLLlVDc…
1
10
113
14,168
David & Lester have just updated the pixelSplat code, now supporting 3-view training in addition to 2-view and with some improvements suggested to us by reviewers, including new pre-trained checkpoints! davidcharatan.com/pixelsplat… More exciting work coming very soon, stay tuned!
18
117
12,032
Concerned about terminology of neural implicit reps / neural coordinate-based reps / etc.? We have a review article forthcoming (on arXiv soon!) that argues that the appropriate term is "neural field", i.e., neural light field, neural signed distance field, etc. (1/n)
4
16
114
Introducing Neural Isometries where we show how to exploit equivariant ML even for transformations that are “nasty”, e.g. non-compact, projective, nonlinear, or not even a group action! arxiv.org/abs/2405.19296 Collab w/ the amazing Tommy Mitchel @twmitchel and Mike Taylor! 1/n
3
20
115
16,004
I have made a slideshare account and will start uploading slides for some of my presentations / talks / courses, starting with the slides for the introduction to novel view synthesis @siggraph 2021. Feel free to re-use them! Find them here: slideshare.net/VincentSitzma…
1
8
98
Strongly agree with Jon - thanks for putting it together so concisely. IMO there is no point in either this form or the NeurIPS checklist - they come from a good place but ultimately serve only to make the whole experience even more cumbersome and stressful, with no upside.
It looks like @CVPR has implemented a new mandatory "Compute Reporting Form" that must be submitted alongside any paper submission. Though I am sympathetic to the motivations for this change, I am opposed to it for a variety of reasons:
2
1
105
18,911
This paper was accepted to NeurIPS! More stuff coming soon :)
Introducing “FlowCam: Training Generalizable 3D Radiance Fields w/o Camera Poses via Pixel-Aligned Scene Flow”! We train a generalizable 3D scene representation self-supervised on datasets of raw videos, without any pre-computed camera poses or SFM! cameronosmith.github.io/flow… 1/n
2
7
103
15,885
Really happy to see this study! Always wanted to do something like this myself, if only to support calming words to grad students: current-gen generative models have nothing to do with intelligence, and AI research remains fascinating and unsolved!
Curious whether video generation models (like #SORA) qualify as world models? We conduct a systematic study to answer this question by investigating whether a video gen model is able to learn physical laws. Three are three key messages to take home: 1⃣The model generalises perfectly for in-distribution data, but fails to do out-of-distribution generalization. For combinatorial scenarios, scaling law is observed. 2⃣The models fail to abstract general rules and instead tries to mimic the closest training example. 3⃣The model prioritizes different attributes when referencing training data: color > size > velocity > shape. This work is a joint effort with our outstanding intern @YangYue_THU. Paper: arxiv.org/abs/2411.02385 Webpage: phyworld.github.io/
3
8
102
9,841
For anyone at NeurIPS: at our startup for 3D asset generation for gaming, we are still looking for one Research Scientist and one ML Engineer with a background in 3D generation, mesh processing, animation, etc. Please reach out and let's chat if that is you!
1
14
102
24,172
Our NeurIPS 2020 paper "MetaSDF" shows how to rapidly fit neural implicit representations with gradient-based meta-learning. We wrote a Colab with a stand-alone implementation of MAML, Siren, and ReLU MLPs so you can jump right in and easily extend it! colab.research.google.com/gi…
1
16
99
If you are looking to do a PhD on inverse graphics, 3D computer vision, differentiable rendering, etc, please apply to Ayush's lab at the University of Cambridge! He is brilliant, very patient, and a kind human :)
I am looking for graduate students for my new lab at the University of Cambridge! Join me to understand and build models of visual perception.
4
98
11,950
This looks great! I think in general, it seems that any reconstruction problem that is reasonably well-determined given the input data can be solved with supervised learning for in-distribution test data (where it's of course interesting to ask what constitutes in-distribution)!
Excited to share MonST3R! -- a simple way to estimate geometry from unposed video of dynamic scene We achieve competitive results on several downstreams (video depth, camera pose) and believe this is a promising step toward feed-forward 4D reconstruction monst3r-project.github.io
2
4
102
9,737
Me and some members of my research group (@_atewari, @GCazenavette, @omcamsmith) will be at NeurIPS - talk to us about our work on training 3D diffusion models only from images (diffusion-with-forward-model…) and pixelNeRF without camera poses (scenerepresentations.org/pub…)! #NeurIPS2023
2
4
95
13,776
Come to our ICCV workshop on 3D representations - lots of fantastic talks and panels, and a poster session with some of the coolest recent work on 3D reps and neural rendering!
We are organizing the Second #3DReps workshop at @ICCV_2021 to bring together researchers working on learned 3D representations. We have an amazing lineup of invited speakers and posters. Please join us tomorrow (Oct 17)! Full schedule here: ivl.cs.brown.edu/3DReps
3
5
96
Amazing work by Ben Mildenhall et al: NeRF! They clearly demonstrate that implicit representations can achieve photorealism, by using a volume renderer instead of a raymarcher and with a smart positional encoding. Let's tackle generalization next! matthewtancik.com/nerf
21
90
Super exciting work by friends at MIT! Auto-regressive video generative models are the way to go :) Stay tuned, we are cooking, too - if you're at NeurIPS, make sure to come to our Diffusion Forcing poster!
Video diffusion models generate high-quality videos but are too slow for interactive applications. We @MIT_CSAIL @AdobeResearch introduce CausVid, a fast autoregressive video diffusion model that starts playing the moment you hit "Generate"! A thread 🧵
1
5
91
10,993
Wish I could be at ICCV but alas can't make it due to a wedding :( Have fun everyone and see you at @NeurIPSConf 2025!
1
1
92
10,812
At the Dagstuhl Seminar on morphable models and beyond - looking forward to finally meeting graphics folks in person again! ⁦@dagstuhl
5
2
88
Really cool! Loved the below paragraph in particular, a cool way to think about gradients!
I ran this experiment to show that duality-based optimizers like Muon are not only *fast* but also *numerically different* to vanilla gradient descent. In particular, the weights move a qualitatively different amount in the same number of training steps. (1/4)
4
9
90
14,508
We are excited to have Andreessen Horowitz on board, who funded our seed round with $5M!
.@yellow_3d_ has raised $5M from @a16zGames to further develop its Gen AI-powered 3D character modeling tool venturebeat.com/games/yellow…
7
2
89
23,960
If you're at Eurographics and interested in inverse graphics, neural fields, and neural rendering, check out the STAR presentations happening tomorrow: 1. Neural Fields in Visual Computing and Beyond 2. Advances in Neural Rendering eg2022.univ-reims.fr/pr-star…
2
13
84
Fantastic neural rendering results leveraging SIREN! They leverage a SIREN in a NERF-like neural rendering framework. The SIREN is conditioned on a random latent code z, and a Holo-GAN like adversarial loss provides the training signal.
Introducing pi-GAN: Periodic Implicit GANs for 3D-Aware Image Synthesis. Trained on unlabeled images, π-GAN generates 3D representations and synthesizes images from arbitrary poses. Website: marcoamonteiro.github.io/pi-… @monteiroamarco Petr Kellnhofer @GordonWetzstein @jiajunwu_cs
1
13
81
Check out Scene Representation Networks: piped.video/6vMEBWD8O20 Our new continuous 3D-aware scene representation reconstructs appearance and geometry just from posed images, generalizes across scenes for single-shot reconstruction, and naturally handles non-rigid deformation!
Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations. The idea of differentiable ray-marching looks promising! arxiv.org/abs/1906.01618 #computervision
27
83
This project, led by UBC's Daniel Rebain, was a lot of fun - we propose a novel parameterization of 3D scene geometry via the medial field, which intuitively parameterizes the local "thickness" of a 3D shape. This has a variety of cool applications in physics & graphics!
Deep Medial Fields pdf: arxiv.org/pdf/2106.03804.pdf abs: arxiv.org/abs/2106.03804 an implicit representation of the local thickness, that expands the capacity of implicit representations for 3D geometry
3
19
79
We're beginning the beta-testing phase of our first product, a text-to-3D-character mesh generator. More cool stuff coming soon :)
Announcing the AI Character Shape Generator by @yellow_3d_ 🟡 This handy Daz Studio plug-in allows you to generate character mesh shapes from simple text prompts, transforming the character creation process.
6
82
8,877
Find us at poster 226 where Cameron is presenting his cool work “FlowCam”!
1
4
81
10,232
Very cool work by friends at Google on diffusion models for novel view synthesis! Goes to show, no ray-marching or volume rendering necessary... It's all light fields ;)
Excited to announce our work on novel view synthesis with diffusion models! Our model can lift a single 2d image into 3d. 3d-diffusion.github.io Joint work w/ @wchan212 @rmbrualla @hojonathanho @taiyasaki @mo_norouzi
7
83
I'm very excited to be part of a new NSF center: HAND, which is focused on developing the next generation of dexterous robots! A key motivation for vision for me has always been embodied, intelligent agents, and I think that vision & robots are now closer than ever before!
The @NSF announced the Human AugmentatioN via Dexterity (HAND) Engineering Research Center (ERC)! Our Center will revolutionize how #robots augment human labor using dexterous robot hands with AI-powered skills and intuitive interfaces. new.nsf.gov/news/nsf-announc…
3
76
5,789
I'm thrilled that many members of the Scene Representation Group will be at CVPR! Catch them and chat about their work :) I shout them out below in no particular order:
3
5
75
14,151
In this project, led by Yunzhu and Shuang Li, we explored the application of 3D implicit representations to visuomotor control. The idea is to use the latent of a conditional 3D implicit representation as a state representation, and use the resulting state space for control (1/n)
Introducing “3D Neural Scene Representations for Visuomotor Control”! bit.ly/2VoHt71 (w/ video!) We combine implicit neural scene representations with intuitive physics models, enabling visuomotor control of dynamic 3D scenes from out-of-distribution viewpoints. (1/7)
2
12
75
This is a great example of how neural implicit representations enable principled implementations of symmetries - here, rotation and translation equivariance. Cool work!
Alias-Free GAN pdf: nvlabs-fi-cdn.nvidia.com/ali… project page: nvlabs.github.io/alias-free-… networks match the FID of StyleGAN2 but differ dramatically in their internal representations, and they are fully equivariant to translation and rotation even at subpixel scales
2
11
73
I will be at @NeurIPSConf in New Orleans - hit me up if you'd like to chat about differentiable rendering, 3D representation learning, neural scene representation, etc :)
1
72
Some students have approached me for guidance on how to deal with a review of their NeurIPS submission that was clearly LLM generated. Apart from sending a message to the AC, I was just looking through all the NeurIPS guidelines (Reviewer Guide, AC Guide), and couldn't find...
2
2
69
17,589
Check out our NeurIPS paper! This was a cool little project. Two key ideas are generative modeling of multi-modal data via neural implicit representations / neural fields, and auto-decoder based manifold learning instead of GANs/VAEs. @du_yilun and @katie_m_collins killed it!
Check out generative manifold learning! yilundu.github.io/gem/ We show how to capture the manifold of any signal modality (including cross-modal ones), by representing each signal as a neural field. We can then traverse the latent space between signals and generate new samples!
2
4
63
Very cool work - it is remarkable how much more robust normal estimation is compared to depth estimation, I think there is more to be done here!
Introducing IronDepth, a framework that uses surface normal and its uncertainty to iteratively refine the predicted depth map (to appear in #bmvc2022). Visit baegwangbin.github.io/IronDe… for more detail. Joint work with @IgnasBud and @robertocipolla.
8
66
Last week, I was at the #ICVSS2022 summer school in Sicily. It was my first summer school, and it was so much fun and very inspiring! It was amazing to chat with students, and also to meet, exchange ideas, and receive great advice from fellow faculty...
1
1
64
Another one on *generalizing* implicit representations: “MetaSDF: Meta-Learning Signed Distance Functions” vsitzmann.github.io/metasdf We identify a key connection between learning of implicit function spaces and meta-learning, and reconstruct SDFs faster & more accurately! (1/n)
2
12
60
I've always been a big fan of the work of Danijar and co-authors on the "Dreamer" line of publications. I'm hence hyped that Diffusion Forcing plays a part in their most recent paper - exciting work, as always!
Replying to @danijarh
▶️ Shortcut forcing builds on diffusion forcing and shortcut models, training a sequence model with both the noise level and requested step size as inputs This enables much faster frame-by-frame generations than diffusion forcing, without needing a distillation phase ⏱️
1
4
67
10,132
If you’re looking for me at NeurIPS, I am sitting in my AirBnB with a sore throat and blocked sinuses… of course I get sick immediately upon traveling again 😭
6
65
10,401
Meanwhile, I am already at MIT, and will be looking for incoming graduate students and postdocs who want to work on the big questions in this area :) You can find more on my research under vsitzmann.github.io - if that's you, get in touch, and let’s chat! (2/n)
4
3
63
We have released the code for our paper "Neural Isometries"! All the links here: scenerepresentations.org/pub… Amazing work by @twmitchel :)
11
58
9,592
Peter's paper on intrinsic image est. with diffusion models was accepted to CVPR! The core insight is that the task of intrinsic image decomposition is inherently uncertain, and that hence, diffusion models yield significant improvements. Fun collab! Web: peter-kocsis.github.io/Intri…
Our group has fourteen papers accepted at #CVPR'2024! Exciting topics: lots of diffusion & transformers focusing on generative AI for image synthesis, geometry generation, and many more - check it out! I'm so proud of everyone involved - let's go 🚀🚀 niessnerlab.org/publications…
5
58
13,106
This is really cool work!
Symmetries are everywhere — from butterfly’s wings to Greek temples. But detecting them in noisy data? That’s a challenge. 🦋🏛 Our #SIGGRAPHAsia2024 paper, Robust Symmetry Detection via Riemannian Langevin Dynamics, tackles this: symmetry-langevin.github.io/ 🧵(1/n)
1
3
60
8,634
Really cool work from Anagh, with David & Gordon from my old lab - amazing to see light propagate in 3D :)
📢📢📢 A pulse of light takes ~3ns to pass through a Coke bottle—100 million times less than it takes you to blink. Our work lets you fly around this 3D scene at the speed of light, revealing propagating wavefronts of light that are invisible to the naked eye—from any viewpoint! Flying with Photons: Rendering Novel Views of Propagating Light 🌐 Website: anaghmalik.com/FlyingWithPho… ⌨️ Code: github.com/anaghmalik/Flying… w/ @NoahJuravsky, @Po_lhr, @GordonWetzstein, @kyroskutulakos and @DaveLindell
1
6
52
14,762
Thrilled to share the first milestone of our startup @yellow_3d_! We are set on dramatically lowering the barrier-to-entry of making video games and telling stories using 3D content in general! 1/n
3
57
8,990
Cool paper by Ana, co-advised with Justin Solomon - her first MIT paper, and a great one at that :)
Can neural fields help unlock new understandings of an old geometry problem? Excited to announce our latest SIGGRAPH Asia work: Variational Barycentric Coordinates! 🧵 - with @OdedStein @vincesitzmann @JustinMSolomon
1
3
57
13,350
Check out the Huggingface demo that @kiwhansong0 and @BoyuanChen0 made - you can play with Diffusion Forcing and history guidance to get a feeling for how, together, they allow generating ultra-long, consistent video! huggingface.co/spaces/kiwhan…
2
11
56
10,873
We’ve put a lot of time into this review paper - it’s likely useful to you if you’re writing on neural fields!
If you're working on a neural field/coordinate-based neural net paper for @eccvconf, you may want to use our review to help with your related work section. #eccv2022 #neuralfields @neural_fields
5
55
A thread and video by MIT CSAIL about our Diffusion Forcing paper!
Sequence models have skyrocketed in popularity for their ability to analyze data & predict what to do next. MIT’s "Diffusion Forcing" method combines the strengths of next-token prediction (like w/ChatGPT) & video diffusion (like w/Sora), training neural networks to handle corrupted data while predicting the next steps. This flexible, reliable sequence model helps produce higher-quality artificial videos and guides more precise decision-making for robots & AI agents: bit.ly/3BK2wWC
3
57
5,776
Amazing work, congrats, Congyue!! We love it :)
Our paper "Zero-Shot Image Feature Consensus with Deep Functional Maps" is accepted at #ECCV2024! @eccvconf Want better image correspondences with noisy and inaccurate features? Let's go to the spectral space with Laplacian eigenfunctions! ArXiv: arxiv.org/abs/2403.12038
1
3
56
11,393
Neural Rendering offers new approaches to computer graphics, inverse graphics, and computer vision. In our state-of-the-art report, we compiled an overview over this emerging field - check it out! arxiv.org/pdf/2004.03805.pdf #EGEV2020 #NeuralRendering
8
53
Come join me in the gather hall 6: representation learning at poster A0 to chat about Light Field Networks / neural light fields! I just joined, and it's basically just me... nips.cc/virtual/2021/poster/…
2
5
52
If you are interested in geometry processing, I highly recommend applying to Silvia's lab - she is brilliant and kind and will be an amazing adviser :)
3
51
7,961
I'm planning my trip to NeurIPS right now - excited to meet everyone again after (sadly) having to miss ECCV for visa reasons...
1
53
7,275
Scene Representation Networks receive an honorable mention in the „Outstanding New Directions“ category! I’m super happy and grateful that folks find this line of work promising.
Want to know which NeurIPS papers were selected for an award, and how the selection was done? Check out our latest blog post on the subject: medium.com/@NeurIPSConf/neur…
3
5
49
Awesome work! It’s really great to do such a principled study to generate a reliable empirical insight that others can build on 😊 Also wonder if we’ll see more work like this going forward following the establishment of @TmlrOrg !
📢📢📢 𝗔𝘁𝘁𝗲𝗻𝘁𝗶𝗼𝗻 𝗕𝗲𝗮𝘁𝘀 𝗖𝗼𝗻𝗰𝗮𝘁𝗲𝗻𝗮𝘁𝗶𝗼𝗻 𝗳𝗼𝗿 𝗖𝗼𝗻𝗱𝗶𝘁𝗶𝗼𝗻𝗶𝗻𝗴 𝗡𝗲𝘂𝗿𝗮𝗹 𝗙𝗶𝗲𝗹𝗱𝘀 We ran A LOT of experiments to find the best way to make neural fields generalize... so you don’t have to! arxiv.org/abs/2209.10684
1
47
Congrats again, Sergey - awesome to collaborate w/ you, Stanford, and the other folks at TRI!
Proud to announce that our paper “Single-Shot Scene Reconstruction” is accepted to #CoRL2021! We use transformers and implicit representations to infer a fully editable 3D scene from a single image. Collaboration between @ToyotaResearch, @Stanford and @MIT.
6
49
Really great work - equivariance in implicit representations has been a long-standing question for me, and this is significant progress!
📢📢📢 Introducing "Vector Neurons" Want a network (and latent space) that act by construction in an equivariant way w.r.t. SO(3) transformations? All you need is to do is to generalize the scalar non-linearity to a vector one (e.g. Vector ReLU) cs.stanford.edu/~congyue/vnn…
5
46