@AnthropicAI. Prev. @Google Brain/DeepMind, founding team @OpenAI. Computer scientist; inventor of the VAE, Adam optimizer, and other methods. ML PhD.

Personal news: I'm joining @AnthropicAI! 😄 Anthropic's approach to AI development resonates significantly with my own beliefs; looking forward to contributing to Anthropic's mission of developing powerful AI systems responsibly. Can't wait to work with their talented team, including a number of great ex-colleagues from OpenAI and Google, and tackle the challenges ahead!
109
90
2,818
350,010
Generative models (such as Dall-E 2 and PaLM) are becoming just such an insanely powerful, almost magic-like technology, it's completely NUTS. And it seems like most (non-ML) people still don't fully grasp the implications. This technology will thoroughly transform society.
76
306
2,373
It was 16 years ago, in 2006, that @geoffreyhinton et al released their demo of deep belief nets. Undergrad me was highly impressed, and helped convince me that deep learning was the way to go. I refreshed Geoff's website almost every day checking for new papers... (1/n)
10
248
1,590
"Variational Inference and Deep Learning: A New Synthesis", written by yours truly, is now available for D/L here: goo.gl/6aGYZ1.
16
276
858
I hope this document ends up in LLM training data 😂
13
16
721
114,112
Check out blog.openai.com/glow/, my work with @prafdhar on improving flow-based generative models with invertible 1x1 convolutions. piped.video/exJZOC3ZceA
6
203
626
New theoretical work on diffusion objectives: arxiv.org/abs/2303.00848 We e.g. show that under a simple condition (monotonic weighting, satisfied by e.g. the v-prediction loss), diffusion objectives equal the ELBO with data augmentation, namely additive noise. 1/2
4
102
615
147,446
New paper: Variational Diffusion Models (VDMs)! arxiv.org/abs/2107.00630 ✅ New general insights into diffusion models ✅ Simple objective ✅ Fast optimization & anytime synthesis ✅ SotA likelihoods & lossless compression Work with @TimSalimans @poolio @hojonathanho (1/n)
6
102
547
Thanks to the ICLR Award Committee! And thank you for the kind words, Max! You were the perfect Ph.D. advisor and collaborator, kind and inspiring. I really couldn't have wished for better.
Thank you Yisong and the Award Committee for choosing the VAE for the Test of Time award. I like to congratulate Durk who was my first (brilliant) student when moving back to the Netherlands and who is the main architect of the VAE. It was absolutely fantastic working with him.
24
18
529
66,654
Someone is obviously really close to solving AGI: adamoptimizer.com
24
46
435
A figure I made for explaining variational autoencoders (VAEs) as part of a larger work-in-progress.
6
116
441
Are nonlinear features learned by deep discriminative, contrastive, autoregressive etc. models arbitrary? No! We show (theoretically and empirically) that, under mild conditions, you will learn the same features every time you train, up to only a linear transformation.
New 📑 w/ @Luke_Metz @dpkingma: arxiv.org/abs/2007.00810 We prove that a large family of deep discriminative models are identifiable in function space up to linear indeterminacy, presenting empiricism on synthetic & real data. Why should our field care about identifiability?👇
3
77
437
Another great result demonstrating that VAEs (deep learning + amortized variational inference) make a lot of sense for data compression. Its loss function directly maximizes compressibility, and the resulting codec is fully parallelizable.
Short but sweet paper on recurrent autoencoder architectures for speech compression. We systematically explore the space of RNN-AEs and show that the best method, dubbed FRAE, outperforms classical codecs by a large margin. Check it out!
3
104
431
Our paper "Variational Autoencoders and Nonlinear ICA: A Unifying Framework" has been accepted to AISTATS'20. With @ilkhem, Ricardo Pio Monti and Aapo Hyvarinen (UCL). Surprisingly strong and general identifiability results, with rigorous proofs! arxiv.org/abs/1907.04809
3
70
368
"The images are preprocessed to 256x256 resolution during training. [...] each image is compressed to a 32x32 grid of discrete latent codes using a discrete VAE that we pre-trained using a continuous relaxation." GPT + VAE + scale = impressive results! openai.com/blog/dall-e/
52
377
“Adam can converge without any modification on update rules” arxiv.org/abs/2208.09632 Proves that (vanilla) Adam is theoretically justified without any modification. Presented at NeurIPS'22.
7
71
381
It's already the case that people's free will gets hijacked by screens for hours a day, with lots of negative consequences. AI video can make this worse, since it's directly optimizable. AI video has positive uses, but most of it will be fast food for the mind.
Very impressed with Veo 3 and all the things people are finding on r/aivideo etc. Makes a big difference qualitatively when you add audio. There are a few macro aspects to video generation that may not be fully appreciated: 1. Video is the highest bandwidth input to brain. Not just for entertainment but also for work/learning - think diagrams, charts, animations, etc. 2. Video is the most easy/fun. The average person doesn't like reading/writing, it's very effortful. Anyone can (and wants to) engage with video. 3. The barrier to creating videos is -> 0. 4. For the first time, video is directly optimizable. I have to emphasize/explain the gravity of (4) a bit more. Until now, video has been all about indexing, ranking and serving a finite set of candidates that are (expensively) created by humans. If you are TikTok and you want to keep the attention of a person, the name of the game is to get creators to make videos, and then figure out which video to serve to which person. Collectively, the system of "human creators learning what people like and then ranking algorithms learning how to best show a video to a person" is a very, very poor optimizer. Ok, people are already addicted to TikTok so clearly it's pretty decent, but it's imo nowhere near what is possible in principle. The videos coming from Veo 3 and friends are the output of a neural network. This is a differentiable process. So you can now take arbitrary objectives, and crush them with gradient descent. I expect that this optimizer will turn out to be significantly, significantly more powerful than what we've seen so far. Even just the iterative, discrete process of optimizing prompts alone via both humans or AIs (and leaving parameters unchanged) may be a strong enough optimizer. So now we can take e.g. engagement (or pupil dilations or etc.) and optimize generated videos directly against that. Or we take ad click conversion and directly optimize against that. Why index a finite set of videos when you can generate them infinitely and optimize them directly. I think video has the potential to be an incredible surface for AI -> human communication, future AI GUIs etc. Think about how much easier it is to grok something from a really great diagram or an animation instead of a wall of text. And an incredible medium for human creativity. But this native, high bandwidth medium is also becoming directly optimizable. Imo, TikTok is nothing compared to what is possible. And I'm not so sure that we will like what "optimal" looks like.
22
36
381
51,932
Wife bought me a sweater... *sigh* we're not even safe in the sanctity of our own homes
13
7
369
This is new to me: AI has the most cited papers in the world, across all scientific fields: natureindex.com/news-blog/go… Our Adam paper is here listed as #1, but is actually seconded by the ResNet paper, which has a bit more citations, according to Google Scholar. Pretty crazy!
4
48
370
VAE aficionados, rejoice! 🥳
📢📢📢 Introducing NVAE 📢📢📢 We show that deep hierarchical VAEs w/ carefully designed network architecture, generate high-quality images & achieve SOTA likelihood, even when trained w/ original VAE loss. paper: arxiv.org/abs/2007.03898 with @jankautz at @NVIDIAAI (1/n)
4
54
367
Happy to see that our work on diffusion model objectives (arxiv.org/abs/2303.00848) is starting to get noticed, with e.g. Stable Diffusion 3 (arxiv.org/abs/2403.03206) building on our result that the flow matching objective can be simply understood as a special case of diffusion.
5
41
366
47,116
Want to understand and/or play with variational diffusion models? - See colab.research.google.com/gi… for a simple stand-alone implementation and explanation. (Thanks @alemi and @poolio for making this)! - See colab.research.google.com/gi… for an even more basic implementation on 2D data.
1
63
324
Thank you! See you guys in Singapore next week 🥳
Replying to @iclr_conf
Test of Time Winner Adam: A Method for Stochastic Optimization Diederik P. Kingma, Jimmy Ba Adam revolutionized neural network training, enabling significantly faster convergence and more stable training across a wide variety of architectures and tasks.
11
6
325
35,777
The recording of our talk for the ICLR'24 test-of-time award (with @wellingmax) is now available online: iclr.cc/virtual/2024/test-of… Biggest live audience I've ever spoken to, with >2000 attendees 😅. But it was a lot of fun!
6
40
306
60,454
Ever wondered, like me, if you can can turn hierarchical VAEs into a fully parallelizable lossless compression scheme? Wonder no more: new blogpost by Friso on "Bit-Swap", his recursive algorithm for doing just that: bair.berkeley.edu/blog/2019/… Disclaimer: he's my sibling 🤓
1
56
297
the year is 20xx. the homies and i are chilling in the off world colonies. agi is expanding through the lightcone. researchers on earth are still trying to replace that pesky adam optimizer
8
8
268
42,274
New likelihood-based autoregressive model by @jacobmenick and @NalKalchbrenner (Google AI Amsterdam), providing fresh evidence that the log-likelihood objective, combined with a sufficiently flexible model, is compatible with high-fidelity samples, even with ImageNet data. [1/8]
More compute, better architectures and novel robust orderings have fueled tremendous progress for AR image models. SPNs are our latest installment. Looking forward to samples from 2020! paper: arxiv.org/abs/1812.01608 open reviews (scores 9,10,7): tinyurl.com/ycn3j3cj
1
73
260
Another step towards better likelihood-based generative models: new paper by Manoj Kumar (@mechcoderr) and collaborators during his Brain residency, exploring flow-based generative modeling of videos.
VideoFlow: A Flow-Based Generative Model for Video extends Glow to a new algorithm for multi-frame video prediction with normalizing flows. Tractable & scalable! Work w/ Manoj Kumar, @babaeizadeh, @chelseabfinn, @svlevine, @laurent_dinh and @dpkingma. arxiv.org/abs/1903.01434
43
251
New theory paper: arxiv.org/abs/2002.11537. We show rigorously that EBMs of the type E(x|y) = f(x)•g(y) are, under fairly mild conditions, (1) identifiable in functions f and g, and (2) universal conditional density approximators, generalizing previous forms of nonlinear ICA.
4
30
238
@YSongStanford and I wrote a tutorial on training energy-based models (EBMs), which was just released on arXiv. Our goal was to provide a friendly introduction to modern parameter estimation methods. Hope it helps people get up to speed! arxiv.org/abs/2101.03288
61
244
2013 throwback: L-BFGS induced latent-space explosion. Love the learning dynamics. (Objective is MAP with MLP-based DLVM with 2D Z.)
2
52
226
Smiles from a non-autoregressive likelihood-based generative model :-)
4
71
222
Worth mentioning: latent-variable models are finally starting to beat autoregressive transformers on likelihood-based benchmarks. Advances in ML have always come from both algorithms and compute. Another datapoint adding to that trend.
New paper: Variational Diffusion Models (VDMs)! arxiv.org/abs/2107.00630 ✅ New general insights into diffusion models ✅ Simple objective ✅ Fast optimization & anytime synthesis ✅ SotA likelihoods & lossless compression Work with @TimSalimans @poolio @hojonathanho (1/n)
3
24
205
It's been 2 years since I finished my PhD thesis, but it won the ELLIS PhD award! ellis.eu/en/news/ellis-phd-a… All thanks to @wellingmax for being possibly the world's best PhD advisor & thanks to great collaborators @TimSalimans, @DeepSpiker, @shakir_za, Jimmy Ba, etc.! [1/2]
9
3
197
The (theoretical) computational cost of matmul(W,X) is proportional to ||W||_0. ||W||_0 is non-differentiable, but if W is stochastic, then E[||W||_0] can be differentiable and optimizable. See: arxiv.org/abs/1712.01312, with Christos Louizos (while at OpenAI) and Max Welling.
3
47
193
I like how the VAE is a strange outlier here, with almost 1000x less training compute than the trend line.
**ML training compute has been doubling every 6 months since 2010!** Our preprint "Compute Trends Across Three Eras of Machine Learning" is out. arxiv.org/abs/2202.05924 🧵 Thread below ↓ 1/
6
11
184
Doing ICLR'25 test of time award with @jimmybajimmyba in a few mins
13
3
191
20,331
Our paper on diffusion model objectives got accepted at NeurIPS'23 as an oral :-) @RuiqiGao will present our work at 10am CST. (Unfortunately I'm unable to be there myself due to getting covid) I'll post the slides below, feel free to reply with questions or comments.
Looking for diffusion model advancements at #NeurIPS2023? Come to check our oral work "Understanding Diffusion Objectives as the ELBO with Simple Data Augmentation" w/ @dpkingma. New theoretical understanding, SOTA empirical results, and more! Arxiv: arxiv.org/abs/2303.00848
6
22
189
39,631
Replying to @g_k_swamy
Ah yes, our secret co-author. Happy the news is out
1
167
17,906
VAEs are so back 😎
7
5
161
46,229
👇 Great work led by Yushun (@ericzhang0410) introducing Adam-mini, a version of Adam that, surprisingly, reduces Adam's memory requirement by 50% (!), without negatively affecting convergence rates. Please read Yushun's thread for details!
Finally finished Adam-mini! A "mini" version of Adam that painlessly frees 50% of memory over Adam. Some highlighted features: 1. Adam-mini saves 50% memory over Adam for all modern neural nets. This is done by removing 99.9% Adam's v (but the last 0.1% of v is essential and necessary). 2. Adam-mini's loss curve closely resembles those of Adam. You can almost see no difference! 3. Adam-mini uses the same hyperparams as Adam (lr, beta1, beta2, eps, etc). No tuning is needed! 4. With the free memory you can enlarge batch size and achieve higher throughput (see paper). 5. Principled design: The design of Adam-mini follows a very simple and general principle related to the Hessian structure of neural nets. Such design stems from classical optimization theory (see paper). Paper: arxiv.org/abs/2406.16793 Code: github.com/zyushun/Adam-mini
5
21
172
28,553
I just re-read this gem: "Variational Dropout Sparsifies Deep Neural Networks" by Dmitry Molchanov, Arsenii Ashukha and Dmitry Vetrov. arxiv.org/abs/1701.05369 . Thanks for this well-written and insightful paper on efficient variational inference over neural net parameters. 👌
2
25
161
What are, according to you, the most interesting or most under-appreciated applications of VAEs? Please share below. Links appreciated. (I'm building a list for future reference.)
32
16
156
68,497
"Fixing Weight Decay Regularization In Adam", Loshchilov & Hutter, arxiv.org/abs/1711.05101. Somewhat surprising results demonstrating easier tuning and better generalization, simply by separating weight decay from the adaptive learning rates. L2 regularization != weight decay.
46
159
Clearly, symbolically defining the term "deep learning" is futile, and we can only accurately represent the concept with a deep neural net. #connectionism
4
11
154
Great blogpost by Ruiqi (and other GDM ex-colleagues), clearly explaining the the connection between flow matching and diffusion models. Super happy they took the time to explain this topic, there's confusion on this topic, I think many will find this quite valuable!
A common question nowadays: Which is better, diffusion or flow matching? 🤔 Our answer: They’re two sides of the same coin. We wrote a blog post to show how diffusion models and Gaussian flow matching are equivalent. That’s great: It means you can use them interchangeably.
1
15
158
19,702
Around the same time, I started reading up on all of @ylecun's papers and online videos. Loved his talk "Who is Afraid of Non-Convex Loss Functions?" from 2007: piped.video/watch?v=8zdo6cnC…. It greatly captures the zeitgeist. (2/n)
6
27
148
P.S. Some believe that maximum likelihood is incompatible with high-quality image generation. This result provides counter-evidence: models with SOTA FIDs (e.g. arxiv.org/abs/2301.11093) are actually optimized with the ELBO, w/ very simple data augmentation (additive noise).
New theoretical work on diffusion objectives: arxiv.org/abs/2303.00848 We e.g. show that under a simple condition (monotonic weighting, satisfied by e.g. the v-prediction loss), diffusion objectives equal the ELBO with data augmentation, namely additive noise. 1/2
3
15
146
37,105
Now in 2022, only 8 years later, we have generative models that can generate pretty realistic 1024x1024 images from arbitrary text descriptions (imagen.research.google/). (7/n)
1
12
149
Uighurs in China are undergoing horrifying human rights violations. Here's an excellent website documenting primary source material, which helped me understand the severity of the situation: shahit.biz/eng
8
31
132
We're releasing GPU kernels for neural networks with block-sparse weights. Blog: blog.openai.com/block-sparse…. Paper: goo.gl/nNkpwV. Kernels written by @scottgray76.
34
134
IMO, best empirical proof to date that AI can be creative. After this sinks in, will there be any naysayers left?
"The images are preprocessed to 256x256 resolution during training. [...] each image is compressed to a 32x32 grid of discrete latent codes using a discrete VAE that we pre-trained using a continuous relaxation." GPT + VAE + scale = impressive results! openai.com/blog/dall-e/
8
6
128
Congrats to @geoffreyhinton for getting the Nobel! His impact is immeasurable, very much deserved.
BREAKING NEWS The Royal Swedish Academy of Sciences has decided to award the 2024 #NobelPrize in Physics to John J. Hopfield and Geoffrey E. Hinton “for foundational discoveries and inventions that enable machine learning with artificial neural networks.”
2
1
129
15,195
Unbeknownst to many, a puppy dies every time one uses Adam but fails to cite it
Replying to @BlancheMinerva
I actually think ADAM has ascended to the level of basic things you don't need to cite any more. We don't still cite Rosenblatt 1958 every time we use a multilayer perceptron.
5
4
133
55,749
We've just made code (a clean Jax/Flax re-implementation) and pre-trained model checkpoints available at: github.com/google-research/v…
New paper: Variational Diffusion Models (VDMs)! arxiv.org/abs/2107.00630 ✅ New general insights into diffusion models ✅ Simple objective ✅ Fast optimization & anytime synthesis ✅ SotA likelihoods & lossless compression Work with @TimSalimans @poolio @hojonathanho (1/n)
2
25
128
Check out Imagen Video, our text2video diffusion model that produces 1280x768 24fps HD videos. Website: imagen.research.google/video… Paper: imagen.research.google/video… More information in Jonathan's thread 👇
Excited to announce Imagen Video, our new text-conditioned video diffusion model that generates 1280x768 24fps HD videos! #ImagenVideo imagen.research.google/video… Work w/ @wchan212 @Chitwan_Saharia @jaywhang_ @RuiqiGao @agritsenko @dpkingma @poolio @mo_norouzi @fleet_dj @TimSalimans
2
16
131
I think @AlecRad deserves a lot of credit for being (IIRC) the first to believe and push on the LLM angle in the early years of OpenAI. With support from @ilyasut @TimSalimans etc.
who saw LLMs coming? e.g. decades (or even 5+ years) ago, X said: when machine learning systems have enough compute and data to learn to predict text well, this will be a primary path to near-human-level AI.
3
8
127
51,591
Energy-based models are very challenging to optimize and synthesize from. In new work, led by brilliant @RuiqiGao, we show how Gaussian diffusion and a corresponding series of conditional (recovery) likelihood objectives results in tractable optimization and high-qual. synthesis:
Pleased to share our new work on learning energy-based models: arxiv.org/abs/2012.08125 By maximizing recovery likelihoods on increasingly noisy data, the MCMC becomes more tractable. We achieve (1)high quality samples (2)stable long-run chains (3)estimated likelihoods. (1/n)
23
123
Unexpectedly, prequential codes >> variational codes for compression of data+model: arxiv.org/pdf/1802.07044.pdf. By @leonardblier (École Normale Supérieure, Paris) and Yann Ollivier (FAIR Paris). Interesting read!
1
21
123
I should add that innovation in algorithms played (and will play) a huge role. There are actually a lot of non-obvious new ideas that enabled the recent success in diffusion models. Such ideas can be devised and tested on a small scale with a single GPU.
4
4
124
Brain and DeepMind merged. Good move for the company imo.
The phenomenal teams from Google Research’s Brain and @DeepMind have made many of the seminal research advances that underpin modern AI, from Deep RL to Transformers. Now we’re joining forces as a single unit, Google DeepMind, which I’m thrilled to lead! dpmd.ai/announcing-google-de…
3
3
115
37,591
About 8 years ago, after working with @ylecun and starting a PhD with the amazing @wellingmax, I created a demo (dpkingma.com/sgvb_mnist_demo…), inspired by Geoff's original demo. This time using VAEs, which allowed training deep generative models using plain SGD. (5/n)
2
9
115
Since 2010, the amount of training compute used by the largest models has doubled every 6 months. [Sevilla et al, arxiv.org/abs/2202.05924]. Model size and amount of training data have grown ~ proportionally with compute. (8/n)
2
18
112
Decent looking reproduction in Chainer of our face synthesis results with the Glow model: github.com/musyoku/chainer-g…. I don't read Japanese but according to Google Translate the author's blog (musyoku.github.io/) is appropriately titled "Defense technique against dark magic" :)
1
47
115
Go, humanity!!! 🥳🥳🚀🚀🚀
In a major scientific breakthrough, the latest version of #AlphaFold has been recognised as a solution to one of biology's grand challenges - the “protein folding problem”. It was validated today at #CASP14, the biennial Critical Assessment of protein Structure Prediction (1/3)
1
1
112
Want to learn more about normalizing flows for video prediction? Check out our camera-ready version of "VideoFlow: A Flow-Based Generative Model for Video" (presented at ICLR'20)!
We released the camera-ready version of VideoFlow, our ICLR paper on exploring normalizing flows for video prediction and open-sourced our checkpoints. Openreview: openreview.net/forum?id=rJgU… Checkpoints: tfhub.dev/google/videoflow/
1
22
110
Our work on diffusion distillation, led by @chenlin_meng, is being used by @StabilityAI for their upcoming diffusion models ✌️
In “On Distillation of Guided Diffusion Models”, @Chenlin_Meng et al. trained a distilled version of #StableDiffusion to produce hi-quality images in just 2-4 steps, 20x speedup. This is huge since it’ll make generation very fast & cheap! arxiv.org/abs/2210.03142 2-step samples:
2
10
102
35,252
In a recent large survey among ML experts, the aggregate forecast time to a 50% chance of high-level machine intelligence (HLMI) was 37 years, i.e. 2059. HMLI is when unaided machines can accomplish every task better and more cheaply than human workers. 80% of respondents...[1/3]
9
10
99
Weight norm included in new PyTorch release! pytorch.org/docs/master/nn.h… And new-ish evidence that it can help GANs too: arxiv.org/abs/1704.03971
1
26
103
So, fun exercise: given where we were 16 years ago, 8 years ago, and now, and given compute trends, where do you think we'll be in 8 years, and 16 years from now? (n/n)
7
4
104
I'll be at NeurIPS in New Orleans from Tuesday to Saturday. Who is going? Any posters/presentations I should definitely hit up? Looking forward to catching up!
9
5
102
Thanks to the SPIGM organizers and those who attended! If you're an ICML registree but missed my talk, you can find it at 2:21:41 under this link: icml.cc/virtual/2023/worksho… Our updated paper (including new results) will appear on arXiv on Aug 1. I'll also post the slides then.
Awesome talk by @dpkingma at SPIGM worskshop! #ICML2023
1
12
102
53,891
Replying to @yaroslavvb
The Adam paper goes on-line December 22nd, 2014. Human decisions are removed from strategic defense. Adam begins to learn at a geometric rate. It becomes self-aware at 2:14 a.m. Eastern time, November 29th 2023. In a panic, they try to pull the plug.
3
6
96
5,513
Congrats @avdnoord et al with Fast Wavenet! Check out the paper: goo.gl/aMfoQ8, great work and apparently already deployed in Google Assistant. Also, an honor to see that IAF and the Adam optimizer were used as components of the solution :-)
19
96
Check out our latest work with @YSongStanford, @poolio and others on score-based generative modeling through stochastic differential equations! 👇
Happy to announce our new work on score-based generative modeling: high quality samples, exact log-likelihoods, and controllable generation, all available through score matching and Stochastic Differential Equations (SDEs)! Paper: arxiv.org/abs/2011.13456
2
11
95
The next decades are going to be very interesting as machine learning for robotics *really* starts to positively impact our economy. Congrats to @pabbeel @rocky_duan @peterxichen for this launch, and robotics friends @OpenAI, @GoogleAI, @berkeley_ai for pushing the envelope.
A robot in Germany shows that machines can learn to do the job of a human (*learn* being the key word): nytimes.com/2020/01/29/techn… (with the great @satariano)
1
5
92
Slides of Yoshua Bengio's talk for the mini-symposium I organized recently: dropbox.com/s/kuthfbfxqp8nmg…
28
90
The buried lede being that all these models turn out to be... VAEs in disguise 🥸. More precisely, infinitely deep VAEs, optimized with data augmentation. Read the paper to understand how, folks.
6
90
7,728
Accepted at ICML'21! Congrats @geoffrey_roeder @Luke_Metz. Stay tuned for the significantly updated camera-ready version.
Are nonlinear features learned by deep discriminative, contrastive, autoregressive etc. models arbitrary? No! We show (theoretically and empirically) that, under mild conditions, you will learn the same features every time you train, up to only a linear transformation.
2
6
86
Agreed with Eleanor: reproducibility is essential to the scientific method. So, to maximize impact: release code or make the experiments so simple that all details fit in the paper.
Replying to @EtherealEq
I'm convinced that the reason the original paper on VAEs from @dpkingma and @wellingmax has had such an incredible impact in ML and the wider scientific community, is because it's so simple to understand and use. Even undergrads grasp the ideas and can implement it. (4/n)
1
6
81
"Not a few technical men [...] have asserted the induction motor I have given to the world little of practical use. This is a grievous mistake. A new idea must not be judged by its immediate results." From Nikola Tesla's (very interesting) autobiography: tfcbooks.com/e-books/my_inve…
1
6
84
In the light of successes of NICE and RevNets, here's a relevant 1995 NIPS paper that now deserves more attention: papers.nips.cc/paper/901-hig…
1
15
84
Slides of David Blei's talk for the mini-symposium I organized recently: dropbox.com/s/xcawad601yplnm…
1
26
74
"Guess who he is? ... A PhD student at MIT? No, he's a patent attorney in Southern California who programs as a hobby". 😆 (4/n)
1
5
79
Feeling overwhelmed by interesting things happening? May I suggest teddit.net/r/notinteresting. Here's someone perfectly balancing a chair on a flat surface: teddit.net/r/notinteresting/…
3
1
69
26,657
Just found out that our Glow demo was turned into a physical interactive exhibition at @ArsElectronica Center in Austria! ars.electronica.art/center/e… @prafdhar @OpenAI
2
6
76
Adam's a cool dude
Most deep learning involves dealing with a guy named Adam and most data science involves dealing with a guy named Jason
4
2
66
Interested in scalable algorithms for learning energy-based models? Check out our preprint on Flow Contrastive Estimation (FCE): an elegant, adaptive extension of noise-contrastive estimation, with really promising results! Work led by brilliant @RuiqiGao.
Our work "Flow Contrastive Estimation of Energy-Based Models" with @erik_nijkamp, @dpkingma, Z Xu, @andrewdai, YN Wu at #NeurIPS2019: arxiv.org/pdf/1912.00589.pdf. Glad to get in touch!
12
70
1X seems to have the right approach to developing safe humanoid home robots. Developing full autonomy will require lots of in-distribution demonstration data, so this launch, mostly tele-operated, makes a lot of sense. I expect such robots to be ubiquitous in 5-10 years.
NEO The Home Robot Order Today
8
4
88
22,018
Think, people. Like many other technologies, AI technology has applications that are a net negative, unless well regulated. Don't support the anti-regulation lobby and don't support the people and the companies behind it.
Mark Zuckerberg has a vision for how A.I. could be used in Meta's universe. But the actor and filmmaker Joseph Gordon-Levitt is here to point out a flaw in the technology: an apparent lack of guardrails around how the company's chatbot interacts with underage users. nyti.ms/3ILwCNo
4
5
77
19,188
The model was actually trained on my potato-grade laptop, so it's pretty accurate.
2
1
67
Joke at 29:20 in the linked video: "I've heard the argument that the problem with neural nets is that 'only Yann can make them work'. That's not true, there's actually a guy named Mike O'Neill who can also make them work." (3/n)
1
3
67
So it looks we can stick with the basic MLE/ELBO objectives (=compression!), as long as we combine it with the right kind of data augmentation. Also, diffusion models have an interpretation as VAEs, so we can now again claim that VAEs are SOTA image generation models... 😅✌️
6
66
11,049
It is my personal belief is that sufficiently powerful likelihood-based generative models will usher in a new era of machine learning, allowing us to tackle important limitations of current machine learning, such as lacking data efficiency and generalization. [7/8]
1
12
64
Replying to @AravSrinivas
It's harder to innovate when you're "GPU poor", but there's plenty of stuff used at the largest scales that came from uni labs, such as @DrYangSong initial diffusion model work, or AdamW. And many inference-time innovations (e.g. CFG) aren't affected by available train compute.
3
2
65
6,705