Durk Kingma (@dpkingma) | nitter

Pinned Tweet

Durk Kingma @dpkingma

1 Oct 2024

Personal news: I'm joining @AnthropicAI! 😄 Anthropic's approach to AI development resonates significantly with my own beliefs; looking forward to contributing to Anthropic's mission of developing powerful AI systems responsibly. Can't wait to work with their talented team, including a number of great ex-colleagues from OpenAI and Google, and tackle the challenges ahead!

109

90

2,818

350,010

Durk Kingma @dpkingma

9 Apr 2022

Generative models (such as Dall-E 2 and PaLM) are becoming just such an insanely powerful, almost magic-like technology, it's completely NUTS. And it seems like most (non-ML) people still don't fully grasp the implications. This technology will thoroughly transform society.

76

306

2,373

Durk Kingma @dpkingma

29 May 2022

It was 16 years ago, in 2006, that @geoffreyhinton et al released their demo of deep belief nets. Undergrad me was highly impressed, and helped convince me that deep learning was the way to go. I refreshed Geoff's website almost every day checking for new papers... (1/n)

10

248

1,590

Durk Kingma @dpkingma

30 Sep 2017

"Variational Inference and Deep Learning: A New Synthesis", written by yours truly, is now available for D/L here: goo.gl/6aGYZ1.

16

276

858

Durk Kingma @dpkingma

10 Dec 2023

I hope this document ends up in LLM training data 😂

13

16

721

114,112

Durk Kingma @dpkingma

9 Jul 2018

Check out blog.openai.com/glow/, my work with @prafdhar on improving flow-based generative models with invertible 1x1 convolutions. piped.video/exJZOC3ZceA

6

203

626

Durk Kingma @dpkingma

3 Mar 2023

New theoretical work on diffusion objectives: arxiv.org/abs/2303.00848 We e.g. show that under a simple condition (monotonic weighting, satisfied by e.g. the v-prediction loss), diffusion objectives equal the ELBO with data augmentation, namely additive noise. 1/2

4

102

615

147,446

Durk Kingma @dpkingma

6 Jul 2021

New paper: Variational Diffusion Models (VDMs)! arxiv.org/abs/2107.00630 ✅ New general insights into diffusion models ✅ Simple objective ✅ Fast optimization & anytime synthesis ✅ SotA likelihoods & lossless compression Work with @TimSalimans @poolio @hojonathanho (1/n)

6

102

547

Durk Kingma @dpkingma

7 May 2024

Thanks to the ICLR Award Committee! And thank you for the kind words, Max! You were the perfect Ph.D. advisor and collaborator, kind and inspiring. I really couldn't have wished for better.

Max Welling @wellingmax

7 May 2024

Thank you Yisong and the Award Committee for choosing the VAE for the Test of Time award. I like to congratulate Durk who was my first (brilliant) student when moving back to the Netherlands and who is the main architect of the VAE. It was absolutely fantastic working with him.

24

18

529

66,654

Durk Kingma @dpkingma

31 Aug 2019

Someone is obviously really close to solving AGI: adamoptimizer.com

24

46

435

Durk Kingma @dpkingma

29 Jul 2017

A figure I made for explaining variational autoencoders (VAEs) as part of a larger work-in-progress.

6

116

441

Durk Kingma @dpkingma

6 Jul 2020

Are nonlinear features learned by deep discriminative, contrastive, autoregressive etc. models arbitrary? No! We show (theoretically and empirically) that, under mild conditions, you will learn the same features every time you train, up to only a linear transformation.

Geoffrey Roeder @geoffrey_roeder

6 Jul 2020

New 📑 w/ @Luke_Metz @dpkingma: arxiv.org/abs/2007.00810 We prove that a large family of deep discriminative models are identifiable in function space up to linear indeterminacy, presenting empiricism on synthetic & real data. Why should our field care about identifiability?👇

3

77

437

Durk Kingma @dpkingma

27 Jan 2020

Another great result demonstrating that VAEs (deep learning + amortized variational inference) make a lot of sense for data compression. Its loss function directly maximizes compressibility, and the resulting codec is fully parallelizable.

Taco Cohen

@TacoCohen

27 Jan 2020

Short but sweet paper on recurrent autoencoder architectures for speech compression. We systematically explore the space of RNN-AEs and show that the best method, dubbed FRAE, outperforms classical codecs by a large margin. Check it out!

3

104

431

Durk Kingma @dpkingma

8 Jan 2020

Our paper "Variational Autoencoders and Nonlinear ICA: A Unifying Framework" has been accepted to AISTATS'20. With @ilkhem, Ricardo Pio Monti and Aapo Hyvarinen (UCL). Surprisingly strong and general identifiability results, with rigorous proofs! arxiv.org/abs/1907.04809

3

70

368

Durk Kingma @dpkingma

5 Jan 2021

"The images are preprocessed to 256x256 resolution during training. [...] each image is compressed to a 32x32 grid of discrete latent codes using a discrete VAE that we pre-trained using a continuous relaxation." GPT + VAE + scale = impressive results! openai.com/blog/dall-e/

DALL·E: Creating images from text

We’ve trained a neural network called DALL·E that creates images from text captions for a wide range of concepts expressible in natural language.

52

377

Durk Kingma @dpkingma

3 Dec 2022

“Adam can converge without any modification on update rules” arxiv.org/abs/2208.09632 Proves that (vanilla) Adam is theoretically justified without any modification. Presented at NeurIPS'22.

7

71

381

Durk Kingma @dpkingma

3 Jun 2025

It's already the case that people's free will gets hijacked by screens for hours a day, with lots of negative consequences. AI video can make this worse, since it's directly optimizable. AI video has positive uses, but most of it will be fast food for the mind.

Andrej Karpathy

@karpathy

2 Jun 2025

Very impressed with Veo 3 and all the things people are finding on r/aivideo etc. Makes a big difference qualitatively when you add audio. There are a few macro aspects to video generation that may not be fully appreciated: 1. Video is the highest bandwidth input to brain. Not just for entertainment but also for work/learning - think diagrams, charts, animations, etc. 2. Video is the most easy/fun. The average person doesn't like reading/writing, it's very effortful. Anyone can (and wants to) engage with video. 3. The barrier to creating videos is -> 0. 4. For the first time, video is directly optimizable. I have to emphasize/explain the gravity of (4) a bit more. Until now, video has been all about indexing, ranking and serving a finite set of candidates that are (expensively) created by humans. If you are TikTok and you want to keep the attention of a person, the name of the game is to get creators to make videos, and then figure out which video to serve to which person. Collectively, the system of "human creators learning what people like and then ranking algorithms learning how to best show a video to a person" is a very, very poor optimizer. Ok, people are already addicted to TikTok so clearly it's pretty decent, but it's imo nowhere near what is possible in principle. The videos coming from Veo 3 and friends are the output of a neural network. This is a differentiable process. So you can now take arbitrary objectives, and crush them with gradient descent. I expect that this optimizer will turn out to be significantly, significantly more powerful than what we've seen so far. Even just the iterative, discrete process of optimizing prompts alone via both humans or AIs (and leaving parameters unchanged) may be a strong enough optimizer. So now we can take e.g. engagement (or pupil dilations or etc.) and optimize generated videos directly against that. Or we take ad click conversion and directly optimize against that. Why index a finite set of videos when you can generate them infinitely and optimize them directly. I think video has the potential to be an incredible surface for AI -> human communication, future AI GUIs etc. Think about how much easier it is to grok something from a really great diagram or an animation instead of a wall of text. And an incredible medium for human creativity. But this native, high bandwidth medium is also becoming directly optimizable. Imo, TikTok is nothing compared to what is possible. And I'm not so sure that we will like what "optimal" looks like.

22

36

381

51,932

Durk Kingma @dpkingma

31 Jul 2023

Sneak peek: - Paper: drive.google.com/file/d/1jIL… - Slides: drive.google.com/file/d/1rle…

4

69

378

52,334

Durk Kingma @dpkingma

23 May 2021

Wife bought me a sweater... *sigh* we're not even safe in the sanctity of our own homes

13

7

369

Durk Kingma @dpkingma

15 Oct 2020

This is new to me: AI has the most cited papers in the world, across all scientific fields: natureindex.com/news-blog/go… Our Adam paper is here listed as #1, but is actually seconded by the ResNet paper, which has a bit more citations, according to Google Scholar. Pretty crazy!

4

48

370

Durk Kingma @dpkingma

9 Jul 2020

VAE aficionados, rejoice! 🥳

Arash Vahdat

@ArashVahdat

9 Jul 2020

📢📢📢 Introducing NVAE 📢📢📢 We show that deep hierarchical VAEs w/ carefully designed network architecture, generate high-quality images & achieve SOTA likelihood, even when trained w/ original VAE loss. paper: arxiv.org/abs/2007.03898 with @jankautz at @NVIDIAAI (1/n)

4

54

367

Durk Kingma @dpkingma

11 Mar 2024

Happy to see that our work on diffusion model objectives (arxiv.org/abs/2303.00848) is starting to get noticed, with e.g. Stable Diffusion 3 (arxiv.org/abs/2403.03206) building on our result that the flow matching objective can be simply understood as a special case of diffusion.

Understanding Diffusion Objectives as the ELBO with Simple Data...

To achieve the highest perceptual quality, state-of-the-art diffusion models are optimized with objectives that typically look very different from the maximum likelihood and the Evidence Lower...

5

41

366

47,116

Durk Kingma @dpkingma

16 Sep 2022

Want to understand and/or play with variational diffusion models? - See colab.research.google.com/gi… for a simple stand-alone implementation and explanation. (Thanks @alemi and @poolio for making this)! - See colab.research.google.com/gi… for an even more basic implementation on 2D data.

SimpleDiffusionColab.ipynb

Run, share, and edit Python notebooks

colab.research.google.com

1

63

324

Durk Kingma @dpkingma

15 Apr 2025

Thank you! See you guys in Singapore next week 🥳

ICLR @iclr_conf

14 Apr 2025

Replying to @iclr_conf

Test of Time Winner Adam: A Method for Stochastic Optimization Diederik P. Kingma, Jimmy Ba Adam revolutionized neural network training, enabling significantly faster convergence and more stable training across a wide variety of architectures and tasks.

11

6

325

35,777

Durk Kingma @dpkingma

1 Jul 2024

The recording of our talk for the ICLR'24 test-of-time award (with @wellingmax) is now available online: iclr.cc/virtual/2024/test-of… Biggest live audience I've ever spoken to, with >2000 attendees 😅. But it was a lot of fun!

6

40

306

60,454

Durk Kingma @dpkingma

19 Sep 2019

Ever wondered, like me, if you can can turn hierarchical VAEs into a fully parallelizable lossless compression scheme? Wonder no more: new blogpost by Friso on "Bit-Swap", his recursive algorithm for doing just that: bair.berkeley.edu/blog/2019/… Disclaimer: he's my sibling 🤓

A Deep Learning Approach to Data Compression

The BAIR Blog

bair.berkeley.edu

1

56

297

Durk Kingma @dpkingma

1 Feb 2023

the year is 20xx. the homies and i are chilling in the off world colonies. agi is expanding through the lightcone. researchers on earth are still trying to replace that pesky adam optimizer

8

8

268

42,274

Durk Kingma @dpkingma

7 Dec 2018

New likelihood-based autoregressive model by @jacobmenick and @NalKalchbrenner (Google AI Amsterdam), providing fresh evidence that the log-likelihood objective, combined with a sufficiently flexible model, is compatible with high-fidelity samples, even with ImageNet data. [1/8]

6 Dec 2018

More compute, better architectures and novel robust orderings have fueled tremendous progress for AR image models. SPNs are our latest installment. Looking forward to samples from 2020! paper: arxiv.org/abs/1812.01608 open reviews (scores 9,10,7): tinyurl.com/ycn3j3cj

1

73

260

Durk Kingma @dpkingma

5 Mar 2019

Another step towards better likelihood-based generative models: new paper by Manoj Kumar (@mechcoderr) and collaborators during his Brain residency, exploring flow-based generative modeling of videos.

Dumitru Erhan

@doomie

5 Mar 2019

VideoFlow: A Flow-Based Generative Model for Video extends Glow to a new algorithm for multi-frame video prediction with normalizing flows. Tractable & scalable! Work w/ Manoj Kumar, @babaeizadeh, @chelseabfinn, @svlevine, @laurent_dinh and @dpkingma. arxiv.org/abs/1903.01434

43

251

Durk Kingma @dpkingma

1 Jul 2020

New theory paper: arxiv.org/abs/2002.11537. We show rigorously that EBMs of the type E(x|y) = f(x)•g(y) are, under fairly mild conditions, (1) identifiable in functions f and g, and (2) universal conditional density approximators, generalizing previous forms of nonlinear ICA.

ICE-BeeM: Identifiable Conditional Energy-Based Deep Models Based...

We consider the identifiability theory of probabilistic models and establish sufficient conditions under which the representations learned by a very broad family of conditional energy-based models...

4

30

238

Durk Kingma @dpkingma

12 Jan 2021

@YSongStanford and I wrote a tutorial on training energy-based models (EBMs), which was just released on arXiv. Our goal was to provide a friendly introduction to modern parameter estimation methods. Hope it helps people get up to speed! arxiv.org/abs/2101.03288

How to Train Your Energy-Based Models

Energy-Based Models (EBMs), also known as non-normalized probabilistic models, specify probability density or mass functions up to an unknown normalizing constant. Unlike most other probabilistic...

61

244

Durk Kingma @dpkingma

15 Jul 2017

2013 throwback: L-BFGS induced latent-space explosion. Love the learning dynamics. (Objective is MAP with MLP-based DLVM with 2D Z.)

2

52

226

Durk Kingma @dpkingma

24 Apr 2018

Smiles from a non-autoregressive likelihood-based generative model :-)

4

71

222

Durk Kingma @dpkingma

12 Jul 2021

Worth mentioning: latent-variable models are finally starting to beat autoregressive transformers on likelihood-based benchmarks. Advances in ML have always come from both algorithms and compute. Another datapoint adding to that trend.

Durk Kingma @dpkingma

6 Jul 2021

New paper: Variational Diffusion Models (VDMs)! arxiv.org/abs/2107.00630 ✅ New general insights into diffusion models ✅ Simple objective ✅ Fast optimization & anytime synthesis ✅ SotA likelihoods & lossless compression Work with @TimSalimans @poolio @hojonathanho (1/n)

3

24

205

Durk Kingma @dpkingma

6 Sep 2019

It's been 2 years since I finished my PhD thesis, but it won the ELLIS PhD award! ellis.eu/en/news/ellis-phd-a… All thanks to @wellingmax for being possibly the world's best PhD advisor & thanks to great collaborators @TimSalimans, @DeepSpiker, @shakir_za, Jimmy Ba, etc.! [1/2]

9

3

197

Durk Kingma @dpkingma

31 Jan 2018

The (theoretical) computational cost of matmul(W,X) is proportional to ||W||_0. ||W||_0 is non-differentiable, but if W is stochastic, then E[||W||_0] can be differentiable and optimizable. See: arxiv.org/abs/1712.01312, with Christos Louizos (while at OpenAI) and Max Welling.

Learning Sparse Neural Networks through $L_0$ Regularization

We propose a practical method for $L_0$ norm regularization for neural networks: pruning the network during training by encouraging weights to become exactly zero. Such regularization is...

3

47

193

Durk Kingma @dpkingma

15 Feb 2022

I like how the VAE is a strange outlier here, with almost 1000x less training compute than the trend line.

Lennart Heim

@ohlennart

15 Feb 2022

**ML training compute has been doubling every 6 months since 2010!** Our preprint "Compute Trends Across Three Eras of Machine Learning" is out. arxiv.org/abs/2202.05924 🧵 Thread below ↓ 1/

6

11

184

Durk Kingma @dpkingma

25 Apr 2025

Doing ICLR'25 test of time award with @jimmybajimmyba in a few mins

13

3

191

20,331

Durk Kingma @dpkingma

13 Dec 2023

Our paper on diffusion model objectives got accepted at NeurIPS'23 as an oral :-) @RuiqiGao will present our work at 10am CST. (Unfortunately I'm unable to be there myself due to getting covid) I'll post the slides below, feel free to reply with questions or comments.

Ruiqi Gao

@RuiqiGao

7 Dec 2023

Looking for diffusion model advancements at #NeurIPS2023? Come to check our oral work "Understanding Diffusion Objectives as the ELBO with Simple Data Augmentation" w/ @dpkingma. New theoretical understanding, SOTA empirical results, and more! Arxiv: arxiv.org/abs/2303.00848

6

22

189

39,631

Durk Kingma @dpkingma

16 Jun 2019

My PhD thesis won a Dutch data science prize. Thanks to @wellingmax and the organizers for the support! emerce.nl/wire/pacmed-philip…

Pacmed, Philips Research en Diederik Kingma winnen de Nederlandse Data Science Prijzen - Emerce

Op donderdag 13 juni zijn voor de derde keer de Nederlandse Data Science Prijzen toegekend. De drie prijzen werden feestelijk uitgereikt in het JADS, Jheronimus Academy of Data Science, te Den Bosch....

2

3

173

Durk Kingma @dpkingma

23 Feb 2025

Replying to @g_k_swamy

Ah yes, our secret co-author. Happy the news is out

1

167

17,906

Durk Kingma @dpkingma

29 Aug 2023

VAEs are so back 😎

7

5

161

46,229

Durk Kingma @dpkingma

2 Dec 2024

👇 Great work led by Yushun (@ericzhang0410) introducing Adam-mini, a version of Adam that, surprisingly, reduces Adam's memory requirement by 50% (!), without negatively affecting convergence rates. Please read Yushun's thread for details!

Yushun Zhang

@yushun_zzz

2 Dec 2024

Finally finished Adam-mini! A "mini" version of Adam that painlessly frees 50% of memory over Adam. Some highlighted features: 1. Adam-mini saves 50% memory over Adam for all modern neural nets. This is done by removing 99.9% Adam's v (but the last 0.1% of v is essential and necessary). 2. Adam-mini's loss curve closely resembles those of Adam. You can almost see no difference! 3. Adam-mini uses the same hyperparams as Adam (lr, beta1, beta2, eps, etc). No tuning is needed! 4. With the free memory you can enlarge batch size and achieve higher throughput (see paper). 5. Principled design: The design of Adam-mini follows a very simple and general principle related to the Hessian structure of neural nets. Such design stems from classical optimization theory (see paper). Paper: arxiv.org/abs/2406.16793 Code: github.com/zyushun/Adam-mini

5

21

172

28,553

Durk Kingma @dpkingma

9 Aug 2018

I just re-read this gem: "Variational Dropout Sparsifies Deep Neural Networks" by Dmitry Molchanov, Arsenii Ashukha and Dmitry Vetrov. arxiv.org/abs/1701.05369 . Thanks for this well-written and insightful paper on efficient variational inference over neural net parameters. 👌

2

25

161

Durk Kingma @dpkingma

1 May 2024

What are, according to you, the most interesting or most under-appreciated applications of VAEs? Please share below. Links appreciated. (I'm building a list for future reference.)

32

16

156

68,497

Durk Kingma @dpkingma

15 Nov 2017

"Fixing Weight Decay Regularization In Adam", Loshchilov & Hutter, arxiv.org/abs/1711.05101. Somewhat surprising results demonstrating easier tuning and better generalization, simply by separating weight decay from the adaptive learning rates. L2 regularization != weight decay.

Decoupled Weight Decay Regularization

L$_2$ regularization and weight decay regularization are equivalent for standard stochastic gradient descent (when rescaled by the learning rate), but as we demonstrate this is \emph{not} the case...

46

159

Durk Kingma @dpkingma

26 Dec 2019

Clearly, symbolically defining the term "deep learning" is futile, and we can only accurately represent the concept with a deep neural net. #connectionism

4

11

154

Durk Kingma @dpkingma

2 Dec 2024

Great blogpost by Ruiqi (and other GDM ex-colleagues), clearly explaining the the connection between flow matching and diffusion models. Super happy they took the time to explain this topic, there's confusion on this topic, I think many will find this quite valuable!

Ruiqi Gao

@RuiqiGao

2 Dec 2024

A common question nowadays: Which is better, diffusion or flow matching? 🤔 Our answer: They’re two sides of the same coin. We wrote a blog post to show how diffusion models and Gaussian flow matching are equivalent. That’s great: It means you can use them interchangeably.

1

15

158

19,702

Durk Kingma @dpkingma

29 May 2022

Around the same time, I started reading up on all of @ylecun's papers and online videos. Loved his talk "Who is Afraid of Non-Convex Loss Functions?" from 2007: piped.video/watch?v=8zdo6cnC…. It greatly captures the zeitgeist. (2/n)

6

27

148

Durk Kingma @dpkingma

5 Mar 2023

P.S. Some believe that maximum likelihood is incompatible with high-quality image generation. This result provides counter-evidence: models with SOTA FIDs (e.g. arxiv.org/abs/2301.11093) are actually optimized with the ELBO, w/ very simple data augmentation (additive noise).

Simple diffusion: End-to-end diffusion for high resolution images

Currently, applying diffusion models in pixel space of high resolution images is difficult. Instead, existing approaches focus on diffusion in lower dimensional spaces (latent diffusion), or have...

Durk Kingma @dpkingma

3 Mar 2023

New theoretical work on diffusion objectives: arxiv.org/abs/2303.00848 We e.g. show that under a simple condition (monotonic weighting, satisfied by e.g. the v-prediction loss), diffusion objectives equal the ELBO with data augmentation, namely additive noise. 1/2

3

15

146

37,105

Durk Kingma @dpkingma

29 May 2022

Now in 2022, only 8 years later, we have generative models that can generate pretty realistic 1024x1024 images from arbitrary text descriptions (imagen.research.google/). (7/n)

1

12

149

Durk Kingma @dpkingma

3 Feb 2021

Uighurs in China are undergoing horrifying human rights violations. Here's an excellent website documenting primary source material, which helped me understand the severity of the situation: shahit.biz/eng

Xinjiang Victims Database

The goal of this database consists in documenting the aforementioned individuals, so as to both protect them now and hold the Chinese authorities accountable later, by creating the foundations for...

8

31

132

Durk Kingma @dpkingma

6 Dec 2017

We're releasing GPU kernels for neural networks with block-sparse weights. Blog: blog.openai.com/block-sparse…. Paper: goo.gl/nNkpwV. Kernels written by @scottgray76.

34

134

Durk Kingma @dpkingma

5 Jan 2021

IMO, best empirical proof to date that AI can be creative. After this sinks in, will there be any naysayers left?

Durk Kingma @dpkingma

5 Jan 2021

"The images are preprocessed to 256x256 resolution during training. [...] each image is compressed to a 32x32 grid of discrete latent codes using a discrete VAE that we pre-trained using a continuous relaxation." GPT + VAE + scale = impressive results! openai.com/blog/dall-e/

8

6

128

Durk Kingma @dpkingma

8 Oct 2024

Congrats to @geoffreyhinton for getting the Nobel! His impact is immeasurable, very much deserved.

The Nobel Prize

@NobelPrize

8 Oct 2024

BREAKING NEWS The Royal Swedish Academy of Sciences has decided to award the 2024 #NobelPrize in Physics to John J. Hopfield and Geoffrey E. Hinton “for foundational discoveries and inventions that enable machine learning with artificial neural networks.”

2

1

129

15,195

Durk Kingma @dpkingma

19 Mar 2023

Unbeknownst to many, a puppy dies every time one uses Adam but fails to cite it

David Pfau @pfau

19 Mar 2023

Replying to @BlancheMinerva

I actually think ADAM has ascended to the level of basic things you don't need to cite any more. We don't still cite Rosenblatt 1958 every time we use a multilayer perceptron.

5

4

133

55,749

Durk Kingma @dpkingma

15 Jun 2022

We've just made code (a clean Jax/Flax re-implementation) and pre-trained model checkpoints available at: github.com/google-research/v…

GitHub - google-research/vdm

Contribute to google-research/vdm development by creating an account on GitHub.

Durk Kingma @dpkingma

6 Jul 2021

New paper: Variational Diffusion Models (VDMs)! arxiv.org/abs/2107.00630 ✅ New general insights into diffusion models ✅ Simple objective ✅ Fast optimization & anytime synthesis ✅ SotA likelihoods & lossless compression Work with @TimSalimans @poolio @hojonathanho (1/n)

2

25

128

Durk Kingma @dpkingma

5 Oct 2022

Check out Imagen Video, our text2video diffusion model that produces 1280x768 24fps HD videos. Website: imagen.research.google/video… Paper: imagen.research.google/video… More information in Jonathan's thread 👇

Imagen Video

High Definition Video Generation with Diffusion Models

imagen.research.google

Jonathan Ho @hojonathanho

5 Oct 2022

Excited to announce Imagen Video, our new text-conditioned video diffusion model that generates 1280x768 24fps HD videos! #ImagenVideo imagen.research.google/video… Work w/ @wchan212 @Chitwan_Saharia @jaywhang_ @RuiqiGao @agritsenko @dpkingma @poolio @mo_norouzi @fleet_dj @TimSalimans

2

16

131

Durk Kingma @dpkingma

19 Mar 2023

I think @AlecRad deserves a lot of credit for being (IIRC) the first to believe and push on the LLM angle in the early years of OpenAI. With support from @ilyasut @TimSalimans etc.

David Chalmers

@davidchalmers42

19 Mar 2023

who saw LLMs coming? e.g. decades (or even 5+ years) ago, X said: when machine learning systems have enough compute and data to learn to predict text well, this will be a primary path to near-human-level AI.

3

8

127

51,591

Durk Kingma @dpkingma

16 Dec 2020

Energy-based models are very challenging to optimize and synthesize from. In new work, led by brilliant @RuiqiGao, we show how Gaussian diffusion and a corresponding series of conditional (recovery) likelihood objectives results in tractable optimization and high-qual. synthesis:

Ruiqi Gao

@RuiqiGao

16 Dec 2020

Pleased to share our new work on learning energy-based models: arxiv.org/abs/2012.08125 By maximizing recovery likelihoods on increasingly noisy data, the MCMC becomes more tractable. We achieve (1)high quality samples (2)stable long-run chains (3)estimated likelihoods. (1/n)

23

123

Durk Kingma @dpkingma

5 Sep 2018

Unexpectedly, prequential codes >> variational codes for compression of data+model: arxiv.org/pdf/1802.07044.pdf. By @leonardblier (École Normale Supérieure, Paris) and Yann Ollivier (FAIR Paris). Interesting read!

1

21

123

Durk Kingma @dpkingma

29 May 2022

I should add that innovation in algorithms played (and will play) a huge role. There are actually a lot of non-obvious new ideas that enabled the recent success in diffusion models. Such ideas can be devised and tested on a small scale with a single GPU.

4

4

124

Durk Kingma @dpkingma

20 Apr 2023

Brain and DeepMind merged. Good move for the company imo.

Demis Hassabis

@demishassabis

20 Apr 2023

The phenomenal teams from Google Research’s Brain and @DeepMind have made many of the seminal research advances that underpin modern AI, from Deep RL to Transformers. Now we’re joining forces as a single unit, Google DeepMind, which I’m thrilled to lead! dpmd.ai/announcing-google-de…

3

3

115

37,591

Durk Kingma @dpkingma

29 May 2022

About 8 years ago, after working with @ylecun and starting a PhD with the amazing @wellingmax, I created a demo (dpkingma.com/sgvb_mnist_demo…), inspired by Geoff's original demo. This time using VAEs, which allowed training deep generative models using plain SGD. (5/n)

2

9

115

Durk Kingma @dpkingma

29 May 2022

Since 2010, the amount of training compute used by the largest models has doubled every 6 months. [Sevilla et al, arxiv.org/abs/2202.05924]. Model size and amount of training data have grown ~ proportionally with compute. (8/n)

2

18

112

Durk Kingma @dpkingma

22 Aug 2018

Decent looking reproduction in Chainer of our face synthesis results with the Glow model: github.com/musyoku/chainer-g…. I don't read Japanese but according to Google Translate the author's blog (musyoku.github.io/) is appropriately titled "Defense technique against dark magic" :)

GitHub - musyoku/chainer-glow: Glow: Generative Flow with Invertible 1×1 Convolutions

Glow: Generative Flow with Invertible 1×1 Convolutions - musyoku/chainer-glow

1

47

115

Durk Kingma @dpkingma

30 Nov 2020

Go, humanity!!! 🥳🥳🚀🚀🚀

Google DeepMind

@GoogleDeepMind

30 Nov 2020

In a major scientific breakthrough, the latest version of #AlphaFold has been recognised as a solution to one of biology's grand challenges - the “protein folding problem”. It was validated today at #CASP14, the biennial Critical Assessment of protein Structure Prediction (1/3)

1

1

112

Durk Kingma @dpkingma

12 Feb 2020

Want to learn more about normalizing flows for video prediction? Check out our camera-ready version of "VideoFlow: A Flow-Based Generative Model for Video" (presented at ICLR'20)!

mechcoder @mechcoder

12 Feb 2020

We released the camera-ready version of VideoFlow, our ICLR paper on exploring normalizing flows for video prediction and open-sourced our checkpoints. Openreview: openreview.net/forum?id=rJgU… Checkpoints: tfhub.dev/google/videoflow/

1

22

110

Durk Kingma @dpkingma

26 Mar 2023

Our work on diffusion distillation, led by @chenlin_meng, is being used by @StabilityAI for their upcoming diffusion models ✌️

hardmaru

@hardmaru

1 Dec 2022

In “On Distillation of Guided Diffusion Models”, @Chenlin_Meng et al. trained a distilled version of #StableDiffusion to produce hi-quality images in just 2-4 steps, 20x speedup. This is huge since it’ll make generation very fast & cheap! arxiv.org/abs/2210.03142 2-step samples:

2

10

102

35,252

Durk Kingma @dpkingma

25 Aug 2022

In a recent large survey among ML experts, the aggregate forecast time to a 50% chance of high-level machine intelligence (HLMI) was 37 years, i.e. 2059. HMLI is when unaided machines can accomplish every task better and more cheaply than human workers. 80% of respondents...[1/3]

9

10

99

Durk Kingma @dpkingma

7 Aug 2017

Weight norm included in new PyTorch release! pytorch.org/docs/master/nn.h… And new-ish evidence that it can help GANs too: arxiv.org/abs/1704.03971

1

26

103

Durk Kingma @dpkingma

29 May 2022

So, fun exercise: given where we were 16 years ago, 8 years ago, and now, and given compute trends, where do you think we'll be in 8 years, and 16 years from now? (n/n)

7

4

104

Durk Kingma @dpkingma

2 Oct 2017

Amazing talk/paper by Jose Miguel Lobato et al on design of novel molecules (drugs, LEDs, etc) using Grammar VAEs: piped.video/watch?v=XkY1z6kC…

#BIOAI2017 Jose Miguel Lobato: Advances in deep generative models of...

If you are interested to hear more about the advancements of Artifi...

2

21

99

Durk Kingma @dpkingma

27 Nov 2022

I'll be at NeurIPS in New Orleans from Tuesday to Saturday. Who is going? Any posters/presentations I should definitely hit up? Looking forward to catching up!

9

5

102

Durk Kingma @dpkingma

31 Jul 2023

Thanks to the SPIGM organizers and those who attended! If you're an ICML registree but missed my talk, you can find it at 2:21:41 under this link: icml.cc/virtual/2023/worksho… Our updated paper (including new results) will appear on arXiv on Aug 1. I'll also post the slides then.

Bahjat Kawar @bahjat_kawar

28 Jul 2023

Awesome talk by @dpkingma at SPIGM worskshop! #ICML2023

1

12

102

53,891

Durk Kingma @dpkingma

29 Nov 2023

Replying to @yaroslavvb

The Adam paper goes on-line December 22nd, 2014. Human decisions are removed from strategic defense. Adam begins to learn at a geometric rate. It becomes self-aware at 2:14 a.m. Eastern time, November 29th 2023. In a panic, they try to pull the plug.

3

6

96

5,513

Durk Kingma @dpkingma

22 Nov 2017

Congrats @avdnoord et al with Fast Wavenet! Check out the paper: goo.gl/aMfoQ8, great work and apparently already deployed in Google Assistant. Also, an honor to see that IAF and the Adam optimizer were used as components of the solution :-)

19

96

Durk Kingma @dpkingma

1 Dec 2020

Check out our latest work with @YSongStanford, @poolio and others on score-based generative modeling through stochastic differential equations! 👇

Yang Song

@DrYangSong

1 Dec 2020

Happy to announce our new work on score-based generative modeling: high quality samples, exact log-likelihoods, and controllable generation, all available through score matching and Stochastic Differential Equations (SDEs)! Paper: arxiv.org/abs/2011.13456

2

11

95

Durk Kingma @dpkingma

29 Jan 2020

The next decades are going to be very interesting as machine learning for robotics *really* starts to positively impact our economy. Congrats to @pabbeel @rocky_duan @peterxichen for this launch, and robotics friends @OpenAI, @GoogleAI, @berkeley_ai for pushing the envelope.

Cade Metz @CadeMetz

29 Jan 2020

A robot in Germany shows that machines can learn to do the job of a human (*learn* being the key word): nytimes.com/2020/01/29/techn… (with the great @satariano)

1

5

92

Durk Kingma @dpkingma

8 Nov 2017

Slides of Yoshua Bengio's talk for the mini-symposium I organized recently: dropbox.com/s/kuthfbfxqp8nmg…

28

90

Durk Kingma @dpkingma

11 Mar 2024

The buried lede being that all these models turn out to be... VAEs in disguise 🥸. More precisely, infinitely deep VAEs, optimized with data augmentation. Read the paper to understand how, folks.

6

90

7,728

Durk Kingma @dpkingma

8 May 2021

Accepted at ICML'21! Congrats @geoffrey_roeder @Luke_Metz. Stay tuned for the significantly updated camera-ready version.

Durk Kingma @dpkingma

6 Jul 2020

Are nonlinear features learned by deep discriminative, contrastive, autoregressive etc. models arbitrary? No! We show (theoretically and empirically) that, under mild conditions, you will learn the same features every time you train, up to only a linear transformation.

2

6

86

Durk Kingma @dpkingma

9 Jul 2020

Agreed with Eleanor: reproducibility is essential to the scientific method. So, to maximize impact: release code or make the experiments so simple that all details fit in the paper.

Eleanor Q @EtherealEq

9 Jul 2020

Replying to @EtherealEq

I'm convinced that the reason the original paper on VAEs from @dpkingma and @wellingmax has had such an incredible impact in ML and the wider scientific community, is because it's so simple to understand and use. Even undergrads grasp the ideas and can implement it. (4/n)

1

6

81

Durk Kingma @dpkingma

22 May 2021

"Not a few technical men [...] have asserted the induction motor I have given to the world little of practical use. This is a grievous mistake. A new idea must not be judged by its immediate results." From Nikola Tesla's (very interesting) autobiography: tfcbooks.com/e-books/my_inve…

1

6

84

Durk Kingma @dpkingma

21 Jul 2017

In the light of successes of NICE and RevNets, here's a relevant 1995 NIPS paper that now deserves more attention: papers.nips.cc/paper/901-hig…

1

15

84

Durk Kingma @dpkingma

8 Nov 2017

Slides of David Blei's talk for the mini-symposium I organized recently: dropbox.com/s/xcawad601yplnm…

1

26

74

Durk Kingma @dpkingma

29 May 2022

"Guess who he is? ... A PhD student at MIT? No, he's a patent attorney in Southern California who programs as a hobby". 😆 (4/n)

1

5

79

Durk Kingma @dpkingma

1 Feb 2017

New, much improved version of Inverse Autoregressive Flow paper is out on arXiv: arxiv.org/abs/1606.04934

Improving Variational Inference with Inverse Autoregressive Flow

The framework of normalizing flows provides a general strategy for flexible variational inference of posteriors over latent variables. We propose a new type of normalizing flow, inverse...

2

30

78

Durk Kingma @dpkingma

31 Mar 2023

Feeling overwhelmed by interesting things happening? May I suggest teddit.net/r/notinteresting. Here's someone perfectly balancing a chair on a flat surface: teddit.net/r/notinteresting/…

3

1

69

26,657

Durk Kingma @dpkingma

23 Oct 2023

noemamag.com/artificial-gene… Very reasonable article on the subject of AGI. Worth a read. (Co-authored by Peter Norvig, famous for his classic AI textbook)

Artificial General Intelligence Is Already Here | NOEMA

Today’s most advanced AI models have many flaws, but decades from now, they will be recognized as the first true examples of artificial general intelligence.

5

17

73

20,315

Durk Kingma @dpkingma

19 Aug 2020

Just found out that our Glow demo was turned into a physical interactive exhibition at @ArsElectronica Center in Austria! ars.electronica.art/center/e… @prafdhar @OpenAI

https://www.flickr.com/photos/arselectronica/49564558802

ALT https://www.flickr.com/photos/arselectronica/49564558802

2

6

76

Durk Kingma @dpkingma

12 Sep 2022

Adam's a cool dude

François Chollet

@fchollet

12 Sep 2022

Most deep learning involves dealing with a guy named Adam and most data science involves dealing with a guy named Jason

4

2

66

Durk Kingma @dpkingma

13 Dec 2019

Interested in scalable algorithms for learning energy-based models? Check out our preprint on Flow Contrastive Estimation (FCE): an elegant, adaptive extension of noise-contrastive estimation, with really promising results! Work led by brilliant @RuiqiGao.

Ruiqi Gao

@RuiqiGao

12 Dec 2019

Our work "Flow Contrastive Estimation of Energy-Based Models" with @erik_nijkamp, @dpkingma, Z Xu, @andrewdai, YN Wu at #NeurIPS2019: arxiv.org/pdf/1912.00589.pdf. Glad to get in touch!

12

70

Durk Kingma @dpkingma

30 Oct 2025

1X seems to have the right approach to developing safe humanoid home robots. Developing full autonomy will require lots of in-distribution demonstration data, so this launch, mostly tele-operated, makes a lot of sense. I expect such robots to be ubiquitous in 5-10 years.

1X

@1x_tech

28 Oct 2025

NEO The Home Robot Order Today

8

4

88

22,018

Durk Kingma @dpkingma

1 Oct 2025

Think, people. Like many other technologies, AI technology has applications that are a net negative, unless well regulated. Don't support the anti-regulation lobby and don't support the people and the companies behind it.

New York Times Opinion

@nytopinion

30 Sep 2025

Mark Zuckerberg has a vision for how A.I. could be used in Meta's universe. But the actor and filmmaker Joseph Gordon-Levitt is here to point out a flaw in the technology: an apparent lack of guardrails around how the company's chatbot interacts with underage users. nyti.ms/3ILwCNo

4

5

77

19,188

Durk Kingma @dpkingma

15 Feb 2022

The model was actually trained on my potato-grade laptop, so it's pretty accurate.

2

1

67

Durk Kingma @dpkingma

29 May 2022

Joke at 29:20 in the linked video: "I've heard the argument that the problem with neural nets is that 'only Yann can make them work'. That's not true, there's actually a guy named Mike O'Neill who can also make them work." (3/n)

1

3

67

Durk Kingma @dpkingma

5 Mar 2023

So it looks we can stick with the basic MLE/ELBO objectives (=compression!), as long as we combine it with the right kind of data augmentation. Also, diffusion models have an interpretation as VAEs, so we can now again claim that VAEs are SOTA image generation models... 😅✌️

6

66

11,049

Durk Kingma @dpkingma

7 Dec 2018

It is my personal belief is that sufficiently powerful likelihood-based generative models will usher in a new era of machine learning, allowing us to tackle important limitations of current machine learning, such as lacking data efficiency and generalization. [7/8]

1

12

64

Durk Kingma @dpkingma

28 Aug 2023

Replying to @AravSrinivas

It's harder to innovate when you're "GPU poor", but there's plenty of stuff used at the largest scales that came from uni labs, such as @DrYangSong initial diffusion model work, or AdamW. And many inference-time innovations (e.g. CFG) aren't affected by available train compute.

3

2

65

6,705