Over the last few months I have spent a lot of time sampling from this model. Some tips: 1) You can generate videos even with small GPUs (just decrease number of frames you decode at a time as this eats most VRAM). 14 frames (decoding one at a time) should be less than 20GB VRAM
Stability releases Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets model: huggingface.co/stabilityai/s… present Stable Video Diffusion — a latent video diffusion model for high-resolution, state-of-the-art text-to-video and image-to-video generation. Recently, latent diffusion models trained for 2D image synthesis have been turned into generative video models by inserting temporal layers and finetuning them on small, high-quality video datasets. However, training methods in the literature vary widely, and the field has yet to agree on a unified strategy for curating video data. In this paper, we identify and evaluate three different stages for successful training of video LDMs: text-to-image pretraining, video pretraining, and high-quality video finetuning.
11
61
276
117,357
Stable Video Diffusion: code & chekpoints out! On my way to get some vacation now 🌴🌻
3
10
123
18,457
come and build the best models with us
We are actively hiring across several roles, check out our website or job board for detailed job descriptions.
6
8
122
10,899
FLUX.1 🤝 Grok-2
Huge thank you to the @bfl_ml team, who scaled up their FLUX.1 inference API to support the Grok-2 release today!
5
3
119
8,816
📢📢 Introducing GENIE: Higher-Order Denoising Diffusion Solvers. nv-tlabs.github.io/GENIE/ GENIE distills higher-order score terms into a small neural network and uses them for accelerated diffusion model sampling. 💨 Fun project with @karsten_kreis & @ArashVahdat! (1/6)
GENIE: Higher-Order Denoising Diffusion Solvers abs: buff.ly/3yAaZaq project page: buff.ly/3yAb3aa Higher-Order Denoising Diffusion Solvers: Based on truncated Taylor methods, we derive a novel higher-order solver that significantly accelerates synthesis
2
25
114
so this happened
We are excited to announce the launch of Black Forest Labs. Our mission is to develop and advance state-of-the-art generative deep learning models for media and to push the boundaries of creativity, efficiency and diversity.
9
3
102
4,359
📢 Excited to announce Differentially Private Diffusion Models! 🔒 nv-tlabs.github.io/DPDM We train diffusion models with strict differential privacy guarantees and outperform previous methods by large margins. w/ @tianshi_cao, @ArashVahdat, @karsten_kreis (1/n)
Differentially Private Diffusion Models abs: arxiv.org/abs/2210.09929 project page: nv-tlabs.github.io/DPDM
4
25
95
Two examples of how lower motion score can give you more object motion (left 255, right 31):
3
13
83
30,865
📢📢 Glad we can finally share our work on (text-to-)video generation. TL;DR: Take Stable Diffusion, insert additional temporal layers and fine-tune them on video data while keeping the spatial layers fixed. w/ @andi_blatt, @robrombach, @HuanLing6, @FidlerSanja, @karsten_kreis
Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models turn the publicly available, state-of-the-art text-to-image LDM Stable Diffusion into an efficient and expressive text-to-video model with resolution up to 1280 x 2048 abs: arxiv.org/abs/2304.08818 project page: research.nvidia.com/labs/tor…
3
11
61
20,411
After two failed conference attempts, DPDM has been accepted at TMLR (openreview.net/forum?id=ZPpQ…). I am super happy about the discussions around DPDMs and follow-up works despite "only being an arxiv paper" for almost a year.
📢 Excited to announce Differentially Private Diffusion Models! 🔒 nv-tlabs.github.io/DPDM We train diffusion models with strict differential privacy guarantees and outperform previous methods by large margins. w/ @tianshi_cao, @ArashVahdat, @karsten_kreis (1/n)
3
6
58
15,221
Bored of overdamped Langevin dynamics in diffusion models? Why not introduce velocity variables and speed-up the diffusion process with a Hamiltonian component. That's exactly what we did in our #iclr2022 (spotlight) paper (w/ @karsten_kreis,@ArashVahdat): nv-tlabs.github.io/CLD-SGM/
1
6
46
Thank you for the kind words @thegautamkamath, and thank you to all the committee members, in particular Yaoliang for all his support during my PhD and for being an amazing supervisor. I am also very grateful to @driainmurray for his consistent guidance from afar.
Congrats to Dr. Tim Dockhorn (@timudk) who defended his PhD thesis yesterday! Tim is a world expert in diffusion models (DMs), and is going on to work at @StabilityAI. His work on privatizing DMs is an incredible leap forward (arxiv.org/abs/2210.09929). timudk.github.io/
5
2
45
10,152
Super excited to finally reveal what I have been working on with @karsten_kreis and @ArashVahdat during my internship at nvidia. I am also very happy to announce that I will stay on this amazing team led by @FidlerSanja as an intern, and push score-based models even further.
📢 Score-based Generative Modeling with Critically-Damped Langevin Diffusion! nv-tlabs.github.io/CLD-SGM/ We propose a novel diffusion using auxiliary velocity variables for more efficient denoising and higher quality generative models. w/ the amazing @timudk & @ArashVahdat! (1/n)
6
31
📢 The code and checkpoints for our Critically-Damped Diffusion paper has been released: github.com/nv-tlabs/CLD-SGM We also made some colabs so you can play 🎮 with sampling and likelihood computation.
4
34
Awesome work showing that you can scale DPDMs to CIFAR-10 using public pre-training. Congrats to @SGhalebikesabi and the team!
Differentially Private Diffusion Models Generate Useful Synthetic Images By privately fine-tuning ImageNet pre-trained diffusion models with more than 80M parameters, obtain SOTA results on CIFAR-10 and Camelyon17 abs: arxiv.org/abs/2302.13861
6
30
13,573
I am in SF for a few days: - Sep 18 / Sep 19 @PyTorch conference - Sep 18 @bfl_ml x @FAL dinner event - Sep 19 @huggingface Flare Party - Sep 21 CUDA MODE IRL We are also hiring at @bfl_ml for several roles, please come and chat with us job-boards.greenhouse.io/bla…
1
2
30
1,983
Looking for yet another application of normalizing flows? We show that you can fit them given only noisy observations and the statistics of the noise distribution. This is joint work with @jmsrtch, @driainmurray and Yaoliang Yu. arxiv.org/abs/2006.09396
2
6
29
6) Lastly I want to say that this is just the beginning and we have a lot of ideas on how to improve video models. I am also super excited to see what finetunes/inference tricks the community can come up with; that's the best part about releasing weights!
1
27
1,874
We released code & models for our Differentially Private Diffusion Models (DPDMs): github.com/nv-tlabs/DPDM Check it out and train your own DPDMs.
Replying to @timudk
🔒 Despite DP generative modeling being incredibly challenging, we hope that our results can stimulate future work in this important field. Project page: nv-tlabs.github.io/DPDM/ arXiv: arxiv.org/abs/2210.09929 Code will be released soon! Stay tuned! (12/12)
3
25
3,548
🫐🫐🫐
1
1
24
844
The diffusion tutorial dream team is back. Don't miss it.
📢 Planning your NeurIPS'23 trip? Interested in *Latent* Diffusion Models? @RuiqiGao, @ArashVahdat and I will present the tutorial "Latent Diffusion Models: Is the Generative AI Revolution Happening in Latent Space?" Monday, Dec 11, New Orleans. neurips2023-ldm-tutorial.git… (1/n)
4
22
5,038
I will be at @NeurIPSConf from Saturday-Saturday. 📨 DM if you want to chat about Diffusion models or if you need a buddy to watch the world cup ⚽️
3
19
2) The fps conditioning and motion conditioning can greatly influence results. You don't necessarily need to choose the fps conditioning = fps rendering! I have gotten very good results with high fps / high motion conditionings rendered at lower fps.
2
18
3,049
On my way to New Orleans for NeurIPS! Excited to chat about all things generative modeling, especially efficient and scalable video generation 📽️🎞️
1
18
2,134
TAing this class was super fun and I learned lots. My favorite parts were learning about convergence of proximal descent for non-convex functions (Remark 4.22) and the connection between dual averaging and the generalized conditional gradient (HW5).
Want to learn optimization? Start with my @UWCheritonCS colleague Yaoliang Yu's course "Optimization for Data Science"! 20 excellent lectures, starting from the basics. cs.uwaterloo.ca/~y328yu/myco…
1
16
If I am not mistaken, one can recover OT flow matching (your (21) + (9)) exactly using the diffusion v-prediction (SNR+1 loss) from arxiv.org/abs/2202.00512l with alpha_t = 1 and sigma_t = 1 -t. Credits to @RiversHaveWings who originally found this.
3
3
14
2,832
📢📢 Presenting: Latent Space Diffusion Models of Cryo-EM Structures We are training diffusion models in the latent space of a cryo-EM autoencoder. Huge potential for downstream applications such as protein generative modeling from cryo-EM data. 🔥
In a fantastic collaboration with @karsten_kreis, @timudk, and Zihao Li, we extend cryoDRGN ❄️🐉 for generative sampling of cryo-EM structures via latent diffusion models. We'll be presenting this work @workshopmlsb @NeurIPSConf Sat, 9am! #EZlab Paper: arxiv.org/abs/2211.14169 1/
2
13
I will give a talk on score-based models and our CLD-SGM model on Thursday 4pm EST. Tune in by registering here vectorinstitute.zoom.us/meet…
📢 Score-based Generative Modeling with Critically-Damped Langevin Diffusion! nv-tlabs.github.io/CLD-SGM/ We propose a novel diffusion using auxiliary velocity variables for more efficient denoising and higher quality generative models. w/ the amazing @timudk & @ArashVahdat! (1/n)
1
2
11
What do people think about this comparison between generative models? Source: arxiv.org/abs/2103.04922
2
4
13
7) This was a a great collaborative project and I am deeply grateful for co-leads @andi_blatt @sumith1896 and the rest of the team
1
13
1,795
3) The guidance scale can also have a big impact on results. We actually increase the guidance scale linearly from w_min to w_max over the frame axis. More guidance will lead to better consistency but may result in oversaturation. For best results play with w_min/w_max.
1
12
2,011
🔥🔥🔥
Flux 1.1 pro ultra is now on Replicate. 4 megapixels (2096x2096) in 10 seconds 🔥 replicate.com/black-forest-l…
1
1
12
1,195
4) The model was only trained for resolution 576x1024 and you will likely observe artifacts when changing the aspect ratio considerably. If you still want to try, it may help to increase the conditioning augmentation noise.
1
10
1,894
Replying to @arankomatsuzaki
" Unlike previous methods, our approach can remove concepts from a diffusion model permanently rather than modifying the output at the inference time, so it cannot be circumvented even if a user has access to model weights." - are weights released though?
1
9
1,465
Replying to @iScienceLuvr
This is a great summary Tanishq
1
9
1,590
Working with @karsten_kreis and @ArashVahdat on score-based generative models has been nothing but great. Excited to share what we have come up with soon. If you like to push SOTA generative models and their applications consider applying!
This was an exciting project - I continue to be amazed by the capabilities of modern deep generative models! If you are interested in working with us on generative models and their applications, please reach out. We are looking for exceptional interns at NVIDIA's Toronto AI Lab.
2
8
Replying to @sp_monte_carlo
Looking for a textbook on SDEs mostly for their application in probability theory/mcmc (fokker-planck eq, langevin dynamics)? Any recommendations?
2
7
Replying to @rbhuta95
Increasing motion_bucket_id should lead to more overall motion in the generated video
2
8
941
You should be able to get it below 20GB VRAM by decoding one frame at a time
2
1
8
944
GENIE is a higher-order solver that is based on the second truncated Taylor method (TTM). Intuitively, the higher-order terms in GENIE capture the local curvature of the ODE and enable larger step sizes when compared to DDIM (first TTIM). (2/6)
2
7
Replying to @sedielem
I guess it's time for yet another diffusion circle ⛱️
1
7
422
Personal update: I am very excited to start my PhD (tomorrow!) with Yaoliang Yu at @UWaterloo and @VectorInst. I will broadly be working on combining machine learning and probabilistic modeling.
6
Given a dataset {(x_i, y_i)}_{i=1}^m ⊂ R^2 how many parameters does a (deep) neural network need to achieve training error below epsilon (given the optimal solution is found by a magical optimizer and we are ok with overfitting)? Can we do better than O(m)? @roydanroy @mpd37
2
7
Project page: nv-tlabs.github.io/GENIE/ arXiv: arxiv.org/abs/2210.05475 Code will be released soon! Stay tuned! (6/6)
3
3
7
i love this
6
2,351
"Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" - SD-XL 0.9
Introducing the latest release from Stability AI: Breaking barriers with #SDXL 0.9! SDXL 0.9 produces massively improved text-to-image and composition detail over the beta release and provides a leap in use cases for generative AI imagery. #StabilityAI Unleash your creativity today! → bit.ly/3Xn12bI
1
6
1,802
5) Increasing conditioning augmentation noise is also necessary when applying the model to images that have heavy compression artifacts.
1
6
1,713
(1/6) My master's thesis is now available online: Generative Modeling with Neural Ordinary Differential Equations uwspace.uwaterloo.ca/handle/…
1
5
Replying to @sp_monte_carlo
Seems like you have had a look at quite a few books. Do you have recommendations/tips how to read them? I presume you did not have time to read them end to end and taking elaborate all throughout? I am struggling to find an efficient strategy.
1
4
Would be interesting to see how popular it would have become if it was published by some less popular researchers. "Just" publishing on arxiv when you are not well-known is difficult, i.e., you might not get much attention.
5
Replying to @jm_alexia
Thank you for tweeting about something else than covid!!
3
Replying to @zacharylipton
Similar pattern: new adversarial attack vs new adversarial defense
4
Replying to @lxuechen
In my Master's thesis, I derived the adjoint method for Neural ODEs and for Continuous normalizing flows from a constrained optimization framework. I didn't know about LeCun's paper when I derived the results but fortunately was pointed there before submitting the final version.
2
3
python scripts/sampling/simple_video_sample.py --decoding_t 1 --version svd_image_decoder
2
4
215
Thanks Chin-Wei :) Your Augmented Normalizing Flow paper was part of the motivation for our work.
4
Spotted in a high schooler's NeurIPS 2021 checklist: Q: Did you include the total amount of compute and the type of resources used? A: Our models were trained for a total of 1378 CPU-years on a TI-84 Plus.
Texas Instruments just released the TI-84 Plus CE Color Graphing Calculator. The crazy thing is it now supports #Python! Reminiscent of high school years, the TI has come a long way. amzn.to/2VBxyv7
3
I wanted to get my feet wet with Schrödinger Bridge GMs for a while. This work made the journey quite comfortable by neatly connecting to SGMs (via Forward-Backward SDEs). IMO the major advantage compared to SGMs is that you don't have to craft the forward process yourself.
Score-based generative models are implicit optimal transport models; lifting them to accept fully nonlinear diffusion yields Schrödinger Bridge generative models. Check out our latest work on log-likelihood training of Schrödinger Bridge 🌉! arxiv.org/pdf/2110.11291.pdf (1/3)
3
Come by to hear about our work on density deconvolution with flows invertibleworkshop.github.io…
Please join us *Saturday at #ICML2020 for the INNF+ workshop for invited talks by @wellingmax, @eric_nalisnick, @emidup, Cheng Zhang, @adjiboussodieng, @KyleCranmer and Martin Jankowiak. Starts 5:25 EDT / 11:25 CET / 18:25 JST invertibleworkshop.github.io… icml.cc/virtual/2020/worksho…
3
Cascaded DM pipelines and DM-based super-resolution have become crucial ingredients in large-scale image generation. We also explore the applicability of GENIE in this setting. Our GENIE upsampler only uses five function evaluations to generate the cats below. (5/6)
1
3
Replying to @jm_alexia
I guess it ultimately depends on what your goal is. I would have liked to see an actual application (or motivation) where this is useful.
3
Replying to @seungkim0123
Great stuff as always from you guys
1
2
180
Very excited to dig into this. I have been thinking about this problem for a while and I am very glad somebody did the math for me.
Diffusion models go Riemannian arxiv.org/abs/2202.02763 - Time reversal + score-matching on compact manifolds - Sampling and likelihood computation with SOTA results - Solves Schrodinger bridges on manifolds @ValentinDeBort1 @MathieuEmile @MHutchinson141 @JamesTThorn @yeewhye
2
That's definitely how most people in Germany would pronounce it.
1
2
Excited to try out this beast!! Great work as always @RiversHaveWings
My 602M parameter CLIP conditioned diffusion model trained on Conceptual 12M is out at github.com/crowsonkb/v-diffu…! It can generate images matching the prompt quickly using its CLIP conditioning, but still requires CLIP guidance for best results.
2
Replying to @fofrAI
Awesome - btw you can even generate more than 25 frames. Depending on the input, I could sometimes get good results for up to 40 frames - even more results will deteriorate quality.
1
2
132
During training, we propose to extract the necessary higher-order terms from the diffusion model (DM) via automatic differentiation. The higher-order terms are then distilled into a small neural network on top of the DM, allowing for efficient inference. (3/6)
1
2
@gaetan_hadjeres the revised version is now on arxiv
2
python scripts/sampling/simple_video_sample.py decoding_t 1 --version svd_image_decoder (If you decode one frame at a time you may as well use the standard image decoder)
1
2
217
tweet format stolen from @danielhanchen 🤫
1
2
292
Replying to @gjzhang1
We are still early in video generative modeling. Generating 20s-1min videos should be the next goal.
2
180
Replying to @RiversHaveWings
Last picture looks like an ocean wave crashing against the W Barcelona.
2
Just reread arxiv.org/abs/1904.12083 by @daibond_alpha et al. on training EBMs by learning a "dual sampler". Their work generalizes more or less all EBM training methods. Very impressive work.
2
🔒 (ii) Results! Training CNN classifiers with synthesized data from our DMs performs on par with CNN classifiers trained directly w/ DP-SGD. This is initial proof that DP generative models can eventually be used as effective data sharing media of sensitive data. (11/n)
1
2
I really liked @carlhenrikek's explanation of variational inference (piped.video/watch?v=qLyIGnS-…). Most often, VI is motivated by finding a good approximation q to the posterior, but we actually want is a lower bound of the marginal likelihood.
1
1
2
Congrats and very good choice !!
1
2
346
I haven’t really tried anything but Adam and SGD for NN training, and I don’t plan to do so. Seems like that’s ok.
📣 #ICML 2021 Paper 📣 Overwhelmed by the flood of optimizers for deep learning? We felt the same and performed an extensive benchmark. Joint work with @robinschmidt_ & @PhilippHennig5. Paper: arxiv.org/abs/2007.01547 Results: github.com/SirRob1997/Crowde… Video: piped.video/cz9RzlstFdE
2
🔒 (iii) Why DMs? GANs are currently predominantly used in DP generative modeling. They are difficult to optimize and prone to mode collapse which is problematic during noisy DP-SGD training. In contrast, DMs are trained with a robust and scalable regression-like loss. (8/n)
1
2
Replying to @_arohan_ @_akhaliq
Thanks Rohan :) We average the gradients for different noise levels of the same training example *before* clipping the gradients. This induces no additional privacy cost. Our approach is motivated by "augmentation multiplicity" in arxiv.org/abs/2204.13650.
2
The authors show here exact representation using O(m) parameters; I am more interested in getting some error bounds when we use less parameters. Results like this exist for polynomial interpolation: en.wikipedia.org/wiki/Polyno… see Chapter 7.
2
Replying to @isidentical
hahaha
1
1
60
🔒 Despite DP generative modeling being incredibly challenging, we hope that our results can stimulate future work in this important field. Project page: nv-tlabs.github.io/DPDM/ arXiv: arxiv.org/abs/2210.09929 Code will be released soon! Stay tuned! (12/12)
2
python scripts/sampling/simple_video_sample.py --decoding_t 1 --version svd_image_decoder Runs with VRAM spike of 20GB on A100 (PyTorch 2.0.1+cu117)
2
171
@yubai01 et al. pointed out the similarity of BC to Dual Averaging (DA). Our main contribution is a refinement thereof: BC is a nonconvex counterpart of DA, and more importantly, DA itself is the generalized conditional gradient algorithm applied to a smoothened dual problem.
1
1
Thanks for everything Iain!!
1
28
(3/6) I show how to make training of CNFs more efficient by scheduling the numerical solver tolerances. The inspiration for this comes from inexact newton methods. I am currently working on a paper that will extent this work to "adaptive tolerance schedulers". Stay tuned!
1
1
Replying to @fofrAI
Yes, should decrease gradually up to some limit. Too many frames generally lead to repetition/ back-forth movements.
1
44
🔒 (i) Why DMs? In DP-SGD, the amount of injected noise also depends on the model size: more parameters, more noise! The denoiser, the learnable component in DMs, is less complex than the network learned by a GAN or the end-to-end sampling process of the DM itself. (6/n)
1
1
Btw, GENIE and other recent DM acceleration works build on the DDIM ODE, which is considerably more easy to solve than the Probability Flow ODE. We discuss this in our paper but also see the excellent arxiv.org/abs/2206.00364 (3/n)
1