Pinned Tweet
1/ Our new @reve image model is now #2 on the @arena text-to-image leaderboard — behind only GPT Image 2, ahead of Nano Banana Pro, Microsoft, xAI and everyone else. And it's a 125 point jump over Reve 1.5 from just 3 months ago. The research story behind it 🧵👇
Today, we’re launching Reve 2.0, the best 4K image model in the world. We invented a new way to generate and edit any image using precise layouts. For the first time, it’s possible to create images you can touch.
44
46
509
2,296,630
Video diffusion models generate high-quality videos but are too slow for interactive applications. We @MIT_CSAIL @AdobeResearch introduce CausVid, a fast autoregressive video diffusion model that starts playing the moment you hit "Generate"! A thread 🧵
23
112
613
83,850
For more details, please visit the project website at causvid.github.io We plan to release an implementation based on an open-source model soon. I am incredibly grateful to all my collaborators at Adobe and MIT, including @qiangz_ai, @xunhuang1995, @rzhang88, @elishechtman, Bill Freeman, and @fredodurand Excited to explore a wide range of new applications in content creation, virtual reality, and robotics made possible by our approach.
5
2
24
2,332
CausVid trains a four-step autoregressive diffusion model to generate videos. Unlike previous bidirectional diffusion models that denoise all frames simultaneously, CausVid generates videos frame by frame. This approach enables users to watch the video while it is being generated.
2
1
17
4,607
To perform diffusion generation in just 4 steps instead of 50, we apply distribution matching distillation (DMD) to videos. For an excellent overview of DMD, see the following thread.
Diffusion models generate high-quality images but require hundreds of forward passes. @MIT_CSAIL and @AdobeResearch introduce Distribution Matching Distillation (DMD), a distillation approach that converts costly multi-step diffusion models into fast one-step generators. A thread 🧵
1
15
4,306
A bidirectional teacher with privileged future information during training proves surprisingly effective in reducing error accumulation in the causal student (see video below). This form of asymmetric distillation, where the student and teacher use different architectures, is only feasible with DMD-style distillation. Other methods, such as progressive distillation or consistency models, require identical architectures for both the student and teacher.
1
1
10
2,596
To address this, we propose an asymmetric distillation strategy where we supervise a causal student model with a bidirectional teacher.
1
1
10
2,207
One crucial issue with previous autoregressive diffusion approach is error accumulation: As the video generates future frames conditioned on previously generated ones, any imperfections in earlier frames compound over time, causing the video to drift off track. This eventually leads to visible artifacts and complete failure.
1
1
9
2,449
the most typical mode collapse example in generative model training ...
2
76
Replying to @oahzxl
There’s little difference in quality, but the distilled bidirectional model may handle local details better, while the distilled causal model offers much lower latency.
1
121
Replying to @BoyuanChen0
Thank you! Building on the shoulders of giants, particularly previous autoregressive diffusion works like Diffusion Forcing!
1
74
Replying to @parasjain
Congrats! Really Impressive!
1
1
400
Replying to @thuanz123
Thank you! It's great to share similar ideas!
1
85
Replying to @dreamingtulpa
Working hard but no clue 😅
1
1
61