Dan Fu (@realDanFu)

Sam Altman

@sama

25 Mar 2023

we though we wanted flying cars and not 140/280 characters, but really we wanted 32000 tokens

234

92,156

Dan Fu · Jan 11, 2024 · 6:06 PM UTC

Dan Fu

@realDanFu

11 Jan 2024

New year, new model drop! w/ @JonSaadFalcon, @simran_s_arora, excited to release new long-context retrieval models with Monarch Mixer, up to 32K sequence length! First step 2 long-context retrieval, outperforming Mistral, BGE, OpenAI on long-context document retrieval. 1/

226

53,786

Dan Fu · Jun 23, 2022 · 5:38 PM UTC

Dan Fu

@realDanFu

23 Jun 2022

S4 is an amazing sequence model - but has seemed mysterious. It doesn't have to be! In this blog (originally an internal explainer for our group), @HazyResearch looks at S4 from first principles that are familiar to most sophomore engineering students. hazyresearch.stanford.edu/bl…

Simplifying S4

Explaining S4 from the first principles of signal processing.

187

Dan Fu · Feb 15, 2023 · 8:28 PM UTC

Dan Fu

@realDanFu

15 Feb 2023

What's the simplest model that can get the job done? New paper and blog post on how the answer for sequence modeling (including language) may be convolutions... with a touch of regularization. 📜 arxiv.org/abs/2302.06646 🖥️ github.com/HazyResearch/safa… ⌨️ hazyresearch.stanford.edu/bl… 1/n

Simple Hardware-Efficient Long Convolutions for Sequence Modeling

State space models (SSMs) have high performance on long sequence modeling but require sophisticated initialization techniques and specialized implementations for high quality and runtime...

arxiv.org

164

30,293

Dan Fu · Jul 25, 2023 · 10:21 PM UTC

Dan Fu

@realDanFu

25 Jul 2023

You've heard of models that are sub-quadratic in sequence length, but what if they were sub-quadratic in model *dimension* too? Announcing a preview of Monarch Mixer - a fully sub-quadratic & hardware-efficient architecture that matches BERT in quality! w @simran_s_arora 1/

155

61,452

Dan Fu · Mar 5, 2025 · 9:36 PM UTC

Dan Fu

@realDanFu

5 Mar 2025

Super excited to announce ThunderMLA: fast MLA decode in ThunderKittens ⚡️🐱! Up to 35% faster than FlashMLA. Where does that speedup come from? It's all in the scheduling! 1/

Benjamin F Spector

@bfspector

5 Mar 2025

(1/7) Inspired by DeepSeek's FlashMLA, we're releasing ThunderMLA—a fused megakernel optimized for variable-prompt decoding! ⚡️🐱ThunderMLA is up to 35% faster than FlashMLA and just 400 LoC. Blog: bit.ly/4kubAAK With @AaryanSinghal4, @realDanFu, and @hazyresearch!

133

25,921

Dan Fu · Jan 23, 2023 · 7:31 PM UTC

Dan Fu

@realDanFu

23 Jan 2023

One key point: SSMs are *linear* in sequence length instead of quadratic, and have no fixed context length. Long context for everyone! We're super excited, so we're releasing our code and model weights today - up to 2.7B parameters! github.com/HazyResearch/H3 2/n

GitHub - HazyResearch/H3: Language Modeling with the H3 State Space Model

Language Modeling with the H3 State Space Model. Contribute to HazyResearch/H3 development by creating an account on GitHub.

126

14,769

Dan Fu · Jan 10, 2022 · 6:56 PM UTC

Dan Fu

@realDanFu

10 Jan 2022

The Stanford MLSys Seminar is now available in podcast form on Apple Podcasts, Spotify, Google, and more! We release new podcasts every Monday and Friday (new episodes on Fridays, old episodes from the backlog on Mondays). Check us out on your favorite platform below! (1/n)

126

Dan Fu · Apr 19, 2022 · 6:28 PM UTC

Dan Fu

@realDanFu

19 Apr 2022

Blog alert! 📣 How does contrastive learning work? How can we apply it effectively? New *3-part series* covering *2 new papers* on getting better transfer & robustness, and how to apply contrastive w types to improve entity retrieval. Part 1: hazyresearch.stanford.edu/bl… 👇 (1/n)

Advances in Understanding, Improving, and Applying Contrastive Learning

Part 1 of a 3-part blog series on advances in contrastive learning.

112

Dan Fu · Apr 22, 2025 · 5:20 PM UTC

Dan Fu

@realDanFu

22 Apr 2025

I’ll be at #ICLR2025! 🛫🇸🇬 - ThunderKittens (spotlight) w @bfspector Thu 3pm - I’ll be at the @togethercompute booth Fri afternoon - we’re hiring aggressively for kernels! Please reach out if you’d like to chat kernels🌽, TK⚡️🐱, Chipmunk🐿️, or anything performance! DMs open!

114

5,793

Dan Fu · Mar 6, 2025 · 7:11 PM UTC

Dan Fu

@realDanFu

6 Mar 2025

And we're not done - excited to announce ThunderGQA ⚡️🐱! Fast fused decode, applied to GQA for Llama & QWEN family models, and 20+% faster than FA3! We'll be shipping more updates to ThunderMLA in the coming days, watch this space! w/ @bfspector @AaryanSinghal4 @HazyResearch

Dan Fu

@realDanFu

5 Mar 2025

Super excited to announce ThunderMLA: fast MLA decode in ThunderKittens ⚡️🐱! Up to 35% faster than FlashMLA. Where does that speedup come from? It's all in the scheduling! 1/

105

10,294

Dan Fu · Jul 23, 2022 · 10:23 PM UTC

Dan Fu

@realDanFu

23 Jul 2022

Thrilled that FlashAttention won the best paper award at the Hardware Aware Efficient Training workshop at ICML - really excited to meet so many like-minded folks at the workshop. Thanks to the organizers (and NVIDIA) for the GPU!

Tri Dao

@tri_dao

31 May 2022

Announcing FlashAttention, a fast and memory-efficient attention algorithm with no approximation! 📣 w/ @realDanFu By reducing GPU memory reads/writes, FlashAttention runs 2-4x faster & requires 5-20x less memory than PyTorch standard attention, & scales to seq. length 64K. 1/

102

Dan Fu · Mar 28, 2022 · 5:46 PM UTC

Dan Fu

@realDanFu

28 Mar 2022

New preprint alert! 📣 How do we fuse foundation models with weak supervision? Liger (🐯 +🦁) is a new weak supervision framework that fuses FMs + WS using *local smoothness* -- outperforming both FMs and WS on their own. 📜 arxiv.org/abs/2203.13270 More below 👇 (1/n)

Dan Fu · Apr 17, 2023 · 4:19 PM UTC

Dan Fu

@realDanFu

17 Apr 2023

Super excited to release the RedPajama dataset - a new, fully open *1.2 trillion token* dataset following the LLaMA recipe. A first step towards creating leading, fully open-source large language models. together.xyz/blog/redpajama

RedPajama, a project to create leading open-source models, starts by reproducing LLaMA training...

together.ai

Together AI

@togethercompute

17 Apr 2023

Announcing RedPajama — a project to create leading, fully open-source large language models, beginning with the release of a 1.2 trillion token dataset that follows the LLaMA recipe, available today! together.xyz/blog/redpajama More in 🧵 …

32,706

Dan Fu · Mar 15, 2025 · 3:44 PM UTC

Dan Fu

@realDanFu

15 Mar 2025

A little pre-GTC present for everyone... new Blackwell kernels, all written in ThunderKittens! ⚡️🐱 BF16 & FP8 GEMMs, attention forwards & backwards - fast (competitive with cuDNN and cuBLAS) and open-source! w/ @bfspector @AaryanSinghal4 @HazyResearch @togethercompute 1/

6,754

Dan Fu · May 27, 2025 · 7:08 PM UTC

Dan Fu

@realDanFu

27 May 2025

An entire model... in a single kernel! The H100 number is crazy - at 1000 toks/s on 1xH100, the Llama-1B is running at 72% memory bandwidth util for the entire model. ⚡️

Benjamin F Spector

@bfspector

27 May 2025

(1/5) We’ve never enjoyed watching people chop Llamas into tiny pieces. So, we’re excited to be releasing our Low-Latency-Llama Megakernel! We run the whole forward pass in single kernel. Megakernels are faster & more humane. Here’s how to treat your Llamas ethically: (Joint with @jordanjuravsky, @stuart_sul, @OwenDugan, @dylan__lim, @realDanFu, @simran_s_arora, and @HazyResearch)

7,571

Dan Fu · May 8, 2025 · 3:59 PM UTC

Dan Fu

@realDanFu

8 May 2025

See everyone at MLSys 2025 in Santa Clara next week! Super excited to organize the Young Professional Symposium program on day one (May 12) and our invited speakers @soumithchintala (keynote), @Tim_Dettmers, @infwinston, @simran_s_arora, @BeidiChen!

9,857

Dan Fu · Dec 16, 2023 · 2:17 PM UTC

Dan Fu

@realDanFu

16 Dec 2023

Today I'm talking about FlashFFTConv at the ENLSP workshop (Efficient Natural Language and Speech Processing)! The talk is at 9:48 AM, and the poster session is from 1:00 to 2:00!

Dan Fu

@realDanFu

11 Dec 2023

I'm flying out to #NeurIPS2023 @NeurIPSConf! I'll be presenting an oral on Monarch Mixer tomorrow at 3:40 in the Oral 2A session, and I'll be presenting FlashFFTConv Saturday at the ENLSP workshop! Monarch Mixer: arxiv.org/abs/2310.12109 FlashFFTConv: arxiv.org/abs/2311.05908

23,301

Dan Fu · Oct 3, 2024 · 4:23 PM UTC

Dan Fu

@realDanFu

3 Oct 2024

A little taste of what we've been working on... super excited to launch support for FLUX models on @togethercompute! Some highlights: * With @tri_dao and the TKC team, we built the fastest FLUX engine anywhere - 315ms inference time for FLUX.1 [schnell] on our turbo engine * We're running a promotion with three months free support for FLUX.1 [schnell] on our free endpoint - have fun prototyping! * We're one of the exclusive launch partners for @bfl_ml's new FLUX1.1 [pro] model - the new state-of-the-art diffusion model by ELO score If you're excited about faster and better diffusion models (or you have a diffusion model you want to speed up, video anyone? 👀), please reach out - let's make diffusion faster for everyone!

Together AI

@togethercompute

3 Oct 2024

FLUX has arrived on Together AI, and it's faster and more powerful than ever. We’re one of the exclusive launch partners for FLUX1.1 [pro], @bfl_ml latest premium high-performance model. Plus we’re giving developers 3 months free access to FLUX.1 [schnell] via our FLUX-schnell-Free endpoint. Start building with state-of-the-art image generation today. Read more: together.ai/blog/flux-api-is…

15,403

Dan Fu · Jul 9, 2024 · 8:56 PM UTC

Dan Fu

@realDanFu

9 Jul 2024

This is one of the coolest papers I've read this year. Efficient attention-free LLMs (SSMs, Mamba, etc) are cool - but you lose something from going to a fixed state. E.g., much harder to use long-context docs in QA. The problem is that you don't know what to put in the state.

Simran Arora

@simran_s_arora

9 Jul 2024

Excited to share Just read twice: going beyond causal language modeling to close quality gaps between efficient recurrent models and attention-based models!! There’s so much recent progress on recurrent architectures, which are dramatically more memory efficient and asymptotically faster than attention 💨 But there’s no free lunch 🥪 these models can’t fit all the information from long contexts into the limited memory, degrading in-context learning quality. Is all lost?

10,184

Dan Fu · Dec 8, 2023 · 8:22 PM UTC

Dan Fu

@realDanFu

8 Dec 2023

Super excited for this model to see the light of day! 7B model, hybrid gated conv/SSM + attention architecture, trained for long context and running FlashFFTConv everywhere. You can chat with it now on the Together API!

Together AI

@togethercompute

8 Dec 2023

Announcing StripedHyena 7B — an open source model using an architecture that goes beyond Transformers achieving faster performance and longer context. It builds on the lessons learned in past year designing efficient sequence modeling architectures. together.ai/blog/stripedhyen…

18,733

Dan Fu · Jan 10, 2023 · 5:03 PM UTC

Dan Fu

@realDanFu

10 Jan 2023

After a short hiatus, the Stanford MLSys Seminar is coming back this quarter with a special series of episodes on foundation models! Our first talk (ep 67!!) will be @tri_dao, who'll be talking about FlashAttention. Catch us *TOMORROW* at 3:30 PT: piped.video/watch?v=gMOAud7h…

FlashAttention - Tri Dao | Stanford MLSys #67

Episode 67 of the Stanford MLSys Seminar “Foundation Models Limited...

10,814

Dan Fu · Feb 8, 2024 · 7:15 PM UTC

Dan Fu

@realDanFu

8 Feb 2024

ChatGPT's 1700-token system prompt got you down? Led by @jordanjuravsky, @brad19brown, introducing Hydragen, a simple technique for Transformer LLM inference with shared prefixes! Up to 30x improvement in throughput with no custom CUDA! A few things I love in this project: 1/

Jordan Juravsky

@jordanjuravsky

8 Feb 2024

Excited to share my first PhD project! TLDR: Hydragen is an exact, simple (no custom CUDA) implementation of attention for large batches with shared prefixes. We can improve LLM throughput by over 30x for CodeLlama-13b. Also, adding lots more shared context becomes cheap: growing a prefix from 1k to 16k tokens causes less than a 15% drop in throughput. Details: Large-batch inference with shared prefixes is a common use case for LLMs. Chatbots can have long system instructions that are shared across users, few-shot examples can be reused across multiple problems, or many candidate solutions can be sampled from a single prompt (e.g. self-consistency, AlphaCode). Shared prefixes enable special optimizations because the attention keys and values for the prefix tokens are identical across sequences. Libraries like vLLM are great at avoiding redundant storage of the prefix, enabling a much larger batch size. We show that in addition to saving memory, shared prefixes can also be used to significantly improve the speed of computing attention. Existing attention implementations operate independently on every sequence in the batch without considering sharing. When sequences do in fact share a prefix, this means that the same prefix keys and values are read from GPU memory many times, regardless of whether they are redundantly stored or not. Moreover, during decoding these approaches involve computing many matrix-vector products, preventing the use of fast tensor cores. Overall, this leads to attention having a low hardware utilization that can bottleneck end-to-end decoding with big batches or long sequences. With Hydragen, we can improve the utilization and speed of attention by taking advantage of shared prefixes. Hydragen is a combination of two techniques: 1. Attention Decomposition: We split attention over the full sequence (which has partial KV overlap across the batch) into prefix attention (which has full overlap) and suffix attention (which has no overlap). As long as we store the softmax denominators from each sub-computation, we can cheaply combine them to obtain the full attention result. 2. Inter-Sequence Batching: Now that the shared prefix has been split into its own attention op, we can compute it efficiently by batching attention queries together across sequences. This converts many matrix-vector products into fewer matrix-matrix products, reducing redundant reads and leveraging tensor cores. Both of these techniques can be easily implemented in PyTorch, as long as you have access to a fast attention primitive that returns softmax denominators. Hydragen can dramatically improve end-to-end LLM throughput over baselines that only avoid redundant prefix storage. The speedups are biggest when attention is expensive relative to the rest of the model (e.g. large batch sizes, long sequence lengths, smaller models, no MQA/GQA), and when the ratio of prefix length to suffix length is high. A key takeaway for LLM users is that prefix attention is so fast that adding more shared tokens is cheap. With a large batch size, expanding the prefix length from 1k to 16k tokens for Hydragen only results in a 15% drop in throughput, while for vLLM throughput drops by over 90%.

15,541

Dan Fu · Dec 16, 2023 · 10:28 PM UTC

Dan Fu

@realDanFu

16 Dec 2023

Thrilled to win the Best Poster award at the ENLSP workshop!

Dan Fu

@realDanFu

16 Dec 2023

Today I'm talking about FlashFFTConv at the ENLSP workshop (Efficient Natural Language and Speech Processing)! The talk is at 9:48 AM, and the poster session is from 1:00 to 2:00!

31,990

Dan Fu · Apr 21, 2025 · 5:51 PM UTC

Dan Fu

@realDanFu

21 Apr 2025

Super excited to share Chipmunk 🐿️- training-free acceleration of diffusion transformers (video, image generation) with dynamic attention & MLP sparsity! Led by @austinsilveria, @SohamGovande - 3.7x faster video gen, 1.6x faster image gen. Kernels written in TK ⚡️🐱 1/

Austin Silveria

@austinsilveria

21 Apr 2025

Training-free acceleration of Diffusion Transformers with dynamic sparsity and cross-step attention/MLP deltas--collaboration with @SohamGovande and @realDanFu! ⚡️ 3.7x faster video and 1.6x faster image generation while preserving quality! 🧵 Open-source code & CUDA kernels!

8,350

Dan Fu · Nov 28, 2022 · 8:43 PM UTC

Dan Fu

@realDanFu

28 Nov 2022

I'll be at #NeurIPS2022 this week! @tri_dao and I will be presenting FlashAttention (arxiv.org/abs/2205.14135) at Poster Session 4 Hall J #917, Wednesday 4-6 PM. Super excited to talk all things performance, ML+systems, and breaking down scaling bottlenecks!

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Transformers are slow and memory-hungry on long sequences, since the time and memory complexity of self-attention are quadratic in sequence length. Approximate attention methods have attempted to...

arxiv.org

Dan Fu · Feb 28, 2020 · 4:37 PM UTC

Dan Fu

@realDanFu

28 Feb 2020

Super excited to share some thoughts with @laurel_orr1 on lessons learned from the past four years with @HazyResearch and @SnorkelML, and what's next for the ways that machine learning is changing how we build software: hazyresearch.stanford.edu/so…

Dan Fu · Jul 6, 2021 · 3:00 PM UTC

Dan Fu

@realDanFu

6 Jul 2021

This Thursday, @srush_nlp from @cornell_tech will be talking to us about going beyond softmax in NLP. As always, 30 minute talk + 30 minute podcast with live audience questions, be sure to tune in! Livestream link: piped.video/watch?v=8nx4KfK3… #Stanford #MachineLearning

NLP Beyond Softmax feat. Sasha Rush | Stanford MLSys Seminar Episode...

Episode 33 of the Stanford MLSys Seminar Series!Beyond Softmax: S...

Dan Fu · May 16, 2022 · 5:02 PM UTC

Dan Fu

@realDanFu

16 May 2022

Our paper got accepted to #ICML2022 - excited to talk about this work in Baltimore!

Mayee Chen

@MayeeChen

19 Apr 2022

New preprint alert! 📣 How do we produce transferable and robust representations with supervised contrastive learning? We need *geometric spread* and an inductive bias towards *latent subclass clustering* in representation space. 📜 arxiv.org/abs/2204.07596 👇 (1/n)

Dan Fu · Jan 23, 2023 · 7:31 PM UTC

Dan Fu

@realDanFu

23 Jan 2023

We're super excited about these advances, so we're releasing our code and model weights today: github.com/HazyResearch/H3 13/n

GitHub - HazyResearch/H3: Language Modeling with the H3 State Space Model

Language Modeling with the H3 State Space Model. Contribute to HazyResearch/H3 development by creating an account on GitHub.

4,035

Dan Fu · Jan 23, 2023 · 7:31 PM UTC

Dan Fu

@realDanFu

23 Jan 2023

In H3, we replace attention with a new layer based on state space models (SSMs) - with the right modifications, we find that it can outperform Transformers. Two key ideas: * Adapting SSMs to be able to do *comparison* * Making SSMs as hardware-efficient as attention 3/n

8,230

Dan Fu · Oct 13, 2022 · 7:27 PM UTC

Dan Fu

@realDanFu

13 Oct 2022

We built off the super-optimized version of Diffusers that And 33% faster than the super optimized version that @Nouamanetazi / @huggingface released last week - the diff is pretty small, 68 LOC: github.com/HazyResearch/diff…

Add FlashAttention · HazyResearch/diffusers@fd45ca2

Update README Update with example Check for FlashAttention install Update README REmove breakpoint Remove new line

Dan Fu · Apr 24, 2023 · 8:16 PM UTC

Dan Fu

@realDanFu

24 Apr 2023

We’ve been hard at work training RedPajama 7B! GPUs go brrr :)

Together AI

@togethercompute

24 Apr 2023

Training our first RedPajama 7B model is going well! Less than half way through training (after 440 billion tokens) the model achieves better results on HELM benchmarks than the well-regarded Pythia-7B trained on the Pile. Details at together.xyz/blog/redpajama-…

5,889

Dan Fu · Aug 15, 2022 · 9:52 PM UTC

Dan Fu

@realDanFu

15 Aug 2022

A bit late, but super honored to receive the best student paper runner up at @UncertaintyInAI #UAI2022! This project has been 2+ years in the making (we started *before COVID*), so super grateful to see it recognized! w @MayeeChen, @dyhadila, @fredsala, @kayvonf, @HazyResearch!

Dan Fu

@realDanFu

28 Mar 2022

Dan Fu · Jan 23, 2023 · 7:31 PM UTC

Dan Fu

@realDanFu

23 Jan 2023

Overall, really excited about new models/architectures like this. What happens if we don't need attention to get the magic we've been seeing, and we can get the same quality with a linear operator? No more fixed context windows, long context for everyone! 16/n

ALT Stillesque GIF

3,631

Dan Fu · May 31, 2022 · 12:50 AM UTC

Dan Fu

@realDanFu

31 May 2022

Super excited by this work. Making attention IO-aware makes it run way faster - and enables much longer sequences, since memory footprint becomes linear in sequence length. Really excited to see how this gets used, and where it goes next - IO-aware transformers?

Tri Dao

@tri_dao

31 May 2022

Dan Fu · Oct 13, 2022 · 7:27 PM UTC

Dan Fu

@realDanFu

13 Oct 2022

FlashAttention speeds up attention and reduces its memory footprint - without any approximation. Our key insight is that attention is bottlenecked by GPU memory *reads/writes*. FlashAttention speeds up attention by reducing the R/W. Same FLOPs, 3-4x faster!

Dan Fu · Jun 16, 2022 · 7:37 PM UTC

Dan Fu

@realDanFu

16 Jun 2022

Friends don't let friends run XGBoost on tabular data without trying foundation models first Great work by some awesome labmates!

Avanika Narayan

@Avanika15

16 Jun 2022

Can Foundation Models (FMs) clean and integrate your data? We explore the efficacy of FMs on these hard classical data tasks (1/7)

Dan Fu · Mar 10, 2023 · 11:10 PM UTC

Dan Fu

@realDanFu

10 Mar 2023

Build your own ChatGPT! Super excited by this open-source release - even more exciting that it was trained 100% carbon-negative. Happy to play a (minuscule) part in putting it together and helping serve it faster. Looking forward to seeing what folks build on top of this!

Together AI

@togethercompute

10 Mar 2023

Introducing OpenChatKit. A powerful, open-source base to create chatbots for various applications. Details in 🧵 together.xyz/blog/openchatki…

10,183

Dan Fu · Oct 9, 2019 · 4:02 PM UTC

Dan Fu

@realDanFu

9 Oct 2019

Why Train What You Can Code? Excited to share Rekall - using programmatic composition to find new events in video! Paper on arXiv, and code available on Github! Blog: dawn.cs.stanford.edu/2019/10…

Dan Fu · Nov 13, 2023 · 5:31 AM UTC

Dan Fu

@realDanFu

13 Nov 2023

Replying to @main_horse @arankomatsuzaki

We were going to wait until the morning, but now is as good a time as any: github.com/HazyResearch/flas…

GitHub - HazyResearch/flash-fft-conv: FlashFFTConv: Efficient Convolutions for Long Sequences with...

FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores - HazyResearch/flash-fft-conv

Aaryan Singhal @AaryanSinghal4

1,671

Dan Fu · Mar 6, 2025 · 6:08 PM UTC

Dan Fu

@realDanFu

6 Mar 2025

I’m building up the kernels team @togethercompute! If you’d like to contribute to kernels like ThunderMLA for production workloads, please reach out!

Together AI

@togethercompute

6 Mar 2025

Replying to @togethercompute

At Together AI, we are thrilled to be building a world-class kernels team. If you’d like to come build with us, please reach out!

5,060

Dan Fu · Oct 3, 2022 · 8:38 PM UTC

Dan Fu

@realDanFu

3 Oct 2022

Meta uses FlashAttention to speed up inference in AITemplate - really cool work, super excited to see folks pick it up!

AI at Meta

@AIatMeta

3 Oct 2022

Get faster, more flexible inference on GPUs using our newly open-sourced AITemplate, a revolutionary new inference engine that delivers up to 12X performance improvements on NVIDIA GPUs & 4X on AMD GPUs compared to eager-mode within Pytorch. Learn more: bit.ly/3rl8F3b nitter.app/MetaAI/status/15769745…

Dan Fu · Mar 1, 2022 · 2:36 AM UTC

Dan Fu

@realDanFu

1 Mar 2022

Absolutely thrilled to receive the best paper award w @MayeeChen for our work on supervised contrastive learning at the AI with Biased/Scarce Data Workshop at @RealAAAI today! Check out the paper on the workshop website: drive.google.com/file/d/1LX7… Short 🧵👇 - more soon! (1/n)

Dan Fu · Mar 9, 2025 · 6:02 PM UTC

Dan Fu

@realDanFu

9 Mar 2025

Happy Sunday! ThunderMLA -> ThunderGQA -> ThunderMHA! What’s next? 👀

9 Mar 2025

Wrapping up our trio of decode kernels, we’re excited to announce ThunderMHA! Our fused decode kernels pack now supports Multi-Head Attention (MHA), powering even faster inference for day-1 architectures like Transformers, GPT, and BERT. 10%+ faster than FA3 on H100s, we’re excited to keep on pushing perf 🚀. Play with ThunderMHA here: github.com/HazyResearch/Thun… w/ @bfspector @realDanFu @HazyResearch

2,480

Dan Fu · Jul 29, 2023 · 4:06 PM UTC

Dan Fu

@realDanFu

29 Jul 2023

Join us today for our workshop on efficient systems for foundation models - we’ve got a killer lineup of speakers and posters!

ES-FoMo@ICML2025 @ESFoMo

25 Jul 2023

Attending #ICML2023? Join us Saturday at our workshop on Efficient Systems for Foundation Models! 🔥 Large-Scale Distributed Training 🚀 Efficient Inference ⚙️ Deep Optimization 📈 Over 50 posters and 4 orals spanning from RL to efficient finetuning! gpusgobrrr.com

38,105

Dan Fu · Dec 6, 2024 · 1:37 AM UTC

Dan Fu

@realDanFu

6 Dec 2024

I'm heading to #NeurIPS next week Wed-Fri! I'll be at a couple things: - Wed 1-2pm: talking Transformer killers with @picocreator at @swyx @latentspacepod live! - Wed 11am: RedPajama poster (spotlight) with @mauriceweberq I'm also recruiting for my lab at UCSD this cycle and for @togethercompute! Please reach out if you're interested in: - CUDA kernels/ThunderKittens - Faster diffusion models - SSMs/architectures DM me if you'd like to meet up! 👋

5,020

Dan Fu · May 8, 2025 · 3:59 PM UTC

Dan Fu

@realDanFu

8 May 2025

Then we'll have an invited talk from @Tim_Dettmers (10:45-11:05) on "Lessons Learned from Successful PhD Students" - where Tim will tell us a bit about his PhD journey and how to have a satisfying and successful PhD. I'm sure it will be great advice for all of us!

15,107

Dan Fu · Jan 17, 2023 · 5:00 PM UTC

Dan Fu

@realDanFu

17 Jan 2023

Ce Zhang (@DS3Lab and @togethercompute) has done some crazy stuff in distributed training. In this talk, he goes over the magic behind distribute training and inference on a GLOBAL scale over slow networks! Tune in tomorrow at 3:30 pm Pacific! piped.video/watch?v=e7o2C0lP…

Distributed and Decentralized Learning - Ce Zhang | Stanford MLSys #68

Episode 68 of the Stanford MLSys Seminar “Foundation Models Limited...

5,327

Dan Fu · May 30, 2023 · 8:06 PM UTC

Dan Fu

@realDanFu

30 May 2023

The deadline for our #ICML2023 workshop Efficient Systems for Foundation Models is tomorrow, May 31 AOE! Submit your best papers on training, inference or anything FM systems and efficiency - then join us for a great day of speakers & panel in Hawaii! es-fomo.com

13,320

Dan Fu · Oct 13, 2022 · 7:27 PM UTC

Dan Fu

@realDanFu

13 Oct 2022

Check out our fork of @huggingface Diffusers on GitHub and our blog post to try it out yourselves and read more! Code: github.com/HazyResearch/diff… Blog: hazyresearch.stanford.edu/bl…

Dan Fu · May 3, 2023 · 6:50 AM UTC

Dan Fu

@realDanFu

3 May 2023

If you're at ICLR, Catch my talk on our paper Hungry Hungry Hippos: Towards Language Modeling with State Space Models today at 10 AM in room AD12! Featuring photos of actual Rwandan hippos :) (+poster from 11:30-1:30 at board 80!)

Dan Fu

@realDanFu

28 Apr 2023

🛫 to Rwanda for #ICLR2023! I’ll be giving a talk about H3 on Wednesday, and talking about some newer work on long convs at the ME-FoMo workshop on Thursday. Please reach out if you’ll be there and want to chat! Happy to talk about Hyenas, Red Pajamas, or anything else!

4,539

Dan Fu · Jan 23, 2023 · 7:31 PM UTC

Dan Fu

@realDanFu

23 Jan 2023

The upshot: we can scale H3 up to *2.7B* parameter models. And because of the state passing, we can run inference blazing fast -- up to *2.4x* faster than highly-optimized Transformers. Up to 1,980 tokens/second! 12/n

4,204

Dan Fu · Apr 3, 2023 · 4:52 AM UTC

Dan Fu

@realDanFu

3 Apr 2023

Replying to @typedfemale

Thanks for bringing this to our attention. We've updated the blog in light of this new and important information: 🙏🙏🙏 hazyresearch.stanford.edu/bl…

From Deep to Long Learning?

2,467

Dan Fu · Jan 22, 2025 · 6:30 PM UTC

Dan Fu

@realDanFu

22 Jan 2025

This is cool - a generalization of attention, SSMs, RNNs through the view of associative recall and what is solvable by each class. Nice work @heyyalexwang!

Alex Wang

@heyyalexwang

22 Jan 2025

did you know you've been doing test-time learning this whole time? transformers, SSMs, RNNs, are all test-time regressors but with different design choices we present a unifying framework that derives sequence layers (and higher-order attention👀) from a *single* equation 🧵

3,507

Dan Fu · Jan 23, 2023 · 7:31 PM UTC

Dan Fu

@realDanFu

23 Jan 2023

The H3 layer closes the gap on our synthetics, and the gains translate to strong downstream performance on language modeling. We replaced almost all the attention blocks in a Transformer with H3 layers, and trained on the PILE. Our model *outperforms* GPT-Neo in PPL! 7/n

5,339

Dan Fu · Oct 13, 2022 · 7:27 PM UTC

Dan Fu

@realDanFu

13 Oct 2022

We were actually a bit late to the game here - when we saw a couple folks on Reddit and elsewhere who beat us to the punch, we decided to give it a try ourselves :) PhotoRoom: photoroom.com/tech/stable-di… u/hnipun: teddit.net/r/StableDiffusion…

Make stable diffusion up to 100% faster with Memory Efficient Attention

Dive into optimizing the Stable Diffusion pipeline for photo editing apps at Photoroom by leveraging memory-efficient attention mechanisms from the xformers library, resulting in significant speed...

photoroom.com

Dan Fu · Jun 24, 2025 · 8:28 PM UTC

Dan Fu

@realDanFu

24 Jun 2025

What a throwback to weak supervision! Great work @JonSaadFalcon @ekellbuch @MayeeChen!

Jon Saad-Falcon

@JonSaadFalcon

24 Jun 2025

How can we close the generation-verification gap when LLMs produce correct answers but fail to select them? 🧵 Introducing Weaver: a framework that combines multiple weak verifiers (reward models + LM judges) to achieve o3-mini-level accuracy with much cheaper non-reasoning models like Llama 3.3 70B Instruct! 🧵(1 / N)

5,374

Dan Fu · Jun 10, 2025 · 5:44 PM UTC

Dan Fu

@realDanFu

10 Jun 2025

And to close out a trio of diffusion papers… Super excited to announce Grafting - a method for distilling pretrained diffusion transformers into *new architectures*, led by @keshigeyan! Swap attention for new primitives for 2% pretraining cost, exciting for modeling research!

Keshigeyan Chandrasegaran

@keshigeyan

10 Jun 2025

1/ Model architectures have been mostly treated as fixed post-training. 🌱 Introducing Grafting: A new way to edit pretrained diffusion transformers, allowing us to customize architectural designs on a small compute budget. 🌎 grafting.stanford.edu/ Co-led with @MichaelPoli6

6,283

Dan Fu · Dec 11, 2023 · 6:30 PM UTC

Dan Fu

@realDanFu

11 Dec 2023

One final plug: Oral 2A Efficient Learning tomorrow is absolutely **packed** with great work from @Tim_Dettmers and @srush_nlp - super excited to hear what they have to say!

3,924

Dan Fu · Jun 3, 2024 · 5:38 PM UTC

Dan Fu

@realDanFu

3 Jun 2024

Mambas go brr with tensor cores!

Tri Dao

@tri_dao

3 Jun 2024

With @_albertgu, we’ve built a rich theoretical framework of state-space duality, showing that many linear attn variants and SSMs are equivalent! The resulting model, Mamba-2 is better & faster than Mamba-1, and still matching strong Transformer arch on language modeling. 1/

3,607

Dan Fu · Jan 23, 2023 · 7:31 PM UTC

Dan Fu

@realDanFu

23 Jan 2023

With FlashConv, we can make SSMs outperform attention for almost all sequence lengths -- up to 35x faster than FlashAttention for long sequences! 11/n

4,386

Dan Fu · Jan 17, 2022 · 10:07 PM UTC

Dan Fu

@realDanFu

17 Jan 2022

(1/n) This week we have @fredsala on the Stanford MLSys Seminar, live on Thursday at 1:30 PM! Fred was a postdoc at @StanfordAILab, and is now a professor at @WisconsinCS and a research scientist at @SnorkelAI -- so he knows a thing or two about MLSys. piped.video/watch?v=XbnAYeSJ…

Dan Fu · Oct 30, 2023 · 5:27 PM UTC

Dan Fu

@realDanFu

30 Oct 2023

RedPajama-v2 - 30 trillion tokens, 84 CC dumps, 5 languages! Excited to see what people do with it :)

Together AI

@togethercompute

30 Oct 2023

We are excited to release RedPajama-Data-v2: 30 trillion filtered & de-duplicated tokens from 84 CommonCrawl dumps, 25x larger than our first dataset. It exposes a diverse range of quality annotations so you can slice & weight the data for LLM training. together.ai/blog/redpajama-d…

3,159

Dan Fu · Dec 11, 2023 · 6:30 PM UTC

Dan Fu

@realDanFu

11 Dec 2023

14,960

Dan Fu · Jan 23, 2023 · 7:31 PM UTC

Dan Fu

@realDanFu

23 Jan 2023

And we've got two blog posts up on our work -- first, read about our synthetic languages and how we developed H3: hazyresearch.stanford.edu/bl… 14/n

H3: Language Modeling with State Space Models and (Almost) No Attention

Replacing attention with SSMs in language modeling.

3,761

Dan Fu · Jul 17, 2025 · 6:36 PM UTC

Dan Fu

@realDanFu

17 Jul 2025

I’m off to #ICML2025 in Vancouver! (After an unusually eventful first flight - our plane had a wing problem, so we had to take an emergency landing back to SFO & switch planes) Reach out if you’d like to chat about (mega)kernels, @togethercompute, or anything MLSys! 1/

1,000

Dan Fu · Jan 3, 2022 · 9:56 PM UTC

Dan Fu

@realDanFu

3 Jan 2022

The MLSys Seminar is back this week with our very own @BeidiChen! Tune in Thursday, 1:30 PM on YouTube to hear about her great work on sparsity in deep learning. Livestream link: piped.video/watch?v=aGPzuwox… #Stanford #MachineLearning

Pixelated Butterfly: Fast Machine Learning with Sparsity - Beidi Chen...

Episode 49 of the Stanford MLSys Seminar Series!Pixelated Butterf...

Dan Fu · Jan 23, 2023 · 7:31 PM UTC

Dan Fu

@realDanFu

23 Jan 2023

These synthetic languages (inspired by great work like transformer-circuits.pub/202…) test how well SSMs can do in-context learning compared to attention. We find a critical missing capability -- SSMs have trouble *comparing tokens* across the sequence. 5/n

6,173

Dan Fu · Apr 28, 2023 · 2:06 PM UTC

Dan Fu

@realDanFu

28 Apr 2023

Dan Fu

@realDanFu

23 Jan 2023

7,122

Dan Fu · Oct 13, 2022 · 7:27 PM UTC

Dan Fu

@realDanFu

13 Oct 2022

We sped up stable diffusion by replacing the self-attention/cross-attention blocks in the unet with FlashAttention. FlashAttention doesn't do any approximation, so you get the *exact same image* at the end.

Dan Fu · Jul 18, 2022 · 11:37 PM UTC

Dan Fu

@realDanFu

18 Jul 2022

I'm at #ICML2022 this week! Let's chat if you're also in person! I'm presenting two papers: - Improving Transfer, Robustness of Supervised Contrastive Learning arxiv.org/abs/2204.07596 - FlashAttention: Fast & Memory-Efficient Exact Attention arxiv.org/abs/2205.14135 ⏱below!

Dan Fu · Apr 30, 2023 · 6:08 PM UTC

Dan Fu

@realDanFu

30 Apr 2023

The power of data - RedPajama-2.8B matches Pythia-7B in HELM score after being trained on 2x the tokens! Excited to see these models continue to improve as they see more tokens :)

Together AI

@togethercompute

30 Apr 2023

In addition to RedPajama 7B, we’ve also been training a 2.8B model. After 600B tokens it is exciting to see the model has higher HELM scores than the excellent Pythia-2.8B & GPT-Neo 2.7B. In fact, trained with twice the tokens, RedPajama-2.8B has comparable quality to Pythia-7B!

2,698

Dan Fu · Jan 23, 2023 · 7:31 PM UTC

Dan Fu

@realDanFu

23 Jan 2023

In response, we designed the H3 layer (Hungry Hungry Hippos) to plug this gap. The H3 layer stacks two SSMs, and uses some simple multiplicative interactions between them (gating) to do comparisons. 6/n

6,248

Dan Fu · Sep 29, 2022 · 11:21 PM UTC

Dan Fu

@realDanFu

29 Sep 2022

Wow, excited to see FlashAttention seeing adoption by folks in industry - excited to see where else it can make training faster!

Databricks AI Research

@DbrxMosaicAI

29 Sep 2022

We have exciting news! In our latest and greatest LLM blog, we show how MosaicML Cloud can help you train LLMs from 1B - 70B parameters, and for the first time, publish transparent times + costs for doing so. It's a lot cheaper than you think! (1/9) mosaicml.com/blog/gpt-3-qual…

Dan Fu · Oct 13, 2020 · 10:13 PM UTC

Dan Fu

@realDanFu

13 Oct 2020

Super excited for our new seminar series on ML and systems -- how does ML change the modern programming stack, and what does it mean for how people will build and deploy applications in the future? Live on YouTube every Thursday, 3-4 PM PT. Check out links below for more!

Hazy Research: Strip Mall AI Research Club

@HazyResearch

13 Oct 2020

Announcing the new live-streamed Stanford MLSys Seminar Series, in which we will explore the frontier of machine learning and systems. Read the full announcement: hazyresearch.stanford.edu/ml… Schedule: mlsys.stanford.edu Intro video: piped.video/OEiNnfdxBRE

Dan Fu · Feb 14, 2022 · 10:58 PM UTC

Dan Fu

@realDanFu

14 Feb 2022

(1/n) This week @dorisjlee from @ucbrise and @BerkeleyISchool will be joining us on the Stanford MLSys Seminar to talk about her fantastic work on @lux_api. You can catch us live on YouTube this Thursday at 1:30 PT! Deets in 🧵👇: piped.video/watch?v=yrmSoU8j…

Lux: Visualization for Data Science - Doris Lee | Stanford MLSys #55

Episode 55 of the Stanford MLSys Seminar Series!Always-on Datafra...

Dan Fu · Jun 9, 2025 · 7:49 PM UTC

Dan Fu

@realDanFu

9 Jun 2025

Announcing HMAR - Efficient Hierarchical Masked Auto-Regressive Image Generation, led by @KumbongHermann! HMAR is hardware-efficient, reformulates autoregressive image generation in a way that can take advantage of tensor cores. Hermann is presenting it at CVPR this week!

Hermann Kumbong

@KumbongHermann

9 Jun 2025

Excited to be presenting our new work–HMAR: Efficient Hierarchical Masked Auto-Regressive Image Generation– at #CVPR2025 this week. VAR (Visual Autoregressive Modelling) introduced a very nice way to formulate autoregressive image generation as a next-scale prediction task (from low-res to high-res) as opposed to next-token prediction. HMAR builds on VAR to make it even better. We make changes that: ✅Improve quality; FID, Inception Score (up to 30 pts) & qualitatively ✅ Speed up training by up to 2.5x, inference by up to 1.75x, and reduce inference memory footprint by up to 3x. ✅Enable adjustable sampling schedules to trade-off quality/speed without retraining from scratch.

5,005

Dan Fu · Nov 11, 2024 · 7:19 PM UTC

Dan Fu

@realDanFu

11 Nov 2024

One thing I've been wondering about as the next generation of GPUs come online is how much further we can take the quantization/scaling bits down regime... this paper takes some of the first steps towards answering this question!

Tanishq Kumar

@tanishqkumar07

11 Nov 2024

[1/7] New paper alert! Heard about the BitNet hype or that Llama-3 is harder to quantize? Our new work studies both! We formulate scaling laws for precision, across both pre and post-training arxiv.org/pdf/2411.04330. TLDR; - Models become harder to post-train quantize as they are overtrained on lots of data, so that eventually more pretraining data can be actively harmful if quantizing post-training! - The effects of putting weights, activations, or attention in varying precisions during pretraining are consistent and predictable, and fitting a scaling law suggests that pretraining at high (BF16) and next-generation (FP4) precisions may both be suboptimal design choices! Joint work with @ZackAnkner @bfspector @blake__bordelon @Muennighoff @mansiege @CPehlevan @HazyResearch @AdtRaghunathan.

3,123

Dan Fu · Nov 13, 2023 · 6:12 PM UTC

Dan Fu

@realDanFu

13 Nov 2023

Replying to @realDanFu @arankomatsuzaki @_akhaliq

A few fun bits I couldn't fit into the original tweet: 1. We also have the fastest implementation of a short depthwise 1D convolution, which doesn't use the FFT but is up to 7x faster than PyTorch Conv1D, check out our repo to try it out: github.com/HazyResearch/flas… 2. During development of this project, we found a bug in the backward pass of the FFT in PyTorch... that was a fun one to debug :) github.com/pytorch/pytorch/i… 3. End-to-end speedup numbers and comparison against FA-v2 (Twitter only allows 4 images per post now?) 62% MFU end-to-end!

1,889

Dan Fu · Jan 23, 2023 · 7:31 PM UTC

Dan Fu

@realDanFu

23 Jan 2023

What's the problem? Long convolutions require multiple FFT calls, which introduce expensive GPU memory reads/writes. We develop FlashConv to address this problem. FlashConv uses a block FFT algorithm to increase FLOP util, and uses state passing to scale to long sequences. 10/n

4,772

Dan Fu · Feb 28, 2020 · 10:30 PM UTC

Dan Fu

@realDanFu

28 Feb 2020

Announcing FlyingSquid - fast weak supervision with triplet methods. We speed up weak supervision by orders of magnitude, allowing weakly-supervised video analysis and online learning! Blog: hazyresearch.stanford.edu/fl… w/ @MayeeChen, @fredsala, Sarah Hooper, @kayvonf, @HazyResearch

Dan Fu · Jul 11, 2025 · 5:52 PM UTC

Dan Fu

@realDanFu

11 Jul 2025

This is really cool! There’s a ton of places where a dynamic differentiable hierarchy makes a ton of sense. Awesome to see progress here!

Albert Gu

@_albertgu

11 Jul 2025

Tokenization is just a special case of "chunking" - building low-level data into high-level abstractions - which is in turn fundamental to intelligence. Our new architecture, which enables hierarchical *dynamic chunking*, is not only tokenizer-free, but simply scales better.

1,137

Dan Fu · Jan 23, 2023 · 7:31 PM UTC

Dan Fu

@realDanFu

23 Jan 2023

Part 1: the quality gap SSM's have achieved impressive results on sequence modeling (30+ points over Transformers on Long Range Arena), but have underperformed attention in language modeling. In our paper, we use *synthetic languages* to probe this gap 4/n

6,847

Dan Fu · Jul 25, 2023 · 10:21 PM UTC

Dan Fu

@realDanFu

25 Jul 2023

Blog post: hazyresearch.stanford.edu/bl… Code: github.com/HazyResearch/m2 I'll be at #ICML2023 in Honolulu this week through Saturday - come chat if you're interested! 2/

Monarch Mixer: Revisiting BERT, Without Attention or MLPs

1,601

Dan Fu · Apr 11, 2022 · 8:56 PM UTC

Dan Fu

@realDanFu

11 Apr 2022

This week we're excited to have @kexinrong (@Stanford, @VMware, and @gtcomputing) on the MLSys Seminar. Kexin will talk about improving query performance on big-data analytics. Be there or be square! Watch us live on YouTube this Thursday at 1:30 PT: piped.video/watch?v=sHmpMoao…

Big Data Analytics - Kexin Rong | Stanford MLSys #61

Episode 61 of the Stanford MLSys Seminar Series!Learned Indexing ...

Dan Fu · Oct 23, 2023 · 8:45 PM UTC

Dan Fu

@realDanFu

23 Oct 2023

Thanks Tri! And yes, I'm on the academic job market this year :)

Tri Dao

@tri_dao

23 Oct 2023

As much as I like attention, I'm also fond of attention-free architectures for long context. @realDanFu and others have been pushing in this direction, with deep theory and compelling empirical results! And @realDanFu is on the academic job market this year!

5,192

Dan Fu · Dec 11, 2024 · 10:19 PM UTC

Dan Fu

@realDanFu

11 Dec 2024

Thanks for having me on! It was really fun, really great event and really well-run!

swyx @aiDotEngineer WF Day 1

@swyx

11 Dec 2024

Replying to @swyx @roboflow @vikhyatk @moondreamai @soldni @natolambert @NousResearch @sophiamyang @DynamicWebPaige @picocreator @Alibaba_Qwen

and then there’s @realDanFu presenting all the frontier architecture work, starting with the adorably named ThunderKittens!!

5,335

Dan Fu · Jan 11, 2024 · 6:06 PM UTC

Dan Fu

@realDanFu

11 Jan 2024

Check out the blog for more details on the technical bits, and check out our GitHub for instructions on how to play with the model! Blog: hazyresearch.stanford.edu/bl… Github: github.com/HazyResearch/m2/b… 7/

Long-Context Retrieval Models with Monarch Mixer

1,243

Dan Fu · Jan 23, 2023 · 7:31 PM UTC

Dan Fu

@realDanFu

23 Jan 2023

With @tri_dao (co-first), @KhaledSaab11, @ai_with_brains, Atri Rudra, and @HazyResearch! Thanks to @StanfordAILab, @StanfordHAI, @StanfordCRFM, and @togethercompute for helping provide us with the compute necessary to train these models! 17/17

3,567

Dan Fu · Jan 11, 2024 · 6:06 PM UTC

Dan Fu

@realDanFu

11 Jan 2024

Model weights available on HuggingFace, and AutoModel compatible. Download them with just two lines of code! 32k model: huggingface.co/togethercompu… 8k model: huggingface.co/togethercompu… 2k model: huggingface.co/togethercompu… 4/

1,431

Dan Fu · May 19, 2025 · 8:39 PM UTC

Dan Fu

@realDanFu

19 May 2025

ES-FoMo is back at #ICML2025 this year! Submissions open until May 26, come join us for a great day of talks and posters in Vancouver!

ES-FoMo@ICML2025 @ESFoMo

19 May 2025

ES-FoMo is back for round three at #ICML2025! Join us in Vancouver on Saturday July 19 for a day dedicated to Efficient Systems for Foundation Models: from 💬reasoning models to🖼️scalable multimodality, 🧱efficient architectures, and more! Submissions due May 26! More below 👇

1,481

Dan Fu · Jan 11, 2024 · 6:06 PM UTC

Dan Fu

@realDanFu

11 Jan 2024

This project has been a great collaboration with @togethercompute. Thanks to them, these models are already integrated into @MongoDB Atlas, @langchain, and @llama_index. Check out their tweet thread for more details! 9/

Together AI

@togethercompute

11 Jan 2024

We are thrilled to announce the Together Embeddings endpoint! 🚀 Higher quality than OpenAI or Cohere in the MTEB benchmark. ✅ State of the art M2-Retrieval models with up to 32k context length. ✅ Up to 4x lower price. ✅ together.ai/blog/embeddings-… Details👇

4,448

Dan Fu · Feb 22, 2022 · 10:30 PM UTC

Dan Fu

@realDanFu

22 Feb 2022

(1/n) This week we're delighted to have @faitpoms (@Stanford, @SnorkelAI) on the MLSys Seminar Series! Fait will be talking about a vision for interactive model development, so you won't want to miss it. Catch us live on YouTube Thursday at 1:30 PM! piped.video/watch?v=-9LbJBzK… 🧵👇

Interactive Model Development - Fait Poms | Stanford MLSys #56

Episode 56 of the Stanford MLSys Seminar Series!A vision for inte...