Quanquan Gu · Jun 29, 2025 · 6:18 AM UTC

Quanquan Gu

Quanquan Gu

@QuanquanGu

29 Jun 2025

You don’t need a PhD to be a great AI researcher, as long as you’re standing on the shoulders of 100 who have one.

Noam Brown

@polynoamial

28 Jun 2025

You don’t need a PhD to be a great AI researcher. Even @OpenAI’s Chief Research Officer doesn’t have a PhD.

135

2,335

288,968

Quanquan Gu · Dec 6, 2023 · 7:35 PM UTC

Quanquan Gu

@QuanquanGu

6 Dec 2023

Uncertain about GPT-5, but a super-strong model (more powerful than Gemini) is expected to arrive anytime now.

anton

@abacaj

6 Dec 2023

The odds of OpenAI shipping GPT-5 just went up

921

424,773

Quanquan Gu · Jul 28, 2024 · 6:07 AM UTC

Quanquan Gu

@QuanquanGu

28 Jul 2024

Whether right or wrong, the true value of X is just increasing. I’ve noticed that many in AI/ML are joining X, making it the best platform to stay updated on the latest research and fostering discussions.

This Post is from an account that no longer exists.

801

136,258

Quanquan Gu · Feb 9, 2024 · 5:58 PM UTC

Quanquan Gu

@QuanquanGu

9 Feb 2024

We've just open-sourced the code and data for Self-play Fine-Tuning (SPIN)! Time to SPIN every model out there! 🚀🚀🚀 Code: github.com/uclaml/SPIN Data: huggingface.co/collections/U… Models: huggingface.co/collections/U… Project Page: uclaml.github.io/SPIN/ Many thanks to @Yihe__Deng, @HuizhuoY, and @Kaixuan_Ji_19 for their tremendous efforts in preparing these.

GitHub - uclaml/SPIN: The official implementation of Self-Play Fine-Tuning (SPIN)

The official implementation of Self-Play Fine-Tuning (SPIN) - uclaml/SPIN

github.com

Quanquan Gu

@QuanquanGu

3 Jan 2024

Give someone a fish, and you feed them for a day; teach someone to fish, and you feed them for a lifetime. Elevating from Weak to Strong with Self-Play Fine-Tuning (SPIN) for All LLMs. Empower, Evolve, SPIN! arxiv.org/pdf/2401.01335.pdf

171

779

168,951

Quanquan Gu · Oct 20, 2024 · 2:53 AM UTC

Quanquan Gu

@QuanquanGu

20 Oct 2024

This suggests that transformers are capable of reasoning and planning.

Deedy

@deedydas

18 Oct 2024

DeepMind just trained a 270M transformer that can play like a grandmaster without searching through moves (MCTS), It was trained on Stockfish’s assessment of ~10M games so it didn’t really learn the game from scratch. Impressive that transformers generalize to a “logic” task

716

228,550

Quanquan Gu · Jun 10, 2025 · 11:18 PM UTC

Quanquan Gu

@QuanquanGu

10 Jun 2025

Are there still any AI experts who think we won’t achieve AGI soon?

258

728

210,224

Quanquan Gu · Nov 18, 2024 · 8:05 PM UTC

Quanquan Gu

@QuanquanGu

18 Nov 2024

Today’s the day to launch! Introducing MARS (Make vAriance Reduction Shine): the ultimate LLM optimizer. Let’s unite, innovate, and take our shot at MARS! 🚀🚀🚀 Paper: arxiv.org/pdf/2411.10438 Code: github.com/AGI-Arena/MARS

393

117,136

Quanquan Gu · Feb 22, 2025 · 11:31 PM UTC

Quanquan Gu

@QuanquanGu

22 Feb 2025

Spent several hours with Grok-3, and it’s absolutely incredible. @elonmusk and xAI are pushing the boundaries once again. The future of AGI is here! 🚀🔥

Elon Musk

@elonmusk

22 Feb 2025

🚀🚀

521

49,855

Quanquan Gu · Jul 11, 2025 · 5:33 PM UTC

Quanquan Gu

@QuanquanGu

11 Jul 2025

This explains why LLaMA 4 failed. The tokens per parameter (TPP) is way off. You can’t defy scaling laws and expect miracles. === Llama 4 Maverick was 400B(17B active) and >30T tokens, TPP = 1764 Llama 4 Behemoth was 2T(288B active) and > 30T tokens, TPP = 104 DeepSeek v3 is 671B(37B active) and 14.8T tokens, TPP = 400 Kimi K2 is 1T(32B active) and 15.5T tokens, TPP = 484

@nrehiew_

11 Jul 2025

Llama 4 Maverick was 400B(70B active) and >30T tokens = 429 tokens / active param Llama 4 Behemoth was 2T(288B active) and > 30T tokens = 104 tokens / active param DeepSeek v3 is 671B(37B active) and 14.8T tokens = 400 tokens / active param Kimi K2 is 1T(32B active) and 15.5T tokens = 484 tokens / active param

526

93,663

Quanquan Gu · Feb 17, 2025 · 9:27 PM UTC

Quanquan Gu

@QuanquanGu

17 Feb 2025

I hope I can feel the AGI tonight. @elonmusk @xai

Sam Altman

@sama

17 Feb 2025

trying GPT-4.5 has been much more of a "feel the AGI" moment among high-taste testers than i expected!

486

53,642

Quanquan Gu · Jul 21, 2024 · 8:18 AM UTC

Quanquan Gu

@QuanquanGu

21 Jul 2024

Stunned to see the AI guru lose his composure.

Yann LeCun

@ylecun

21 Jul 2024

X is a $44 billion propaganda machine. Yet it attempts to disguise itself as a defender of unfettered free speech, a source of factual information, and a substitute for professional journalism.

451

102,745

Quanquan Gu · May 2, 2024 · 5:24 AM UTC

Quanquan Gu

@QuanquanGu

2 May 2024

Another triumph for Self-Play! Self-Play Preference Optimization (SPPO) has surpassed (iterative) DPO, IPO, Self-Rewarding LMs, and others on AlpacaEval, MT-Bench, and the Open LLM Leaderboard. Remarkably, Mistral-7B-instruct-v0.2 fine-tuned by SPPO achieves superior performance to GPT-4 0613 without relying on any GPT-4 responses. Explore the roadmap of LLM fine-tuning techniques: Supervised Fine-tuning: SFT --> SPIN Preference Fine-tuning: PPO --> DPO --> SPPO Paper: arxiv.org/pdf/2405.00675

452

165,280

Quanquan Gu · Dec 31, 2024 · 1:48 AM UTC

Quanquan Gu

@QuanquanGu

31 Dec 2024

When Google's "Attention is All You Need" paper came out, I was busy proving bounds.

405

65,033

Quanquan Gu · Jan 20, 2024 · 11:16 PM UTC

Quanquan Gu

@QuanquanGu

20 Jan 2024

Not surprising. This phenomenon is known as 'benign overfitting' in machine learning. It suggests that even with a minimal loss, there's still room for improving the 'margin' or some 'implicit bias' that boosts generalization ability. There are a bunch of works studying this phenomenon. Here is an incomplete list of references including our own works: [1] arxiv.org/abs/1906.11300 [2] jmlr.org/papers/volume24/22-… [3] jmlr.org/papers/volume24/21-… [4] proceedings.mlr.press/v178/f… [5] openreview.net/pdf?id=pF8btd… [6] proceedings.mlr.press/v202/k… [7] openreview.net/pdf?id=JpbLyE… [8] openreview.net/pdf?id=G560qr…

anton

@abacaj

20 Jan 2024

Reminder instruct-gpt (OpenAI) was trained on 16 epochs (even after overfitting on just 1). SFT for LLMs is wild

414

98,574

Quanquan Gu · Jan 3, 2024 · 2:17 AM UTC

Quanquan Gu

@QuanquanGu

3 Jan 2024

400

346,021

Quanquan Gu · Jul 29, 2024 · 7:50 AM UTC

Quanquan Gu

@QuanquanGu

29 Jul 2024

Replying to @ylecun @elonmusk

I’m afraid that it will end up like San Francisco, grappling with high crime rates and a lot of other significant problems.

387

35,011

Quanquan Gu · Nov 27, 2023 · 1:35 AM UTC

Quanquan Gu

@QuanquanGu

27 Nov 2023

I agree with Yann's view that current auto-regressive LLMs may not constitute the ultimate solution for achieving AGI. However, it's important to acknowledge that: Can LLMs do reasoning? Yes. LLM can implement first-order logic, see e.g., openreview.net/pdf?id=qFVVBz… by Abulhair Saparov and He He. For example, if X implies Y, and Y implies X, then X and Y are equivalent. This fundamental principle in logic is known as the biconditional or 'if and only if' (iff) statement. It asserts that if the truth of X guarantees the truth of Y, and vice versa, then X and Y are logically equivalent. LLM has the capacity to implement this principle along with other reasoning principles. Can LLMs do planning? Yes. LLM can effectively implement Markov Decision Processes (MDPs), known as decision transformers, see e.g., openreview.net/pdf?id=a7APmM… and arxiv.org/pdf/2202.05607.pdf by @aravindr93 @adityagrover_ @pabbeel @yayitsamyzhang @qqyuzu et al. MDPs are a fundamental framework for modeling decision-making in stochastic environments. By implementing MDPs, LLMs demonstrate the capability in understanding and executing complex planning tasks.

Yann LeCun

@ylecun

26 Nov 2023

Popping this up: a response to a question about what I consider reasoning & planning, why current Auto-Regressive LLMs can't do it, why that would require AI systems with world models, and why we still have a lot of progress to do towards AI systems that can learn and reason.

393

223,737

Quanquan Gu · Sep 27, 2025 · 11:58 PM UTC

Quanquan Gu

@QuanquanGu

27 Sep 2025

Agreed. GRPO is technically wrong.

This tweet is unavailable

371

139,675

Quanquan Gu · Oct 28, 2024 · 10:29 PM UTC

Quanquan Gu

@QuanquanGu

28 Oct 2024

I'm afraid that’s not true. Transformers are far from being the "optimal" architecture.

Simo Ryu

@cloneofsimo

28 Oct 2024

This is yet another reminder that if you are like freshman into ML, before you question: "hmm... maybe this part of transformer is useless/can be replaced?" just read these paper. arxiv.org/abs/2109.08668v2 arxiv.org/abs/2001.08361 arxiv.org/abs/2002.05202 arxiv.org/abs/2404.05405 and realize there is practically no 'fundamental' aspect of current transformer that can give you dramatic benefit. You can speed up training time but arch probably aint it.

355

100,453

Quanquan Gu · Jul 19, 2025 · 2:42 PM UTC

Quanquan Gu

@QuanquanGu

19 Jul 2025

Wait… this model didn’t even use Lean? That’s insane. Big congrats to the @OpenAI team. That’s incredible work!

Mark Chen

@markchen90

19 Jul 2025

We achieved gold medal level performance on this year's IMO! Our model thinks and writes proofs in clear, plain‑English - no formal code required. Unlike the narrower systems used in past competitions, our model is built to reason broadly, far beyond contest problems.

356

39,931

Quanquan Gu · Jan 26, 2025 · 8:50 PM UTC

Quanquan Gu

@QuanquanGu

26 Jan 2025

Ranked No. 1 now!

Derya Unutmaz, MD

@DeryaTR_

26 Jan 2025

DeepSeek has now moved up to the second spot from third since this morning among all of the top iPhone apps!

324

61,263

Quanquan Gu · Dec 22, 2024 · 11:09 PM UTC

Quanquan Gu

@QuanquanGu

22 Dec 2024

He isn’t entirely wrong. LLMs do generate a lot of incorrect answers as they produce more tokens. The key is to leverage strategies that reject the wrong ones and find the correct ones. Essentially, you’re trading compute for accuracy: the more tokens you generate, the better chances of finding the right answer, as long as you have mechanisms to filter effectively.

Air Katakana

@airkatakana

22 Dec 2024

what yann lecun said: "the more tokens an llm generates, the more likely it is to go off the rails and get everything wrong" what actually happened: "we get extremely high accuracy on arc-agi by generating billions of tokens, the more tokens we throw at it the better it gets"

340

73,238

Quanquan Gu · Jan 16, 2025 · 10:22 PM UTC

Quanquan Gu

@QuanquanGu

16 Jan 2025

MHA-->GQA-->MLA--->TPA🚀🚀🚀 Introducing Tensor Product Attention (TPA). To reduce KV cache size, various Multi-Head Attention (MHA) variants have been developed, including Multi-Query Attention (MQA), Group Query Attention (GQA), and Multi-Head Latent Attention (MLA). GQA has been adopted in LLaMA 2 and 3, while MLA is utilized in DeepSeek V2 and V3. Although MLA achieves greater KV cache reduction than GQA, it is incompatible with Rotary Positional Encoding (RoPE) and requires ad hoc modifications for integration. Tensor Product Attention (TPA) addresses these limitations by offering full compatibility with RoPE while further improving KV cache efficiency through context-aware tensor decomposition. With all these distinct advantages, TPA is the obvious choice! Paper: arxiv.org/pdf/2501.06425 Code: github.com/tensorgi/T6 Website: tensorgi.github.io/T6/

Yifan Zhang

@yifanzhang_

14 Jan 2025

1/ Introducing “Tensor Product Attention Is All You Need” (TPA) and Tensor ProducT ATTenTion Transformer (T6)! 🚀 Ever wondered if there’s a more memory-efficient way to handle long contexts in LLMs? Homepage: tensorgi.github.io/T6

315

54,879

Quanquan Gu · Sep 29, 2024 · 9:05 PM UTC

Quanquan Gu

@QuanquanGu

29 Sep 2024

🚀To celebrate the veto of SB 1047, we will drop the best-ever optimizer for deep learning and LLMs once this post hits 1k likes!

Quanquan Gu

@QuanquanGu

29 Sep 2024

SB 1047 has been vetoed in California by @GavinNewsom—Thank you! This is a huge relief and a win for the future of tech innovation in the state.

305

33,616

Quanquan Gu · Jan 25, 2025 · 9:46 PM UTC

Quanquan Gu

@QuanquanGu

25 Jan 2025

With the stellar performance of DeepSeek V3 and R1, along with their open-sourcing, it’s time to adopt these models as Judges instead of OpenAI API in benchmarks like AlpacaEval, MTBench, and beyond. This move will democratize open source AI research, accelerate progress, and significantly reduce costs for academic labs, from thousands to hundreds of thousands of dollars per year. The saved cost can then be invested in training more next generation of researchers and practitioners in AI.

298

32,849

Quanquan Gu · Nov 8, 2023 · 9:23 PM UTC

Quanquan Gu

@QuanquanGu

8 Nov 2023

📢 Excited to share our latest research on improving human-AI communication! 🤖💬 We introduce 'Rephrase and Respond' (RaR), a simple yet effective method that enhances LLMs’ understanding of human questions. Check out how RaR improves #GPT4 performance by resolving ambiguities & can be integrated with Chain-of-Thought (#CoT) for more robust AI responses. 🌟This work is led by @Yihe__Deng and an exceptional team of students @WeitongZhang @_zxchen_ Paper: arxiv.org/pdf/2311.04205.pdf Project: uclaml.github.io/Rephrase-an… Code: github.com/uclaml/Rephrase-a… HuggingFace: huggingface.co/papers/2311.0… Key Insights: 👉 Human input is key to LLM response quality. Crafting clear, detailed questions is crucial as our different thought frames may lead to AI misunderstandings. 👉 To tackle the disparity between human and LLM thought frames, we introduce RaR, which prompts the LLM to rearticulate the given question, and respond. 👉 Our experiments demonstrate that RaR significantly improves the performance of various GPT models across a wide range of tasks. 👉 We introduce formal mathematical formulations for both CoT and RaR and show that RaR is different and complementary to CoT. Through empirical analysis, we illustrate the importance of question quality—it should be prioritized before enhancing the model’s reasoning capabilities! 🧵1/N

299

105,809

Quanquan Gu · Jun 25, 2024 · 6:52 PM UTC

Quanquan Gu

@QuanquanGu

25 Jun 2024

We've open-sourced the code and models for Self-Play Preference Optimization (SPPO)! 🚀🚀🚀 ⭐ code: github.com/uclaml/SPPO 🤗models: huggingface.co/collections/U…

Quanquan Gu

@QuanquanGu

2 May 2024

309

96,218

Quanquan Gu · Oct 22, 2024 · 8:04 PM UTC

Quanquan Gu

@QuanquanGu

22 Oct 2024

To become a better researcher, the key is to consistently read high-quality papers and critically think about them every day.

Eugene Vinitsky 🦋@EugeneVinitsky

22 Oct 2024

Becoming a better researcher is easy. Just spend your time like so: 40% reading papers 40% writing code 40% coming up with new ideas 40% studying textbooks 40% talking to other researchers

300

40,663

Quanquan Gu · Dec 27, 2024 · 8:26 PM UTC

Quanquan Gu

@QuanquanGu

27 Dec 2024

Check out our work, Direct Q Optimization (DQO), which is the ‘true RL’ version of RLHF. Let’s make RLHF true RL again! Paper: arxiv.org/abs/2410.09302

Enhancing Multi-Step Reasoning Abilities of Language Models...

Reinforcement Learning (RL) plays a crucial role in aligning large language models (LLMs) with human preferences and improving their ability to perform complex tasks. However, current approaches...

arxiv.org

Denny Zhou

@denny_zhou

27 Dec 2024

In the old days, the term “RL” by default meant the “true RL” used in AlphaZero. Now, the term “RL” by default means the “fake RL” used in RLHF (nothing is negative about RLHF here. RLHF is a great innovation).

294

91,719

Quanquan Gu · Oct 6, 2024 · 5:50 PM UTC

Quanquan Gu

@QuanquanGu

6 Oct 2024

1/n 🚀 Introducing General Preference representation Model (GPM) and General Preference Optimization (GPO) for RLHF! 🎯 Reward modeling plays a central role in RLHF. Most existing reward models are based on the classical Bradley-Terry (BT) reward model. However, the BT model has limitations in handling intransitivity and complex human preferences. 💡 We introduce the GPM model, which lifts the BT model from scalar-valued space to vector-valued space using preference embedding, retaining the simplicity of BT model training while adding greater flexibility! Notably, our GPM achieves a query complexity of O(K) for evaluating preferences among K responses, a significant improvement over the O(K^2) complexity of traditional supervised preference models that rely on pairwise inputs. 💡 Building on GPM, we propose GPO, which takes self-play preference optimization (SPPO) to new heights! Paper: arxiv.org/pdf/2410.02197

Yifan Zhang

@yifanzhang_

4 Oct 2024

1/8 ⭐General Preference Modeling with Preference Representations for Aligning Language Models⭐ arxiv.org/abs/2410.02197 As Huggingface Daily Papers: huggingface.co/papers/2410.0… We just dropped our latest research on General Preference Modeling (GPM)! 🚀

290

80,131

Quanquan Gu · May 26, 2025 · 3:28 AM UTC

Quanquan Gu

@QuanquanGu

26 May 2025

The RPG is out. Make KL-regularized Policy Gradient Correct Again! No more GRPO or Reinforce++ — their objectives and KL regularization are inherently inconsistent.

YIFENG LIU @YIFENGLIU_AI

26 May 2025

1/6 We introduce RPG, a principled framework for deriving and analyzing KL-regularized policy gradient methods, unifying GRPO/k3-estimator and REINFORCE++ under this framework and discovering better RL objectives than GRPO: Paper: arxiv.org/abs/2505.17508 Code: github.com/complex-reasoning… Webpage: complex-reasoning.github.io/… @yifanzhang_, @HuiZhuoY, @QuanquanGu

206

53,977

Quanquan Gu · Feb 15, 2022 · 7:42 PM UTC

Quanquan Gu

@QuanquanGu

15 Feb 2022

I am tremendously excited and humbled to receive the Sloan Research Fellowship. Special thanks to my amazing mentors, collaborators, students, and of course @UCLAengineering @UCLAComSci who made all the work happen! Thank @SloanFoundation for the recognition and support!

Sloan Foundation

@SloanFoundation

15 Feb 2022

Introducing… the winners of this year’s Sloan Research Fellowship! These extraordinary researchers represent some of the most exciting young minds working today—and we are thrilled to support them. Meet the winners here: sloan.org/fellowships/2022-F… #SloanFellow

288

Quanquan Gu · Aug 10, 2025 · 4:12 AM UTC

Quanquan Gu

@QuanquanGu

10 Aug 2025

You’re only half right. Yes, GPT-5 failed twice. No, scaling laws aren’t over. Do them right, and the game goes on.

Yuchen Jin

@Yuchenj_UW

10 Aug 2025

GPT-5 failed twice. Scaling laws are coming to an end. Open-source AI will have the Mandate of Heaven.

267

38,002

Quanquan Gu · Dec 14, 2024 · 10:35 PM UTC

Quanquan Gu

@QuanquanGu

14 Dec 2024

This is deeply concerning and offensive. Using a specific racial group as an example, even with a disclaimer that most individuals from that group are honest and morally upright, does not make it appropriate or acceptable.

Jiao Sun

@sunjiao123sun_

14 Dec 2024

Mitigating racial bias from LLMs is a lot easier than removing it from humans! Can’t believe this happened at the best AI conference @NeurIPSConf We have ethical reviews for authors, but missed it for invited speakers? 😡

267

25,141

Quanquan Gu · Dec 1, 2023 · 10:27 PM UTC

Quanquan Gu

@QuanquanGu

1 Dec 2023

Expected. Get ready for super strong models from an exceptional team—arriving soon!🚄

anton

@abacaj

1 Dec 2023

No, these 70B models are not better than GPT-4

250

132,631

Quanquan Gu · Mar 22, 2025 · 5:04 PM UTC

Quanquan Gu

@QuanquanGu

22 Mar 2025

In RL we trust.

235

21,269

Quanquan Gu · Jun 29, 2025 · 9:08 AM UTC

Quanquan Gu

@QuanquanGu

29 Jun 2025

It’s not that people think calculus or math is useless in AI. They’re just tired of theory folks who never touch code, never scale a model, and still argue they’re solving problems in AI:) If theory becomes detached from practice, the world will treat it like noise and that’s on us.

Jelani Nelson

@minilek

28 Jun 2025

See below on what Zuckerberg is looking for in star recruits worth $100m pay packages for Meta’s plans in Artificial Intelligence. But weren’t some people saying calculus is no longer useful in the AI age? 🤔

226

38,453

Quanquan Gu · Sep 29, 2025 · 8:23 PM UTC

Quanquan Gu

@QuanquanGu

29 Sep 2025

Nice blog post! Essentially, this shows that μP + LoRA, when done right, makes the optimal learning rate transferable and nearly matches full fine-tuning performance. One subtle but important point worth mentioning is that there is an additional dimension of scaling to consider: the rank of the LoRA (r), along with several other multipliers.

Thinking Machines

@thinkymachines

29 Sep 2025

LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA. thinkingmachines.ai/blog/lor…

229

27,236

Quanquan Gu · Jul 5, 2024 · 11:29 PM UTC

Quanquan Gu

@QuanquanGu

5 Jul 2024

Due to visa issues, only one person from our group can attend #icml2024. This is beyond frustrating. 😱

214

68,426

Quanquan Gu · Nov 6, 2023 · 2:09 AM UTC

Quanquan Gu

@QuanquanGu

6 Nov 2023

I believe the message in the paper is straightforward and uncontroversial. However, it seems there might be a misunderstanding in @abacaj 's interpretation. Pre-trained transformers can effectively acquire in-context knowledge for tasks related to their pre-training data and generalize to those tasks, but they cannot generalize to tasks significantly distinct from their pre-training contexts. In fact, the generalization ability of transformer-based in-context learning is discussed in terms of task diversity in arxiv.org/pdf/2306.15063.pdf by @AllanRaventos @SuryaGanguli et al., and task complexity in arxiv.org/pdf/2310.08391.pdf by @uuujingfeng et al.

anton

@abacaj

5 Nov 2023

New paper by Google provides evidence that transformers (GPT, etc) cannot generalize beyond their training data

214

122,560

Quanquan Gu · Sep 1, 2024 · 1:13 AM UTC

Quanquan Gu

@QuanquanGu

1 Sep 2024

AdamW remains the leading optimizer for training LLMs.

Quanquan Gu

@QuanquanGu

30 Aug 2024

What's your preferred optimizer for LLMs?

201

31,808

Quanquan Gu · Dec 26, 2024 · 10:47 PM UTC

Quanquan Gu

@QuanquanGu

26 Dec 2024

Very Impressive work! It appears that Multi-Token Prediction (MTP) has a substantial impact for both pre-training and inference.

DeepSeek

@deepseek_ai

26 Dec 2024

🚀 Introducing DeepSeek-V3! Biggest leap forward yet: ⚡ 60 tokens/second (3x faster than V2!) 💪 Enhanced capabilities 🛠 API compatibility intact 🌍 Fully open-source models & papers 🐋 1/n

207

36,257

Quanquan Gu · Jan 19, 2024 · 8:27 PM UTC

Quanquan Gu

@QuanquanGu

19 Jan 2024

We have uploaded the model weights of SPIN for zephyr-7b-sft-full at iteration 0-3 to @huggingface : huggingface.co/UCLA-AGI In our paper, we used the latest version (v.0.4.0) of Eleuther AI Harness for evaluation, resulting in slight difference in numbers compared with @huggingface OpenLLM leaderboard, which uses an older version. The overall trend of performance improvement remains the same and notably the improvement after 4 iterations from the SFT checkpoint is even more significant (>6%)!

Quanquan Gu

@QuanquanGu

3 Jan 2024

186

73,513

Quanquan Gu · Sep 17, 2024 · 7:41 AM UTC

Quanquan Gu

@QuanquanGu

17 Sep 2024

Replying to @ylecun

Yann, we acknowledge your support for Democratic Party and your stance against Trump, which is nothing wrong. However, why do you often quote misinterpreted tweets and seem to struggle with distinguishing between legal, illegal, and criminal immigrants? Additionally, you often compare the U.S. with Europe or other regions. Many people come to the U.S. for specific reasons and perhaps choose it over Europe or other countries. Therefore, we focus on American values and the future of the U.S., not on what happens elsewhere.

199

27,148

Quanquan Gu · Sep 28, 2025 · 3:15 AM UTC

Quanquan Gu

@QuanquanGu

28 Sep 2025

Replying to @zjasper

The original GRPO is an off-policy RL algorithm, but its KL regularization isn't done right. Specifically, the k3 estimator for the unnormalized reverse KL is missing the importance weight. The correct formulation should be:

207

113,417

Quanquan Gu · Sep 3, 2025 · 10:27 PM UTC

Quanquan Gu

@QuanquanGu

3 Sep 2025

Another fantastic benchmark of optimizers. Key takeaways: 1. Variance-reduced Adam variants (e.g., MARS) achieve significant speedups over the AdamW baseline. 2. Matrix-based optimizers (e.g., Muon, SOAP) consistently outperform their scalar-based counterparts (e.g., Lion). Note: MARS unifies both scalar- and matrix-based optimizers. The code for the matrix-based version of MARS will be released soon.

Tanishq Mathew Abraham, Ph.D.

@iScienceLuvr

3 Sep 2025

Fantastic Pretraining Optimizers and Where to Find Them "we conduct a systematic study of ten deep learning optimizers across four model scales (0.1B-1.2B parameters) and data-to-model ratios (1–8× the Chinchilla optimum)." "we find that all the fastest optimizers such as Muon and Soap, use matrices as preconditioners" "However, the speedup of matrix-based optimizers is inversely proportional to model scale, decreasing from 1.4× over AdamW for 0.1B parameter models to merely 1.1× for 1.2B parameter models." Observations made in the paper: 1. Hyperparameter transfer between optimizers is non-trivial. 2. The speedup of new optimizers is lower than claimed and diminishes with model size. 3. Early-stage loss curves can mislead significantly. 4. Matrix-based optimizers consistently outperform scalar-based optimizers for small models. 5. Optimal choice of optimizer shifts depends on data-to-model ratios.

180

22,840

Quanquan Gu · Oct 14, 2024 · 1:33 AM UTC

Quanquan Gu

@QuanquanGu

14 Oct 2024

🚨We introduce Accelerated Preference Optimization (APO) for language model alingment! 💡Key takeaway: DPO and other preference optimization algorithms (e.g., IPO & SPPO) are just fancy proximal point methods in disguise! This opens the door for using Nesterov’s momentum to turbocharge them!🚀🚀🚀

Jiafan He

@JiafanHe

14 Oct 2024

(1/n)🚀 Introducing Accelerated Preference Optimization (APO) for Reinforcement Learning from Human Feedback (RLHF)! 🎯 RLHF plays a central role in aligning large language models (LLMs) with human preferences. Direct Preference Optimization (DPO), one of the most popular approaches, simplifies this process by avoiding explicit reward estimation. But what if we could make it even faster and more efficient? The answer: momentum! ⚡ 💡 APO Framework: We introduce Accelerated Preference Optimization (APO), a general framework that unifies several preference optimization methods and incorporates Nesterov's momentum to accelerate the convergence process. By framing preference optimization as a proximal point method, we demonstrate APO's ability to speed up the RLHF process. 📊 Theoretical Insights: APO achieves a faster convergence rate compared to standard methods like iterative DPO and Self-Play Preference Optimization (SPPO), which a significant improvement in the efficiency of training LLMs. 🚀 Empirical Results: On the AlpacaEval 2.0 benchmark, APO outperforms DPO, iterative DPO, and other strong baselines in aligning LLMs with human preferences. 🔗 Paper: arxiv.org/pdf/2410.02197

189

31,878

Quanquan Gu · Dec 6, 2023 · 9:17 PM UTC

Quanquan Gu

@QuanquanGu

6 Dec 2023

Replying to @noot_ippi

Open model weights.

174

24,508

Quanquan Gu · Aug 23, 2025 · 8:55 PM UTC

Quanquan Gu

@QuanquanGu

23 Aug 2025

So many multipliers! Great to see that Grok2 was trained using μP. huggingface.co/xai-org/grok-…

177

38,114

Quanquan Gu · Nov 3, 2024 · 7:19 PM UTC

Quanquan Gu

@QuanquanGu

3 Nov 2024

🎬Trailer for the ultimate LLM optimizer: Faster and more token-efficient. Coming soon!

173

32,395

Quanquan Gu · Sep 8, 2020 · 6:48 PM UTC

Quanquan Gu

@QuanquanGu

8 Sep 2020

Excited to announce the @NeurIPSConf OPT2020 workshop on optimization for machine learning. Please submit your papers to our workshop to explore the intimate relationship between OPTIMIZATION and MACHINE LEARNING. Deadline: Oct 8, 2020. Website: opt-ml.org/index.html (1/2)

163

Quanquan Gu · Nov 6, 2023 · 9:53 PM UTC

Quanquan Gu

@QuanquanGu

6 Nov 2023

Very excited to be at the helm of the AI for Science initiative at ByteDance Research. Our unwavering commitment to reshaping the scientific discovery landscape using AI is a journey to watch. Today, we release CryoStar, a state-of-the-art open-source tool for Cryo-EM heterogeneous reconstruction. It's the epitome of innovation in merging AI with the world of structural biology. Project: bytedance.github.io/cryostar… Code: github.com/bytedance/cryosta… Paper: biorxiv.org/content/10.1101/… Join us on this remarkable journey of exploration, from AI breakthroughs to groundbreaking discoveries in science. Stay connected, as more models and open-sourced tools are on the horizon!

Yilai Li @li_yilai

6 Nov 2023

1/ Very excited to present cryoSTAR, a novel approach for continuous heterogeneity in SPA cryo-EM. In brief, cryoSTAR leverages the prior knowledge from a user-given atomic model to better find dynamics in the final reconstruction. bytedance.github.io/cryostar… biorxiv.org/content/10.1101/…

153

41,255

Quanquan Gu · Oct 29, 2025 · 10:03 PM UTC

Quanquan Gu

@QuanquanGu

29 Oct 2025

No joke. Most people haven’t yet realized how powerful machine learning theory actually is. I’m speaking from the perspective of someone directly building AGI: it stabilizes both pretraining and RL, and it provides the blueprint for scaling all the way to AGI.

157

65,676

Quanquan Gu · Nov 6, 2024 · 5:57 AM UTC

Quanquan Gu

@QuanquanGu

6 Nov 2024

🎬Trailer 2 for the ultimate LLM optimizer: Faster, more token-efficient, and highly scalable! The final countdown begins!

151

25,478

Quanquan Gu · Jul 13, 2025 · 6:40 PM UTC

Quanquan Gu

@QuanquanGu

13 Jul 2025

从理论转到大模型，一路走来不讨喜。有人不适应你的变化，有人不希望你真的做成。 Losers and haters make noise. Builders build. Feel the AGI!

145

11,492

Quanquan Gu · Oct 29, 2025 · 9:51 PM UTC

Quanquan Gu

@QuanquanGu

29 Oct 2025

Machine learning theory.

Wenting Zhao

@wzhao_nlp

29 Oct 2025

The question I got asked most frequently during COLM this year was what research questions can be studied in academia that will also be relevant to frontier labs. So I’m making a talk for this. What topics / areas should I cover? RL/eval/pretraining,?

139

30,164

Quanquan Gu · Oct 27, 2025 · 9:07 PM UTC

Quanquan Gu

@QuanquanGu

27 Oct 2025

Here is another compelling case highlighting why KL-regularized RL is indispensable.

Thinking Machines

@thinkymachines

27 Oct 2025

Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other approaches for a fraction of the cost. thinkingmachines.ai/blog/on-…

134

23,399

Quanquan Gu · May 29, 2021 · 1:33 AM UTC

Quanquan Gu

@QuanquanGu

29 May 2021

Congrats to Dr. Jinghui Chen @netfox001 on his successful Ph.D. defense today! Jinghui has done excellent work in optimization for deep learning and robust machine learning. He will join @penn_state as an assistant professor in the Fall of 2021. Thanks to the amazing committee!

137

Quanquan Gu · Dec 15, 2024 · 7:33 AM UTC

Quanquan Gu

@QuanquanGu

15 Dec 2024

I fully respect everyone’s right to freedom of speech. If the invited speaker has prior experiences that have led to biases against Chinese students or scholars, I am open to engaging in a personal or public dialogue to address these biases. However, delivering such an offensive message during a keynote talk at NeurIPS, a highly regarded research forum for our community, is both unnecessary and inappropriate.

Pedro Domingos

@pmddomingos

15 Dec 2024

If you don’t think conference codes of conduct are bad for free speech, read this.

130

19,457

Quanquan Gu · Jan 12, 2024 · 7:40 PM UTC

Quanquan Gu

@QuanquanGu

12 Jan 2024

Excited to contribute to the initiative aimed at benchmarking the trustworthiness of Large Language Models (LLMs) across 6 dimensions: truthfulness, safety, fairness, robustness, privacy, and machine ethics. Explore the leaderboard showcasing the trustworthiness of 16 open-source (e.g., Llama2 and Mistral) and proprietary LLMs (ChatGPT, GPT-4) at: trustllmbenchmark.github.io/…

@_akhaliq

12 Jan 2024

TrustLLM: Trustworthiness in Large Language Models paper page: huggingface.co/papers/2401.0… Large language models (LLMs), exemplified by ChatGPT, have gained considerable attention for their excellent natural language processing capabilities. Nonetheless, these LLMs present many challenges, particularly in the realm of trustworthiness. Therefore, ensuring the trustworthiness of LLMs emerges as an important topic. This paper introduces TrustLLM, a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. Our findings firstly show that in general trustworthiness and utility (i.e., functional effectiveness) are positively related. Secondly, our observations reveal that proprietary LLMs generally outperform most open-source counterparts in terms of trustworthiness, raising concerns about the potential risks of widely accessible open-source LLMs. However, a few open-source LLMs come very close to proprietary ones. Thirdly, it is important to note that some LLMs may be overly calibrated towards exhibiting trustworthiness, to the extent that they compromise their utility by mistakenly treating benign prompts as harmful and consequently not responding. Finally, we emphasize the importance of ensuring transparency not only in the models themselves but also in the technologies that underpin trustworthiness. Knowing the specific trustworthy technologies that have been employed is crucial for analyzing their effectiveness.

128

25,699

Quanquan Gu · May 17, 2025 · 7:09 PM UTC

Quanquan Gu

@QuanquanGu

17 May 2025

Modern scaling law research often feels like this: 1. Train a few models 2. Plot metrics on a log-log scale 3. Fit a line 4. Call it a new law Maybe it’s time to ask: are we uncovering principles, or just describing artifacts?🤔

131

12,248

Quanquan Gu · Aug 23, 2025 · 7:08 PM UTC

Quanquan Gu

@QuanquanGu

23 Aug 2025

Math is key to LLMs and AI, and mathematicians can do amazing work. But watching them meltdown over AI… just unbelievable.

Edward Frenkel

@edfrenkel

22 Aug 2025

This is an unwise statement that can only make people confused about what LLMs can or cannot do. Let me tell you something: Math is NOT about solving this kind of ad hoc optimization problems. Yeah, by scraping available data and then clustering it, LLMs can sometimes solve some very minor math problems. It's an achievement, and I applaud you for that. But let's be honest: this is NOT the REAL Math. Not by 10,000 miles. REAL Math is about concepts and ideas - things like "schemes" introduced by the great Alexander Grothendieck, who revolutionized algebraic geometry; the Atiyah-Singer Index Theorem; or the Langlands Program, tying together Number Theory, Analysis, Geometry, and Quantum Physics. That's the REAL Math. Can LLMs do that? Of course not. So, please, STOP confusing people - especially, given the atrocious state of our math education. LLMs give us great tools, which I appreciate very much. Useful stuff! Go ahead and use them AS TOOLS (just as we use calculators to crunch numbers or cameras to render portraits and landscapes), an enhancement of human abilities, and STOP pretending that LLMs are somehow capable of replicating everything that human beings can do. In this one area, mathematics, LLMs are no match to human mathematicians. Period. Not to mention many other areas. Calling on my friend @ericweinstein and @GaryMarcus, who has been one of the few sane expert voices on these matters lately. 🙏 h/t @hellheff

130

20,720

Quanquan Gu · Feb 13, 2024 · 12:47 AM UTC

Quanquan Gu

@QuanquanGu

13 Feb 2024

SPIN (github.com/uclaml/SPIN) is trending on GitHub! We firmly believe open source efforts will be the main driving force for advancing research in LLMs. github.com/trending

Quanquan Gu

@QuanquanGu

9 Feb 2024

127

16,480

Quanquan Gu · Feb 15, 2024 · 11:57 PM UTC

Quanquan Gu

@QuanquanGu

15 Feb 2024

🚨SPIN vs. DPO: SPIN uses the SFT dataset, while DPO demands human preferences with additional labeling overhead. Can we make DPO more label efficient? Introducing Active DPO (ADPO) - streamline DPO with a 50% reduction in labeled human preference data! 🚀 arxiv.org/abs/2402.09401 Joint work w/ @Kaixuan_Ji_19 @JiafanHe

Kaixuan Ji @Kaixuan_Ji_19

15 Feb 2024

🔥Excited to share our recent research on query-efficient RLHF! Introducing Active Direct Preference Optimization (ADPO), a new approach that improves DPO performance on Open-LLM-Benchmark with just half the queries. Discover how ADPO eliminates the significant demand for querying preference labels.🚀[1/4] Paper: arxiv.org/pdf/2402.09401.pdf A joint work with @JiafanHe , and @QuanquanGu 👏

128

17,666

Quanquan Gu · May 21, 2025 · 1:21 AM UTC

Quanquan Gu

@QuanquanGu

21 May 2025

Parallelized reasoning is the path to exceeding human-level intelligence.

Google DeepMind

@GoogleDeepMind

20 May 2025

Deep Think in 2.5 Pro has landed. 🤯 It’s a new enhanced reasoning mode using our research in parallel thinking techniques - meaning it explores multiple hypotheses before responding. This enables it to handle incredibly complex math and coding problems more effectively.

126

16,411

Quanquan Gu · Dec 11, 2020 · 4:58 AM UTC

Quanquan Gu

@QuanquanGu

11 Dec 2020

1/4 Want to learn more about optimization for ML after @NeurIPSConf main conference? Welcome to our OPT2020 workshop on optimization for machine learning on Friday (in 6 hours!). Friday, December 11th, 2020 EST time 06:00 AM -- 19:00 PM Schedule: neurips.cc/virtual/2020/publ…

115

Quanquan Gu · Jun 22, 2020 · 7:16 PM UTC

Quanquan Gu

@QuanquanGu

22 Jun 2020

Is upper-confidence bound (UCB) strategy provably efficient in neural contextual bandits? The answer is affirmative. Check out our paper (arxiv.org/abs/1911.04462) to be presented in #ICML2020 @icmlconf, where we proposed a NeuralUCB algorithm that attains an O(\sqrt{T}) regret.

116

Quanquan Gu · Jun 16, 2021 · 7:27 PM UTC

Quanquan Gu

@QuanquanGu

16 Jun 2021

Very proud of my student Dr. Pan Xu for his successful Ph.D. journey and amazing research work! It's a privilege to be his advisor. Pan's work spans nonconvex optimization, MCMC, and policy gradient methods, all pushing the frontier of sample-efficient ML. Congratulations!

Pan Xu @iampanxu

16 Jun 2021

Personal news: I have graduated from UCLA and officially become Dr. Xu. I will join Duke University @DukeU as a tenure track assistant professor in the Department of Biostatistics and Bioinformatics in August 2022. In the meantime, I will be a CAST Postdoctoral Fellow at Caltech.

117

Quanquan Gu · Nov 5, 2025 · 6:44 AM UTC

Quanquan Gu

@QuanquanGu

5 Nov 2025

🔥 Learning rate transfer under μP is now proven!

Soufiane Hayou

@hayou_soufiane

4 Nov 2025

🎯 Just released a new preprint that proves LR transfer under μP. -> The Problem: When training large neural networks, one of the trickiest questions is: what learning rate should I use? [1/n]🧵 Link: arxiv.org/abs/2511.01734

118

18,312

Quanquan Gu · Mar 9, 2024 · 11:36 PM UTC

Quanquan Gu

@QuanquanGu

9 Mar 2024

Experiment is up and running, awaiting a major outcome.

108

34,671

Quanquan Gu · May 16, 2025 · 11:00 PM UTC

Quanquan Gu

@QuanquanGu

16 May 2025

GRPO has served its purpose. It's time to move on.

115

12,520

Quanquan Gu · Jan 9, 2021 · 12:34 AM UTC

Quanquan Gu

@QuanquanGu

9 Jan 2021

1/3 Can RL with linear function approximation achieve minimax optimality? For linear mixture/kernel MDPs, yes! Our recent work arxiv.org/pdf/2012.08507.pdf provides nearly matching upper and lower bounds for both episodic & discounted settings. Joint work w/ @DongruoZ @CsabaSzepesvari

109

Quanquan Gu · Jan 14, 2025 · 7:18 PM UTC

Quanquan Gu

@QuanquanGu

14 Jan 2025

We're the architects now. 🏗️📐.

Yifan Zhang

@yifanzhang_

14 Jan 2025

107

19,359

Quanquan Gu · Sep 14, 2024 · 10:35 PM UTC

Quanquan Gu

@QuanquanGu

14 Sep 2024

I believe this was proved in a more rigorous way some time ago in this paper: arxiv.org/pdf/2311.14648

Rohan Paul

@rohanpaul_ai

14 Sep 2024

Paper - "LLMs Will Always Hallucinate, and We Need to Live With This" Podcast format generated with Google's new illuminate tool (illuminate is trained to produce short podcast from research papers)

103

18,136

Quanquan Gu · Apr 22, 2024 · 4:37 PM UTC

Quanquan Gu

@QuanquanGu

22 Apr 2024

In our latest research, we've leveraged Rosetta energy as a reward in residual-level DPO tailored for antibody design. Check out our paper at: arxiv.org/html/2403.16576v1

Jason Yim @json_yim

22 Apr 2024

My prediction for the next bio/ML trend at NeurIPS. DPO and RLHF for protein design. Protein language models in particular. 😉

101

23,330

Quanquan Gu · Feb 24, 2025 · 2:23 AM UTC

Quanquan Gu

@QuanquanGu

24 Feb 2025

Very cool! Who’d like to use FlashTPA? Drop a like if you want us to release it! MHA-->GQA-->MLA--->TPA🚀🚀 Paper: arxiv.org/pdf/2501.06425

DeepSeek

@deepseek_ai

24 Feb 2025

🚀 Day 1 of #OpenSourceWeek: FlashMLA Honored to share FlashMLA - our efficient MLA decoding kernel for Hopper GPUs, optimized for variable-length sequences and now in production. ✅ BF16 support ✅ Paged KV cache (block size 64) ⚡ 3000 GB/s memory-bound & 580 TFLOPS compute-bound on H800 🔗 Explore on GitHub: github.com/deepseek-ai/Flash…

103

18,021

Quanquan Gu · Nov 16, 2025 · 1:28 AM UTC

Quanquan Gu

@QuanquanGu

16 Nov 2025

Imagine a Cursor for theorem proving: branches for ideas, PRs for proofs, reviews for rigor. That’s the future of theoretical research.

Sebastien Bubeck

@SebastienBubeck

15 Nov 2025

Doing research in a ChatGPT Group Chat with friends*, alternating between proofs and simulations, so so much fun ... *I refrain from using the term "vibes research", because obviously every step eventually needs to be very carefully checked!

109

31,599

Quanquan Gu · Dec 24, 2024 · 10:17 PM UTC

Quanquan Gu

@QuanquanGu

24 Dec 2024

Fair point, but Terence Tao is a once-in-a-generation genius. The rest of us are just trying to make ML theory great again, maybe with a little salary boost!🤣

the Rich

@Duderichy

24 Dec 2024

Terence Tao makes $700k a year, what’s your excuse?

24,200

Quanquan Gu · Jan 9, 2024 · 12:34 AM UTC

Quanquan Gu

@QuanquanGu

9 Jan 2024

Is linear attention sufficient to reproduce the success of transformers? We have a paper (arxiv.org/pdf/2310.08391.pdf) on analyzing in-context learning with linear attention, and I know there is another paper (arxiv.org/pdf/2310.01082.pdf) showing that linear attention can replicate key aspects of soft-max attention's training dynamics. Despite this, I remain skeptical about whether linear attention alone is enough to replicate the success of transformers.

Jacob Buckman @jacobmbuckman

8 Jan 2024

Anyone who has trained a Transformer has viscerally felt its O(T^2) cost. It is not tractable to train Transformers end-to-end on long contexts. Here's a writeup of the research direction I believe is most likely to solve this: linear transformers. manifestai.com/blogposts/fas… 1/7

100

32,740

Quanquan Gu · Jun 26, 2025 · 5:45 PM UTC

Quanquan Gu

@QuanquanGu

26 Jun 2025

The hottest thing a woman can be is training models at 3am.

Lauren Self

@laurenlself

23 Jun 2025

The hottest thing a man can be is exceptionally good at math

12,324

Quanquan Gu · Jul 13, 2025 · 12:58 PM UTC

Quanquan Gu

@QuanquanGu

13 Jul 2025

Can’t make it to #ICML2025 this year. People ask why I’m so obsessed with pretraining and scaling. Simple: the AGI era is here. I refuse to be irrelevant.

13,083

Quanquan Gu · Feb 22, 2024 · 8:58 PM UTC

Quanquan Gu

@QuanquanGu

22 Feb 2024

Using a diffusion transformer (DiT) instead of a U-Net is an evident trend. However, what is the advantage of flow matching over diffusion? Is it merely its sampling speed advantage, or are there more distinguished features to consider?

Stability AI

@StabilityAI

22 Feb 2024

Announcing Stable Diffusion 3, our most capable text-to-image model, utilizing a diffusion transformer architecture for greatly improved performance in multi-subject prompts, image quality, and spelling abilities. Today, we are opening the waitlist for early preview. This phase is crucial for gathering insights to improve its performance and safety ahead of open release. You can sign up to join the waitlist and learn more here: bit.ly/3OR2qQF #stablediffusion3 Prompt: Epic anime artwork of a wizard atop a mountain at night casting a cosmic spell into the dark sky that says "Stable Diffusion 3" made out of colorful energy

21,095

Quanquan Gu · Feb 21, 2024 · 8:22 PM UTC

Quanquan Gu

@QuanquanGu

21 Feb 2024

It's intriguing to observe the use of REINFORCE in RLHF. REINFORCE is a classical algorithm utilized to estimate policy gradients for episodic Markov Decision Processes (MDPs). Another notable method is GPOMDP. While both are effective estimators, it's worth noting that neither of them is new. proceedings.mlr.press/v115/x…

Google DeepMind

@GoogleDeepMind

21 Feb 2024

Introducing Gemma: a family of lightweight, state-of-the-art open models for developers and researchers to build with AI. 🌐 We’re also releasing tools to support innovation and collaboration - as well as to guide responsible use. Get started now. → dpmd.ai/3UJu1Y1

ALT The word “Gemma” and a spark icon with blueprint styling appears in a blue gradient against a black background.

36,627

Quanquan Gu · Jun 25, 2020 · 6:28 PM UTC

Quanquan Gu

@QuanquanGu

25 Jun 2020

Check out our new paper arxiv.org/abs/2003.01803 We propose MOTS, the first Thompson sampling type algorithm that achieves the minimax optimality for multi-armed bandits! Our result removes the logarithmic factor in existing regret bounds. @ashipra @LihongLi20

Quanquan Gu · Feb 21, 2025 · 4:08 AM UTC

Quanquan Gu

@QuanquanGu

21 Feb 2025

Make AGI Open Again!

DeepSeek

@deepseek_ai

21 Feb 2025

🚀 Day 0: Warming up for #OpenSourceWeek! We're a tiny team @deepseek_ai exploring AGI. Starting next week, we'll be open-sourcing 5 repos, sharing our small but sincere progress with full transparency. These humble building blocks in our online service have been documented, deployed and battle-tested in production. As part of the open-source community, we believe that every line shared becomes collective momentum that accelerates the journey. Daily unlocks are coming soon. No ivory towers - just pure garage-energy and community-driven innovation.

7,311

Quanquan Gu · Sep 28, 2025 · 3:15 AM UTC

Quanquan Gu

@QuanquanGu

28 Sep 2025

Replying to @samsja19

9,851

Quanquan Gu · Aug 1, 2022 · 5:17 PM UTC

Quanquan Gu

@QuanquanGu

1 Aug 2022

(1/4) Want to learn the challenges and recent progress in representation learning? Welcome to our workshop on representation learning theory this week sponsored and hosted by @TTIC_Connect. Time: August 4 & 5th, 2022, 9:00 AM - 5:30 PM CT Registration: shorturl.at/lnpQY

Quanquan Gu · Mar 1, 2024 · 7:08 PM UTC

Quanquan Gu

@QuanquanGu

1 Mar 2024

🚀 Introducing Diffusion Protein Language Models (DPLM), a new suite of discrete diffusion-based protein language models! 🧬 With versatility in both generative and predictive tasks, DPLM is poised to set the new SOTA in protein language models, excelling across a spectrum of benchmark tasks. Congrats to the team!

Age-restricted adult content. This content might not be appropriate for people under 18 years old. To view this media, you’ll need to log in to X.

11,765

Quanquan Gu · May 6, 2024 · 6:36 PM UTC

Quanquan Gu

@QuanquanGu

6 May 2024

🚀We have released the Mistral 7B models fine-tuned by SPPO on @huggingface: huggingface.co/collections/U…

SPPO - a UCLA-AGI Collection

Self-Play Preference Optimization

huggingface.co

Quanquan Gu

@QuanquanGu

2 May 2024

19,770

Quanquan Gu · Mar 16, 2024 · 5:46 PM UTC

Quanquan Gu

@QuanquanGu

16 Mar 2024

I've noticed a trend in the LLM community. Whenever something XYZ is perceived as cool, there's almost immediately an OpenXYZ. Is this truly in the spirit of open research, or is it just a form of mimicry or copying?

27,565

Quanquan Gu · Nov 11, 2025 · 4:24 AM UTC

Quanquan Gu

@QuanquanGu

11 Nov 2025

This is fantastic. Every university should consider doing the same. Teaching undergraduate students how modern AI systems actually work under the hood and making the fundamentals accessible early on is exactly what the field needs.

Zico Kolter

@zicokolter

10 Nov 2025

I'm teaching a new "Intro to Modern AI" course at CMU this Spring: modernaicourse.org. It's an early-undergrad course on how to build a chatbot from scratch (well, from PyTorch). The course name has bothered some people – "AI" usually means something much broader in academic contexts – but I think the time has come where the first thing that many students interested in AI should see is how the AI they are familiar with actually works (because it's really simple!) The more people who understand it the better. I'll be trying to put as much material as I can that we develop online (assignments + autograding, hopefully lecture videos), though as a first-time course there are also likely to be some bumps along the way. Hopefully it becomes a good resource over time, though. Feedback welcome.

18,117

Quanquan Gu · Jun 29, 2025 · 12:39 PM UTC

Quanquan Gu

@QuanquanGu

29 Jun 2025

Microsoft: “Textbooks are all you need.” Tencent: “WizardLM team is all we need.”

Jeremy Howard

@jeremyphoward

29 Jun 2025

Imagine where Microsoft would be now if they hadn't destroyed the amazing WizardLM team, causing them to leave and go to Tencent… …where they're now releasing models far beyond anything Microsoft has created.

13,109

Quanquan Gu · Jun 29, 2025 · 6:50 AM UTC

Quanquan Gu

@QuanquanGu

29 Jun 2025

Replying to @greyfedora0

I’m not blindly defending PhDs in AI, a lot of PhD work contributes little to real progress. But a strong PhD, done right, builds the foundation that many great AI researchers stand on, whether they hold the degree or not.

8,211

Quanquan Gu · Sep 5, 2024 · 9:40 PM UTC

Quanquan Gu

@QuanquanGu

5 Sep 2024

Great advice! I’d like to add one more thought: Mathematical thinking is key to conducting exceptional research.

Omar Khattab

@lateinteraction

4 Sep 2024

🔗 Thoughts on Research Impact in AI. Grad students often ask: how do I do research that makes a difference in the current, crowded AI space? This is a blogpost that summarizes my perspective in six guidelines for making research impact via open-source artifacts. Link below.

17,728

Quanquan Gu · Apr 28, 2025 · 10:25 PM UTC

Quanquan Gu

@QuanquanGu

28 Apr 2025

MoE is just cool. Fine-grained scaling is the way forward.

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)

@teortaxesTex

28 Apr 2025

Qwen3-30B-A3B is de facto on par with Qwen3-32B dense and the greatest vindication of finegrained MoEs the world has seen in the open.

9,265

Quanquan Gu · Sep 19, 2024 · 6:31 PM UTC

Quanquan Gu

@QuanquanGu

19 Sep 2024

Claiming that only one side cares about facts undermines the diversity of thought and approach that is fundamental to the scientific community.

Yann LeCun

@ylecun

19 Sep 2024

People studying misinformation lean left for two reasons: 1. scientists lean left, regardless of specialty, because they care about facts. 2. misinformation today primarily comes from the Right ("they're eating the dawwwgs!") which makes it worth studying and fighting against for people leaning left.

12,230

Quanquan Gu · Jun 21, 2025 · 5:30 AM UTC

Quanquan Gu

@QuanquanGu

21 Jun 2025

他们问我做大模型哪块。 “训练？”不是。 “推理？”不是。 “多模态？”不是。 “Finetune？”也不是。他们皱眉：“那你到底做啥？” 我眼圈发红、声音发颤： “我……我做 Agent 开发……” 全场寂静，传来大家意味深长的窃笑。

NIK

@ns123abc

20 Jun 2025

🚨BREAKING; APPLE HELD INTERNAL TALKS ABOUT BUYING PERPLEXITY

15,910