Professor @UCLA, Ex-ByteDance Seed | Recent work: Seed2.0, SeedFold, SeedProteo | Opinions are my own

Los Angeles, CA
You don’t need a PhD to be a great AI researcher, as long as you’re standing on the shoulders of 100 who have one.
You don’t need a PhD to be a great AI researcher. Even @OpenAI’s Chief Research Officer doesn’t have a PhD.
53
135
2,335
288,968
Uncertain about GPT-5, but a super-strong model (more powerful than Gemini) is expected to arrive anytime now.
The odds of OpenAI shipping GPT-5 just went up
39
87
921
424,773
Whether right or wrong, the true value of X is just increasing. I’ve noticed that many in AI/ML are joining X, making it the best platform to stay updated on the latest research and fostering discussions.
69
19
801
136,258
We've just open-sourced the code and data for Self-play Fine-Tuning (SPIN)! Time to SPIN every model out there! 🚀🚀🚀 Code: github.com/uclaml/SPIN Data: huggingface.co/collections/U… Models: huggingface.co/collections/U… Project Page: uclaml.github.io/SPIN/ Many thanks to @Yihe__Deng, @HuizhuoY, and @Kaixuan_Ji_19 for their tremendous efforts in preparing these.
Give someone a fish, and you feed them for a day; teach someone to fish, and you feed them for a lifetime. Elevating from Weak to Strong with Self-Play Fine-Tuning (SPIN) for All LLMs. Empower, Evolve, SPIN! arxiv.org/pdf/2401.01335.pdf
20
171
779
168,951
This suggests that transformers are capable of reasoning and planning.
DeepMind just trained a 270M transformer that can play like a grandmaster without searching through moves (MCTS), It was trained on Stockfish’s assessment of ~10M games so it didn’t really learn the game from scratch. Impressive that transformers generalize to a “logic” task
86
60
716
228,550
Are there still any AI experts who think we won’t achieve AGI soon?
258
30
728
210,224
Today’s the day to launch! Introducing MARS (Make vAriance Reduction Shine): the ultimate LLM optimizer. Let’s unite, innovate, and take our shot at MARS! 🚀🚀🚀 Paper: arxiv.org/pdf/2411.10438 Code: github.com/AGI-Arena/MARS
16
37
393
117,136
Spent several hours with Grok-3, and it’s absolutely incredible. @elonmusk and xAI are pushing the boundaries once again. The future of AGI is here! 🚀🔥
17
23
521
49,855
This explains why LLaMA 4 failed. The tokens per parameter (TPP) is way off. You can’t defy scaling laws and expect miracles. === Llama 4 Maverick was 400B(17B active) and >30T tokens, TPP = 1764 Llama 4 Behemoth was 2T(288B active) and > 30T tokens, TPP = 104 DeepSeek v3 is 671B(37B active) and 14.8T tokens, TPP = 400 Kimi K2 is 1T(32B active) and 15.5T tokens, TPP = 484
Llama 4 Maverick was 400B(70B active) and >30T tokens = 429 tokens / active param Llama 4 Behemoth was 2T(288B active) and > 30T tokens = 104 tokens / active param DeepSeek v3 is 671B(37B active) and 14.8T tokens = 400 tokens / active param Kimi K2 is 1T(32B active) and 15.5T tokens = 484 tokens / active param
28
61
526
93,663
I hope I can feel the AGI tonight. @elonmusk @xai
trying GPT-4.5 has been much more of a "feel the AGI" moment among high-taste testers than i expected!
13
9
486
53,642
Stunned to see the AI guru lose his composure.
X is a $44 billion propaganda machine. Yet it attempts to disguise itself as a defender of unfettered free speech, a source of factual information, and a substitute for professional journalism.
81
11
451
102,745
Another triumph for Self-Play! Self-Play Preference Optimization (SPPO) has surpassed (iterative) DPO, IPO, Self-Rewarding LMs, and others on AlpacaEval, MT-Bench, and the Open LLM Leaderboard. Remarkably, Mistral-7B-instruct-v0.2 fine-tuned by SPPO achieves superior performance to GPT-4 0613 without relying on any GPT-4 responses. Explore the roadmap of LLM fine-tuning techniques: Supervised Fine-tuning: SFT --> SPIN Preference Fine-tuning: PPO --> DPO --> SPPO Paper: arxiv.org/pdf/2405.00675
14
78
452
165,280
When Google's "Attention is All You Need" paper came out, I was busy proving bounds.
14
10
405
65,033
Not surprising. This phenomenon is known as 'benign overfitting' in machine learning. It suggests that even with a minimal loss, there's still room for improving the 'margin' or some 'implicit bias' that boosts generalization ability. There are a bunch of works studying this phenomenon. Here is an incomplete list of references including our own works: [1] arxiv.org/abs/1906.11300 [2] jmlr.org/papers/volume24/22-… [3] jmlr.org/papers/volume24/21-… [4] proceedings.mlr.press/v178/f… [5] openreview.net/pdf?id=pF8btd… [6] proceedings.mlr.press/v202/k… [7] openreview.net/pdf?id=JpbLyE… [8] openreview.net/pdf?id=G560qr…
Reminder instruct-gpt (OpenAI) was trained on 16 epochs (even after overfitting on just 1). SFT for LLMs is wild
3
63
414
98,574
Give someone a fish, and you feed them for a day; teach someone to fish, and you feed them for a lifetime. Elevating from Weak to Strong with Self-Play Fine-Tuning (SPIN) for All LLMs. Empower, Evolve, SPIN! arxiv.org/pdf/2401.01335.pdf
15
63
400
346,021
Replying to @ylecun @elonmusk
I’m afraid that it will end up like San Francisco, grappling with high crime rates and a lot of other significant problems.
16
5
387
35,011
I agree with Yann's view that current auto-regressive LLMs may not constitute the ultimate solution for achieving AGI. However, it's important to acknowledge that: Can LLMs do reasoning? Yes. LLM can implement first-order logic, see e.g., openreview.net/pdf?id=qFVVBz… by Abulhair Saparov and He He. For example, if X implies Y, and Y implies X, then X and Y are equivalent. This fundamental principle in logic is known as the biconditional or 'if and only if' (iff) statement. It asserts that if the truth of X guarantees the truth of Y, and vice versa, then X and Y are logically equivalent. LLM has the capacity to implement this principle along with other reasoning principles. Can LLMs do planning? Yes. LLM can effectively implement Markov Decision Processes (MDPs), known as decision transformers, see e.g., openreview.net/pdf?id=a7APmM… and arxiv.org/pdf/2202.05607.pdf by @aravindr93 @adityagrover_ @pabbeel @yayitsamyzhang @qqyuzu et al. MDPs are a fundamental framework for modeling decision-making in stochastic environments. By implementing MDPs, LLMs demonstrate the capability in understanding and executing complex planning tasks.
Popping this up: a response to a question about what I consider reasoning & planning, why current Auto-Regressive LLMs can't do it, why that would require AI systems with world models, and why we still have a lot of progress to do towards AI systems that can learn and reason.
15
45
393
223,737
Agreed. GRPO is technically wrong.
12
18
371
139,675
I'm afraid that’s not true. Transformers are far from being the "optimal" architecture.
This is yet another reminder that if you are like freshman into ML, before you question: "hmm... maybe this part of transformer is useless/can be replaced?" just read these paper. arxiv.org/abs/2109.08668v2 arxiv.org/abs/2001.08361 arxiv.org/abs/2002.05202 arxiv.org/abs/2404.05405 and realize there is practically no 'fundamental' aspect of current transformer that can give you dramatic benefit. You can speed up training time but arch probably aint it.
27
9
355
100,453
Wait… this model didn’t even use Lean? That’s insane. Big congrats to the @OpenAI team. That’s incredible work!
We achieved gold medal level performance on this year's IMO! Our model thinks and writes proofs in clear, plain‑English - no formal code required. Unlike the narrower systems used in past competitions, our model is built to reason broadly, far beyond contest problems.
4
11
356
39,931
Ranked No. 1 now!
DeepSeek has now moved up to the second spot from third since this morning among all of the top iPhone apps!
8
12
324
61,263
He isn’t entirely wrong. LLMs do generate a lot of incorrect answers as they produce more tokens. The key is to leverage strategies that reject the wrong ones and find the correct ones. Essentially, you’re trading compute for accuracy: the more tokens you generate, the better chances of finding the right answer, as long as you have mechanisms to filter effectively.
what yann lecun said: "the more tokens an llm generates, the more likely it is to go off the rails and get everything wrong" what actually happened: "we get extremely high accuracy on arc-agi by generating billions of tokens, the more tokens we throw at it the better it gets"
15
32
340
73,238
MHA-->GQA-->MLA--->TPA🚀🚀🚀 Introducing Tensor Product Attention (TPA). To reduce KV cache size, various Multi-Head Attention (MHA) variants have been developed, including Multi-Query Attention (MQA), Group Query Attention (GQA), and Multi-Head Latent Attention (MLA). GQA has been adopted in LLaMA 2 and 3, while MLA is utilized in DeepSeek V2 and V3. Although MLA achieves greater KV cache reduction than GQA, it is incompatible with Rotary Positional Encoding (RoPE) and requires ad hoc modifications for integration. Tensor Product Attention (TPA) addresses these limitations by offering full compatibility with RoPE while further improving KV cache efficiency through context-aware tensor decomposition. With all these distinct advantages, TPA is the obvious choice! Paper: arxiv.org/pdf/2501.06425 Code: github.com/tensorgi/T6 Website: tensorgi.github.io/T6/
1/ Introducing “Tensor Product Attention Is All You Need” (TPA) and Tensor ProducT ATTenTion Transformer (T6)! 🚀 Ever wondered if there’s a more memory-efficient way to handle long contexts in LLMs? Homepage: tensorgi.github.io/T6
17
53
315
54,879
🚀To celebrate the veto of SB 1047, we will drop the best-ever optimizer for deep learning and LLMs once this post hits 1k likes!
SB 1047 has been vetoed in California by @GavinNewsom—Thank you! This is a huge relief and a win for the future of tech innovation in the state.
10
15
305
33,616
With the stellar performance of DeepSeek V3 and R1, along with their open-sourcing, it’s time to adopt these models as Judges instead of OpenAI API in benchmarks like AlpacaEval, MTBench, and beyond. This move will democratize open source AI research, accelerate progress, and significantly reduce costs for academic labs, from thousands to hundreds of thousands of dollars per year. The saved cost can then be invested in training more next generation of researchers and practitioners in AI.
15
30
298
32,849
📢 Excited to share our latest research on improving human-AI communication! 🤖💬 We introduce 'Rephrase and Respond' (RaR), a simple yet effective method that enhances LLMs’ understanding of human questions. Check out how RaR improves #GPT4 performance by resolving ambiguities & can be integrated with Chain-of-Thought (#CoT) for more robust AI responses. 🌟This work is led by @Yihe__Deng and an exceptional team of students @WeitongZhang @_zxchen_ Paper: arxiv.org/pdf/2311.04205.pdf Project: uclaml.github.io/Rephrase-an… Code: github.com/uclaml/Rephrase-a… HuggingFace: huggingface.co/papers/2311.0… Key Insights: 👉 Human input is key to LLM response quality. Crafting clear, detailed questions is crucial as our different thought frames may lead to AI misunderstandings. 👉 To tackle the disparity between human and LLM thought frames, we introduce RaR, which prompts the LLM to rearticulate the given question, and respond. 👉 Our experiments demonstrate that RaR significantly improves the performance of various GPT models across a wide range of tasks. 👉 We introduce formal mathematical formulations for both CoT and RaR and show that RaR is different and complementary to CoT. Through empirical analysis, we illustrate the importance of question quality—it should be prioritized before enhancing the model’s reasoning capabilities! 🧵1/N
9
85
299
105,809
We've open-sourced the code and models for Self-Play Preference Optimization (SPPO)! 🚀🚀🚀 ⭐ code: github.com/uclaml/SPPO 🤗models: huggingface.co/collections/U…
Another triumph for Self-Play! Self-Play Preference Optimization (SPPO) has surpassed (iterative) DPO, IPO, Self-Rewarding LMs, and others on AlpacaEval, MT-Bench, and the Open LLM Leaderboard. Remarkably, Mistral-7B-instruct-v0.2 fine-tuned by SPPO achieves superior performance to GPT-4 0613 without relying on any GPT-4 responses. Explore the roadmap of LLM fine-tuning techniques: Supervised Fine-tuning: SFT --> SPIN Preference Fine-tuning: PPO --> DPO --> SPPO Paper: arxiv.org/pdf/2405.00675
6
69
309
96,218
To become a better researcher, the key is to consistently read high-quality papers and critically think about them every day.
Becoming a better researcher is easy. Just spend your time like so: 40% reading papers 40% writing code 40% coming up with new ideas 40% studying textbooks 40% talking to other researchers
6
20
300
40,663
Check out our work, Direct Q Optimization (DQO), which is the ‘true RL’ version of RLHF. Let’s make RLHF true RL again! Paper: arxiv.org/abs/2410.09302
In the old days, the term “RL” by default meant the “true RL” used in AlphaZero. Now, the term “RL” by default means the “fake RL” used in RLHF (nothing is negative about RLHF here. RLHF is a great innovation).
6
38
294
91,719
1/n 🚀 Introducing General Preference representation Model (GPM) and General Preference Optimization (GPO) for RLHF! 🎯 Reward modeling plays a central role in RLHF. Most existing reward models are based on the classical Bradley-Terry (BT) reward model. However, the BT model has limitations in handling intransitivity and complex human preferences. 💡 We introduce the GPM model, which lifts the BT model from scalar-valued space to vector-valued space using preference embedding, retaining the simplicity of BT model training while adding greater flexibility! Notably, our GPM achieves a query complexity of O(K) for evaluating preferences among K responses, a significant improvement over the O(K^2) complexity of traditional supervised preference models that rely on pairwise inputs. 💡 Building on GPM, we propose GPO, which takes self-play preference optimization (SPPO) to new heights! Paper: arxiv.org/pdf/2410.02197
1/8 ⭐General Preference Modeling with Preference Representations for Aligning Language Models⭐ arxiv.org/abs/2410.02197 As Huggingface Daily Papers: huggingface.co/papers/2410.0… We just dropped our latest research on General Preference Modeling (GPM)! 🚀
3
53
290
80,131
The RPG is out. Make KL-regularized Policy Gradient Correct Again! No more GRPO or Reinforce++ — their objectives and KL regularization are inherently inconsistent.
1/6 We introduce RPG, a principled framework for deriving and analyzing KL-regularized policy gradient methods, unifying GRPO/k3-estimator and REINFORCE++ under this framework and discovering better RL objectives than GRPO: Paper: arxiv.org/abs/2505.17508 Code: github.com/complex-reasoning… Webpage: complex-reasoning.github.io/… @yifanzhang_, @HuiZhuoY, @QuanquanGu
2
16
206
53,977
I am tremendously excited and humbled to receive the Sloan Research Fellowship. Special thanks to my amazing mentors, collaborators, students, and of course @UCLAengineering @UCLAComSci who made all the work happen! Thank @SloanFoundation for the recognition and support!
Introducing… the winners of this year’s Sloan Research Fellowship! These extraordinary researchers represent some of the most exciting young minds working today—and we are thrilled to support them. Meet the winners here: sloan.org/fellowships/2022-F… #SloanFellow
47
6
288
You’re only half right. Yes, GPT-5 failed twice. No, scaling laws aren’t over. Do them right, and the game goes on.
GPT-5 failed twice. Scaling laws are coming to an end. Open-source AI will have the Mandate of Heaven.
16
6
267
38,002
This is deeply concerning and offensive. Using a specific racial group as an example, even with a disclaimer that most individuals from that group are honest and morally upright, does not make it appropriate or acceptable.
Mitigating racial bias from LLMs is a lot easier than removing it from humans! Can’t believe this happened at the best AI conference @NeurIPSConf We have ethical reviews for authors, but missed it for invited speakers? 😡
8
5
267
25,141
Expected. Get ready for super strong models from an exceptional team—arriving soon!🚄
No, these 70B models are not better than GPT-4
10
12
250
132,631
In RL we trust.
8
15
235
21,269
It’s not that people think calculus or math is useless in AI. They’re just tired of theory folks who never touch code, never scale a model, and still argue they’re solving problems in AI:) If theory becomes detached from practice, the world will treat it like noise and that’s on us.
See below on what Zuckerberg is looking for in star recruits worth $100m pay packages for Meta’s plans in Artificial Intelligence. But weren’t some people saying calculus is no longer useful in the AI age? 🤔
6
22
226
38,453
Nice blog post! Essentially, this shows that μP + LoRA, when done right, makes the optimal learning rate transferable and nearly matches full fine-tuning performance. One subtle but important point worth mentioning is that there is an additional dimension of scaling to consider: the rank of the LoRA (r), along with several other multipliers.
LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA. thinkingmachines.ai/blog/lor…
5
17
229
27,236
Due to visa issues, only one person from our group can attend #icml2024. This is beyond frustrating. 😱
19
3
214
68,426
I believe the message in the paper is straightforward and uncontroversial. However, it seems there might be a misunderstanding in @abacaj 's interpretation. Pre-trained transformers can effectively acquire in-context knowledge for tasks related to their pre-training data and generalize to those tasks, but they cannot generalize to tasks significantly distinct from their pre-training contexts. In fact, the generalization ability of transformer-based in-context learning is discussed in terms of task diversity in arxiv.org/pdf/2306.15063.pdf by @AllanRaventos @SuryaGanguli et al., and task complexity in arxiv.org/pdf/2310.08391.pdf by @uuujingfeng et al.
New paper by Google provides evidence that transformers (GPT, etc) cannot generalize beyond their training data
6
32
214
122,560
AdamW remains the leading optimizer for training LLMs.
What's your preferred optimizer for LLMs?
10
11
201
31,808
Very Impressive work! It appears that Multi-Token Prediction (MTP) has a substantial impact for both pre-training and inference.
🚀 Introducing DeepSeek-V3! Biggest leap forward yet: ⚡ 60 tokens/second (3x faster than V2!) 💪 Enhanced capabilities 🛠 API compatibility intact 🌍 Fully open-source models & papers 🐋 1/n
5
20
207
36,257
We have uploaded the model weights of SPIN for zephyr-7b-sft-full at iteration 0-3 to @huggingface : huggingface.co/UCLA-AGI In our paper, we used the latest version (v.0.4.0) of Eleuther AI Harness for evaluation, resulting in slight difference in numbers compared with @huggingface OpenLLM leaderboard, which uses an older version. The overall trend of performance improvement remains the same and notably the improvement after 4 iterations from the SFT checkpoint is even more significant (>6%)!
Give someone a fish, and you feed them for a day; teach someone to fish, and you feed them for a lifetime. Elevating from Weak to Strong with Self-Play Fine-Tuning (SPIN) for All LLMs. Empower, Evolve, SPIN! arxiv.org/pdf/2401.01335.pdf
6
32
186
73,513
Replying to @ylecun
Yann, we acknowledge your support for Democratic Party and your stance against Trump, which is nothing wrong. However, why do you often quote misinterpreted tweets and seem to struggle with distinguishing between legal, illegal, and criminal immigrants? Additionally, you often compare the U.S. with Europe or other regions. Many people come to the U.S. for specific reasons and perhaps choose it over Europe or other countries. Therefore, we focus on American values and the future of the U.S., not on what happens elsewhere.
13
2
199
27,148
Replying to @zjasper
The original GRPO is an off-policy RL algorithm, but its KL regularization isn't done right. Specifically, the k3 estimator for the unnormalized reverse KL is missing the importance weight. The correct formulation should be:
7
23
207
113,417
Another fantastic benchmark of optimizers. Key takeaways: 1. Variance-reduced Adam variants (e.g., MARS) achieve significant speedups over the AdamW baseline. 2. Matrix-based optimizers (e.g., Muon, SOAP) consistently outperform their scalar-based counterparts (e.g., Lion). Note: MARS unifies both scalar- and matrix-based optimizers. The code for the matrix-based version of MARS will be released soon.
Fantastic Pretraining Optimizers and Where to Find Them "we conduct a systematic study of ten deep learning optimizers across four model scales (0.1B-1.2B parameters) and data-to-model ratios (1–8× the Chinchilla optimum)." "we find that all the fastest optimizers such as Muon and Soap, use matrices as preconditioners" "However, the speedup of matrix-based optimizers is inversely proportional to model scale, decreasing from 1.4× over AdamW for 0.1B parameter models to merely 1.1× for 1.2B parameter models." Observations made in the paper: 1. Hyperparameter transfer between optimizers is non-trivial. 2. The speedup of new optimizers is lower than claimed and diminishes with model size. 3. Early-stage loss curves can mislead significantly. 4. Matrix-based optimizers consistently outperform scalar-based optimizers for small models. 5. Optimal choice of optimizer shifts depends on data-to-model ratios.
5
21
180
22,840
🚨We introduce Accelerated Preference Optimization (APO) for language model alingment! 💡Key takeaway: DPO and other preference optimization algorithms (e.g., IPO & SPPO) are just fancy proximal point methods in disguise! This opens the door for using Nesterov’s momentum to turbocharge them!🚀🚀🚀
(1/n)🚀 Introducing Accelerated Preference Optimization (APO) for Reinforcement Learning from Human Feedback (RLHF)! 🎯 RLHF plays a central role in aligning large language models (LLMs) with human preferences. Direct Preference Optimization (DPO), one of the most popular approaches, simplifies this process by avoiding explicit reward estimation. But what if we could make it even faster and more efficient? The answer: momentum! ⚡ 💡 APO Framework: We introduce Accelerated Preference Optimization (APO), a general framework that unifies several preference optimization methods and incorporates Nesterov's momentum to accelerate the convergence process. By framing preference optimization as a proximal point method, we demonstrate APO's ability to speed up the RLHF process. 📊 Theoretical Insights: APO achieves a faster convergence rate compared to standard methods like iterative DPO and Self-Play Preference Optimization (SPPO), which a significant improvement in the efficiency of training LLMs. 🚀 Empirical Results: On the AlpacaEval 2.0 benchmark, APO outperforms DPO, iterative DPO, and other strong baselines in aligning LLMs with human preferences. 🔗 Paper: arxiv.org/pdf/2410.02197
4
26
189
31,878
Replying to @noot_ippi
Open model weights.
10
15
174
24,508
So many multipliers! Great to see that Grok2 was trained using μP. huggingface.co/xai-org/grok-…
7
22
177
38,114
🎬Trailer for the ultimate LLM optimizer: Faster and more token-efficient. Coming soon!
7
11
173
32,395
Excited to announce the @NeurIPSConf OPT2020 workshop on optimization for machine learning. Please submit your papers to our workshop to explore the intimate relationship between OPTIMIZATION and MACHINE LEARNING. Deadline: Oct 8, 2020. Website: opt-ml.org/index.html (1/2)
1
39
163
Very excited to be at the helm of the AI for Science initiative at ByteDance Research. Our unwavering commitment to reshaping the scientific discovery landscape using AI is a journey to watch. Today, we release CryoStar, a state-of-the-art open-source tool for Cryo-EM heterogeneous reconstruction. It's the epitome of innovation in merging AI with the world of structural biology. Project: bytedance.github.io/cryostar… Code: github.com/bytedance/cryosta… Paper: biorxiv.org/content/10.1101/… Join us on this remarkable journey of exploration, from AI breakthroughs to groundbreaking discoveries in science. Stay connected, as more models and open-sourced tools are on the horizon!
1/ Very excited to present cryoSTAR, a novel approach for continuous heterogeneity in SPA cryo-EM. In brief, cryoSTAR leverages the prior knowledge from a user-given atomic model to better find dynamics in the final reconstruction. bytedance.github.io/cryostar… biorxiv.org/content/10.1101/…
3
27
153
41,255
No joke. Most people haven’t yet realized how powerful machine learning theory actually is. I’m speaking from the perspective of someone directly building AGI: it stabilizes both pretraining and RL, and it provides the blueprint for scaling all the way to AGI.
12
5
157
65,676
🎬Trailer 2 for the ultimate LLM optimizer: Faster, more token-efficient, and highly scalable! The final countdown begins!
10
10
151
25,478
从理论转到大模型,一路走来不讨喜。 有人不适应你的变化,有人不希望你真的做成。 Losers and haters make noise. Builders build. Feel the AGI!
6
2
145
11,492
Machine learning theory.
The question I got asked most frequently during COLM this year was what research questions can be studied in academia that will also be relevant to frontier labs. So I’m making a talk for this. What topics / areas should I cover? RL/eval/pretraining,?
3
6
139
30,164
Here is another compelling case highlighting why KL-regularized RL is indispensable.
Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other approaches for a fraction of the cost. thinkingmachines.ai/blog/on-…
7
9
134
23,399
Congrats to Dr. Jinghui Chen @netfox001 on his successful Ph.D. defense today! Jinghui has done excellent work in optimization for deep learning and robust machine learning. He will join @penn_state as an assistant professor in the Fall of 2021. Thanks to the amazing committee!
8
3
137
I fully respect everyone’s right to freedom of speech. If the invited speaker has prior experiences that have led to biases against Chinese students or scholars, I am open to engaging in a personal or public dialogue to address these biases. However, delivering such an offensive message during a keynote talk at NeurIPS, a highly regarded research forum for our community, is both unnecessary and inappropriate.
If you don’t think conference codes of conduct are bad for free speech, read this.
3
5
130
19,457
Excited to contribute to the initiative aimed at benchmarking the trustworthiness of Large Language Models (LLMs) across 6 dimensions: truthfulness, safety, fairness, robustness, privacy, and machine ethics. Explore the leaderboard showcasing the trustworthiness of 16 open-source (e.g., Llama2 and Mistral) and proprietary LLMs (ChatGPT, GPT-4) at: trustllmbenchmark.github.io/…
TrustLLM: Trustworthiness in Large Language Models paper page: huggingface.co/papers/2401.0… Large language models (LLMs), exemplified by ChatGPT, have gained considerable attention for their excellent natural language processing capabilities. Nonetheless, these LLMs present many challenges, particularly in the realm of trustworthiness. Therefore, ensuring the trustworthiness of LLMs emerges as an important topic. This paper introduces TrustLLM, a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. Our findings firstly show that in general trustworthiness and utility (i.e., functional effectiveness) are positively related. Secondly, our observations reveal that proprietary LLMs generally outperform most open-source counterparts in terms of trustworthiness, raising concerns about the potential risks of widely accessible open-source LLMs. However, a few open-source LLMs come very close to proprietary ones. Thirdly, it is important to note that some LLMs may be overly calibrated towards exhibiting trustworthiness, to the extent that they compromise their utility by mistakenly treating benign prompts as harmful and consequently not responding. Finally, we emphasize the importance of ensuring transparency not only in the models themselves but also in the technologies that underpin trustworthiness. Knowing the specific trustworthy technologies that have been employed is crucial for analyzing their effectiveness.
23
128
25,699
Modern scaling law research often feels like this: 1. Train a few models 2. Plot metrics on a log-log scale 3. Fit a line 4. Call it a new law Maybe it’s time to ask: are we uncovering principles, or just describing artifacts?🤔
7
7
131
12,248
Math is key to LLMs and AI, and mathematicians can do amazing work. But watching them meltdown over AI… just unbelievable.
This is an unwise statement that can only make people confused about what LLMs can or cannot do. Let me tell you something: Math is NOT about solving this kind of ad hoc optimization problems. Yeah, by scraping available data and then clustering it, LLMs can sometimes solve some very minor math problems. It's an achievement, and I applaud you for that. But let's be honest: this is NOT the REAL Math. Not by 10,000 miles. REAL Math is about concepts and ideas - things like "schemes" introduced by the great Alexander Grothendieck, who revolutionized algebraic geometry; the Atiyah-Singer Index Theorem; or the Langlands Program, tying together Number Theory, Analysis, Geometry, and Quantum Physics. That's the REAL Math. Can LLMs do that? Of course not. So, please, STOP confusing people - especially, given the atrocious state of our math education. LLMs give us great tools, which I appreciate very much. Useful stuff! Go ahead and use them AS TOOLS (just as we use calculators to crunch numbers or cameras to render portraits and landscapes), an enhancement of human abilities, and STOP pretending that LLMs are somehow capable of replicating everything that human beings can do. In this one area, mathematics, LLMs are no match to human mathematicians. Period. Not to mention many other areas. Calling on my friend @ericweinstein and @GaryMarcus, who has been one of the few sane expert voices on these matters lately. 🙏 h/t @hellheff
12
7
130
20,720
SPIN (github.com/uclaml/SPIN) is trending on GitHub! We firmly believe open source efforts will be the main driving force for advancing research in LLMs. github.com/trending
We've just open-sourced the code and data for Self-play Fine-Tuning (SPIN)! Time to SPIN every model out there! 🚀🚀🚀 Code: github.com/uclaml/SPIN Data: huggingface.co/collections/U… Models: huggingface.co/collections/U… Project Page: uclaml.github.io/SPIN/ Many thanks to @Yihe__Deng, @HuizhuoY, and @Kaixuan_Ji_19 for their tremendous efforts in preparing these.
2
22
127
16,480
🚨SPIN vs. DPO: SPIN uses the SFT dataset, while DPO demands human preferences with additional labeling overhead. Can we make DPO more label efficient? Introducing Active DPO (ADPO) - streamline DPO with a 50% reduction in labeled human preference data! 🚀 arxiv.org/abs/2402.09401 Joint work w/ @Kaixuan_Ji_19 @JiafanHe
🔥Excited to share our recent research on query-efficient RLHF! Introducing Active Direct Preference Optimization (ADPO), a new approach that improves DPO performance on Open-LLM-Benchmark with just half the queries. Discover how ADPO eliminates the significant demand for querying preference labels.🚀[1/4] Paper: arxiv.org/pdf/2402.09401.pdf A joint work with @JiafanHe , and @QuanquanGu 👏
2
26
128
17,666
Parallelized reasoning is the path to exceeding human-level intelligence.
Deep Think in 2.5 Pro has landed. 🤯 It’s a new enhanced reasoning mode using our research in parallel thinking techniques - meaning it explores multiple hypotheses before responding. This enables it to handle incredibly complex math and coding problems more effectively.
7
6
126
16,411
1/4 Want to learn more about optimization for ML after @NeurIPSConf main conference? Welcome to our OPT2020 workshop on optimization for machine learning on Friday (in 6 hours!). Friday, December 11th, 2020 EST time 06:00 AM -- 19:00 PM Schedule: neurips.cc/virtual/2020/publ…
1
22
115
Is upper-confidence bound (UCB) strategy provably efficient in neural contextual bandits? The answer is affirmative. Check out our paper (arxiv.org/abs/1911.04462) to be presented in #ICML2020 @icmlconf, where we proposed a NeuralUCB algorithm that attains an O(\sqrt{T}) regret.
2
14
116
Very proud of my student Dr. Pan Xu for his successful Ph.D. journey and amazing research work! It's a privilege to be his advisor. Pan's work spans nonconvex optimization, MCMC, and policy gradient methods, all pushing the frontier of sample-efficient ML. Congratulations!
Personal news: I have graduated from UCLA and officially become Dr. Xu. I will join Duke University @DukeU as a tenure track assistant professor in the Department of Biostatistics and Bioinformatics in August 2022. In the meantime, I will be a CAST Postdoctoral Fellow at Caltech.
4
117
🔥 Learning rate transfer under μP is now proven!
🎯 Just released a new preprint that proves LR transfer under μP. -> The Problem: When training large neural networks, one of the trickiest questions is: what learning rate should I use? [1/n]🧵 Link: arxiv.org/abs/2511.01734
3
8
118
18,312
Experiment is up and running, awaiting a major outcome.
6
3
108
34,671
GRPO has served its purpose. It's time to move on.
7
4
115
12,520
1/3 Can RL with linear function approximation achieve minimax optimality? For linear mixture/kernel MDPs, yes! Our recent work arxiv.org/pdf/2012.08507.pdf provides nearly matching upper and lower bounds for both episodic & discounted settings. Joint work w/ @DongruoZ @CsabaSzepesvari
3
17
109
We're the architects now. 🏗️📐.
1/ Introducing “Tensor Product Attention Is All You Need” (TPA) and Tensor ProducT ATTenTion Transformer (T6)! 🚀 Ever wondered if there’s a more memory-efficient way to handle long contexts in LLMs? Homepage: tensorgi.github.io/T6
8
10
107
19,359
I believe this was proved in a more rigorous way some time ago in this paper: arxiv.org/pdf/2311.14648
Paper - "LLMs Will Always Hallucinate, and We Need to Live With This" Podcast format generated with Google's new illuminate tool (illuminate is trained to produce short podcast from research papers)
5
22
103
18,136
In our latest research, we've leveraged Rosetta energy as a reward in residual-level DPO tailored for antibody design. Check out our paper at: arxiv.org/html/2403.16576v1
My prediction for the next bio/ML trend at NeurIPS. DPO and RLHF for protein design. Protein language models in particular. 😉
1
23
101
23,330
Very cool! Who’d like to use FlashTPA? Drop a like if you want us to release it! MHA-->GQA-->MLA--->TPA🚀🚀 Paper: arxiv.org/pdf/2501.06425
🚀 Day 1 of #OpenSourceWeek: FlashMLA Honored to share FlashMLA - our efficient MLA decoding kernel for Hopper GPUs, optimized for variable-length sequences and now in production. ✅ BF16 support ✅ Paged KV cache (block size 64) ⚡ 3000 GB/s memory-bound & 580 TFLOPS compute-bound on H800 🔗 Explore on GitHub: github.com/deepseek-ai/Flash…
4
13
103
18,021
Imagine a Cursor for theorem proving: branches for ideas, PRs for proofs, reviews for rigor. That’s the future of theoretical research.
Doing research in a ChatGPT Group Chat with friends*, alternating between proofs and simulations, so so much fun ... *I refrain from using the term "vibes research", because obviously every step eventually needs to be very carefully checked!
7
10
109
31,599
Fair point, but Terence Tao is a once-in-a-generation genius. The rest of us are just trying to make ML theory great again, maybe with a little salary boost!🤣
Terence Tao makes $700k a year, what’s your excuse?
5
3
98
24,200
Is linear attention sufficient to reproduce the success of transformers? We have a paper (arxiv.org/pdf/2310.08391.pdf) on analyzing in-context learning with linear attention, and I know there is another paper (arxiv.org/pdf/2310.01082.pdf) showing that linear attention can replicate key aspects of soft-max attention's training dynamics. Despite this, I remain skeptical about whether linear attention alone is enough to replicate the success of transformers.
Anyone who has trained a Transformer has viscerally felt its O(T^2) cost. It is not tractable to train Transformers end-to-end on long contexts. Here's a writeup of the research direction I believe is most likely to solve this: linear transformers. manifestai.com/blogposts/fas… 1/7
3
8
100
32,740
The hottest thing a woman can be is training models at 3am.
The hottest thing a man can be is exceptionally good at math
5
3
90
12,324
Can’t make it to #ICML2025 this year. People ask why I’m so obsessed with pretraining and scaling. Simple: the AGI era is here. I refuse to be irrelevant.
2
4
94
13,083
Using a diffusion transformer (DiT) instead of a U-Net is an evident trend. However, what is the advantage of flow matching over diffusion? Is it merely its sampling speed advantage, or are there more distinguished features to consider?
Announcing Stable Diffusion 3, our most capable text-to-image model, utilizing a diffusion transformer architecture for greatly improved performance in multi-subject prompts, image quality, and spelling abilities. Today, we are opening the waitlist for early preview. This phase is crucial for gathering insights to improve its performance and safety ahead of open release. You can sign up to join the waitlist and learn more here: bit.ly/3OR2qQF #stablediffusion3 Prompt: Epic anime artwork of a wizard atop a mountain at night casting a cosmic spell into the dark sky that says "Stable Diffusion 3" made out of colorful energy
3
3
90
21,095
It's intriguing to observe the use of REINFORCE in RLHF. REINFORCE is a classical algorithm utilized to estimate policy gradients for episodic Markov Decision Processes (MDPs). Another notable method is GPOMDP. While both are effective estimators, it's worth noting that neither of them is new. proceedings.mlr.press/v115/x…
Introducing Gemma: a family of lightweight, state-of-the-art open models for developers and researchers to build with AI. 🌐 We’re also releasing tools to support innovation and collaboration - as well as to guide responsible use. Get started now. → dpmd.ai/3UJu1Y1
3
20
89
36,627
Check out our new paper arxiv.org/abs/2003.01803 We propose MOTS, the first Thompson sampling type algorithm that achieves the minimax optimality for multi-armed bandits! Our result removes the logarithmic factor in existing regret bounds. @ashipra @LihongLi20
2
12
85
Make AGI Open Again!
🚀 Day 0: Warming up for #OpenSourceWeek! We're a tiny team @deepseek_ai exploring AGI. Starting next week, we'll be open-sourcing 5 repos, sharing our small but sincere progress with full transparency. These humble building blocks in our online service have been documented, deployed and battle-tested in production. As part of the open-source community, we believe that every line shared becomes collective momentum that accelerates the journey. Daily unlocks are coming soon. No ivory towers - just pure garage-energy and community-driven innovation.
1
4
87
7,311
Replying to @samsja19
The original GRPO is an off-policy RL algorithm, but its KL regularization isn't done right. Specifically, the k3 estimator for the unnormalized reverse KL is missing the importance weight. The correct formulation should be:
5
10
88
9,851
(1/4) Want to learn the challenges and recent progress in representation learning? Welcome to our workshop on representation learning theory this week sponsored and hosted by @TTIC_Connect. Time: August 4 & 5th, 2022, 9:00 AM - 5:30 PM CT Registration: shorturl.at/lnpQY
1
15
83
🚀 Introducing Diffusion Protein Language Models (DPLM), a new suite of discrete diffusion-based protein language models! 🧬 With versatility in both generative and predictive tasks, DPLM is poised to set the new SOTA in protein language models, excelling across a spectrum of benchmark tasks. Congrats to the team!
2
11
84
11,765
🚀We have released the Mistral 7B models fine-tuned by SPPO on @huggingface: huggingface.co/collections/U…
Another triumph for Self-Play! Self-Play Preference Optimization (SPPO) has surpassed (iterative) DPO, IPO, Self-Rewarding LMs, and others on AlpacaEval, MT-Bench, and the Open LLM Leaderboard. Remarkably, Mistral-7B-instruct-v0.2 fine-tuned by SPPO achieves superior performance to GPT-4 0613 without relying on any GPT-4 responses. Explore the roadmap of LLM fine-tuning techniques: Supervised Fine-tuning: SFT --> SPIN Preference Fine-tuning: PPO --> DPO --> SPPO Paper: arxiv.org/pdf/2405.00675
6
10
86
19,770
I've noticed a trend in the LLM community. Whenever something XYZ is perceived as cool, there's almost immediately an OpenXYZ. Is this truly in the spirit of open research, or is it just a form of mimicry or copying?
17
6
79
27,565
This is fantastic. Every university should consider doing the same. Teaching undergraduate students how modern AI systems actually work under the hood and making the fundamentals accessible early on is exactly what the field needs.
I'm teaching a new "Intro to Modern AI" course at CMU this Spring: modernaicourse.org. It's an early-undergrad course on how to build a chatbot from scratch (well, from PyTorch). The course name has bothered some people – "AI" usually means something much broader in academic contexts – but I think the time has come where the first thing that many students interested in AI should see is how the AI they are familiar with actually works (because it's really simple!) The more people who understand it the better. I'll be trying to put as much material as I can that we develop online (assignments + autograding, hopefully lecture videos), though as a first-time course there are also likely to be some bumps along the way. Hopefully it becomes a good resource over time, though. Feedback welcome.
1
2
80
18,117
Microsoft: “Textbooks are all you need.” Tencent: “WizardLM team is all we need.”
Imagine where Microsoft would be now if they hadn't destroyed the amazing WizardLM team, causing them to leave and go to Tencent… …where they're now releasing models far beyond anything Microsoft has created.
2
1
76
13,109
Replying to @greyfedora0
I’m not blindly defending PhDs in AI, a lot of PhD work contributes little to real progress. But a strong PhD, done right, builds the foundation that many great AI researchers stand on, whether they hold the degree or not.
4
2
74
8,211
Great advice! I’d like to add one more thought: Mathematical thinking is key to conducting exceptional research.
🔗 Thoughts on Research Impact in AI. Grad students often ask: how do I do research that makes a difference in the current, crowded AI space? This is a blogpost that summarizes my perspective in six guidelines for making research impact via open-source artifacts. Link below.
4
6
76
17,728
MoE is just cool. Fine-grained scaling is the way forward.
Qwen3-30B-A3B is de facto on par with Qwen3-32B dense and the greatest vindication of finegrained MoEs the world has seen in the open.
1
6
76
9,265
Claiming that only one side cares about facts undermines the diversity of thought and approach that is fundamental to the scientific community.
People studying misinformation lean left for two reasons: 1. scientists lean left, regardless of specialty, because they care about facts. 2. misinformation today primarily comes from the Right ("they're eating the dawwwgs!") which makes it worth studying and fighting against for people leaning left.
5
7
75
12,230
他们问我做大模型哪块。 “训练?”不是。 “推理?”不是。 “多模态?”不是。 “Finetune?”也不是。 他们皱眉:“那你到底做啥?” 我眼圈发红、声音发颤: “我……我做 Agent 开发……” 全场寂静,传来大家意味深长的窃笑。
🚨BREAKING; APPLE HELD INTERNAL TALKS ABOUT BUYING PERPLEXITY
5
3
75
15,910