Assistant Professor at CMU. Marathoner, @thesisreview.

Excited to teach Advanced NLP at CMU this semester! Slides are on the course page as the course proceeds: cmu-l3.github.io/anlp-spring… Lectures will be uploaded to Youtube: piped.video/playlist?list=PL…
12
175
1,023
759,449
Lecture 11: Reinforcement Learning piped.video/disWB7qwcOk - RL basics - Reward functions for NLP - Optimizing rewards (policy gradient) - Stabilizing learning (e.g., KL penalty, PPO)
Excited to teach Advanced NLP at CMU this semester! Slides are on the course page as the course proceeds: cmu-l3.github.io/anlp-spring… Lectures will be uploaded to Youtube: piped.video/playlist?list=PL…
4
93
783
85,230
Slides for my recent talk on: "Reasoning with inference-time compute" wellecks.com/data/welleck202… Papers: - Lean-STaR: arxiv.org/abs/2407.10040 - Easy-to-hard: arxiv.org/abs/2403.09472 - Compute-optimal inference: arxiv.org/abs/2408.00724 - Meta-generation: arxiv.org/abs/2406.16838
6
111
710
64,517
Lecture 20: Advanced Post-Training piped.video/yuJUkR2vvJM - Supervised Fine-tuning - Reward Modeling - Reinforcement Learning - Direct Preference Optimization
Excited to teach Advanced NLP at CMU this semester! Slides are on the course page as the course proceeds: cmu-l3.github.io/anlp-spring… Lectures will be uploaded to Youtube: piped.video/playlist?list=PL…
3
76
664
72,504
Lecture 16: Parallelism and Scaling piped.video/Mpg1YJfAEH0 - Basics of training on one device - Parallelization on multiple devices (e.g., data, tensor, pipeline parallel) - Combining and comparing strategies
Excited to teach Advanced NLP at CMU this semester! Slides are on the course page as the course proceeds: cmu-l3.github.io/anlp-spring… Lectures will be uploaded to Youtube: piped.video/playlist?list=PL…
4
71
638
68,428
Lecture 5: Transformers - Attention - Transformers - Improved transformers piped.video/bN6YylvZCzM
Excited to teach Advanced NLP at CMU this semester! Slides are on the course page as the course proceeds: cmu-l3.github.io/anlp-spring… Lectures will be uploaded to Youtube: piped.video/playlist?list=PL…
3
60
524
42,175
What do nucleus sampling, tree-of-thought, and PagedAttention have in common? They're all part of our new survey: "From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models" arxiv.org/abs/2406.16838
10
112
530
69,826
Announcing the L3 Lab at CMU! cmu-l3.github.io/ We focus on Learning, Language, and Logic, including: - Principles of ML for language - ML in high-trust areas, such as verifying math and programs - ML systems that improve over time Recruiting PhD students for fall 2024!
9
91
512
70,917
Lecture 19: Efficient Inference piped.video/jbHgzU4r7yU - Basics of efficient LLM inference - Speeding up single-token and sequence generation - Speeding up meta-generation strategies
Excited to teach Advanced NLP at CMU this semester! Slides are on the course page as the course proceeds: cmu-l3.github.io/anlp-spring… Lectures will be uploaded to Youtube: piped.video/playlist?list=PL…
55
509
43,130
Lecture 15: Quantization (Guest lecture by @Tim_Dettmers) piped.video/YXZZaje76r4 - Quantization basics - Quantized foundation models: LLM.int8() - Finetuning foundation models: QLoRA - Quantization and users
Excited to teach Advanced NLP at CMU this semester! Slides are on the course page as the course proceeds: cmu-l3.github.io/anlp-spring… Lectures will be uploaded to Youtube: piped.video/playlist?list=PL…
3
60
486
58,144
Lecture 9: Fine-tuning - Fine-tuning basics - Instruction tuning - Knowledge distillation - Efficient fine-tuning piped.video/watch?v=3qW996ux…
Excited to teach Advanced NLP at CMU this semester! Slides are on the course page as the course proceeds: cmu-l3.github.io/anlp-spring… Lectures will be uploaded to Youtube: piped.video/playlist?list=PL…
2
56
448
41,705
Lecture 18: Advanced Inference Strategies piped.video/jNpeYvZtJkw - Parallel, tree search, refinement strategies - Long chain-of-thought - Inference scaling laws
Excited to teach Advanced NLP at CMU this semester! Slides are on the course page as the course proceeds: cmu-l3.github.io/anlp-spring… Lectures will be uploaded to Youtube: piped.video/playlist?list=PL…
3
50
439
41,369
Interested in LLMs and Lean? Check out LLMLean, a tool for using LLMs to suggest proof steps and complete proofs in Lean: github.com/cmu-l3/llmlean Here's an example of using LLMLean with GPT-4o to solve problems from Mathematics in Lean:
6
72
262
28,935
Teaching a new course on Neural Code Generation with @dan_fried! cmu-codegen.github.io/s2024/ Here is the lecture on pretraining and scaling laws: cmu-codegen.github.io/s2024/…
3
71
391
37,822
Lecture 14: Agents piped.video/4_kbc0_J_U0 - What is an agent? - Agent environments - Agent patterns
Excited to teach Advanced NLP at CMU this semester! Slides are on the course page as the course proceeds: cmu-l3.github.io/anlp-spring… Lectures will be uploaded to Youtube: piped.video/playlist?list=PL…
2
36
375
38,255
Had a fun time giving the tutorial at @SimonsInstitute! Here are the materials: Transformers for Mathematics Tutorial - Slides: wellecks.com/transformers4ma… - Code/exercises: github.com/wellecks/transfor…
Excited to give a tutorial on Transformers for Mathematics at @SimonsInstitute tomorrow! Part of the wonderful Workshop on AI for Mathematics and Theoretical Computer Science simons.berkeley.edu/workshop…
2
53
374
34,686
New paper by Andre He: Rewarding the Unlikely: Lifting GRPO Beyond Distribution Sharpening arxiv.org/abs/2506.02355 Tired of sharpening the distribution? Try unlikeliness reward to learn new things from the roads less traveled
4
52
358
32,044
I was honored to give a talk at Simons Institute on inference-time algorithms and meta-generation! simons.berkeley.edu/talks/se… It was a sneak-preview subset of our NeurIPS tutorial: cmu-l3.github.io/neurips2024…
5
37
345
27,371
Excited to give a tutorial on Transformers for Mathematics at @SimonsInstitute tomorrow! Part of the wonderful Workshop on AI for Mathematics and Theoretical Computer Science simons.berkeley.edu/workshop…
3
39
328
47,422
And to finish off, Lectures 21 - 23: - AI for Mathematics: piped.video/ToY57HgQKXA - Multimodal I (CLIP / Llava): piped.video/5uI5WOpq8LQ - Multimodal II (VQVAE / Chameleon): piped.video/VismiXpCs_Y
Excited to teach Advanced NLP at CMU this semester! Slides are on the course page as the course proceeds: cmu-l3.github.io/anlp-spring… Lectures will be uploaded to Youtube: piped.video/playlist?list=PL…
1
47
317
28,160
Curious about inference-time scaling, the #1 trending topic in LLMs? Come to our NeurIPS tutorial: Beyond Decoding: Meta-Generation Algorithms for LLMs (Tue. @ 1:30)! cmu-l3.github.io/neurips2024…
4
46
307
73,624
A tutorial on neural theorem proving: github.com/wellecks/ntptutor… Interactive notebooks for learning about combining neural language models with formal proof assistants. Part I) Build and evaluate a next-step suggestion tool Part II) LLM cascades and Draft, Sketch, Prove
7
54
295
65,247
I was honored to give a talk at UW Mathematics on "Language models and formal mathematics", covering: - Neural theorem proving tutorial: github.com/wellecks/ntptutor… - LLMstep: mathai2023.github.io/papers/… - Llemma: arxiv.org/abs/2310.10631 Slides are here! wellecks.com/data/welleck202…
7
52
272
41,015
Lecture 6: Pretraining - Pretraining objectives - Data: quantity, quality, coverage - Compute and scaling laws piped.video/qUAkjz3-VFg
Excited to teach Advanced NLP at CMU this semester! Slides are on the course page as the course proceeds: cmu-l3.github.io/anlp-spring… Lectures will be uploaded to Youtube: piped.video/playlist?list=PL…
32
272
19,175
Successfully defended my PhD dissertation! Thank you to the committee members (@kchonyc, @hhexiy, @jaseweston, @zz_aws_nyush, Keith Ross) and all of those who made this possible. Excited to join the University of Washington as a postdoc with @YejinChoinka's group in early 2021
15
8
238
New paper by @PranjalAggarw16: L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning arxiv.org/abs/2503.04697 We train L1: a reasoning model with controllable thinking length, allowing for precisely trading off test-time compute for improved reasoning
2
45
228
14,885
our new paper: "Neural Text d̶e̶Generation with Unlikelihood Training" is now on arxiv! (w/ @uralik1, @stephenroller, Emily Dinan, @kchonyc, @jaseweston) arxiv.org/pdf/1908.04319.pdf A step towards solving the case of neural text degeneration 🔎
7
54
216
I was honored to give a talk on AI for theorem proving for the Berkeley Advanced LLM Agents course! "Bridging Informal and Formal Mathematical Reasoning with AI" Youtube: piped.video/live/Gy5Nm17l9oo Slides: wellecks.com/data/welleck202… It covers three themes from our recent work: - Informal thoughts: Lean-STaR - Informal proofs: Draft-Sketch-Prove, LeanHammer 👀 - Research-level math: miniCTX, LLMLean
📣 Today 4/14 at 4:10 PM PT, join us for the 10th Advanced LLM Agents MOOC lecture on Advanced Topics in Neural Theorem Proving by @wellecks @CarnegieMellon. 🌐 Join the thriving community of the LLM Agents MOOC series, with 23K+ registered learners & 10K+ members on Discord! 🚀 Register NOW for the AgentX Competition by @BerkeleyRDI @UCBerkeley, w. sponsors @Amazon @huggingface @LambdaAPI @MistralAI @Google @GroqInc @schmidtsciences, and VC partners @Accel @BainCapVC @BessemerVP @lightspeedvp @MayfieldFund @NEA! Exciting announcements on prizes/credits/resources and more coming soon!
3
37
223
18,868
new paper: "NaturalProofs: Mathematical Theorem Proving in Natural Language" As a step towards systems that understand and use natural mathematical language, we develop a dataset of mathematical statements+proofs and a reference retrieval task. wellecks.github.io/naturalpr… (1/7)
6
44
211
How can informal reasoning improve formal theorem proving? New paper: "Lean-STaR: Learning to Interleave Thinking and Proving" arxiv.org/abs/2407.10040 We introduce a framework for learning to interleave informal thoughts with steps of formal proving. 46.3% on miniF2F 🔥
5
45
215
25,032
New paper: arxiv.org/abs/2205.12910 Theorem proving in natural mathematical language- the mix of symbolic and natural language used by humans- tests reasoning and plays a central role in mathematical education. Can language models prove theorems & help us when we're stuck? 1/N
2
55
205
New paper by Weihua Du (@StigLidu): Optimizing Temperature for Language Models with Multi-Sample Inference arxiv.org/abs/2502.05234 We develop TURN, which automatically finds the optimal temperature for inference strategies such as best-of-N or majority voting (1/5)
7
41
204
20,538
Lecture 8: Prompting - Prompting basics - Few-shot prompting - Prompt engineering - Prompting patterns (e.g., chain-of-thought, prompt chaining) piped.video/watch?v=hq5kld3k…
Excited to teach Advanced NLP at CMU this semester! Slides are on the course page as the course proceeds: cmu-l3.github.io/anlp-spring… Lectures will be uploaded to Youtube: piped.video/playlist?list=PL…
4
27
198
17,353
Our Inference Scaling Laws paper received an Outstanding Paper Award at NeurIPS Math-AI! Congrats to Yangzhen (@WYZ0402), Zhiqing (@EdwardSun0909), Shanda (@Shanda_Li_2000) and Yiming!
Replying to @wellecks
2. Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models arxiv.org/abs/2408.00724 Oral presentation at Math-AI! Saturday, West Meeting Room 118-120
5
15
199
22,672
My PhD thesis, "Order and Learning in Sequential Neural Structured Prediction" is now online at cs.nyu.edu/media/publication…
Successfully defended my PhD dissertation! Thank you to the committee members (@kchonyc, @hhexiy, @jaseweston, @zz_aws_nyush, Keith Ross) and all of those who made this possible. Excited to join the University of Washington as a postdoc with @YejinChoinka's group in early 2021
6
20
185
Excited to give a NeurIPS tutorial on LLM inference strategies, inference-time scaling laws & more with @mattf1n and @haileysch__ ! "Beyond Decoding: Meta-Generation Algorithms for Large Language Models" More details soon, check out arxiv.org/abs/2406.16838 in the meantime!
We’re excited to share the list of accepted tutorials for @NeurIPSConf ! Thanks to everyone who put in the time to submit a proposal. Check out the lineup and let us know which tutorials you’re most looking forward to! blog.neurips.cc/2024/10/17/i… with @irenetrampoline & @GalChechik
1
18
178
20,909
Excited about CMU's new Institute for Computer-Aided Reasoning in Mathematics (ICARM), a new NSF Mathematical Sciences Research Institute. I'm honored to serve as an Assistant Director focusing on machine learning and mathematics.
A new federally funded national institute at CMU will help mathematicians use AI to make mathematical reasoning faster and more reliable in solving pressing challenges across science, security and the economy. Read more, and scroll for further details: cmu.is/NSF-institute
8
22
172
25,204
New paper by Pranjal Aggarwal (@PranjalAggarw16): Programming with Pixels: Computer-Use Meets Software Engineering arxiv.org/abs/2502.18525 We reframe agentic software engineering as interacting with an IDE using visual observations and simple actions like clicking and typing
4
10
54
6,887
5 papers upcoming at NeurIPS: 1. Easy-to-Hard Generalization (TODAY 4:30-7:30, East Exhibit Hall A-C #2806) arxiv.org/abs/2403.09472 2. Inference Scaling Laws (Oral @ Math-AI) arxiv.org/abs/2408.00724 3. Lean-STAR @ Math-AI arxiv.org/abs/2407.10040 4. miniCTX @ Math-AI arxiv.org/abs/2408.03350 5. miniCodeProps @ Safe Generative AI arxiv.org/abs/2406.11915
2
12
166
18,390
Code generation graduated from self-contained problems to complex codebases. Neural theorem proving should too! Introducing miniCTX, a new benchmark that tests a model's ability to prove theorems from complex, real Lean projects cmu-l3.github.io/minictx/ arxiv.org/abs/2408.03350
3
28
158
24,229
Thank you for coming to the tutorial! The recording is already up for those with NeurIPS registrations: - neurips.cc/virtual/2024/tuto… All of the materials are here for further reference/study: - cmu-l3.github.io/neurips2024… Also check out our code examples: - github.com/cmu-l3/neurips202…
Curious about inference-time scaling, the #1 trending topic in LLMs? Come to our NeurIPS tutorial: Beyond Decoding: Meta-Generation Algorithms for LLMs (Tue. @ 1:30)! cmu-l3.github.io/neurips2024…
3
19
151
10,946
Can LLMs prove that code is correct? New paper: "miniCodeProps: a Minimal Benchmark for Proving Code Properties" arxiv.org/abs/2406.11915 miniCodeProps tests LLMs' ability to prove properties of simple Lean programs. Despite its simplicity, it's challenging! Led by Evan Lohn
5
32
145
20,757
code and pre-trained models for "Neural Text Generation with Unlikelihood Training" now available! - Train and fine-tune LMs with unlikelihood - 🚨fine-tune a GPT-2 model from pytorch-transformers with unlikelihood github.com/facebookresearch/…
our new paper: "Neural Text d̶e̶Generation with Unlikelihood Training" is now on arxiv! (w/ @uralik1, @stephenroller, Emily Dinan, @kchonyc, @jaseweston) arxiv.org/pdf/1908.04319.pdf A step towards solving the case of neural text degeneration 🔎
40
142
Lecture 7: Decoding algorithms (guest lecture by @abertsch72) - decoding as optimization - sampling algorithms - constrained generation piped.video/cN8yX_ZZWJw
Excited to teach Advanced NLP at CMU this semester! Slides are on the course page as the course proceeds: cmu-l3.github.io/anlp-spring… Lectures will be uploaded to Youtube: piped.video/playlist?list=PL…
20
142
13,418
new paper "Consistency of a Recurrent Language Model With Respect to Incomplete Decoding" arxiv.org/pdf/2002.02492.pdf we show that common decoding algorithms can yield infinite-length, zero-probability strings from neural LMs♾ w/@uralik1, Jaedeok Kim, Richard Pang, @kchonyc (1/6)
3
22
138
new paper w/ @kchonyc: “MLE-guided parameter search for task loss minimization in neural sequence modeling” arxiv.org/pdf/2006.03158.pdf Sequence-level training based on random search around the current parameters and the MLE gradient
1
20
131
Our paper “Neural Text Generation with Unlikelihood Training” was accepted to #ICLR2020! w/ @uralik1 @stephenroller @em_dinan @kchonyc @jaseweston
code and pre-trained models for "Neural Text Generation with Unlikelihood Training" now available! - Train and fine-tune LMs with unlikelihood - 🚨fine-tune a GPT-2 model from pytorch-transformers with unlikelihood github.com/facebookresearch/…
3
16
125
In Vancouver for NeurIPS but don't have Taylor Swift tickets? You can still spend the day going through our tutorial reading list: - cmu-l3.github.io/neurips2024… Tuesday December 10, 1:30-4:00pm @ West Exhibition Hall C, NeurIPS
Curious about inference-time scaling, the #1 trending topic in LLMs? Come to our NeurIPS tutorial: Beyond Decoding: Meta-Generation Algorithms for LLMs (Tue. @ 1:30)! cmu-l3.github.io/neurips2024…
2
16
128
15,150
Lecture 2: Neural Text Representation and Classification piped.video/watch?v=2eJ3S1gX… Includes: - tokenization - token embeddings - minimizing cross entropy loss with neural networks
Excited to teach Advanced NLP at CMU this semester! Slides are on the course page as the course proceeds: cmu-l3.github.io/anlp-spring… Lectures will be uploaded to Youtube: piped.video/playlist?list=PL…
13
124
9,107
Version II of the tutorial on neural theorem proving: github.com/cmu-l3/ntptutoria… Some new additions - Train a model that gets 29.5% on miniF2F - Data extraction in Lean, based on lean-training-data - LLMLean tool (github.com/cmu-l3/llmlean)
A tutorial on neural theorem proving: github.com/wellecks/ntptutor… Interactive notebooks for learning about combining neural language models with formal proof assistants. Part I) Build and evaluate a next-step suggestion tool Part II) LLM cascades and Draft, Sketch, Prove
1
26
117
21,302
Lecture 3: Language Modeling Fundamentals - What is a language model? - Bigram, ngram, feedforward neural language model - Connecting maximum likelihood, KL divergence, and cross entropy loss piped.video/9JuMXy-5Y0E?si=bZAH…
Excited to teach Advanced NLP at CMU this semester! Slides are on the course page as the course proceeds: cmu-l3.github.io/anlp-spring… Lectures will be uploaded to Youtube: piped.video/playlist?list=PL…
19
114
8,182
Excited to announce “Math AI for Education: Bridging the Gap Between Research and Smart Education" (MathAI4Ed) A NeurIPS 2021 workshop on the intersection of AI, mathematics, and education. mathai4ed.github.io/ Now accepting submissions! (due Oct 06, 2021) (1/6)
2
28
103
Thanks to @chelseabfinn for the great conversation about her work on meta-learning and robotics -- check it out below!
Episode 10 of The Thesis Review: Chelsea Finn (@chelseabfinn), "Learning to Learn with Gradients" We discuss meta-learning, her work on MAML and its applications, and the future of robotics research soundcloud.com/thesis-review…
13
97
We present AlphaVerus, which enables LLMs to generate provably correct Rust code via a new tree search and self-improvement loop Very excited about AlphaVerus as a starting point for truly trustworthy code generation. Amazing work by @PranjalAggarw16! alphaverus.github.io/
LLMs often generate incorrect code. Instead, what if they can generate provably correct code? Presenting AlphaVerus: A self-reinforcing method that automatically learns to generate mathematically correct code using inference-time search and verifier feedback. 🧵
1
16
99
16,690
Our CMU-MATH team placed 2nd in the AIMO progress prize! (1st academic team 😎) The solution combines inference algorithms, clever reward model training, and program aided reasoning Code: github.com/AIMO-CMU-MATH/CMU… Models: huggingface.co/AIMO-CMU-MATH Blog: blog.ml.cmu.edu/2024/07/29/c…
🔥Our CMU-MATH team proudly clinched 2nd place in the Artificial Intelligence Mathematical Olympiad (AIMO) out of 1,161 participating teams with the best performance of an academic team! Dive into our blog to discover our winning formula: blog.ml.cmu.edu/2024/07/29/c…
1
18
97
12,560
Cool to see our L1 (arxiv.org/abs/2503.04697) methodology used here! And a nice insight about using the controllable reasoning budget to enable more efficient use of inference hardware
Replying to @PrimeIntellect
With INTELLECT-2 we aim for frontier reasoning performance with a controllable thinking budget. By incorporating length rewards into our training run, users can specify how long the model should reason for a given task. primeintellect.ai/blog/intel…
3
10
98
11,365
Our LLM inference tutorial is happening TODAY! cmu-l3.github.io/neurips2024… Tuesday December 10, 1:30-4:00pm @ West Exhibition Hall C, NeurIPS See you there!
Curious about inference-time scaling, the #1 trending topic in LLMs? Come to our NeurIPS tutorial: Beyond Decoding: Meta-Generation Algorithms for LLMs (Tue. @ 1:30)! cmu-l3.github.io/neurips2024…
2
11
96
10,770
Three papers accepted at #NeurIPS2022 - looking forward to chatting about reasoning & generation in New Orleans! 1. NaturalProver, neural (informal) theorem proving with language models w/ @liujc1998, @GXiming, @HannaHajishirzi, @YejinChoinka
New paper: arxiv.org/abs/2205.12910 Theorem proving in natural mathematical language- the mix of symbolic and natural language used by humans- tests reasoning and plays a central role in mathematical education. Can language models prove theorems & help us when we're stuck? 1/N
2
12
92
new paper: "Symbolic Brittleness in Sequence Models: on Systematic Generalization in Symbolic Mathematics" Sequence models show amazing performance on many tasks. Does perfect test accuracy tell the full story? w/ @PeterWestTM, @JizeCao, @YejinChoinka arxiv.org/abs/2109.13986
2
23
87
Lecture 4: Recurrent Neural Networks - Recurrent neural networks - Vanishing gradients and other recurrent architectures - Encoder-decoder - Attention piped.video/MDYywCo3-rM?si=toXN…
Excited to teach Advanced NLP at CMU this semester! Slides are on the course page as the course proceeds: cmu-l3.github.io/anlp-spring… Lectures will be uploaded to Youtube: piped.video/playlist?list=PL…
1
8
86
5,625
New COLM workshop on test-time scaling and reasoning models! Submit your papers by June 23, more info: scalr-workshop.github.io/
🚨Announcing SCALR @ COLM 2025 — Call for Papers!🚨 The 1st Workshop on Test-Time Scaling and Reasoning Models (SCALR) is coming to @COLM_conf in Montreal this October! This is the first workshop dedicated to this growing research area. 🌐 scalr-workshop.github.io
6
90
9,822
It's often said that "evaluation is easier than generation"... We go one step further: strong evaluators enable generalizing to harder problems! New paper led by @EdwardSun0909 and @scut_longhui Using supervision only on easy problems, 52.5 on MATH with Llemma-34b + re-ranking
🌟Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision 🌟 arxiv.org/abs/2403.09472 How can we keep improving AI systems when their capabilities surpass those of human supervisors? (1/n)
2
16
88
19,537
How do we optimally use compute at inference time? New paper led by @WYZ0402: "An Empirical Analysis of Compute-Optimal Inference with LMs" arxiv.org/abs/2408.00724 We study scaling laws of inference, finding that smaller models with sophisticated inference are compute-optimal.
3
19
89
10,499
MAUVE has received an Outstanding Paper Award at NeurIPS 2021! Honored to be part of a great team -- and an extra congrats to first author @KrishnaPillutla
Replying to @thegautamkamath
Outstanding Paper Award 4. MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers, by @KrishnaPillutla, @swabhz, @rown, @jwthickstun, @wellecks, @YejinChoinka, Zaid Harchaoui arxiv.org/abs/2102.01454 (4/n)
9
86
AlphaVerus has been accepted at #ICML2025! alphaverus.github.io/ arxiv.org/abs/2412.06176 We've seen in math that good verification (e.g., Lean) unlocks surprising capabilities–why not for code too? AlphaVerus puts LLMs & Rust’s Verus verifier into a self-improving loop–lots of untapped potential and open problems in this direction!
We present AlphaVerus, which enables LLMs to generate provably correct Rust code via a new tree search and self-improvement loop Very excited about AlphaVerus as a starting point for truly trustworthy code generation. Amazing work by @PranjalAggarw16! alphaverus.github.io/
5
6
82
6,885
Llemma: open language models for mathematics We train 7B and 34B models on Proofpile II, a 55B token dataset of code, web text, and papers. We make everything publicly available: models, code, data, and evaluation. Excited to have a new platform for research in AI+math!
We release Llemma: open LMs for math trained on up to 200B tokens of mathematical text. The performance of Llemma 34B approaches Google's Minerva 62B despite having half the parameters. Models/data/code: github.com/EleutherAI/math-l… Paper: arxiv.org/abs/2310.10631 More ⬇️
1
15
80
17,008
TURN has been accepted at ICML! arxiv.org/abs/2502.05234 Automatically select the temperature for inference strategies like best-of-N and majority voting
New paper by Weihua Du (@StigLidu): Optimizing Temperature for Language Models with Multi-Sample Inference arxiv.org/abs/2502.05234 We develop TURN, which automatically finds the optimal temperature for inference strategies such as best-of-N or majority voting (1/5)
2
3
80
6,095
The recent Claude 3.7 model from Anthropic lets you control the budget for thinking—how might this work? Check out L1, our fully open recipe for training reasoning models with controllable thinking budgets!
What if you could control how long a reasoning model “thinks”? Presenting L1-1.5B, an RL-trained reasoning model with: - controllable thinking length via a prompt - better performance per token than S1 - better short CoT performance than GPT-4o cmu-l3.github.io/l1 🧵
4
6
74
10,035
new paper: Draft, Sketch, and Prove arxiv.org/abs/2210.12283 A step towards bridging informal and formal mathematical reasoning via language models! LLMs can be used to draft natural mathematical proofs and autoformalize them into high-level sketches that guide a formal prover.
Large language models can write informal proofs, translate them into formal ones, and achieve SoTA performance in proving competition-level maths problems! LM-generated informal proofs are sometimes more useful than the human ground truth 🤯 Preprint: arxiv.org/abs/2210.12283 🧵
1
10
71
🚨TODAY🚨: Jiewen (@Jiewenhu02) and Thomas (@hanwen_zhu) are presenting miniCTX as an oral presentation at ICLR! miniCTX: Neural Theorem Proving with (Long-)Contexts arxiv.org/abs/2408.03350 Theorem proving beyond competition problems: research-level mathematics, scientific projects, and beyond
1
12
72
4,831
I was honored to give a talk and a tutorial at the 2nd Conference on Foundation Models and AI Agents for Science (SciFM 2025)! - Talk: Bridging Informal and Formal Mathematical Reasoning - Tutorial: Test-Time Scaling for Mathematical Reasoning Slide links are below
2
2
61
3,468
Llama 3 70B in LLMLean! Suggests proofs or next steps that are checked in Lean Try it out with a @togethercompute API key: github.com/cmu-l3/llmlean
3
9
59
8,224
New paper on scaling evaluation-time compute — thinking longer leads to better evaluation. Led by @seungonekim! Excited about this new dimension for taking advantage of inference-time compute and reasoning models. arxiv.org/abs/2503.19877
#NLProc New paper on "evaluation-time scaling", a new dimension to leverage test-time compute! We replicate the test-time scaling behaviors observed in generators (e.g., o1, r1, s1) with evaluators by enforcing to generate additional reasoning tokens. arxiv.org/abs/2503.19877
5
58
5,076
Easy-to-Hard Generalization was accepted to NeurIPS! Congrats to @EdwardSun0909 and @scut_longhui! Check out the updated camera-ready version here: openreview.net/pdf?id=qwgfh2…
It's often said that "evaluation is easier than generation"... We go one step further: strong evaluators enable generalizing to harder problems! New paper led by @EdwardSun0909 and @scut_longhui Using supervision only on easy problems, 52.5 on MATH with Llemma-34b + re-ranking
2
8
60
34,646
Excited about our new ICLR workshop on AI + Verification! In the age of increasingly capable models, trusting outputs and getting high-quality feedback to improve models are becoming central bottlenecks. Our workshop explores how AI can be combined with formal systems (e.g. program verifiers and proof assistants) or other kinds of verification to bring correctness and high-quality learning signals to code generation, mathematical reasoning, and beyond. Open for submissions! verifai-workshop.github.io/
📣Announcing VerifAI: AI Verification in the Wild, a workshop at #ICLR2025 VerifAI will gather researchers to explore topics at the intersection of genAI/trustworthyML and verification: verifai-workshop.github.io/ @celine_ylee @theo_olausson @ameeshsh @wellecks @taoyds
1
9
57
9,890
Very exciting course on LLM Agents!! Looking forward to giving a lecture for the course in April
Really excited to announce our Advanced LLM Agents MOOC (Spring 2025)! Building on the success of our LLM Agents MOOC from Fall 2024 (15K+ registered learners, ~9K Discord members, 200K+ lecture views on YouTube), we are excited to extend the MOOC this semester to cover some more advanced topics: → Reasoning & planning → Multimodal Agents → Coding agents, web agents → AI for mathematics and theorem proving → Agent safety & security, and more 🎥 LIVE every Monday @ 4:10PM PT ✨ Whether you're a student, researcher, developer, practitioner, or AI enthusiast, join us on this exciting journey of shaping the future of LLM Agents!
6
50
3,622
Will future SWE agents be computer-use agents? We explore this shift in Programming with Pixels: an agent environment where agents learn to use an IDE's existing functionality rather than relying on hand-designed tool APIs programmingwithpixels.com/
What if AI agents did software engineering like humans—seeing the screen & using any developer tool? Introducing Programming with Pixels: an SWE environment where agents control VSCode via screen perception, typing & clicking to tackle diverse tasks. programmingwithpixels.com 🧵
1
3
48
4,549
Interested in reasoning, scientific discovery, and/or the intersection of NLP & mathematics? Vote for 𝐌𝐚𝐭𝐡𝐍𝐋𝐏: 𝟏𝐬𝐭 𝐖𝐨𝐫𝐤𝐬𝐡𝐨𝐩 𝐨𝐧 𝐌𝐚𝐭𝐡𝐞𝐦𝐚𝐭𝐢𝐜𝐚𝐥 𝐋𝐚𝐧𝐠𝐮𝐚𝐠𝐞 𝐏𝐫𝐨𝐜𝐞𝐬𝐬𝐢𝐧𝐠 to appear at a 2022 *CL conference! docs.google.com/forms/d/e/1F…
1
8
46
I had a great time talking with @seb_ruder about his work on transfer learning - check out Episode 3 of the Thesis Review below!
Episode 3 of The Thesis Review: Sebastian Ruder (@seb_ruder), "Neural Transfer Learning for Natural Language Processing" We discuss transfer learning, including cross-lingual learning & sequential transfer learning, and advice for researchers cs.nyu.edu/~welleck/episode3…
8
44
New multi-domain NaturalProofs for theorem proving in natural mathematical language: Statements+proofs from broad (ProofWiki), in-depth (Stacks), real-world (textbook) sources New retrieval baselines and generation task arxiv.org/abs/2104.01112 github.com/wellecks/naturalp… (1/8)
3
14
42
Pleased to see our tutorial featured on the Institute for Foundations of Machine Learning (IFML, @MLFoundations) webpage! ifml.institute/events/neurip… - Tutorial: cmu-l3.github.io/neurips2024… Tuesday December 10, 1:30-4:00pm @ West Exhibition Hall C, NeurIPS
Curious about inference-time scaling, the #1 trending topic in LLMs? Come to our NeurIPS tutorial: Beyond Decoding: Meta-Generation Algorithms for LLMs (Tue. @ 1:30)! cmu-l3.github.io/neurips2024…
9
44
6,706
new project: "The Thesis Review Podcast" I'll interview researchers from around the field, focusing on their PhD thesis work and how their research and perspective has evolved since. Follow along at @thesisreview, I hope you enjoy the conversations!
New podcast🎙️ The Thesis Review brings you interviews with machine learning researchers, with each conversation centered around their PhD thesis. We've got a great set of guests, from newly minted PhD's to senior researchers!
1
8
43
our paper on Consistency of a Recurrent LM was accepted to #emnlp2020! (w/ @uralik1, Jaedeok Kim, @yzpang97, @kchonyc ) Stay tuned for the updated version ♾
new paper "Consistency of a Recurrent Language Model With Respect to Incomplete Decoding" arxiv.org/pdf/2002.02492.pdf we show that common decoding algorithms can yield infinite-length, zero-probability strings from neural LMs♾ w/@uralik1, Jaedeok Kim, Richard Pang, @kchonyc (1/6)
44
MAUVE -- an automatic evaluation metric for open-ended generation -- will appear at NeurIPS as an oral presentation! Check out our new paper, code, and the summary below 👇 arxiv.org/abs/2102.01454 w/ @KrishnaPillutla, @swabhz, @rown, @jwthickstun, @YejinChoinka, Zaid Harchaoui
How can we measure the gap between machine text and human text? We introduce MAUVE, a new comparison measure for open-ended text generation, in our upcoming oral presentation at NeurIPS 2021. Paper: arxiv.org/abs/2102.01454 Pip package: github.com/krishnap25/mauve 1/n
1
4
42
check out Quark, our new [un]learning algorithm for adjusting & aligning language models!
Quark: Controllable Text Generation with Reinforced Unlearning abs: arxiv.org/abs/2205.13636 introduce Quantized Reward Konditioning (Quark), an algorithm for optimizing a reward function that quantifies an (un)wanted property, while not straying too far from the original model
6
42
"Stolen Probability: A Structural Weakness of Neural Language Models" arxiv.org/pdf/2005.02433.pdf Embedding norms influence token probabilities due to pre-softmax dot product Tokens inside the convex hull of the token embeddings receive smaller probabilities by David Demeter et al
1
6
42
"How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks" #ICLR by @KeyuluXu et al. Proves MLPs quickly converge to linear functions outside of training data range Can extrapolate when the task is linear & training data is diverse openreview.net/forum?id=UH-c…
6
41
"TLDR: Token Loss Dynamic Reweighting for Reducing Repetitive Utterance Generation" Investigates the effect of hard vs. easy tokens on repetition with a variant of the focal loss by @Shaojie_Jiang, @Thom_Wolf , @c_monz @mdr arxiv.org/pdf/2003.11963.pdf
2
7
39
Replying to @srush_nlp @gneubig
We also covered this in neural code generation: cmu-codegen.github.io/s2024/…
1
6
38
5,580
We'll also have a NeurIPS 2024 tutorial based on the survey! Stay tuned for more details 👀
The Information reports that OpenAI's new "strawberry" product will be in ~2 weeks, using 10-20 seconds of inference time compute: theinformation.com/articles/… If you want to study up on methods for inference time compute, our survey could be useful! arxiv.org/abs/2406.16838
5
39
8,848
"An Empirical Study of Generation Order for Machine Translation" (arxiv.org/pdf/1910.13437.pdf) Nice paper studying effects of varying the generation order used to train an Insertion Transformer by William Chan, Mitchell Stern, Jamie Kiros, Jakob Uszkoreit
13
38
Check out our NeurIPS workshop on AI for math & reasoning! "Math-AI : Toward Human-Level Mathematical Reasoning" mathai2022.github.io/ excited to co-organize with @lupantech @Swarooprm7 @Yuhu_ai_ @HannaHajishirzi @percyliang
🚨We are organizing the 2nd MATHAI workshop at NeurIPS! Check it out if you're interested in AI for math, and machine reasoning in general🤯! We have a great lineup of speakers & panelists! See more in call for papers: 👇 mathai2022.github.io/cfp/
13
37
We're back! New Thesis Review episode with @niloofar_mire on privacy and LLMs
Episode 47 of The Thesis Review: Niloofar Mireshghallah (@niloofar_mire), "Auditing and Mitigating Safety Risks in Large Language Models" We discuss her journey into research, PhD work on privacy in LLMs, and memorization vs generalization. soundcloud.com/thesis-review…
1
2
35
2,754
NeurIPS tutorial on "Imitation Learning and its Application to Natural Language Generation" by @haldaume3 and @kchonyc slideslive.com/38921527/imit…
12
37
our paper on generalization in symbolic mathematics was accepted to #AAAI2022!
new paper: "Symbolic Brittleness in Sequence Models: on Systematic Generalization in Symbolic Mathematics" Sequence models show amazing performance on many tasks. Does perfect test accuracy tell the full story? w/ @PeterWestTM, @JizeCao, @YejinChoinka arxiv.org/abs/2109.13986
10
36
nice papers on set generation/modeling at the ICML Object-Oriented Learning workshop Conditional Set Generation with Transformers slideslive.com/38930876/cond… arxiv.org/pdf/2006.16841.pdf Generative Adversarial Set Transformers slideslive.com/38930872/gene… github.com/oolworkshop/oolwo…
1
6
34
Thanks to @adjiboussodieng for the great conversation about her work on deep probabilistic models -- listen below!
Episode 13 of The Thesis Review: Adji Bousso Dieng (@adjiboussodieng), "Deep Probabilistic Graphical Modeling" We discuss models and algorithms for deep PGMs, interpretability & applications, and having an impact through research. soundcloud.com/thesis-review…
4
33