Aran Komatsuzaki · Feb 8, 2023 · 12:20 AM UTC

Aran Komatsuzaki

Aran Komatsuzaki

@arankomatsuzaki

8 Feb 2023

OpenAI did what used to be considered impossible. They made people want to use Bing.

195

1,541

16,642

1,112,000

Aran Komatsuzaki · Feb 1, 2025 · 8:01 PM UTC

Aran Komatsuzaki

@arankomatsuzaki

1 Feb 2025

750

7,503

528,788

Aran Komatsuzaki · Jan 30, 2025 · 5:59 PM UTC

Aran Komatsuzaki

@arankomatsuzaki

30 Jan 2025

The leap from o1 to o3 is exponential, completely bypassing o2. If this pattern holds, o3 won’t lead to o4—it’ll jump straight to o9.

349

259

6,729

371,400

Aran Komatsuzaki · May 14, 2025 · 2:58 PM UTC

Aran Komatsuzaki

@arankomatsuzaki

14 May 2025

Don't forget to check your DMs

3,133

243,839

Aran Komatsuzaki · Sep 9, 2025 · 3:19 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

9 Sep 2025

Google presents an AI system to write expert-level scientific software. Using LLMs + tree search, it invented novel methods in bioinformatics, epidemiology, geospatial analysis & more, often surpassing human SOTA. (1/4)

504

3,097

534,825

Aran Komatsuzaki · May 31, 2021 · 9:02 PM UTC

Aran Komatsuzaki

@arankomatsuzaki

31 May 2021

When you generate images with VQGAN + CLIP, the image quality dramatically improves if you add "unreal engine" to your prompt. People are now calling this "unreal engine trick" lol e.g. "the angel of air. unreal engine"

354

2,446

Aran Komatsuzaki · Mar 31, 2023 · 12:35 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

31 Mar 2023

BloombergGPT: A Large Language Model for Finance Presents BloombergGPT, a 50 billion parameter language model that is trained on a wide range of financial data. arxiv.org/abs/2303.17564

429

2,393

503,698

Aran Komatsuzaki · May 25, 2022 · 1:50 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

25 May 2022

Large Language Models are Zero-Shot Reasoners Simply adding “Let’s think step by step” before each answer increases the accuracy on MultiArith from 17.7% to 78.7% and GSM8K from 10.4% to 40.7% with GPT-3. arxiv.org/abs/2205.11916

534

2,407

Aran Komatsuzaki · Mar 21, 2021 · 8:26 PM UTC

Aran Komatsuzaki

@arankomatsuzaki

21 Mar 2021

We've released the weights (1.3B and 2.7B) of our replication of GPT-3 🥳 Using the updated Colab notebook in the repo you should be able to finetune the models on your own data as well as run inference. github.com/EleutherAI/gpt-ne…

GitHub - EleutherAI/gpt-neo: An implementation of model parallel GPT-2 and GPT-3-style models using...

An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library. - EleutherAI/gpt-neo

github.com

517

2,349

Aran Komatsuzaki · Sep 17, 2025 · 4:14 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

17 Sep 2025

Pattern of my 20s: “This idea’s great, but others are better positioned. I’m late and lack domain expertise. I’ll find something new.” → Later: someone even less qualified makes it work.

2,036

93,928

Aran Komatsuzaki · Nov 30, 2023 · 3:16 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

30 Nov 2023

Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation proj: humanaigc.github.io/animate-… abs: arxiv.org/abs/2311.17117

589

2,026

771,738

Aran Komatsuzaki · Feb 12, 2025 · 3:52 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

12 Feb 2025

OpenAI presents: Competitive Programming with Large Reasoning Models - Competed live at IOI 2024 - o3 achieved gold - General-purpose o3 surpasses o1 w/ hand-crafted pipelines specialized for coding resultss

197

1,869

624,612

Aran Komatsuzaki · Jan 31, 2025 · 3:23 PM UTC

Aran Komatsuzaki

@arankomatsuzaki

31 Jan 2025

1,802

84,851

Aran Komatsuzaki · Feb 18, 2025 · 3:53 PM UTC

Aran Komatsuzaki

@arankomatsuzaki

18 Feb 2025

If you think you have regrets, here are mine: - I turned down an early invitation from Noam to join CharacterAI—and similarly from Igor to join XAI. - I stumbled through a coding interview for OpenAI when Wojciech asked if I wanted to work on GPT-4. - I was once trying to train a large image diffusion model on the LAION dataset for Emad (Stability) before Rombach. I switched projects because I wasn't patient enough for compute being delivered. I also couldn’t join StabilityAI due to my student visa. - I was so absorbed in research that I was very late to startup world, which led to my current struggle because many AI startup ideas have already been taken. I also never took full advantage of my network (esp. Twitter) for my career. It's honestly a goldmine that is beyond what I can handle. I may be good at spotting promising research ideas early, but when it comes to making career decisions—I’m pretty terrible at that lol

1,673

220,385

Aran Komatsuzaki · Apr 5, 2024 · 1:57 PM UTC

Aran Komatsuzaki

@arankomatsuzaki

5 Apr 2024

Google presents Transformer 2 - Unifies attention, recurrence, retrieval, FFN into a single module - Performs on par with Transformer w/ 20x better compute efficiency - Efficiently processes 100M context length proj: tinyurl.com/59upc7v6 abs: tinyurl.com/3nw25nz2

Rick Astley - Never Gonna Give You Up (Official Video) (4K Remaster)

The official video for “Never Gonna Give You Up” by Rick Astley. ...

youtube.com

248

1,558

274,797

Aran Komatsuzaki · Feb 18, 2022 · 1:45 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

18 Feb 2022

Gradients without Backpropagation Presents a method to compute gradients based solely on the directional derivative that one can compute exactly and efficiently via the forward mode, entirely eliminating the need for backpropagation in gradient descent. arxiv.org/abs/2202.08587

211

1,442

Aran Komatsuzaki · Feb 14, 2023 · 3:34 PM UTC

Aran Komatsuzaki

@arankomatsuzaki

14 Feb 2023

1,314

138,025

Aran Komatsuzaki · Feb 10, 2023 · 1:44 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

10 Feb 2023

Toolformer: Language Models Can Teach Themselves to Use Tools Presents Toolformer, a model trained to decide which APIs to call, when to call them, what arguments to pass, and how to best incorporate the results into future token prediction. abs: arxiv.org/abs/2302.04761

230

1,331

206,938

Aran Komatsuzaki · Feb 13, 2025 · 3:06 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

13 Feb 2025

Apple presents: Distillation Scaling Laws Presents a distillation scaling law that estimates distilled model performance based on a compute budget and its allocation between the student and teacher

199

1,327

134,781

Aran Komatsuzaki · Jul 6, 2023 · 1:28 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

6 Jul 2023

LongNet: Scaling Transformers to 1,000,000,000 Tokens Presents LONGNET, a Transformer variant that can scale sequence length to more than 1 billion tokens, without sacrificing the performance on shorter sequences abs: arxiv.org/abs/2307.02486 repo: github.com/microsoft/torchsc…

262

1,268

752,797

Aran Komatsuzaki · Jun 19, 2024 · 1:41 PM UTC

Aran Komatsuzaki

@arankomatsuzaki

19 Jun 2024

On behalf of arXiv CV dataset and evaluation committee, I'd like to announce that we will ask authors to discontinue the use of the Lena Forsén image. Instead, we encourage the use of the image of Frieren eating a gigantic hamburger. Thank you for your understanding.

242

1,267

138,215

Aran Komatsuzaki · Feb 3, 2025 · 3:07 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

3 Feb 2025

Stanford presents: s1: Simple test-time scaling - Seeks the simplest approach to achieve test-time scaling and strong reasoning performance - Exceeds o1-preview on competition math questions by up to 27% (MATH and AIME24) - Model, data, and code are open-source

157

1,280

135,633

Aran Komatsuzaki · May 5, 2023 · 2:27 PM UTC

Aran Komatsuzaki

@arankomatsuzaki

5 May 2023

Millenials vs. Gen Z

143

1,219

162,775

Aran Komatsuzaki · Jun 29, 2022 · 6:37 PM UTC

Aran Komatsuzaki

@arankomatsuzaki

29 Jun 2022

Google just released AGI github.com/google/agi

GitHub - google/agi: Android GPU Inspector

Android GPU Inspector. Contribute to google/agi development by creating an account on GitHub.

github.com

117

1,215

Aran Komatsuzaki · May 26, 2023 · 1:33 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

26 May 2023

The False Promise of Imitating Proprietary LLMs Open-sourced LLMs are adept at mimicking ChatGPT’s style but not its factuality. There exists a substantial capabilities gap, which requires better base LM. arxiv.org/abs/2305.15717

245

1,241

728,110

Aran Komatsuzaki · Apr 23, 2025 · 5:58 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

23 Apr 2025

Nvidia just opensourced Describe Anything! It can generate detailed descriptions for user-specified regions in images and videos, marked by points, boxes, scribbles, or masks

145

1,211

76,564

Aran Komatsuzaki · Jun 24, 2024 · 2:08 PM UTC

Aran Komatsuzaki

@arankomatsuzaki

24 Jun 2024

GPT series still hasn't shown any signs of saturation 😲

1,174

119,590

Aran Komatsuzaki · Feb 3, 2023 · 1:36 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

3 Feb 2023

Dreamix: Video Diffusion Models are General Video Editors proj: dreamix-video-editing.github… abs: arxiv.org/abs/2302.01329

226

1,143

207,610

Aran Komatsuzaki · Apr 11, 2024 · 1:15 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

11 Apr 2024

Google presents Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention 1B model that was fine-tuned on up to 5K sequence length passkey instances solves the 1M length problem arxiv.org/abs/2404.07143

239

1,131

205,677

Aran Komatsuzaki · Jun 11, 2020 · 7:28 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

11 Jun 2020

GPT-3: Money is All You Need

112

1,088

Aran Komatsuzaki · Jun 9, 2021 · 2:06 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

9 Jun 2021

Ben and I have released GPT-J, 6B JAX-based Transformer LM 🥳 - Performs on par with 6.7B GPT-3 - Performs better and decodes faster than GPT-Neo - repo + colab + free web demo article: bit.ly/2TH8yl0 repo: bit.ly/3eszQ6C

232

1,043

Aran Komatsuzaki · Apr 3, 2023 · 12:31 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

3 Apr 2023

A Survey of Large Language Models arxiv.org/abs/2303.18223

207

993

194,387

Aran Komatsuzaki · Jan 27, 2023 · 1:29 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

27 Jan 2023

MusicLM: Generating Music From Text Presents MusicLM, a model for generating high-fidelity music from text. MusicLM generates music at 24 kHz that remains consistent over several minutes. proj: google-research.github.io/se… abs: arxiv.org/abs/2301.11325 data: kaggle.com/datasets/googleai…

214

963

163,084

Aran Komatsuzaki · Jan 9, 2025 · 3:28 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

9 Jan 2025

Agent Laboratory: Using LLM Agents as Research Assistants Enables you to focus on ideation and critical thinking while automating repetitive and time-intensive tasks like coding and documentation

197

985

88,981

Aran Komatsuzaki · Oct 18, 2022 · 1:00 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

18 Oct 2022

Imagic: Text-Based Real Image Editing with Diffusion Models Demonstrates, for the very first time, the ability to apply complex (e.g., non-rigid) text-guided semantic edits to a single real image using Imagen. arxiv.org/abs/2210.09276

179

969

Aran Komatsuzaki · Jul 18, 2023 · 1:29 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

18 Jul 2023

Retentive Network: A Successor to Transformer for Large Language Models Proposes RetNet as a foundation architecture for LLMs, simultaneously achieving training parallelism, low-cost inference, and good performance. arxiv.org/abs/2307.08621

208

950

199,868

Aran Komatsuzaki · May 1, 2023 · 12:36 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

1 May 2023

Are Emergent Abilities of Large Language Models a Mirage? Presents an alternative explanation for emergent abilities: one can choose a metric which leads to the inference of an emergent ability or another metric which does not. arxiv.org/abs/2304.15004

176

915

626,767

Aran Komatsuzaki · Nov 7, 2023 · 2:40 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

7 Nov 2023

CogVLM: Visual Expert for Pretrained Language Models Presents CogVLM, a powerful open-source visual language foundation model that achieves SotA perf on 10 classic cross-modal benchmarks repo: github.com/THUDM/CogVLM abs: arxiv.org/abs/2311.03079

163

909

263,246

Aran Komatsuzaki · Mar 12, 2024 · 2:23 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

12 Mar 2024

Google presents: Stealing Part of a Production Language Model - Extracts the projection matrix of OpenAI’s ada and babbage LMs for <$20 - Confirms that their hidden dim is 1024 and 2048, respectively - Also recovers the exact hidden dim size of gpt-3.5-turbo arxiv.org/abs/2403.06634

140

922

244,153

Aran Komatsuzaki · Apr 30, 2025 · 12:35 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

30 Apr 2025

Why are some people still using 4o when we have o3?

403

911

252,742

Aran Komatsuzaki · Mar 15, 2024 · 1:18 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

15 Mar 2024

Apple presents MM1, a family of multimodal LLMs up to 30B parameters, that are SoTA in pre-training metrics and perform competitively after fine-tuning arxiv.org/abs/2403.09611

176

910

222,109

Aran Komatsuzaki · Jan 29, 2025 · 3:42 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

29 Jan 2025

Microsoft presents: Optimizing Large Language Model Training Using FP4 Quantization - Presents the first FP4 training framework for LLMs - Achieves accuracy comparable to BF16 with minimal degradation - Scales effectively to 13B LLMs trained on 100B tokens

121

920

220,850

Aran Komatsuzaki · Oct 3, 2020 · 2:35 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

3 Oct 2020

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale When pre-trained and transferred to CV tasks, Vision Transformer attains excellent results compared to SOTA CNNs while requiring much fewer computational resources to train. openreview.net/forum?id=Yicb…

197

916

Aran Komatsuzaki · Jan 29, 2025 · 3:35 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

29 Jan 2025

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Shows that: - RL generalizes in rule-based envs, esp. when trained with an outcome-based reward - SFT tends to memorize the training data and struggles to generalize OOD

143

920

76,349

Aran Komatsuzaki · Oct 4, 2023 · 12:57 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

4 Oct 2023

Think before you speak: Training Language Models With Pause Tokens - Performing training and inference on LMs with a learnable pause token appended to the input prefix - Gains on 8 tasks, e,g, +18% on SQuAD arxiv.org/abs/2310.02226

165

890

369,951

Aran Komatsuzaki · Apr 15, 2023 · 1:57 PM UTC

Aran Komatsuzaki

@arankomatsuzaki

15 Apr 2023

Yann LeCun

@ylecun

14 Apr 2023

Before we can get to "God-like AI" we'll need to get through "Dog-like AI".

855

123,363

Aran Komatsuzaki · Apr 25, 2023 · 1:08 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

25 Apr 2023

Track Anything: Segment Anything Meets Videos repo: github.com/gaomingqi/Track-A… abs: arxiv.org/abs/2304.11968

160

862

125,698

Aran Komatsuzaki · May 1, 2024 · 1:49 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

1 May 2024

Meta presents Better & Faster Large Language Models via Multi-token Prediction - training language models to predict multiple future tokens at once results in higher sample efficiency - up to 3x faster at inference arxiv.org/abs/2404.19737

125

866

182,602

Aran Komatsuzaki · Aug 19, 2020 · 1:00 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

19 Aug 2020

Life of a paper: 1. Appears on arXiv 1.001. @ak92501 and I tweet 1.002. @lucidrains makes a repo 2. The author tweets 3. Appears on ML subreddit 4. @hardmaru tweets 5. @ykilcher makes a video Aleph-0. Rejected by reviewers for "lack of novelty" 0. Conceived by Jurgen in 90s

862

Aran Komatsuzaki · Feb 4, 2025 · 5:48 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

4 Feb 2025

NVIDIA and CMU presents ASAP, which enables highly agile motions that were previously difficult to achieve! @Cristiano Siuuuuuuu!

116

842

104,040

Aran Komatsuzaki · Feb 6, 2023 · 6:39 PM UTC

Aran Komatsuzaki

@arankomatsuzaki

6 Feb 2023

Actually, gradient descent can be seen as attention that applies beyond the model's context length! Let me explain why 🧵 👇 (1/N) Ref: arxiv.org/abs/2202.05798 arxiv.org/abs/2212.10559

131

831

171,113

Aran Komatsuzaki · Oct 20, 2020 · 12:00 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

20 Oct 2020

My first blog post was released 🥳 I have aggregated ~20 notable recent ML papers, esp. from ICLR 2021, with summaries, visualizations and my comments! The development in each field is summarized, and the future trends are speculated. arankomatsuzaki.wordpress.co…

Some Notable Recent ML Papers and Future Trends

I have aggregated some of the notable papers released recently, esp. ICLR 2021 submissions, with concise summaries, visualizations and my comments. The development in each field is summarized, and …

arankomatsuzaki.wordpress.com

161

832

Aran Komatsuzaki · Dec 23, 2024 · 5:02 PM UTC

Aran Komatsuzaki

@arankomatsuzaki

23 Dec 2024

Damn, the AI is hitting a wall again.

817

39,679

Aran Komatsuzaki · May 31, 2025 · 2:39 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

31 May 2025

Anyone interested in working on an open-source project for Alpha Evolve with us?

205

822

181,336

Aran Komatsuzaki · Feb 12, 2025 · 3:52 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

12 Feb 2025

o3 achieved 99.8th percentile on Codeforces

773

454,474

Aran Komatsuzaki · Apr 24, 2023 · 12:43 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

24 Apr 2023

Scaling Transformer to 1M tokens and beyond with RMT By leveraging the Recurrent Memory Transformer architecture, they have successfully increased the model’s effective context length to an unprecedented two million tokens. arxiv.org/abs/2304.11062

162

780

209,642

Aran Komatsuzaki · Apr 23, 2024 · 2:17 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

23 Apr 2024

Microsoft just released Phi-3 - phi-3-mini: 3.8B model trained on 3.3T tokens rivals Mixtral 8x7B and GPT-3.5 - phi-3-medium: 14B model trained on 4.8T tokens w/ 78% on MMLU and 8.9 on MT-bench arxiv.org/abs/2404.14219

135

761

339,321

Aran Komatsuzaki · Sep 4, 2023 · 12:57 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

4 Sep 2023

RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback Shows that RLAIF can produce comparable improvements to RLHF without depending on human annotators arxiv.org/abs/2309.00267

164

761

235,314

Aran Komatsuzaki · Jan 10, 2024 · 3:06 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

10 Jan 2024

ByteDance presents MagicVideo-V2 Outperforms SotA video models such as Pika 1.0, SVD-XT according to human evaluation abs: arxiv.org/abs/2401.04468 proj: magicvideov2.github.io/

155

746

228,118

Aran Komatsuzaki · May 30, 2023 · 1:42 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

30 May 2023

Fine-Tuning Language Models with Just Forward Passes Proposes a memory-efficient zeroth-order optimizer, MeZO, adapting the classical ZO-SGD to operate inplace, thereby fine-tuning LMs with the same memory footprint as inference. - A single A100 80GB GPU, MeZO can train a 30-billion parameter mode - MeZO significantly outperforms in-context learning and linear probing - MeZO achieves comparable performance to fine-tuning with backprop across multiple tasks, with up to 12x memory reduction - MeZO can effectively optimize non-differentiable objectives (e.g., maximizing accuracy or F1) repo: github.com/princeton-nlp/MeZ… abs: arxiv.org/abs/2305.17333

163

735

139,339

Aran Komatsuzaki · Oct 26, 2023 · 1:12 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

26 Oct 2023

Detecting Pretraining Data from Large Language Models We propose Min-K% Prob, a simple and effective method that can detect whether if a LLM was pretrained on the provided text without knowing the pretraining data. proj: swj0419.github.io/detect-pre… abs: arxiv.org/abs/2310.16789

156

733

77,982

Aran Komatsuzaki · Jan 3, 2024 · 2:11 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

3 Jan 2024

LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning With only four lines of code modification, the proposed method can effortlessly extend existing LLMs’ context window without any fine-tuning. arxiv.org/abs/2401.01325

141

737

78,990

Aran Komatsuzaki · May 25, 2022 · 2:19 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

25 May 2022

Here's the leaderboard of prompts to add to GPT-3. Can you guys come up with anything better?

Aran Komatsuzaki

@arankomatsuzaki

25 May 2022

122

733

Aran Komatsuzaki · May 25, 2023 · 1:13 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

25 May 2023

Gorilla: Large Language Model Connected with Massive APIs Releases Gorilla, a finetuned LLaMA-based model that surpasses the performance of GPT-4 on writing API calls. proj: gorilla.cs.berkeley.edu/ abs: arxiv.org/abs/2305.15334

179

714

255,655

Aran Komatsuzaki · Jan 30, 2023 · 11:07 PM UTC

Aran Komatsuzaki

@arankomatsuzaki

30 Jan 2023

I spend about 30 minutes roughly every weekday for skimming arXiv papers and tweeting them as bookmarks for myself. Now I've got 30k of brilliant minds following me. I may not be the most popular ML account, but this seems like a great return-to-investment ratio.

716

140,728

Aran Komatsuzaki · Mar 28, 2023 · 1:11 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

28 Mar 2023

ChatGPT Outperforms Crowd-Workers for Text-Annotation Tasks ChatGPT outperforms crowd-workers for several annotation tasks, including relevance, stance, topics, and frames detection, w/ 20x less cost. arxiv.org/abs/2303.15056

141

708

254,055

Aran Komatsuzaki · Dec 25, 2024 · 4:04 PM UTC

Aran Komatsuzaki

@arankomatsuzaki

25 Dec 2024

Deepseek-V3-Base was just opensourced! - 685B MoE w/ 256 experts topk=8 with sigmoid routing - Outperforms Sonnet 3.5 on Aider benchmark huggingface.co/deepseek-ai/D…

123

698

94,036

Aran Komatsuzaki · May 1, 2024 · 2:32 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

1 May 2024

KAN: Kolmogorov–Arnold Networks Proposes an alternative to MLP that outperforms in terms of accuracy and interpretability arxiv.org/abs/2404.19756

134

686

99,050

Aran Komatsuzaki · Apr 1, 2024 · 1:48 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

1 Apr 2024

The Jamba paper was just dropped arxiv.org/abs/2403.19887

126

691

108,375

Aran Komatsuzaki · Nov 18, 2022 · 1:34 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

18 Nov 2022

InstructPix2Pix: Learning to Follow Image Editing Instructions Proposes a method for editing images from human instructions, in the forward pass w/o requiring per-example fine-tuning or inversion, in a matter of seconds. timothybrooks.com/instruct-p… arxiv.org/abs/2211.09800

124

673

Aran Komatsuzaki · Dec 25, 2020 · 2:17 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

25 Dec 2020

Solving Mixed Integer Programs Using Neural Networks The first learning-based method to substantially outperform SCIP (a mixed interger program solver) on various large-scale real-world application datasets. arxiv.org/pdf/2012.13349

144

658

Aran Komatsuzaki · Apr 12, 2024 · 2:21 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

12 Apr 2024

Google presents Best Practices and Lessons Learned on Synthetic Data for Language Models Provides an overview of synthetic data research, discussing its applications, challenges, and future directions arxiv.org/abs/2404.07503

132

683

158,129

Aran Komatsuzaki · Jul 1, 2024 · 1:53 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

1 Jul 2024

Scaling Synthetic Data Creation with 1,000,000,000 Personas - Presents a collection of 1B diverse personas automatically curated from web data - Massive gains on MATH: 49.6 ->64.9 repo: github.com/tencent-ailab/per… abs: arxiv.org/abs/2406.20094

112

674

142,578

Aran Komatsuzaki · Jun 1, 2021 · 6:46 PM UTC

Aran Komatsuzaki

@arankomatsuzaki

1 Jun 2021

StyleGAN + CLIP "Satoshi Nakamoto"

669

Aran Komatsuzaki · Mar 23, 2023 · 12:45 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

23 Mar 2023

Sparks of Artificial General Intelligence: Early experiments with GPT-4 Reports on their investigation of an early version of GPT-4, when it was still in active development by OpenAI. arxiv.org/abs/2303.12712

114

662

302,290

Aran Komatsuzaki · Jul 7, 2023 · 12:51 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

7 Jul 2023

Lost in the Middle: How Language Models Use Long Contexts Finds that performance of LMs is often highest when relevant info occurs at the beginning or end of the input context, and significantly degrades otherwise arxiv.org/abs/2307.03172

123

669

146,422

Aran Komatsuzaki · Jan 25, 2024 · 1:40 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

25 Jan 2024

MambaByte: Token-free Selective State Space Model Outperforms SotA subword Transformers while being tokenizer agnostic and achieving fast inference thanks to linear inference cost arxiv.org/abs/2401.13660

120

636

141,396

Aran Komatsuzaki · Mar 20, 2023 · 12:32 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

20 Mar 2023

CoLT5: Faster Long-Range Transformers with Conditional Computation Achieves: - stronger performance than LongT5 with much faster training and inference - SOTA on the SCROLLS benchmark - strong gains up to 64k input length arxiv.org/abs/2303.09752

108

636

210,199

Aran Komatsuzaki · Jun 5, 2024 · 2:17 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

5 Jun 2024

Google presents To Believe or Not to Believe Your LLM arxiv.org/abs/2406.02543

129

644

76,011

Aran Komatsuzaki · Jan 31, 2023 · 2:04 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

31 Jan 2023

Extracting Training Data from Diffusion Models Extracts over a thousand training examples from SotA models (e.g. Stable Diffusion), ranging from photographs of individual people to trademarked company logos. arxiv.org/abs/2301.13188

127

631

162,719

Aran Komatsuzaki · Feb 11, 2025 · 4:17 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

11 Feb 2025

Google presents: Matryoshka Quantization Presents a novel multi-scale quantization technique that allows training and maintaining just one model, which can then be served at different precision levels

107

637

54,877

Aran Komatsuzaki · Jan 9, 2025 · 3:11 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

9 Jan 2025

SynthLabs + Stanford presents: Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought Proposes Meta Meta-CoT, which extends CoT by explicitly modeling the underlying reasoning required to arrive at a particular CoT

134

641

82,207

Aran Komatsuzaki · May 29, 2023 · 12:57 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

29 May 2023

Large Language Models as Tool Makers Attempts to remove the dependency on external tools by proposing a closed-loop framework, where LLMs create their own reusable tools for problem-solving. arxiv.org/abs/2305.17126

117

624

99,547

Aran Komatsuzaki · Sep 5, 2025 · 4:36 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

5 Sep 2025

RL’s Razor: On-policy RL forgets less than SFT. Even at matched accuracy, RL shows less catastrophic forgetting Key factor: RL’s on-policy updates bias toward KL-minimal solutions Theory + LLM & toy experiments confirm RL stays closer to base model

628

111,919

Aran Komatsuzaki · May 9, 2024 · 5:08 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

9 May 2024

Microsoft presents You Only Cache Once: Decoder-Decoder Architectures for Language Models Substantially reduces GPU memory demands, yet retains global attention capability repo: github.com/microsoft/unilm/t… abs: arxiv.org/abs/2405.05254

131

622

68,487

Aran Komatsuzaki · Aug 1, 2023 · 1:30 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

1 Aug 2023

ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs ToolLLaMA exhibits comparable performance to ChatGPT repo: github.com/OpenBMB/ToolBench abs: arxiv.org/abs/2307.16789

149

607

88,949

Aran Komatsuzaki · Sep 17, 2025 · 3:55 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

17 Sep 2025

Big day for AI agents! Tongyi Lab (@Ali_TongyiLab) just dropped half a dozen new papers, most focused on Deep Research agents. I’ll walk you through the highlights in this thread. (1/N)

613

68,626

Aran Komatsuzaki · May 10, 2024 · 9:12 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

10 May 2024

Google presents Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations? Highlights the risk in introducing new factual knowledge through fine-tuning, which leads to hallucinations arxiv.org/abs/2405.05904

118

610

92,781

Aran Komatsuzaki · Feb 6, 2025 · 4:12 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

6 Feb 2025

LIMO: Less is More for Reasoning Achieves 57.1% on AIME and 94.8% on MATH w/ only 817 training samples, i.e., only 1% of the training data required by previous approaches

601

118,718

Aran Komatsuzaki · Oct 30, 2023 · 2:13 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

30 Oct 2023

FP8-LM: Training FP8 Large Language Models Trains GPT-175B with H100s 64% faster than BF16 without any performance degradation repo: github.com/Azure/MS-AMP abs: arxiv.org/abs/2310.18313

129

591

93,363

Aran Komatsuzaki · Jun 16, 2023 · 2:14 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

16 Jun 2023

Exploring the MIT Mathematics and EECS Curriculum Using Large Language Models Presents a comprehensive dataset of 4,550 questions and solutions from all MIT EECS courses required for obtaining a degree arxiv.org/abs/2306.08997

113

564

1,872,891

Aran Komatsuzaki · Jul 31, 2023 · 12:44 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

31 Jul 2023

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback Overviews techniques to understand, improve, and complement RLHF in practice arxiv.org/abs/2307.15217

140

591

80,198

Aran Komatsuzaki · May 18, 2021 · 4:01 PM UTC

Aran Komatsuzaki

@arankomatsuzaki

18 May 2021

More GPUs are All I Need

593

Aran Komatsuzaki · Apr 4, 2024 · 4:34 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

4 Apr 2024

Google presents Mixture-of-Depths Dynamically allocating compute in transformer-based language models Same performance w/ a fraction of the FLOPs per forward pass arxiv.org/abs/2404.02258

587

216,609

Aran Komatsuzaki · Nov 30, 2022 · 4:37 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

30 Nov 2022

RGB no more: Minimally-decoded JPEG Vision Transformers Achieves up to 39.2% faster training and 17.9% faster inference with no accuracy loss compared to the RGB counterpart. arxiv.org/abs/2211.16421

587

Aran Komatsuzaki · May 12, 2021 · 12:48 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

12 May 2021

Diffusion Models Beat GANs on Image Synthesis Achieves 3.85 FID on ImageNet 512×512 and matches BigGAN-deep even with as few as 25 forward passes per sample, all while maintaining better coverage of the distribution. arxiv.org/abs/2105.05233

106

581

Aran Komatsuzaki · Aug 8, 2023 · 1:12 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

8 Aug 2023

AgentBench: Evaluating LLMs as Agents Presents a multi-dimensional evolving benchmark that currently consists of 8 distinct environments to assess LLM as Agent's reasoning and decision-making abilities in a multi-turn open-ended generation setting. repo: github.com/THUDM/AgentBench abs: arxiv.org/abs/2308.03688

138

591

115,018

Aran Komatsuzaki · Sep 19, 2025 · 3:00 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

19 Sep 2025

Apple presents AToken: A unified visual tokenizer • First tokenizer unifying images, videos & 3D • Shared 4D latent space (preserves both reconstruction & semantics) • Strong across gen & understanding tasks (ImageNet 82.2%, MSRVTT 32.6%, 3D acc 90.9%)

589

105,284

Aran Komatsuzaki · Jan 3, 2023 · 1:38 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

3 Jan 2023

Muse: Text-To-Image Generation via Masked Generative Transformers Presents Muse, a text-to-image Transformer model that achieves SotA image generation perf while being far more efficient than diffusion or AR models. proj: muse-model.github.io/ abs: arxiv.org/abs/2301.00704

130

567

117,214

Aran Komatsuzaki · May 25, 2022 · 1:50 AM UTC

Aran Komatsuzaki

@arankomatsuzaki

25 May 2022

“Let’s think step by step” is all you need

568