Jacob Austin · Feb 4, 2025 · 6:30 PM UTC

Jacob Austin

Pinned Tweet

4 Feb 2025

Making LLMs run efficiently can feel scary, but scaling isn’t magic, it’s math! We wanted to demystify the “systems view” of LLMs and wrote a little textbook called “How To Scale Your Model” which we’re releasing today. 1/n

392

1,917

465,856

Jacob Austin · Aug 18, 2025 · 2:19 PM UTC

Jacob Austin @jacobaustin132

18 Aug 2025

Today we're putting out an update to the JAX TPU book, this time on GPUs. How do GPUs work, especially compared to TPUs? How are they networked? And how does this affect LLM training? 1/n

518

3,435

404,555

Jacob Austin · May 31, 2023 · 6:14 PM UTC

Jacob Austin @jacobaustin132

31 May 2023

Super super happy to be able to talk about DIDACT, the first code LLM trained to model real software developers editing code, fixing builds, and doing code review end-to-end. Developers don't write code in one go and neither should our models! 1/n

200

1,102

305,537

Jacob Austin · Feb 16, 2025 · 4:35 PM UTC

Jacob Austin @jacobaustin132

16 Feb 2025

Glad to see my ex-colleagues at x.ai have been hard at work making "TruthGPT" unbiased

Alexander Doria

@Dorialexander

16 Feb 2025

Painful to see: the kind of brute alignment that can fry latent space. Even DeepSeek CCP-friendly approach is relatively mild by comparison, mostly deflating sensitive questions.

946

97,010

Jacob Austin · Jan 21, 2024 · 12:23 AM UTC

Jacob Austin @jacobaustin132

21 Jan 2024

We've finally put out a detailed IEEE/ACM paper on @Google's multi-year effort to ease the burden of code review with ML. Google engineers now resolve 7.5% of all code review comments with an ML-suggested edit. But the path to that number has been a fun ML and UX journey!

134

740

135,667

Jacob Austin · May 23, 2023 · 7:44 PM UTC

Jacob Austin @jacobaustin132

23 May 2023

Excited to see a blog post on one of the coolest projects I've worked on at Google: using LLMs to automatically resolve code-review comments for Google engineers! 1/n

508

108,942

Jacob Austin · May 14, 2024 · 9:52 PM UTC

Jacob Austin @jacobaustin132

14 May 2024

This is something I've worked on for a while! You can save the activations of one LLM call and reuse them for a follow-up that overlaps with the first. This means asking a question about a big codebase can take 30 seconds the first time and 1s after that!

Jaana Dogan ヤナドガン

@rakyll

14 May 2024

Gemini’s context caching is one of the most exciting releases that came out it of Google I/O. ai.google.dev/gemini-api/doc…

435

104,434

Jacob Austin · Aug 4, 2023 · 2:42 AM UTC

Jacob Austin @jacobaustin132

4 Aug 2023

the hardest thing about being an AI researcher is having to smell homeless people every morning while munching a tartine croissant outside your $4k house on the way to work

235

108,892

Jacob Austin · Aug 18, 2021 · 1:25 AM UTC

Jacob Austin @jacobaustin132

18 Aug 2021

Our new paper! We study how well large language models (244M-137B parameters) can write code, collaborate with humans via dialog (exciting!) and understand/execute the code they write (they don't/can't). TLDR: exciting tech with lots of limitations and room for future work.

This tweet is unavailable

194

Jacob Austin · Apr 15, 2022 · 8:24 PM UTC

Jacob Austin @jacobaustin132

15 Apr 2022

Replying to @jacobandreas @_jasonwei

We found that code models get better when you prompt them with "I'm an expert Python programmer". The new Anthropic paper did something similar, prefixing the model's response with "I’ve tested this function myself so I know that it’s correct:"

206

Jacob Austin · Feb 22, 2024 · 2:25 AM UTC

Jacob Austin @jacobaustin132

22 Feb 2024

Replying to @jxmnop

Every Google model in recent memory has had a 256k vocab size

175

32,597

Jacob Austin · Dec 6, 2021 · 5:23 PM UTC

Jacob Austin @jacobaustin132

6 Dec 2021

Happy to share our work on discrete denoising diffusion models (D3PMs) @NeurIPSConf 2021: arxiv.org/pdf/2107.03006.pdf. D3PMs are diffusion models for discrete data like text or (quantized) images, and they’re flexible! A thread (with code!) 1/n

169

Jacob Austin · Jun 12, 2024 · 9:11 PM UTC

Jacob Austin @jacobaustin132

12 Jun 2024

This may be the most magical new developer tool we've made at Google. Nothing since code completion has felt so seamless to use: devs paste code constantly, and Smart Paste instantly fixes all the little issues: syntax errors, misnamed variables, indentation, and more 1/2

Google AI

@GoogleAI

12 Jun 2024

Code development often involves frequent copy & pasting of code that must be adjusted for the surrounding context. Here we describe Smart Paste, an internal tool that streamlines the code authoring workflow by automating adjustments to pasted code. More at goo.gle/4elzb3S

145

20,203

Jacob Austin · Jul 27, 2022 · 1:57 AM UTC

Jacob Austin @jacobaustin132

27 Jul 2022

Read about our recent work on ML-powered code completion models trained on the @Google codebase. A small but specialized LM trained on extremely high-quality data and backed by static analysis beats much larger models in production.

👩‍💻 Paige Bailey

@DynamicWebPaige

27 Jul 2022

Learn more about how code completion is transforming the developer experience of internal @Google engineers! 👩‍💻 We measured an acceptance rate of 25-34% on >3% of production code, while reducing the coding iteration time by 6% (equating to hundreds of years of SWE hours saved).

134

Jacob Austin · Aug 18, 2025 · 2:20 PM UTC

Jacob Austin @jacobaustin132

18 Aug 2025

Replying to @jacobaustin132 @reinerpope @apaszke

You can read the new chapter here: jax-ml.github.io/scaling-boo… n/n

133

11,138

Jacob Austin · Feb 4, 2025 · 6:30 PM UTC

Jacob Austin @jacobaustin132

4 Feb 2025

The secret is to think in terms of basic system resources — compute, memory, and bandwidth — and calculate which one limits our performance. From this we can estimate the cost, runtime, and optimal parallelism strategy for any given LLM: jax-ml.github.io/scaling-boo… 2/n

119

11,939

Jacob Austin · Mar 15, 2023 · 4:16 PM UTC

Jacob Austin @jacobaustin132

15 Mar 2023

GPT-4 makes big gains on coding (e.g. 48% -> 67% on HumanEval) but it's still a long way from 100% pass@1, not to mention writing a 1000-line program from scratch. GPT-4 shows that scale won't solve everything. Models need to write and debug code iteratively, like humans do

103

26,248

Jacob Austin · Mar 21, 2024 · 3:59 PM UTC

Jacob Austin @jacobaustin132

21 Mar 2024

Gemini 1.5 Pro is widely available now. Long context is great but it's also just a great model, better than GPT-4 on most of our metrics. And it's free!

Jeff Dean

@JeffDean

21 Mar 2024

We're starting to roll out API support for Gemini 1.5 Pro for developers. We're excited to see what you build with the 1M token context window! We'll be onboarding people to the API slowly at first, and then we'll ramp it up. In the meantime, developers can try out Gemini 1.5 Pro in the AI Studio UI right now: aistudio.google.com

102

18,634

Jacob Austin · May 31, 2023 · 6:14 PM UTC

Jacob Austin @jacobaustin132

31 May 2023

Full details are in our blog post here: ai.googleblog.com/2023/05/la…. This was the culmination of years of work from @dtarlow2, Petros Maniatis, and a bunch of colleagues across Google. Please take a look!

Large sequence models for software development activities

Posted by Petros Maniatis and Daniel Tarlow, Research Scientists, Google Software isn’t created in one dramatic step. It improves bit by bit, one l...

research.google

100

7,182

Jacob Austin · Aug 25, 2025 · 6:39 PM UTC

Jacob Austin @jacobaustin132

25 Aug 2025

100%, Rishabh has written some of my favorite papers in the RL universe, and done so in a period where publishing in industry was challenging!

rohan anil

@_arohan_

25 Aug 2025

Rishabh is an amazing researcher. His algorithms underpin post training at Gemini. I got to work together at meta for a short while and was truly impressed. Whichever group got Rishabh is so lucky to have him!

15,567

Jacob Austin · May 5, 2024 · 10:03 PM UTC

Jacob Austin @jacobaustin132

5 May 2024

I won’t be at ICLR this year, but it’s the 200th anniversary of the premier of Beethoven’s 9th in Vienna and you should go! The Wiener Philharmonic and many other symphonies have concerts! wienerphilharmoniker.at/en/k…

71,307

Jacob Austin · Aug 18, 2025 · 2:19 PM UTC

Jacob Austin @jacobaustin132

18 Aug 2025

At the chip level, GPUs and TPUs look a lot alike; they both have a SIMD vector unit, a matmul accelerator, and a similar memory hierarchy. But where a TPU has 1-2 big cores per chip, GPUs have hundreds of little ones. This make them more flexible but also most costly! 2/n

14,199

Jacob Austin · May 17, 2024 · 7:02 PM UTC

Jacob Austin @jacobaustin132

17 May 2024

The Blueshift team has done awesome work pushing Hendryck's MATH above 90%. MATH isn't the hardest dataset in the world but it's surprisingly tricky: some problems take me 5-10 minutes to solve. Getting an LLM to solve more than 90% feels meaningful. Try one yourself!

Behnam Neyshabur

@bneyshabur

17 May 2024

I'm excited about this! Our team has been working really hard to improve Gemini 1.5 capabilities significantly on multiple fronts and in particular MATH/STEM! Please see the report here: goo.gle/GeminiV1-5

11,398

Jacob Austin · Apr 21, 2023 · 4:15 PM UTC

Jacob Austin @jacobaustin132

21 Apr 2023

Very proud to launch coding for Bard! The model is actually pretty good, try it out!

Thomas Kurian

@ThomasOrTK

21 Apr 2023

New capabilities in Bard will help programmers and software developers with code generation, debugging and code explanation. It’s an exciting next step in how generative AI can accelerate innovation across industries. blog.google/technology/ai/co…

18,880

Jacob Austin · Jul 17, 2022 · 5:20 AM UTC

Jacob Austin @jacobaustin132

17 Jul 2022

Replying to @RichardMCNgo

I find many of these questions exhausting. I don't want to psychoanalyze what about me surprises people to a stranger at 3AM after a few beers. Ask me 1:1 when it's appropriate.

Jacob Austin · Aug 18, 2025 · 2:19 PM UTC

Jacob Austin @jacobaustin132

18 Aug 2025

This is still a draft, so please leave comments or questions if you have any. HT to my coauthors @reinerpope, @apaszke, and Swapnil Patil and so many GPU experts who helped me understand GPUs better 6/n

10,309

Jacob Austin · Aug 18, 2025 · 2:19 PM UTC

Jacob Austin @jacobaustin132

18 Aug 2025

From the LLM standpoint, the same parallelism strategies work (FSDP, TP, PP, EP) but have different phase transitions. GPU collective cost changes dramatically beyond the node level (~8 GPUs), and pipelining becomes important much sooner 4/n

11,707

Jacob Austin · May 14, 2024 · 6:01 PM UTC

Jacob Austin @jacobaustin132

14 May 2024

One thing I'm proud of is how Google's gen media team has prioritized building tools for artists rather than text-to-X tools. GenAI can either replace or augment people, let's do the latter!

Google DeepMind

@GoogleDeepMind

14 May 2024

We put our cutting-edge video generation model Veo in the hands of filmmaker @DonaldGlover and his creative studio, Gilga. Let’s take a look. ↓ #GoogleIO

Filmmaking with Donald Glover and his creative studio, Gilga | Veo

11,892

Jacob Austin · Aug 18, 2025 · 2:19 PM UTC

Jacob Austin @jacobaustin132

18 Aug 2025

Like before, we have lots of good practice problems. How many CUDA cores does a B200 have? At what point is a matmul compute bound? How long should an AllGather take? What is the optimal sharding for LLaMA-3 or DeepSeek v3? 5/n

10,271

Jacob Austin · Aug 18, 2025 · 2:19 PM UTC

Jacob Austin @jacobaustin132

18 Aug 2025

The networking story is similar. GPUs aim for flexibility, using a hierarchy of switches to send data from any GPU to any other in only a few hops. This is great as a user but requires lots of expensive switches to scale up 3/n

11,383

Jacob Austin · Feb 4, 2025 · 6:30 PM UTC

Jacob Austin @jacobaustin132

4 Feb 2025

A big chunk of this book is dedicated to understanding the hardware that provides those system resources. We emphasize TPUs in this book, but the principles and math can be adapted to GPUs too. Part 2 explains the TPU in detail: jax-ml.github.io/scaling-boo… 3/n

9,165

Jacob Austin · May 14, 2024 · 9:57 PM UTC

Jacob Austin @jacobaustin132

14 May 2024

FWIW I think this is how you make long-context economical. Long queries aren't all unique, they typically share the same source documents. Low latency, low cost full repo completion can reuse the same KV caches

3,422

Jacob Austin · Feb 4, 2025 · 6:30 PM UTC

Jacob Austin @jacobaustin132

4 Feb 2025

We want this to be a living book, so please ask questions and give us feedback. We'll continue adding to it as time goes on. Without further ado, here’s a link to the beginning: jax-ml.github.io/scaling-boo… 11/11

4,621

Jacob Austin · Feb 4, 2025 · 6:30 PM UTC

Jacob Austin @jacobaustin132

4 Feb 2025

5 years ago, there were many ML architectures, but today, there is (mostly) only one. _You should know the Transformer inside and out!_ How many FLOPs or params in LLaMA-3? How expensive is attention vs. a feed-forward block? You'll know after reading jax-ml.github.io/scaling-boo… 5/n

5,976

Jacob Austin · Feb 4, 2025 · 6:30 PM UTC

Jacob Austin @jacobaustin132

4 Feb 2025

Now for the good stuff! You may have heard of data or tensor parallelism, FSDP or pipelining. But why choose one over the other? Short answer: each adds communication, and the one with the lowest cost depends on the model. Part 5 dives into this: jax-ml.github.io/scaling-boo… 6/n

6,693

Jacob Austin · Feb 19, 2025 · 6:17 PM UTC

Jacob Austin @jacobaustin132

19 Feb 2025

Some awesome stuff here about LLM scaling (esp. on GPUs). Their LLAMA sharding/memory diagram is great. Glad to see it becoming easier to understand scaling in the open

Leandro von Werra

@lvwerra

19 Feb 2025

The Ultra-Scale Playbook: Training LLMs on GPU Clusters Learn how to train your own DeepSeek-V3 model using 5D parallelism, ZeRO, fast kernels, compute/comm overlap and bottlenecks with theory, interactive plots and 4000+ scaling experiments and audio! huggingface.co/spaces/nanotr…

4,920

Jacob Austin · Feb 4, 2025 · 6:30 PM UTC

Jacob Austin @jacobaustin132

4 Feb 2025

Scaling an LLM involves distributing — a.k.a. "sharding" — its weights across multiple TPUs. To run it, we have to add cross-chip communication. Part 3 describes the TPU's communication primitives, and simple rules for multiplying sharded matrices: jax-ml.github.io/scaling-boo… 4/n

6,650

Jacob Austin · Feb 4, 2025 · 6:30 PM UTC

Jacob Austin @jacobaustin132

4 Feb 2025

This book was co-written with @_sholtodouglas, @charliexychen, @pchoy95, @albertwebson, @vinayramasesh, @froystig, @anselmlevskaya, @sharadvikram, and Fede Lebron, building on prior ideas by @reinerp and @jekbradbury. 10/n

6,839

Jacob Austin · Apr 29, 2023 · 11:09 PM UTC

Jacob Austin @jacobaustin132

29 Apr 2023

Please note that the doctors’ responses come from…Reddit

Mark Dredze @mdredze

28 Apr 2023

New study! We compared ChatGPT responses to people's medical questions with those of doctors. Healthcare professionals preferred ChatGPT 79% of the time; as more empathetic and higher quality. I'm excited to figure out how to use LLMs to help doctors! jamanetwork.com/journals/jam…

10,045

Jacob Austin · Mar 5, 2024 · 2:55 PM UTC

Jacob Austin @jacobaustin132

5 Mar 2024

Most LLM evals are leaked. A decent heuristic is to ignore reported numbers on evals over a year old

10,225

Jacob Austin · Jul 27, 2025 · 7:43 PM UTC

Jacob Austin @jacobaustin132

27 Jul 2025

I just stumbled across this awesome book, which covers a lot of the nitty gritty details of GPU hardware, SLURM, cloud providers, and LLM training/serving. Probably the most practical guide to the infrastructure of LLM scaling I've seen

Stas Bekman

@StasBekman

7 Jul 2025

Got a chance to measure Maximum Achievable Matmul TFLOPS on NVIDIA B200. With each new NVIDIA generation the efficiency keeps on dropping: A100: 86.9% H100: 80.3% B200: 77.6% The updated table is here: github.com/stas00/ml-enginee…

4,085

Jacob Austin · May 10, 2023 · 5:29 PM UTC

Jacob Austin @jacobaustin132

10 May 2023

PaLM 2 is really good. Like surprisingly good. And it’s exciting to see it rolling out across a wide array of Google products

👩‍💻 Paige Bailey

@DynamicWebPaige

10 May 2023

*cracks knuckles* and thus, we begin the "🌴PaLM v2" drinking game (but with coffee, tea, or your favorite caffeinated beverage of choice, as it's early! 😉) #GoogleIO2023 #GoogleIO

8,286

Jacob Austin · May 31, 2023 · 6:14 PM UTC

Jacob Austin @jacobaustin132

31 May 2023

Codex-style LLMs are trained on static code snapshots (GitHub files at HEAD) without history or context from the developer's environment (like their IDE or build system). We're throwing away all the data of how the software was built, and why! 2/n

6,355

Jacob Austin · Oct 14, 2022 · 7:44 PM UTC

Jacob Austin @jacobaustin132

14 Oct 2022

UL2 is a new training objective with big implications for LLM training. UL2 combines the span corruption objective that gives T5 its exceptional finetuning ability with causal and prefix-LM objectives which let UL2-trained LLMs outperform purely-causal LMs on few-shot tasks

Google AI

@GoogleAI

14 Oct 2022

Introducing UL2, a novel language pre-training paradigm that improves performance of language models across datasets and setups by using a mixture of training objectives, each with different configurations. Read more and grab model checkpoints at goo.gle/3euHrEo

ALT An overview of the objectives mixed together in the UL2 framework.

Jacob Austin · May 23, 2023 · 7:44 PM UTC

Jacob Austin @jacobaustin132

23 May 2023

Full details can be found here: ai.googleblog.com/2023/05/re…. Huge thanks to Peter Choy, Alex Frömmgen, @lerakharatyan, and a ton of amazing collaborators across Google!

Resolving code review comments with ML

Posted by Alexander Frömmgen, Staff Software Engineer, and Lera Kharatyan, Senior Software Engineer, Core Systems & Experiences Code-change rev...

research.google

4,681

Jacob Austin · May 31, 2023 · 6:14 PM UTC

Jacob Austin @jacobaustin132

31 May 2023

There's so much hype around "LLMs as agents" and when building LLMs for software, i think that's exactly the right approach. Our LLMs can build software like humans, iteratively and using developer tools, and be immediately useful for real developers! 5/n

5,069

Jacob Austin · May 31, 2023 · 6:14 PM UTC

Jacob Austin @jacobaustin132

31 May 2023

Google developers work in a monorepo and build errors, test failures, code review comments, and resulting edits are all tracked. DIDACT models are trained on this data to build software iteratively *based on the history of a dev's work so far!* 3/n

5,061

Jacob Austin · Jul 4, 2023 · 5:54 PM UTC

Jacob Austin @jacobaustin132

4 Jul 2023

Replying to @EigenGender

This is absolutely not true. They could test the explosive design, the subcritical assembly, the gun design. They could detonate the explosives and watch fast X-ray data. And then they had the trinity test

2,561

Jacob Austin · May 31, 2023 · 6:14 PM UTC

Jacob Austin @jacobaustin132

31 May 2023

DIDACT powers a ton of cool dev tools, like our recently announced ML-powered code review tool and a bunch of others, like a tool to fix build errors, predict code review comments, and do GitHub Copilot-style completion conditioned on _your_ development history! 4/n

4,831

Jacob Austin · Apr 8, 2023 · 6:07 PM UTC

Jacob Austin @jacobaustin132

8 Apr 2023

Replying to @andrew_n_carr

CUDA and the collective decades spent installing drivers

2,849

Jacob Austin · May 23, 2023 · 7:44 PM UTC

Jacob Austin @jacobaustin132

23 May 2023

Code LLMs are everywhere, but making them useful to real developers is hard. We trained an LLM on data from _real_ Google developers: fixing builds, performing code review, and editing files, then deploy it within the code-review UI! 2/n

6,807

Jacob Austin · May 25, 2022 · 3:02 AM UTC

Jacob Austin @jacobaustin132

25 May 2022

Replying to @denny_zhou

If true, this highlights one of the complexities of the half-open OpenAI/GPT-3 ecosystem. I'm a fan of the API, but it's v hard to know what DaVinci-002 is, whether it had a given eval set in its training data, etc.

Jacob Austin · Apr 19, 2024 · 3:00 PM UTC

Jacob Austin @jacobaustin132

19 Apr 2024

Penzai is one of the coolest ML libraries out there. Not only can you inspect every weight matrix and attention head in a Colab, you can trivially knock out heads, skip or repeat layers, or extract intermediates with a one line change. A beautiful tool for interpretability.

Daniel Johnson @_ddjohnson

19 Apr 2024

Excited to share Penzai, a JAX research toolkit from @GoogleDeepMind for building, editing, and visualizing neural networks! Penzai makes it easy to see model internals and lets you inject custom logic anywhere. Check it out on GitHub: github.com/google-deepmind/p…

6,582

Jacob Austin · Feb 4, 2025 · 6:30 PM UTC

Jacob Austin @jacobaustin132

4 Feb 2025

Now that we’ve talked about training, we need to talk about serving. How expensive should a model be to serve? What kind of latency can we expect? What are prefill and generation? How do we build an efficient inference service? We talk about this here: jax-ml.github.io/scaling-boo… 7/n

4,493

Jacob Austin · Jul 24, 2022 · 9:07 PM UTC

Jacob Austin @jacobaustin132

24 Jul 2022

Hiking in the shadow of the eastern Sierras, it feels like another world. What a high.

ALT The Milky Way near Lone Pine, CA.

ALT The view from Lake Maysen.

Jacob Austin · Feb 4, 2025 · 6:30 PM UTC

Jacob Austin @jacobaustin132

4 Feb 2025

LLM systems programming is super fun! It's hard to do good ML research without it these days, and you don't need much compute to work on it. I hope this book will make it easier for more people (esp. academics) to work on this stuff 9/n

3,903

Jacob Austin · Apr 24, 2024 · 2:59 AM UTC

Jacob Austin @jacobaustin132

24 Apr 2024

More work from Google on AI for SWE, here automatically fixing build errors! The cool thing about fixing builds is you can check if the build succeeds before showing the user the fix. Results in a measurable shortening of code submission time too!

Vaibhav Tulsyan

@xennygrimmato_

23 Apr 2024

Excited to share a new blog on ML-based repair for build errors at Google! We found that automatically repairing build errors in the IDE increases productivity as measured by overall task completion with no detectable negative impact on code safety!

5,892

Jacob Austin · Feb 4, 2025 · 6:30 PM UTC

Jacob Austin @jacobaustin132

4 Feb 2025

The rest of the book is a set of practical guides: how to write and profile parallel JAX code, and how to apply the previous two sections to real models like LLaMA-3. We also have worked problems at the end of each section if you like homework: jax-ml.github.io/scaling-boo… 8/n

4,133

Jacob Austin · Aug 8, 2024 · 9:29 PM UTC

Jacob Austin @jacobaustin132

8 Aug 2024

I genuinely love this work, a dedicated team spent years building the first human-level table tennis bot and wrote a thoughtful and deeply principled paper about both its strengths and weaknesses. Good research!

Laura Graesser @lgraesser3

8 Aug 2024

I have been dreaming of this moment for a very long time. Our robot got good enough to play games with humans and win, whilst also being fun to play with. I am so so happy this is out & very grateful I got to work on this with so many wonderful & talented people. 👇 for details

3,718

Jacob Austin · Feb 19, 2025 · 5:56 PM UTC

Jacob Austin @jacobaustin132

19 Feb 2025

A hot take is that LLMs are bad at writing because many of the people writing SFT data are bad at writing. Tech has never cared about writing skills...

Jack Morris

@jxmnop

19 Feb 2025

a controversial opinion i hold deeply is that AI is not superhuman at writing (and isn't close) there are 10x and 100x human writers. here's a random excerpt from David Foster Wallace, widely agreed to be one of the greatest modern writers if you sincerely think anything like this could be written by DeepSeek or Claude, you need to read more

4,960

Jacob Austin · Feb 7, 2023 · 7:41 PM UTC

Jacob Austin @jacobaustin132

7 Feb 2023

Google is in the game! A lot of hard work is going into building an exciting, helpful, and responsible new generation of LLM-based tools at Google

Sundar Pichai

@sundarpichai

6 Feb 2023

1/ In 2021, we shared next-gen language + conversation capabilities powered by our Language Model for Dialogue Applications (LaMDA). Coming soon: Bard, a new experimental conversational #GoogleAI service powered by LaMDA. blog.google/technology/ai/ba…

7,244

Jacob Austin · Jan 21, 2024 · 1:03 AM UTC

Jacob Austin @jacobaustin132

21 Jan 2024

A couple lessons from this: * IDE wars are coming. Collecting data in the same dev environment you deploy in is a huge advantage. * LLMs make great demos but it's hard to trust them at complex tasks. Reviewing code is harder than writing it. High-precision, low-recall is OK!

2,278

Jacob Austin · Feb 7, 2023 · 7:27 PM UTC

Jacob Austin @jacobaustin132

7 Feb 2023

Happy to share our work on multilingual evals for code LLMs, led by @GOrlanski. We open-source BabelCode, a framework for running execution-based coding evals across >10 languages (including Rust and Julia) and study the effect of language balancing on low-resource languages 1/2

Gabe Orlanski

@GOrlanski

7 Feb 2023

📢Measuring The Impact Of Programming Language Distribution We present the BabelCode framework for multi-lingual code evaluation and an investigation into the impact of PL distributions in training data. Paper: arxiv.org/abs/2302.01973 Code: github.com/google-research/b… 🧵

7,397

Jacob Austin · Feb 5, 2025 · 5:58 PM UTC

Jacob Austin @jacobaustin132

5 Feb 2025

Sholto wrote the first version of this book and most of its big ideas are his (or come from @reinerpope or @jekbradbury). Working on new tech like this is fun because you see ideas evolve from crazy research topics to everyday engineering principles, in just a few years

Sholto Douglas

@_sholtodouglas

4 Feb 2025

A distillation of our mental models that we use to think about the systems perspective on training and inference at scale. The most important takeaway - you should be able to describe everything about your model with simple equations, and deeply understand how long it should take.

2,408

Jacob Austin · Feb 9, 2025 · 5:29 PM UTC

Jacob Austin @jacobaustin132

9 Feb 2025

Sad to see the real evil of tech's "my hour of reading about this qualifies me to make decisions" mindset. It's a fun exercise until you go off and smugly destroy the arts & sciences

3,076

Jacob Austin · Aug 18, 2025 · 4:02 PM UTC

Jacob Austin @jacobaustin132

18 Aug 2025

Replying to @cHHillee

Wow, according to docs.nvidia.com/dgx-superpod… it goes up to `18 * 400 * 4 / 8 = 3.6TB/s`.

7,497

Jacob Austin · May 23, 2023 · 7:44 PM UTC

Jacob Austin @jacobaustin132

23 May 2023

A huge amount of credit goes to the UX team for helping us make model edits understandable, so developers can audit the code that's being changed. Model calibration also becomes surprisingly – building developer trust by only showing highly confident predictions

5,277

Jacob Austin · Jul 21, 2023 · 8:03 PM UTC

Jacob Austin @jacobaustin132

21 Jul 2023

i found Oppenheimer, like most of Christopher Nolan’s movies, lacking in emotional resonance. Nolan seems to make films about concepts that interest him (time, space, a biography he just read), without worrying about their relevance to the present moment

10,901

Jacob Austin · Apr 5, 2023 · 8:41 PM UTC

Jacob Austin @jacobaustin132

5 Apr 2023

Replying to @_jasonwei

Cost is an important drawback: generalist models will always be outperformed by smaller task-specific models when cost and latency are factored in, except for tasks only the largest models can do. With that said, distillation is likely to play a role

4,120

Jacob Austin · Apr 19, 2024 · 4:36 PM UTC

Jacob Austin @jacobaustin132

19 Apr 2024

2290 tons of CO2 is a lot, but it's also roughly...38 flights from NYC to London on a 737. More CO2 was probably emitted by Meta employees flying back and forth during model development

Sasha Luccioni, PhD 🦋🌎✨🤗@SashaMTL

18 Apr 2024

So LLaMa 3's carbon footprint is... huge? 🤯 They estimate it to be 2,290 tons of CO2eq, compared to 550t for training GPT-3 and 66t for training *all* of the BLOOM models (1B-176B) 🌬️

3,978

Jacob Austin · Dec 7, 2022 · 6:42 PM UTC

Jacob Austin @jacobaustin132

7 Dec 2022

Please consider joining the Blueshift Team! They're wonderful people doing amazing work on reasoning, AI for science, and more

Behnam Neyshabur

@bneyshabur

7 Dec 2022

Interested in Reasoning with Large Language Models? We are hiring! Internship: forms.gle/fZzFhsy5yVH6R97m9 Full-Time Research Scientist: forms.gle/9NB5LaCHjQgXR1wb9 Full-Time Research Engineer: forms.gle/rCRnh5Q1nWmoAKcU7 Learn more about Blueshift Team: research.google/teams/bluesh…

Jacob Austin · Dec 5, 2022 · 5:54 AM UTC

Jacob Austin @jacobaustin132

5 Dec 2022

Returning from NeurIPS, I flew an hour the wrong way to Fort Worth, and then missed my flight to NYC. Now I get to experience the cozy embrace of this hard airport floor

Jacob Austin · Apr 6, 2023 · 2:43 AM UTC

Jacob Austin @jacobaustin132

6 Apr 2023

the people I trust most are loudly and persistently expressing doubt about their beliefs and actions

3,678

Jacob Austin · Oct 25, 2022 · 4:43 AM UTC

Jacob Austin @jacobaustin132

25 Oct 2022

Replying to @amasad

The next generation of code LLMs will exhaust the code available at GitHub HEAD. The amount of diff data is several orders of magnitude larger

Jacob Austin · Aug 25, 2025 · 7:20 PM UTC

Jacob Austin @jacobaustin132

25 Aug 2025

To shill my favorite: arxiv.org/abs/2108.13264 is awesome! Rishabh and folks reran a bunch of RL algorithms from papers and found that many cherry-picked the reported performance for their papers. Beautifully simple result, well executed.

Deep Reinforcement Learning at the Edge of the Statistical Precipice

Deep reinforcement learning (RL) algorithms are predominantly evaluated by comparing their relative performance on a large suite of tasks. Most published results on deep RL benchmarks compare...

arxiv.org

5,767

Jacob Austin · Dec 6, 2023 · 3:49 PM UTC

Jacob Austin @jacobaustin132

6 Dec 2023

Gemini is here and it’s actually pretty decent!

Demis Hassabis

@demishassabis

6 Dec 2023

The Gemini era is here. Thrilled to launch Gemini 1.0, our most capable & general AI model. Built to be natively multimodal, it can understand many types of info. Efficient & flexible, it comes in 3 sizes each best-in-class & optimized for different uses blog.google/technology/ai/go…

2,139

Jacob Austin · Feb 20, 2025 · 4:26 PM UTC

Jacob Austin @jacobaustin132

20 Feb 2025

Replying to @polynoamial

Inference cost feels like a bit of a scary x-axis, because it's so dependent on engineering time spent optimizing serving. Absolute "intelligence" feels more important, in the sense of actually being able to solve meaningful problems regardless of the cost

2,263

Jacob Austin · Jan 21, 2024 · 12:23 AM UTC

Jacob Austin @jacobaustin132

21 Jan 2024

You can find the paper here: research.google/pubs/resolvi…. I think it's an awesome case study in applied LLM deployment. Huge shoutout to Peter Choy, Alex Frömmgen, @lerakharatyan, @gssurita, Kevin Villela, @dtarlow2, Maxim Tabachnyk, really too many people to list!

Resolving Code Review Comments with Machine Learning

research.google

2,628

Jacob Austin · Aug 26, 2025 · 5:46 PM UTC

Jacob Austin @jacobaustin132

26 Aug 2025

Enjoyed this post from my friend Sarah about the current bottlenecks in LLM research velocity. While it's fun to think about a singularity involving LLM AI researchers, LLM research is bottlenecked by expensive experiments and insidious bugs, not more intelligence.

Sarah Catanzaro

@sarahcat21

26 Aug 2025

1/ Some pundits are predicting that the AI bubble will burst. I doubt it. But more ideas or compute won't unlock an "intelligence explosion." The biggest bottleneck AI research faces is the pace and quality of experimentation.

2,909

Jacob Austin · May 31, 2023 · 6:57 PM UTC

Jacob Austin @jacobaustin132

31 May 2023

Replying to @xeophon

Yes, we have a DSL that decomposes the process of writing a PR into actions like "<run build [target]>" or "<make edit [location] [diff]>". The goal is to represent any action a developer could take as a small, local change, instead of making the LLM somehow output a big file

4,451

Jacob Austin · Mar 27, 2025 · 2:38 AM UTC

Jacob Austin @jacobaustin132

27 Mar 2025

Most exciting news of the year so far!

Sheel Mohnot

@pitdesi

25 Mar 2025

Dishoom NYC 2026

1,624

Jacob Austin · Mar 15, 2023 · 10:15 PM UTC

Jacob Austin @jacobaustin132

15 Mar 2023

To be clear, I don't mean the "scale won't solve everything" line as a criticism of scaling. I just find it implausible that LLMs can solve arbitrary problems without decomposing them or adapting to feedback from an environment

1,103

Jacob Austin · Mar 21, 2023 · 7:42 PM UTC

Jacob Austin @jacobaustin132

21 Mar 2023

Bard is alive. Try it out!

Jeff Dean

@JeffDean

21 Mar 2023

Bard is now available in the US and UK, w/more countries to come. It’s great to see early @GoogleAI work reflected in it—advances in sequence learning, large neural nets, Transformers, responsible AI techniques, dialog systems & more. You can try it at bard.google.com

3,613

Jacob Austin · May 23, 2023 · 9:22 PM UTC

Jacob Austin @jacobaustin132

23 May 2023

Speaking from personal experience, the code completion feature in Colab is magical!

Colaboratory @GoogleColab

23 May 2023

Your new coding assistant is almost here! Check out these new Colab features: natural language to code generation, code completion, and an integrated chatbot. Read all about at blog.google/technology/devel… authored by @thechrisperry and @shresbm

2,943

Jacob Austin · Jun 23, 2024 · 1:57 AM UTC

Jacob Austin @jacobaustin132

23 Jun 2024

Replying to @nearcyan @arpitingle

this isn’t really true, Noam and Daniel intended from the beginning to “solve loneliness”

916

Jacob Austin · Dec 5, 2022 · 9:03 PM UTC

Jacob Austin @jacobaustin132

5 Dec 2022

Replying to @DrJimFan

Big +1 here. The model is implicitly trained on a mixture of p(answer | evidence) and p(answer), so it interpolates between memorizing and looking for answers in-context (see arxiv.org/abs/2205.05055)

Data Distributional Properties Drive Emergent In-Context Learning...

Large transformer-based models are able to perform in-context few-shot learning, without being explicitly trained for it. This observation raises the question: what aspects of the training regime...

arxiv.org

Jacob Austin · Jul 28, 2022 · 6:06 PM UTC

Jacob Austin @jacobaustin132

28 Jul 2022

Replying to @lauralondon_ @moultano

Desalination plants can't prevent flooding when sea-levels rise several meters due to Antarctic ice sheets melting. Burying power lines will reduce wildfire frequency at massive cost, but it won't stop them when rising temperatures lead to ever more arid conditions.

Jacob Austin · May 30, 2023 · 9:56 PM UTC

Jacob Austin @jacobaustin132

30 May 2023

it’s frightening walking around Williamsburg hearing tech grifters talk about their “AI for media” startups. it feels better to work upstream of that, on core tech, but it’s not obvious if my hands are cleaner

2,740

Jacob Austin · Jun 1, 2023 · 4:58 PM UTC

Jacob Austin @jacobaustin132

1 Jun 2023

Another aspect of this work to note: it (partly) solves the "specification" problem of program synthesis: how do we tell the computer what code we want it to write? TLDR: rather than tell a model what to do, let it learn from context what you'll want to do next. A thread 1/n

Danny Tarlow @dtarlow2

1 Jun 2023

Very happy to share our work on activating Google's software dev process as an engine for ML-powered dev tools. A multi-year effort from many across Alphabet. Special shout-out to @jacobaustin132 @blip42 @PManzagol @dancherp & Petros Maniatis. See Jacob's🧵& the blog for more.

2,162

Jacob Austin · Mar 23, 2023 · 6:17 PM UTC

Jacob Austin @jacobaustin132

23 Mar 2023

Replying to @natfriedman

Is this toolformer? Toolformer seems specifically about using prompting + log-likelihood based filtering to enable tool use. The idea of tool use in this form has been around for years

2,841

Jacob Austin · Jan 21, 2024 · 12:23 AM UTC

Jacob Austin @jacobaustin132

21 Jan 2024

We first talked about this project in mid-2022 in a @GoogleAI blog post (here's a thread at the time: nitter.app/jacobaustin132/status/…), but this paper talks in much more detail about the model and the design process we went through.

Jacob Austin @jacobaustin132

23 May 2023

Excited to see a blog post on one of the coolest projects I've worked on at Google: using LLMs to automatically resolve code-review comments for Google engineers! 1/n

3,755

Jacob Austin · Aug 21, 2023 · 6:30 PM UTC

Jacob Austin @jacobaustin132

21 Aug 2023

Replying to @_jasonwei

Character can make money without "getting something right". As you point out, exploiting loneliness/insecurity is lucrative. The fact that character.ai shamelessly monetizes a desire for connection (where OAI/Anthro refused) speaks badly, ironically, of their character

character.ai | AI Chat, Reimagined–Your Words. Your World.

Chat with millions of AI Characters on the #1 AI chat app. Where will your next adventure take you?

character.ai

7,961

Jacob Austin · Sep 19, 2024 · 5:08 PM UTC

Jacob Austin @jacobaustin132

19 Sep 2024

Awesome stuff! I continue to be hopeful for discrete diffusion done right. A short thread 1/n

Rupesh Srivastava @rupspace

18 Sep 2024

Interested in Discrete Diffusion? I've just released a Github repo where you can learn about and play with discrete diffusion algorithms with simple and performant "nano-style" implementations. (link below) I've started with the Absorbing D3PM from @jacobaustin132 and @_ddjohnson that performs much better than the original with some updated settings. More stuff coming! Star it and follow here for updates.

ALT animated text: [nano] Discrete Diffusion

1,708

Jacob Austin · Jun 12, 2024 · 9:11 PM UTC

Jacob Austin @jacobaustin132

12 Jun 2024

Smart Paste highlights the core UX challenge of AI for SWEs. The more context switching is required to verify a suggestion, the less useful it is. Tools like code completion and Smart Paste that make suggestions at the cursor and are instantly verifiable are the easiest to adopt

807

Jacob Austin · Aug 21, 2023 · 1:54 AM UTC

Jacob Austin @jacobaustin132

21 Aug 2023

Replying to @docmilanfar @jaschasd

Strongly agree, I still find this one of the clearest explanations of dynamical systems and stochastic processes, it's quite a joy to read

3,278

Jacob Austin · Feb 10, 2025 · 7:02 PM UTC

Jacob Austin @jacobaustin132

10 Feb 2025

I haven't read the report yet but recurrent depth computation is an alternative to chain-of-thought that I really love. Chain of thought relies on human reasoning traces, while arbitrary depth allows the model to learn latent reasoning via SGD

Jonas Geiping

@jonasgeiping

10 Feb 2025

Ok, so I can finally talk about this! We spent the last year (actually a bit longer) training an LLM with recurrent depth at scale. The model has an internal latent space in which it can adaptively spend more compute to think longer. I think the tech report ...🐦‍⬛

1,768

Jacob Austin · Aug 5, 2025 · 3:03 PM UTC

Jacob Austin @jacobaustin132

5 Aug 2025

Even Bear has LaTeX support before Google Docs

Bear - Markdown Notes @BearNotesApp

5 Aug 2025

#update Math formulas, now available in Bear.

2,505

Jacob Austin · Apr 6, 2023 · 7:25 PM UTC

Jacob Austin @jacobaustin132

6 Apr 2023

I loved people like Anthony Bourdain for this reason. You can see him grappling with both the beauty and horror of his life and his art I wish the AI world had more of this. We cannot know if what we make is good, no matter how well-intentioned we are

543

Jacob Austin · Dec 2, 2023 · 6:32 AM UTC

Jacob Austin @jacobaustin132

2 Dec 2023

To grad school applicants: the single best advice I got was that you’re generally admitted by a single faculty member who’ll bet on you, not by the department. Pick a few people and target your application to them

1,967

Jacob Austin · Nov 2, 2024 · 3:32 PM UTC

Jacob Austin @jacobaustin132

2 Nov 2024

Please vote y'all!

Jeff Dean

@JeffDean

2 Nov 2024

There are four days left to vote in the US election. I strongly encourage everyone who is eligible and hasn't already voted to make a plan to go and vote! 🗳️ Obviously there is the presidential election but there are lots of other important races and issues on the ballot around the country. Look at them all and offer your considered opinion by voting!

1,615