David Dohan · Jul 22, 2022 · 11:35 PM UTC

David Dohan

Pinned Tweet

David Dohan

@dmdohan

22 Jul 2022

Happy to release our work on Language Model Cascades. Read on to learn how we can unify existing methods for interacting models (scratchpad/chain of thought, verifiers, tool-use, …) in the language of probabilistic programming. paper: arxiv.org/abs/2207.10342

673

David Dohan · Nov 6, 2022 · 8:05 PM UTC

David Dohan

@dmdohan

6 Nov 2022

“99% of Americans don’t talk about AI at parties. You can too if you try!”

ALT A poster that says “Not AI room” with a drawing of a crossed out robot, and suggestions for alternate topics.

186

2,303

David Dohan · Mar 1, 2023 · 2:13 AM UTC

David Dohan

@dmdohan

1 Mar 2023

New chapter: Happy to share that I recently joined @OpenAI! Thankful for many collaborators, friends, and mentors who made my 6 years of research @Google Brain special🧠 Excited to collaborate toward reliable reasoning & alignment in AI systems and products like #ChatGPT

1,034

184,901

David Dohan · Dec 20, 2024 · 6:16 PM UTC

David Dohan

@dmdohan

20 Dec 2024

o3 @ 87.5% on ARC-AGI It was 16 hours at an increase rate of 3.5% an hour to "solved"

David Dohan

@dmdohan

20 Dec 2024

At this rate, how long til ARC-AGI is “solved”? For context: - gpt-4o @ 5% - Sonnet3.5 @ 14% - o1-preview @ 18% - o1 @ 32% - best scaffolded solution @ 54%

905

1,116,614

David Dohan · Dec 20, 2024 · 6:36 PM UTC

David Dohan

@dmdohan

20 Dec 2024

imo the improvements on FrontierMath are even more impressive than ARG-AGI. Jump from 2% to 25% Terence Tao said the dataset should "resist AIs for several years at least" and "These are extremely challenging. I think that in the near term basically the only way to solve them, short of having a real domain expert in the area, is by a combination of a semi-expert like a graduate student in a related field, maybe paired with some combination of a modern AI and lots of other algebra packages…”

Nat McAleese

@__nmca__

20 Dec 2024

Replying to @__nmca__

Well, on FrontierMath 2024-11-26 o3 improves the state of the art from 2% to 25% accuracy. These are absurdly hard strongly held out math questions. And on ARC, the semi-private test set and public validation set scores are 87.5% (private) and 91.5% (public). (7/n)

877

153,425

David Dohan · Nov 20, 2023 · 1:25 PM UTC

David Dohan

@dmdohan

20 Nov 2023

🩶🫶 Ilya and Sam’s yin/yang was a major reason I joined OpenAI. It is still possible to repair what was shattered.

Ilya Sutskever

@ilyasut

20 Nov 2023

I deeply regret my participation in the board's actions. I never intended to harm OpenAI. I love everything we've built together and I will do everything I can to reunite the company.

734

168,432

David Dohan · Sep 12, 2024 · 5:33 PM UTC

David Dohan

@dmdohan

12 Sep 2024

It's important to emphasize that this is a huge leap /and/ we're still at the start Give o1-preview a try, we think you'll like it. And in a month, give o1 a try and see all the ways it has improved in such a short time And expect that to keep happening

629

233,413

David Dohan · Nov 20, 2023 · 10:29 AM UTC

David Dohan

@dmdohan

20 Nov 2023

OpenAI is nothing without its people

520

56,625

David Dohan · Dec 20, 2024 · 2:23 AM UTC

David Dohan

@dmdohan

20 Dec 2024

At this rate, how long til ARC-AGI is “solved”? For context: - gpt-4o @ 5% - Sonnet3.5 @ 14% - o1-preview @ 18% - o1 @ 32% - best scaffolded solution @ 54%

ARC Prize

@arcprize

19 Dec 2024

Verified o1 performance on ARC-AGI's Semi-Private Eval (100 tasks) o1, Low: 25% ($1.5/task) o1, Medium: 31% ($2.5/task) o1, High: 32% ($3.8/task)

403

235,497

David Dohan · Sep 12, 2024 · 5:10 PM UTC

David Dohan

@dmdohan

12 Sep 2024

🍓is ripe and is ready to think, fast and slow: check out OpenAI o1, trained to reason before answering I joined OpenAI to push boundaries of science & reasoning with AI. Happy to share this result of team's amazing collaboration does just that Try it on your hardest problems

OpenAI

@OpenAI

12 Sep 2024

We're releasing a preview of OpenAI o1—a new series of AI models designed to spend more time thinking before they respond. These models can reason through complex tasks and solve harder problems than previous models in science, coding, and math. openai.com/index/introducing…

359

36,380

David Dohan · Nov 19, 2023 · 5:59 AM UTC

David Dohan

@dmdohan

19 Nov 2023

🩶🫶

Sam Altman

@sama

19 Nov 2023

i love the openai team so much

313

57,982

David Dohan · Nov 24, 2023 · 9:51 PM UTC

David Dohan

@dmdohan

24 Nov 2023

language models are superhuman at predicting the next word try this yourself to see how hard it is rr-lm-game.herokuapp.com/

Jason Wei

@_jasonwei

24 Nov 2023

Like the International Math Olympiad or Spelling Bee, there should be a “language modeling competition” where humans compete to predict the next word in a sequence. The best humans would probably still lose to GPT-2, and we’d have more empathy for how hard it is to be an LLM :)

274

180,601

David Dohan · Feb 17, 2023 · 8:08 PM UTC

David Dohan

@dmdohan

17 Feb 2023

LM performance typically gets worse given irrelevant info. Simple prompting improves it: "Feel free to ignore irrelevant information given in the questions." Work led by @fredahshi, with @xinyun_chen_, @kanishkamisra, @nkscales_google, @edchi, Nathanael Schärli, & @denny_zhou

Aran Komatsuzaki

@arankomatsuzaki

2 Feb 2023

Large Language Models Can Be Easily Distracted by Irrelevant Context arxiv.org/abs/2302.00093

194

36,873

David Dohan · Dec 20, 2024 · 7:06 PM UTC

David Dohan

@dmdohan

20 Dec 2024

We are used to the cadence of big model releases: GPT2->3->4 took two years each time We’re in a different world now o1 was announced months ago, now already on next generation Expect faster improvement going forward: o1 is like gpt2 if we could jump to gpt4 ~immediately

188

48,468

David Dohan · Sep 12, 2024 · 5:37 PM UTC

David Dohan

@dmdohan

12 Sep 2024

Welp looks like the parrot is better at math than I am

188

11,670

David Dohan · Nov 22, 2023 · 6:05 AM UTC

David Dohan

@dmdohan

22 Nov 2023

We’re so back (to work)

OpenAI

@OpenAI

22 Nov 2023

We have reached an agreement in principle for Sam Altman to return to OpenAI as CEO with a new initial board of Bret Taylor (Chair), Larry Summers, and Adam D'Angelo. We are collaborating to figure out the details. Thank you so much for your patience through this.

181

31,870

David Dohan · Feb 26, 2023 · 12:51 AM UTC

David Dohan

@dmdohan

26 Feb 2023

183

29,950

David Dohan · Jul 25, 2023 · 12:17 AM UTC

David Dohan

@dmdohan

25 Jul 2023

At ICML & excited to talk with old and new friends Message me to chat. A few possible topics: - Model chains, agents, programs - Probabilistic programming - Simulation-based/likelihood-free inference - AI for science and reasoning - AI-first Human-Computer interfaces

173

29,652

David Dohan · Mar 14, 2023 · 5:26 PM UTC

David Dohan

@dmdohan

14 Mar 2023

155

11,809

David Dohan · Apr 9, 2023 · 6:30 PM UTC

David Dohan

@dmdohan

9 Apr 2023

The C Elegans of GPT

Andrej Karpathy

@karpathy

9 Apr 2023

This is a baby GPT with two tokens 0/1 and context length of 3, viewing it as a finite state markov chain. It was trained on the sequence "111101111011110" for 50 iterations. The parameters and the architecture of the Transformer modifies the probabilities on the arrows. E.g. we can see that: - state 101 deterministically transitions to 011 in the training data, so the probability of that transition becomes higher (79%). Not near 100% because we only did 50 steps of optimization. - state 111 goes to 111 and 110 with 50% probability each, which the model almost learns (45%, 55%). - states like 000 are never encountered during training, but have relatively sharp transition probabilities, e.g. 73% of going to 001. This is a consequence of inductive biases in the Transformer. One might imagine wanting this to be 50%, except in a real deployment almost every input sequence is unique, not present in the training data verbatim. Not really sure where I was going with this :D, I think it's interesting to train/study tiny GPTs because it becomes tractable to visualize and get an intuitive sense of the entire dynamical system. Play with here: colab.research.google.com/dr…

146

47,334

David Dohan · Sep 12, 2024 · 5:47 PM UTC

David Dohan

@dmdohan

12 Sep 2024

Also want to point out o1-mini, which is incredible at coding tasks while being /fast/ It and o1 are the first generation of a new type of model.

Liam Fedus

@LiamFedus

12 Sep 2024

As part of today, we’re also releasing o1-mini. This is an incredibly smart, small model that can also reason before it’s answer. o1-mini allows us at @OpenAI to make high-intelligence widely accessible. openai.com/index/openai-o1-m… On the AIME benchmark, o1-mini re-defines the intelligence + cost frontier (see if you can spot the old GPT-4o model in the bottom 🙂). Massive congrats to the team and especially @ren_hongyu and @shengjia_zhao for leading this!

144

28,676

David Dohan · Dec 20, 2024 · 6:16 PM UTC

David Dohan

@dmdohan

20 Dec 2024

Example of what ARC-AGI problems look like

123

21,370

David Dohan · Jul 19, 2025 · 8:16 AM UTC

David Dohan

@dmdohan

19 Jul 2025

OpenAI achieved gold medal on 2025 International Math Olympiad (solving 5 of 6 problems)! Thinks for hours and writes proofs in natural language. We've come a long way from LLMs solving 50% of MATH dataset in 2022 Congrats @alexwei_ on spearheading a major milestone!

Alexander Wei

@alexwei_

19 Jul 2025

1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).

125

7,490

David Dohan · Jul 16, 2018 · 12:56 AM UTC

David Dohan

@dmdohan

16 Jul 2018

Excited to present our work on evolving architectures for translation and image generation in a modular language this afternoon at #GECCO2018! Joint work with David So and Quoc Le.

117

David Dohan · Oct 13, 2022 · 8:26 PM UTC

David Dohan

@dmdohan

13 Oct 2022

ProtNLM (Protein Natural Language Model) annotates previously "uncharacterised proteins" in @uniprot in English Instead of a restricted tag set, it predicts function as language: [amino acids] -> "CRISPR-associated endonuclease Cas9" Collaboration between @GoogleAI and @emblebi

EMBL-EBI @emblebi

13 Oct 2022

Ever got a result back saying uncharacterised protein? 😩 @uniprot and @GoogleAI have teamed up to create a natural language processing model that has generated over 40 million protein annotations to address this challenge. ebi.ac.uk/about/news/technol…

ALT Protein structure

114

David Dohan · Nov 20, 2023 · 8:04 PM UTC

David Dohan

@dmdohan

20 Nov 2023

Found the OpenAI tenders

112

10,892

David Dohan · Apr 16, 2023 · 7:06 PM UTC

David Dohan

@dmdohan

16 Apr 2023

Copilot turning me from code monkey into tab monkey

104

14,630

David Dohan · Apr 6, 2024 · 10:53 PM UTC

David Dohan

@dmdohan

6 Apr 2024

Happy 1000 days til AGI to those who celebrate

103

13,179

David Dohan · Mar 14, 2023 · 5:15 PM UTC

David Dohan

@dmdohan

14 Mar 2023

GPT4 feels qualitatively different than models I've used before: like working with a creative partner with vast knowledge. The results on standardized tests will make the rate of progress tangible for many people outside AI

OpenAI

@OpenAI

14 Mar 2023

Announcing GPT-4, a large multimodal model, with our best-ever results on capabilities and alignment: openai.com/product/gpt-4

17,179

David Dohan · Mar 1, 2023 · 5:58 AM UTC

David Dohan

@dmdohan

1 Mar 2023

Just ask for smaller, better models! Paper led by @_angie_chen, w/ @david_r_so & me: LMs discover architectures *by directly writing Python Jax code* instead of searching a restricted DSL With EvoPrompting, we use LMs within an evolutionary algorithm to crossover parent prompts

Angelica Chen @_angie_chen

1 Mar 2023

New paper w/ @dmdohan and @david_r_so! Can LMs be used to design novel model architectures? We propose EvoPrompting, which evolves few-shot prompts to enable a code-pretrained LM to generate novel state-of-the-art architectures. arxiv.org/abs/2302.14838 (1/4)

29,014

David Dohan · Nov 21, 2023 · 6:49 PM UTC

David Dohan

@dmdohan

21 Nov 2023

Replying to @yacineMTB

advice I give for short notice interview prep: - get a copy of "elements of programming interviews in python" - read through each chapter & for each problem: a. spend a few minutes thinking of ways you might solve it. b. Imagine approaches: visualize solution/gestalt, how you would structure the code, how it would behave. c. read the solution in back of book should fully solve a few on computer to practice end-to-end there's a limited set of leetcode style questions that are reasonable for an interview, and this covers most of them. also a pretty good review of algorithms in general.

7,263

David Dohan · Mar 14, 2023 · 5:18 PM UTC

David Dohan

@dmdohan

14 Mar 2023

GPT-4 is in the top 20% of test takers in many of these standardized tests openai.com/research/gpt-4

92,132

David Dohan · Dec 20, 2024 · 6:38 PM UTC

David Dohan

@dmdohan

20 Dec 2024

will exclusively refer to the dataset as "ARGH-AGI" from now on

7,674

David Dohan · Apr 7, 2023 · 6:22 PM UTC

David Dohan

@dmdohan

7 Apr 2023

Declarative langs like SQL let us declare a goal (query), and the system plans how to satisfy constraints LMQL does this for LMs: can get better results for sampling & tool use in fewer tokens bc it optimizes the decoding Try it out in the playground: lmql.ai/playground/#cot

LMQL (Language Model Query Language)@lmqllang

5 Apr 2023

🚀 Excited to announce the first release of lmql.ai, a novel open source programming language and platform for language model interaction! Combining prompts, constraints & scripting, LMQL elevates the capabilities of large language models. 🧵1/6 A quick tour.

31,114

David Dohan · Dec 12, 2023 · 1:17 AM UTC

David Dohan

@dmdohan

12 Dec 2023

Presenting two posters at #NeurIPS2023, come by! 10:45am-12:45pm for both - #527 Tuesday @ poster session 1: "Training Chain-of-Thought via Latent-Variable Inference" - #332 Thursday @ poster session 5: "EvoPrompting: Language Models for Code-Level Neural Architecture Search"

8,027

David Dohan · Nov 19, 2023 · 6:37 PM UTC

David Dohan

@dmdohan

19 Nov 2023

Maybe an ask-to-answer to @adamdangelo on Quora can clear things up?

14,750

David Dohan · Dec 20, 2024 · 7:00 PM UTC

David Dohan

@dmdohan

20 Dec 2024

Replying to @DavidSHolz

Think most benchmarks are necessary not sufficient Beating ARC-AGI doesn't mean you actually have AGI, but it would be surprising to have an AGI that couldn't do it. Though FrontierMath is hard enough that we could have an AGI that doesn't solve it, not sure here.

5,196

David Dohan · Apr 1, 2023 · 4:00 PM UTC

David Dohan

@dmdohan

1 Apr 2023

Did you know? Reading a paper signed by the author doubles your learning rate! Today we are launching papers.deals to share our beloved arXiv of autographed machine learning papers with the world All proceeds from these historic artifacts go to charity 💖

19,602

David Dohan · Apr 11, 2023 · 5:03 AM UTC

David Dohan

@dmdohan

11 Apr 2023

Neat prompt trick for Chat: "express same in Prolog" Simple way to get LMs to translate back-and-forth between informal language and formal representations like Prolog/Idris/MiniKanren/... Next up: use the formal language to check its work

mwgkgk @mwgkgk

10 Apr 2023

I'm not kidding, it's really good. Remember how the 420 latest GPT news reddit post raved about compression? Well "Prolog" as an idea of how to present information is a one word miracle

15,552

David Dohan · Feb 18, 2023 · 11:52 PM UTC

David Dohan

@dmdohan

18 Feb 2023

LMs are pretrained to predict the next token. This description is helpful to build intuition, but it’s no longer quite accurate for RL fine tuned models.

Kamal Ndousse

@kandouss

18 Feb 2023

I think "predicting the word that comes next" is a good description of what pretrained LMs (base models) do. But the description is much less apt after base models are fine-tuned with reinforcement learning.

21,009

David Dohan · Jun 1, 2025 · 3:32 AM UTC

David Dohan

@dmdohan

1 Jun 2025

How to code a side project in 2025: 1. May 31 - Write project spec 2. Procrastinate 6 months 3. Dec 31 - ask favorite AI to implement it

4,426

David Dohan · Jun 29, 2023 · 8:58 PM UTC

David Dohan

@dmdohan

29 Jun 2023

Replying to @alitaylor

Paraxanthine! 80% of caffeine metabolizes to it, rest to theobromine/theophylline. All 4 are xanthines which block adenosine "4 hour half life" of caffeine doesn't include processing the 3 stimulants it turns into Rarebird has px coffee & there are preworkouts/energy drinks

7,041

David Dohan · Dec 12, 2024 · 11:10 PM UTC

David Dohan

@dmdohan

12 Dec 2024

At NeurIPS this week Come talk about test time compute, compound systems, reasoning, ai for science, alignment, the impending singularity, and just how weird this is all getting. 2022 was a simpler time. Now we get to live in interesting times.

Shane Gu

@shaneguML

16 Nov 2024

NeurIPS 2022. Here is a photo of @dmdohan @_sholtodouglas @jimmybajimmyba 1 day after the release of ChatGPT, before post-ChatGPT capitalism tore them apart to OpenAI, DeepMind, and xAI. Post-AGI is when they can play Jenga again😔. I'll attend NeurIPS 2024 and am excited to meet with both old and new people. Please find me if you are interested in Gemini, post-training, AGI timeline and Silicon Valley drama, a based RL person's perspective on scaling laws, symbolic vs physical intelligence, multilinguality team (has headcounts), angel investing strategies for Silicon Valley and Tokyo, how I ended up demoing GPT-4 to Japanese Prime Minister at OpenAI, opportunities of working from Tokyo, etc.

6,181

David Dohan · Sep 12, 2024 · 5:20 PM UTC

David Dohan

@dmdohan

12 Sep 2024

o1 ranks in the top 500 students for AIME -> would qualify for the USA Math Olympiad Coding @ the IOI, a variant scores at median among contestants, and an oracle among 10,000 samples per problem would receive a gold medal On GPQA it achieves 78%, compared to 70% for PhDs

15,605

David Dohan · Sep 12, 2024 · 5:25 PM UTC

David Dohan

@dmdohan

12 Sep 2024

We've entered a new paradigm which allows scaling test-time compute alongside train-time compute, so the model can spend more time and achieve better results. Check out the research blog with details: openai.com/index/learning-to…

6,143

David Dohan · Nov 6, 2022 · 8:08 PM UTC

David Dohan

@dmdohan

6 Nov 2022

Replying to @dmdohan @summeryue0 @todor_m_markov

Gotta bring a “No AI room” poster to @NeurIPSConf to create an oasis at events

David Dohan · Dec 20, 2024 · 9:34 PM UTC

David Dohan

@dmdohan

20 Dec 2024

Caveat on the Tao quote: that refers to the hardest "research" split of the dataset, while the 25% is across the entire dataset.

Jaime Sevilla

@Jsevillamol

20 Dec 2024

Replying to @GarrisonLovely

To clear a possible misunderstanding: the quotes refer to questions in the highest tier of difficulty of FrontierMath. Not every question in the benchmark is as difficult as the ones Tao and Gowers reviewed.

7,259

David Dohan · May 19, 2024 · 11:23 PM UTC

David Dohan

@dmdohan

19 May 2024

Love this! There’s so much unexplored space for LM UX experiences

Max

@kinespheric_

19 May 2024

ChatGPT, but with rabbit holes

7,739

David Dohan · Feb 27, 2023 · 2:25 AM UTC

David Dohan

@dmdohan

27 Feb 2023

Who's going to tell Marvin Minsky we put Perceptrons into the Society of the Mind

11,198

David Dohan · Nov 22, 2023 · 6:15 AM UTC

David Dohan

@dmdohan

22 Nov 2023

Party in Principle

2,933

David Dohan · May 4, 2023 · 6:44 AM UTC

David Dohan

@dmdohan

4 May 2023

New favorite prompt: "Write like Wittgenstein" The general pattern is: "Be concise. Write like X."

taylor

@tayroga

4 May 2023

Model too verbose? "Write like Wittgenstein"

3,436

David Dohan · Apr 1, 2023 · 9:37 PM UTC

David Dohan

@dmdohan

1 Apr 2023

Authorship ordering is a challenging problem. In "Academic Author How Names Order To", the authors propose several groundbreaking solutions for this well studied yet thus far intractable task

You’re unable to view this Post because this account owner limits who can view their Posts.

18,335

David Dohan · Sep 12, 2024 · 5:32 PM UTC

David Dohan

@dmdohan

12 Sep 2024

Practically, thinking in plain language opens up a ton of possibilities. On safety & alignment, the model is more reliable because it can reason about policies and available choices before responding, and we are able to inspect its thinking and look for why something happened

8,850

David Dohan · Dec 14, 2024 · 11:04 PM UTC

David Dohan

@dmdohan

14 Dec 2024

“Calling it attention is anthropomorphizing. If they called it ‘Kernel Smoothing is All You Need’ that would not have grabbed the imagination nearly as much” -Ted Chiang @ creativity and AI NeurIPS workshop

4,560

David Dohan · Dec 20, 2024 · 7:10 PM UTC

David Dohan

@dmdohan

20 Dec 2024

An encouraging aspect of the o3 series is that the model can explicitly think about safety and what's OK, leading to more robustness all around

Eric Wallace

@Eric_Wallace_

20 Dec 2024

Chain-of-thought reasoning provides a natural avenue for improving model safety. Today we are publishing a paper on how we train the "o" series of models to think carefully through unsafe prompts: openai.com/index/deliberativ……

21,908

David Dohan · Dec 20, 2024 · 6:30 PM UTC

David Dohan

@dmdohan

20 Dec 2024

Details on the score arcprize.org/blog/oai-o3-pub…

OpenAI o3 Breakthrough High Score on ARC-AGI-Pub | ARC Prize

OpenAI o3 scores 75.7% on ARC-AGI public leaderboard.

arcprize.org

16,668

David Dohan · Mar 23, 2023 · 5:18 PM UTC

David Dohan

@dmdohan

23 Mar 2023

ChatGPT can now use tools through AI Plugins: openai.com/blog/chatgpt-plug… 1. Browsing: Search web to answer questions (WebGPT) 2. Code Interpreter: Write/execute/debug—sandboxed—Python to test/analyze/... 3. Interface with services like Kayak/WolframAlpha/Zapier, or ones you create!

OpenAI

@OpenAI

23 Mar 2023

We are adding support for plugins to ChatGPT — extensions which integrate it with third-party services or allow it to access up-to-date information. We’re starting small to study real-world use, impact, and safety and alignment challenges: openai.com/blog/chatgpt-plug…

13,660

David Dohan · Oct 4, 2023 · 1:21 AM UTC

David Dohan

@dmdohan

4 Oct 2023

Time for Good Old Fashioned AI to make a comeback? I enjoyed "Cognitive Architectures for Language Agents" from @tedsumers and @ShunyuYao12 Discussion tomorrow with @hwchase17 and @charles_irl on the evolving world of scaffolds/abstractions around LLMs! arxiv.org/abs/2309.02427

Cognitive Architectures for Language Agents

Recent efforts have augmented large language models (LLMs) with external resources (e.g., the Internet) or internal control flows (e.g., prompt chaining) for tasks requiring grounding or...

arxiv.org

Harrison Chase

@hwchase17

3 Oct 2023

Our webinar tomorrow might be my favorite one yet. An absolute MUST JOIN for anyone building chains/agents Guests: @dmdohan - Model Cascades paper author @ShunyuYao12 - ReAct paper author @tedsumers - COALA paper author @charles_irl - top tier educator crowdcast.io/c/v7i2ysxqkbd2

8,114

David Dohan · Nov 13, 2023 · 12:52 AM UTC

David Dohan

@dmdohan

13 Nov 2023

Googled phone # to cancel Citi credit card. Grabbed from generated info box. Called it. Weirdly got different security questions than I had noted but made it through. Request to cancel the card and they don't see it. Realize Google's search LLM gave me Chase's phone number🤦‍♂️

6,647

David Dohan · Dec 20, 2024 · 2:49 AM UTC

David Dohan

@dmdohan

20 Dec 2024

Replying to @nickcammarata

By scaffolding I mean some process beyond just sampling from the model. e.g. the top ones sample tons of programs and filter for ones that solve the examples, or "test-time finetune" on examples for each problem individually arcprize.org/blog/openai-o1-…

OpenAI o1 Results on ARC-AGI-Pub | ARC Prize

How far are the o1 preview and mini models from AGI?

arcprize.org

9,273

David Dohan · May 17, 2024 · 4:58 PM UTC

David Dohan

@dmdohan

17 May 2024

💜

Jan Leike

@janleike

17 May 2024

Replying to @janleike

To all OpenAI employees, I want to say: Learn to feel the AGI. Act with the gravitas appropriate for what you're building. I believe you can "ship" the cultural change that's needed. I am counting on you. The world is counting on you. :openai-heart:

6,795

David Dohan · Jul 21, 2022 · 8:35 PM UTC

David Dohan

@dmdohan

21 Jul 2022

@ ICML workshops til Sunday! Come by beyond-bayes.github.io workshop Friday @ 9:40am for our talk, with posters @ 5pm. You'll learn how probabilistic programming lets us formalize models talking to models ("model cascades"), unifying many approaches to prompting and inference.

Beyond Bayes: Paths Towards Universal Reasoning Systems

ICML Workshop, July 22, 2022, Baltimore Convention Center, Ballroom 2 (Level 400)

beyond-bayes.github.io

David Dohan · Nov 4, 2022 · 7:21 PM UTC

David Dohan

@dmdohan

4 Nov 2022

Prompt engineering was fun while it lasted

Keiran Paster

@keirp1

4 Nov 2022

Can large language models write prompts…for themselves? Yes, at a human-level (!) if they are given the ability to experiment and see what works. arxiv.org/abs/2211.01910 with @Yongchao_Zhou_, @_AndreiMuresanu, @ziwen_h, @silviupitis, @SirrahChan, and @jimmybajimmyba (1/7)

David Dohan · Sep 12, 2024 · 5:53 PM UTC

David Dohan

@dmdohan

12 Sep 2024

Above all, I can't wait to see what you all do with o1 On many tasks it will feel similar to GPT4. But on hard problems where it really shines, it's like nothing else For inspiration, here are a few videos showing experts using it for their use cases

swyx @aiDotEngineer WF Day 1 @swyx

12 Sep 2024

🎉Congrats to @OpenAI for releasing o1: - Economics: @tylercowen asked o1 basically to write a college essay - Genetics: @catbrownstein asked o1 to help her reason through "n of 1" cases - medical cases that nobody has ever seen - Physics: @mariokrenn6240 used o1 to draft and reason through complex quantum physics equations - Code: @ren_hongyu prompted a full snake game and it was generated zero shot, working perfectly, and obeyed instructions to add obstacles

3,101

David Dohan · Dec 20, 2024 · 6:48 PM UTC

David Dohan

@dmdohan

20 Dec 2024

FrontierMath details: arxiv.org/html/2411.04872v1

8,205

David Dohan · Mar 19, 2023 · 8:22 PM UTC

David Dohan

@dmdohan

19 Mar 2023

The HF0 crew have made what I can best describe as a tech monastery in the heart of San Francisco. Hard to imagine a more focused environment. Apply if you want 3 incredibly focused months to build on your projects!

Dave Font

@davefontenot

15 Mar 2023

GPT4 launched yesterday. Today, HF0 launches: HF0.com (1/n)

9,387

David Dohan · Jul 2, 2022 · 6:59 PM UTC

David Dohan

@dmdohan

2 Jul 2022

Teaching Minerva🦉 math & science has been a ton of fun. What else were we supposed to do after realizing all the LaTeX on arXiv is available? Check out the sample explorer: minerva-demo.github.io paper: arxiv.org/abs/2206.14858

alewkowycz @alewkowycz

30 Jun 2022

Very excited to present Minerva🦉: a language model capable of solving mathematical questions using step-by-step natural language reasoning. Combining scale, data and others dramatically improves performance on the STEM benchmarks MATH and MMLU-STEM. goo.gle/3yGpTN7

David Dohan · Aug 18, 2021 · 1:55 AM UTC

David Dohan

@dmdohan

18 Aug 2021

New paper on program synthesis with large language models (244M-137B). We investigate: (1) how scaling improves performance on Python and math tasks (2) whether the models can predict output of executing code (3) humans-computer collaboration to write programs via conversation

This tweet is unavailable

David Dohan · Feb 15, 2023 · 1:36 AM UTC

David Dohan

@dmdohan

15 Feb 2023

Replying to @tszzl @izzyz

😉

2,580

David Dohan · Mar 21, 2024 · 5:28 PM UTC

David Dohan

@dmdohan

21 Mar 2024

Replying to @moebio

Cool! Is each point an embedding of the sentence up to that word? If you haven't already, worth having a look at Loom which is an interface for branching interactions with LMs. Has some related interfaces for visualizing paths. generative.ink/meta/block-mu… github.com/socketteer/loom

3,302

David Dohan · Apr 6, 2024 · 10:53 PM UTC

David Dohan

@dmdohan

6 Apr 2024

Everyone seems to have their own bar for what will qualify as AGI. Best I can tell it usually cashes out to "I'll know it when i see it" Fan of @RichardMCNgo's "time-AGI" frame for placing abilities on a spectrum rather than making binary distinctions. We're in the fog now & reasonable people will debate. Of course progress across domains is not uniform, but on average I'd say we're comfortably past second-AGI, and hovering around minute to hour for many areas. I think of this as a rising waterline, with most areas rising at the same rate but some vastly superhuman spikes, usually around processing huge amounts of information. Can further consider ability @ given price point: there are plenty of things models can already do today that are simply not economical nitter.app/RichardMCNgo/sta…

Richard Ngo

@RichardMCNgo

4 Apr 2023

Instead of treating AGI as a binary threshold, I prefer to treat it as a continuous spectrum defined by comparison to time-limited humans. I call a system a t-AGI if, on most cognitive tasks, it beats most human experts who are given time t to perform the task. More details:

4,546

David Dohan · Oct 31, 2022 · 9:01 PM UTC

David Dohan

@dmdohan

31 Oct 2022

WebGPT by prompting only Waiting for an API that lets us do prompt tuning/soft prompting (gradient based continuous z tuning) to make this even easier

Dust

@DustHQ

31 Oct 2022

WebGPT reproduced from advanced prompting only. Dust-based web-search assistant demo answers questions by searching the web, summarizing content and compiling a final answer with references: dust.tt/spolu/a/41770fd3d9

David Dohan · Apr 2, 2024 · 3:47 AM UTC

David Dohan

@dmdohan

2 Apr 2024

1 token is all you need

4,605

David Dohan · Feb 27, 2023 · 1:09 AM UTC

David Dohan

@dmdohan

27 Feb 2023

Replying to @typedfemale

You’re in luck! @shakir_za, @MihaelaCRosca, @mfigurnov, and @AndriyMnih wrote just that about 3 families of approximations (“the pathwise, score function, and measure-valued gradient estimators”) arxiv.org/abs/1906.10652

Monte Carlo Gradient Estimation in Machine Learning

This paper is a broad and accessible survey of the methods we have at our disposal for Monte Carlo gradient estimation in machine learning and across the statistical sciences: the problem of...

arxiv.org

1,440

David Dohan · Sep 6, 2023 · 5:33 PM UTC

David Dohan

@dmdohan

6 Sep 2023

Come see what's brewing @OpenAI

OpenAI

@OpenAI

6 Sep 2023

We’ll be hosting our first developer conference, OpenAI DevDay, on November 6. Registration to attend in person in San Francisco will open in a few weeks. We’ll also livestream the keynote. openai.com/blog/announcing-o…

3,189

David Dohan · Apr 11, 2023 · 5:05 AM UTC

David Dohan

@dmdohan

11 Apr 2023

By letting an LM parse natural -> formal language, we get the best of both worlds: the formal system checks consistency of the natural language reasoning LM = fast system 1, Prolog etc = slow system 2 @Maxwell_Nye has neat work exploring the combo: arxiv.org/abs/2107.02794

1,618

David Dohan · Oct 17, 2024 · 4:08 AM UTC

David Dohan

@dmdohan

17 Oct 2024

look at your data

jason

@jxnlco

16 Oct 2024

anyone: every experienced ai engineer:

3,196

David Dohan · Dec 13, 2024 · 7:34 AM UTC

David Dohan

@dmdohan

13 Dec 2024

2,648

David Dohan · Oct 4, 2022 · 6:02 AM UTC

David Dohan

@dmdohan

4 Oct 2022

Replying to @jekbradbury @ylecun

Also check out primer.ought.org - does an excellent job of demonstrating factored cognition (~latent variable models) with LLMs. It does not have explicit probabilistic inference yet.

Factored Cognition Primer | Primer

How to write compositional language model programs

primer.ought.org

David Dohan · Mar 14, 2023 · 5:21 PM UTC

David Dohan

@dmdohan

14 Mar 2023

The rate of progress is astounding. Where do we land after 2 more comparable leaps? June 11, 2020: GPT-3 March 14, 2023: GPT-4 Jan 1, 2026: ??? Jan 1, 2029: !?!?!?!?

2,317

David Dohan · Sep 17, 2020 · 2:44 AM UTC

David Dohan

@dmdohan

17 Sep 2020

Want Bespoke, but for everything (especially neural network structures) github.com/awwbees/BespokeSy…

GitHub - awwbees/BespokeSynth: Software modular synth

Software modular synth. Contribute to awwbees/BespokeSynth development by creating an account on GitHub.

github.com

Ryan Challinor @awwbees@post.lurk.org @awwbees

25 May 2020

more playing around with livecoding python in bespoke. I added a nice "note stream" module for visualization, which is very useful for understanding what you're doing in live generative composition.

David Dohan · May 12, 2020 · 4:54 AM UTC

David Dohan

@dmdohan

12 May 2020

Built a few graph viz tools on top of the @rem_note API in @observablehq. observablehq.com/@dmrd/remno… Read-only view for now. Next up: extend to whole knowledge bases & allow directly manipulating content inside the graph! What else would you like to see?

David Dohan · Aug 31, 2022 · 11:56 PM UTC

David Dohan

@dmdohan

31 Aug 2022

Replying to @ericjang11 @OfirPress

Can fine tune a base model on different data and weight average, or use the multiple models as a mixture of experts.

Margaret Li @margs_li

8 Aug 2022

Train an LM made of independent expert LMs (no syncs! no shared params!) ➡️ ➕ new or ➖ existing experts. At. Any. Time. ➡️ Ensemble OR parameter average(!!) to outperform dense & sparse LMs & ensemble baselines with less compute, a fraction of the simultaneous GPU usage. 🌳/n

David Dohan · Jun 30, 2022 · 2:46 AM UTC

David Dohan

@dmdohan

30 Jun 2022

Has science gone too far?

BigScience Large Model Training @BigScienceLLM

29 Jun 2022

▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ 101%

David Dohan · Feb 19, 2023 · 9:57 PM UTC

David Dohan

@dmdohan

19 Feb 2023

Replying to @tszzl

Alignment is the ultimate capability

2,676

David Dohan · Jul 2, 2022 · 6:35 PM UTC

David Dohan

@dmdohan

2 Jul 2022

Manifold markets had <45% likelihood of the MATH dataset hitting 1/2 correct before 2025. Our work on🦉Minerva resolved it to success 3 years early manifold.markets/MatthewBarn…

Will a machine learning model score above 50.0% on the MATH dataset before 2025?

Resolved YES. From Hendrycks et al (https://arxiv.org/abs/2103.03874), > Many intellectual endeavors require mathematical problem solving, but this skill remains beyond the capabilities of computers....

manifold.markets

Vedant Misra

@vedantmisra

1 Jul 2022

📈

David Dohan · Sep 12, 2024 · 5:58 PM UTC

David Dohan

@dmdohan

12 Sep 2024

Not there yet, but someday we'll have models that can think for as long as needed (and interact with the world) to solve the hard problems that really matter to society What will you have it think about?

Noam Brown

@polynoamial

12 Sep 2024

Replying to @polynoamial @rao2z

@OpenAI's o1 thinks for seconds, but we aim for future versions to think for hours, days, even weeks. Inference costs will be higher, but what cost would you pay for a new cancer drug? For breakthrough batteries? For a proof of the Riemann Hypothesis? AI can be more than chatbots

1,724

David Dohan · Nov 11, 2022 · 4:10 AM UTC

David Dohan

@dmdohan

11 Nov 2022

Congratulations to the Metaphor team for launching! It's a different way of building a search engine. You "search by prompting" - instead of asking a question, phrase it so the natural completion would give the answer like: "My favorite personal webpages on the internet are"...

Exa

@ExaAILabs

10 Nov 2022

metaphor.systems is now publicly available! Metaphor is a search engine based on generative AI, the same sorts of techniques behind DALL-E 2 and GPT-3 1/

David Dohan · Mar 23, 2024 · 12:08 AM UTC

David Dohan

@dmdohan

23 Mar 2024

It's ok AI, I also can't sightread sexquinquagintillion

You’re unable to view this Post because this account owner limits who can view their Posts.

2,681

David Dohan · Apr 5, 2019 · 2:37 AM UTC

David Dohan

@dmdohan

5 Apr 2019

Look forward to interfaces that let designers work with generative models like this neat SVG generator.

rapha

@rapha_gl

5 Apr 2019

My 1st @GoogleAI Residency paper is finally on arxiv! We train a powerful generative model of fonts as SVG instead of pixels. This highly structured format enables manipulation of font styles and style transfer between characters at arbitrary scales! 👉🏽 arxiv.org/abs/1904.02632

David Dohan · Dec 28, 2024 · 1:21 AM UTC

David Dohan

@dmdohan

28 Dec 2024

Replying to @rayefull

David Dohan

@dmdohan

20 Aug 2021

Replying to @mollyfmielke

There's evidence for it: "In all cases, with exception of S9, they report having owned 1-of-3 toys widely sold by Fisher-Price between 1972 and 1989" Anecdotally, friend traces some # colors to license plate on family car. neurocritic.blogspot.com/201… study: ncbi.nlm.nih.gov/pmc/article…

202

David Dohan · Nov 7, 2022 · 2:11 AM UTC

David Dohan

@dmdohan

7 Nov 2022

The "No AGI zone" shirt looks more useful by the day.

David Dohan

@dmdohan

31 Oct 2022

Replying to @avitaloliver @savvyRL @FelixHill84

Would you like any “No AGI zone” tshirts Even better if it’s reversible with “Let’s talk about AI”

David Dohan · Jul 25, 2023 · 1:14 AM UTC

David Dohan

@dmdohan

25 Jul 2023

Come by the 11am posters on Wednesday to learn how irrelevant context effects LLMs:

David Dohan

@dmdohan

17 Feb 2023

6,071

David Dohan · Dec 12, 2024 · 10:58 PM UTC

David Dohan

@dmdohan

12 Dec 2024

How many minutes of thinking is acceptable to cure cancer?

Nat McAleese

@__nmca__

12 Dec 2024

the test time compute era is weird — before too long someone will be saying “sure it’s AGI, but it had to think for twenty minutes before curing cancer so…”

2,866

David Dohan · Mar 1, 2023 · 2:28 AM UTC

David Dohan

@dmdohan

1 Mar 2023

Replying to @dmdohan @OpenAI @Google

Amused that the media beat me to the announcement

5,241

David Dohan · Apr 18, 2023 · 5:31 AM UTC

David Dohan

@dmdohan

18 Apr 2023

Replying to @andrewwhite01

The "I Can't Believe It's Not Better" workshop @ NeurIPS does this! So many beautiful ideas with the tiny problem that they don't actually work (yet?) icbinb.cc @ICBINBWorkshop

893

David Dohan · Mar 19, 2023 · 7:02 AM UTC

David Dohan

@dmdohan

19 Mar 2023

Replying to @typedfemale

Favorite Twitter bio

1,016

David Dohan · Jul 26, 2021 · 3:10 AM UTC

David Dohan

@dmdohan

26 Jul 2021

Had a chance to discuss the state of natural language processing & potential applications toward an "IDE for thought" with @AthensResearch last month. @PsionicaOrg demoed Dual, which provides natural language interface over a knowledge base. recording: piped.video/watch?v=Oxbv9Enh…

Athens Community Call 6/27/2021: AI, NLP, and text mining workflows in Athens

Featuring:David Dohan, Research Engineer at Google BrainPaul Bri...

youtube.com

Athens 🏛@AthensResearch

27 Jun 2021

For today's community call in 40 minutes, @dmdohan (Google Brain) will be chatting about how we might apply AI/NLP/GPT-3 to Athens Paul Bricman is joining to talk about his project psionica.org/ A preview of the call here: github.com/athensresearch/at… Don't miss this !!

David Dohan · Dec 20, 2024 · 7:06 PM UTC

David Dohan

@dmdohan

20 Dec 2024

Which is why robust safety testing is all the more important Sign up to help redteam o3(-mini)! openai.com/index/early-acces…

Early access for safety testing

We're offering safety and security researchers early access to our next frontier models.

openai.com

3,016

David Dohan · Nov 24, 2023 · 10:01 PM UTC

David Dohan

@dmdohan

24 Nov 2023

i am not a very good language model =\ the site is also subtly broken in a few ways (not all words are allowed tokens, some correct guesses marked as wrong, ...) still good way to build intuition! anyone know who actually made this?

4,658