Andrew White 🐦‍⬛ · Feb 21, 2026 · 1:57 AM UTC

Andrew White 🐦‍⬛

Pinned Tweet

Andrew White 🐦‍⬛

@andrewwhite01

Feb 21

After a few years of procrastination, I've updated my textbook. Changes: 1. Tensorflow -> PyTorch 2. Darkmode 3. Added scaffold split section 4. Fixed many typos

665

31,139

Andrew White 🐦‍⬛ · Feb 10, 2022 · 4:53 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

10 Feb 2022

I've been writing research articles for over 10 years now and one of the hardest parts is writing consistently and efficiently without procrastinating. I'm going to share some of my tips here 🧵 1/10

1,348

11,408

Andrew White 🐦‍⬛ · Feb 10, 2022 · 4:53 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

10 Feb 2022

[placeholder for when I've finished]

5,239

Andrew White 🐦‍⬛ · Oct 3, 2021 · 3:17 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

3 Oct 2021

In academia, we don't say "I love you." We say "minor revisions." And I think that's really beautiful.

689

4,944

Andrew White 🐦‍⬛ · Nov 5, 2025 · 3:35 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

5 Nov 2025

After two years of work, we’ve made an AI Scientist that runs for days and makes genuine discoveries. Working with external collaborators, we report seven externally validated discoveries across multiple fields. It is available right now for anyone to use. 1/5

195

541

3,704

922,278

Andrew White 🐦‍⬛ · Feb 25, 2023 · 5:04 AM UTC

Andrew White 🐦‍⬛

@andrewwhite01

25 Feb 2023

I packed-up a full-text paper scraper, vector database, and LLM into a CLI to answer questions from only highly-cited peer-reviewed papers. Feels unreal to be able instantly get answers by an LLM "reading" dozens of papers. 1/2

310

2,321

525,432

Andrew White 🐦‍⬛ · Aug 25, 2022 · 1:33 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

25 Aug 2022

Sometimes people ask me advice about academia. I have no idea what I'm doing 80% of the time. All my PI proposals in last 2 years have been rejected. In my entire academic career, I've never been able to get a paper reviewed at JACS, Nat Chem, PNAS, etc. 1/2

103

1,867

Andrew White 🐦‍⬛ · Dec 12, 2023 · 10:52 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

12 Dec 2023

About 75% of doing a PhD is getting your python environment set-up

100

1,949

221,026

Andrew White 🐦‍⬛ · Jun 8, 2025 · 2:57 AM UTC

Andrew White 🐦‍⬛

@andrewwhite01

8 Jun 2025

Apple's AI researchers have embraced a kind of anti-LLM cynic ethos, publishing multiple papers trying to argue that reasoning LLMs are somehow limited/cannot generalize. Apple also has the worst AI products (Siri, Apple Intlligence). No idea what their "strategy" is here

244

142

1,904

271,514

Andrew White 🐦‍⬛ · Sep 3, 2022 · 5:48 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

3 Sep 2022

Packaged our algorithm to design binding peptides into a simple colab. Enter a protein sequence, indicate where you want the peptide to bind, and in a few minutes you'll have a peptide predicted to tightly bind via AlphaFold + Bayesian Optimization. 1/2 colab.research.google.com/gi…

276

1,294

Andrew White 🐦‍⬛ · Feb 2, 2023 · 3:34 AM UTC

Andrew White 🐦‍⬛

@andrewwhite01

2 Feb 2023

I just paid $60 to embed the text of the entire lord of the rings trilogy so I could have GPT answer a question I've wondered my whole evening: Do the people of Middle Earth poop? 1/6

1,294

466,205

Andrew White 🐦‍⬛ · Jun 19, 2023 · 2:01 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

19 Jun 2023

We report a model that can go from natural language instructions, to robot actions, to synthesized molecule with an LLM. We synthesized catalysts, a novel dye, and insect repellent from 1-2 sentence instructions. This has been a seemingly unreachable goal for years! 1/3

237

1,142

508,201

Andrew White 🐦‍⬛ · Jun 6, 2025 · 2:08 AM UTC

Andrew White 🐦‍⬛

@andrewwhite01

6 Jun 2025

This is the worst day ever to launch our reasoning model. We're just gonna announce it fresh next week. Everyone please act surprised.

1,019

84,869

Andrew White 🐦‍⬛ · Mar 20, 2023 · 9:12 AM UTC

Andrew White 🐦‍⬛

@andrewwhite01

20 Mar 2023

The average PhD student spends 8 months of their academic career learning about CUDA drivers, conda, pip, setup_tools, and brew. Feels solved now with GPT-4

988

195,079

Andrew White 🐦‍⬛ · Aug 25, 2022 · 1:33 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

25 Aug 2022

I try to prevent the rejections from interfering with my enthusiasm and curiosity for science. But holy shit, it's hard. And we all keep these failures private, which fucks with your perspective on academia and your brain chemistry. Anyway, happy Thursday #AcademicTwitter 2/2

875

Andrew White 🐦‍⬛ · Nov 4, 2022 · 6:16 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

4 Nov 2022

Just got the official response from unicode about the protein emoji. It was REJECTED 🚫 This is obviously very difficult news for me and the protein emoji team. We're evaluating all our options right now.

867

Andrew White 🐦‍⬛ · Jul 13, 2022 · 3:10 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

13 Jul 2022

We've completed our first draft of a protein emoji that we'll be submitting at end of month. Please let me know if you have feedback and/or show your support for a protein emoji! Art by Michael Osadciw

880

Andrew White 🐦‍⬛ · Sep 8, 2022 · 4:50 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

8 Sep 2022

There are many skills you have to develop as a PI that are not taught in graduate school. The skill that I use most often as a PI is downloading a PDF, inserting my signature, and uploading it. Literally hours of my life are uploading and downloading PDFs.

813

Andrew White 🐦‍⬛ · Jan 20, 2025 · 5:56 AM UTC

Andrew White 🐦‍⬛

@andrewwhite01

20 Jan 2025

I believe this rumor, since we've published two papers showing agents doing tasks better than PhD students in literature research, writing Wikipedia articles, and manipulating DNA. Wish we had better PR though😅

The Spectator Index

@spectatorindex

19 Jan 2025

JUST IN: Axios reports that OpenAI is set to announce artificial intelligence super-agents capable of performing tasks at a PhD level

847

126,108

Andrew White 🐦‍⬛ · Mar 16, 2023 · 3:59 AM UTC

Andrew White 🐦‍⬛

@andrewwhite01

16 Mar 2023

Can GPT-4 do drug discovery? No, but it can help. Let's walk through GPT-4 proposing new drugs. This is called knowledge-based screening. We're trying to fill a list of plausible compounds that could lead to new drugs based on research papers. 1/n

153

803

209,508

Andrew White 🐦‍⬛ · Dec 31, 2024 · 7:45 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

31 Dec 2024

Finishing 2024 with one more research result! We’ve trained small language agents to do hard sci tasks: engineering proteins, manipulating DNA, and working with sci literature in a new library called Aviary. We beat humans and frontier LLMs on these tasks!

114

786

86,376

Andrew White 🐦‍⬛ · Oct 23, 2024 · 4:48 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

23 Oct 2024

We’ve just finished writing the missing 15,616 Wikipedia articles to get complete coverage of all 19,255 human genes. We used PaperQA2, which has higher accuracy than existing human-written Wikipedia articles, as judged by blinded biology PhD students and postdocs. 1/5

134

713

167,948

Andrew White 🐦‍⬛ · Sep 26, 2025 · 2:56 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

26 Sep 2025

I finished my estimate on required compute to make an atomic-resolution virtual cell: 10^38 FLOPs to simulate a human cell for 1 day. We should be able to do this simulation in 2074 using 200 TW of power. 1/3

696

129,195

Andrew White 🐦‍⬛ · Dec 17, 2022 · 5:56 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

17 Dec 2022

My mom texted to tell me about ChatGPT. Huge milestone in AI hype cycle

614

83,493

Andrew White 🐦‍⬛ · Mar 14, 2023 · 9:12 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

14 Mar 2023

How did I get GPT-4 to design new drugs? I gave GPT-4 access to small programs and it proposed a new molecule that could be used for leukemia. I did this with @langchain's agent lib. See chain of thought at bottom of GPT-4 paper cdn.openai.com/papers/gpt-4.…

590

113,156

Andrew White 🐦‍⬛ · Jul 27, 2022 · 2:20 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

27 Jul 2022

Working hard on the most important project of my life

ALT cover page of proposal for emoji protein

543

Andrew White 🐦‍⬛ · Jan 3, 2023 · 8:19 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

3 Jan 2023

Here's an embedding of all drugs that entered clinical trials for last 60 years. You can see that molecules have been getting bigger and becoming more similar.

106

546

98,352

Andrew White 🐦‍⬛ · Jul 23, 2025 · 4:22 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

23 Jul 2025

HLE has recently become the benchmark to beat for frontier agents. We @FutureHouseSF took a closer look at the chem and bio questions and found about 30% of them are likely invalid based on our analysis and third-party PhD evaluations. 1/7

596

128,459

Andrew White 🐦‍⬛ · May 19, 2022 · 7:03 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

19 May 2022

Forget DALLE, PALM, M1 Pytorch. This is the biggest advancement in computer science this year: LaTeX math in github github.blog/2022-05-19-math-…

Math support in Markdown

We are pleased to announce that math expressions can now be rendered natively in Markdown on GitHub

github.blog

553

Andrew White 🐦‍⬛ · Feb 14, 2025 · 11:00 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

14 Feb 2025

Molecular dynamics requires a lot of expert knowledge to set-up and analyze simulations. We set out to automate it with LLM agents: MDCrow!

572

43,130

Andrew White 🐦‍⬛ · Apr 26, 2024 · 2:15 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

26 Apr 2024

Here's an open source 1b (1 byte) language model that gets 25% on MMLU. Very powerful relative to model size. I'm raising $2B on a $20B valuation to scale up to 500b model size.

551

53,223

Andrew White 🐦‍⬛ · Feb 10, 2022 · 6:40 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

10 Feb 2022

Remember, don't let anyone find out I write proposals by playing video games all night with you and then panic at 1am followed by enormous amounts of coffee

486

Andrew White 🐦‍⬛ · Apr 12, 2023 · 12:23 AM UTC

Andrew White 🐦‍⬛

@andrewwhite01

12 Apr 2023

Our preprint on using GPT-4 as an agent with tools for chemistry is out! We call it ChemCrow. Working with @SamCox822, @drecmb @pschwllr, we developed a set of tools for synthesis/cond, safety, commercial availability, patents, paper-qa arxiv.org/abs/2304.05376 1/5

122

532

171,978

Andrew White 🐦‍⬛ · Jan 11, 2025 · 3:31 AM UTC

Andrew White 🐦‍⬛

@andrewwhite01

11 Jan 2025

Normal Friday night @FutureHouseSF. It wants to go in the lab so bad to get to work.

529

44,833

Andrew White 🐦‍⬛ · Apr 3, 2023 · 2:16 AM UTC

Andrew White 🐦‍⬛

@andrewwhite01

3 Apr 2023

I've been exploring if GPT-4 and other models (please give me a key @AnthropicAI!!) can do "algebra" of molecules. Let's see a few examples 1/4 demo: whitead.github.io/svelte-che…

515

151,976

Andrew White 🐦‍⬛ · Nov 1, 2023 · 5:57 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

1 Nov 2023

The last few months, I’ve been helping develop the mission for @FutureHouseSF - a non-profit AI Biology institute. Building on our work with autonomous agent driven chemistry and robotics, we’re building an entire org around the mission to build autonomous scientific agents 1/5

502

122,939

Andrew White 🐦‍⬛ · Oct 28, 2021 · 3:56 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

28 Oct 2021

I'm excited to announce alphafold-inverse🎉! This AI model takes in the output protein structure from alphafold and gives the input amino acid sequence.

481

Andrew White 🐦‍⬛ · Sep 26, 2025 · 2:37 AM UTC

Andrew White 🐦‍⬛

@andrewwhite01

26 Sep 2025

This paper is amazing. Their architecture comparison with AlphaFold2 really tells the whole story. Just flow matching with a transformer...

Tanishq Mathew Abraham, Ph.D.

@iScienceLuvr

24 Sep 2025

SimpleFold: Folding Proteins is Simpler than You Think "we introduce SimpleFold, the first flow-matching based protein folding model that solely uses general purpose transformer blocks. Protein folding models typically employ computationally expensive modules involving triangular updates, explicit pair representations or multiple training objectives curated for this specific domain. Instead, SimpleFold employs standard transformer blocks with adaptive layers and is trained via a generative flow-matching objective with an additional structural term."

479

79,850

Andrew White 🐦‍⬛ · Jul 18, 2021 · 3:30 AM UTC

Andrew White 🐦‍⬛

@andrewwhite01

18 Jul 2021

Here's an old render I did in Blender. Just showing relative sizes of some interesting biomacromolecules. Never enough free time now to do stuff like this. Miss it.

436

Andrew White 🐦‍⬛ · Feb 6, 2023 · 4:25 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

6 Feb 2023

R01s are 13 pages of science. If you're a competent faculty member you should type at 100 words per minute - so you should be authoring an R01 every 7 minutes. Add in 10-15% margin for meeting/admin and you should be submitting about 50 R01's everyday.

Sasha Gusev @SashaGusevPosts

5 Feb 2023

Replying to @SashaGusevPosts

How long does it take to write a grant? R01s are 13 pages of science: 3-4 are bespoke (say, 6-8 hours); the rest are aims that can be repurposed from content intended for a manuscript (another 6-8 hours). So ~16 hours dedicated time per grant.

413

145,770

Andrew White 🐦‍⬛ · May 8, 2024 · 12:23 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

8 May 2024

ChemCrow is out today in @NatMachIntell! ChemCrow is an agent that uses chem tools and a cloud-based robotic lab for open-ended chem tasks. It’s been a journey to get to publication and I’d like to share some history about it. It started back in 2022. 1/8

453

82,652

Andrew White 🐦‍⬛ · Dec 14, 2023 · 7:22 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

14 Dec 2023

The tweet: 🚨New AI model predicts if you have HIV infection from video of you breathing with 99% accuracy🚨 The model:

432

61,914

Andrew White 🐦‍⬛ · Jul 2, 2022 · 1:11 AM UTC

Andrew White 🐦‍⬛

@andrewwhite01

2 Jul 2022

Finished with molcloud - now you can generate your own cloud of molecules 1. pip install molcloud 2. molcloud [smiles_file] github.com/whitead/molcloud

424

Andrew White 🐦‍⬛ · Sep 11, 2024 · 6:28 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

11 Sep 2024

We used PaperQA2 to extract claims from papers and then see if they're contradicted anywhere in literature. This task is time consuming for humans, but we were able to use this for hundreds of papers to look for trends in disagreement in fields, decades, and journals.

443

47,133

Andrew White 🐦‍⬛ · Feb 25, 2023 · 6:06 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

25 Feb 2023

Replying to @andrewwhite01 @langchain @OpenAI @SemanticScholar

Most of the code (except the interface) is already open source - github.com/whitead/paper-qa

434

22,532

Andrew White 🐦‍⬛ · May 2, 2022 · 2:51 AM UTC

Andrew White 🐦‍⬛

@andrewwhite01

2 May 2022

Some more fun with machine generated molecules. Here is a molecule with perfectly satisfied valencies, but I don't think will be synthesized.

411

Andrew White 🐦‍⬛ · Jun 22, 2025 · 3:20 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

22 Jun 2025

I have written up a 3.5k word/10 figure essay on how to write a reward function while avoiding reward hacking for chemistry. It covers all the ridiculous ways we had to avoid reward hacking for training ether0, our scientific reasoning model.

446

56,752

Andrew White 🐦‍⬛ · Feb 9, 2023 · 7:37 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

9 Feb 2023

Read a paper that casually mentioned they trained on 330,000 synthesized molecules at a pharma company. They won't share this data, so it will be replicated over and over. We could reduce animal, economic, and environmental burden so much by sharing data from completed projects.

389

69,108

Andrew White 🐦‍⬛ · Apr 13, 2022 · 6:17 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

13 Apr 2022

I've put together a few of my favorite discussions on the details of doing molecular dynamics. I'll add more as they come. Hopefully they're useful to you! 🧵1/n

403

Andrew White 🐦‍⬛ · May 1, 2025 · 3:29 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

1 May 2025

The plan at FutureHouse has been to build scientific agents and use them to make novel discoveries. We’ve spent the last year researching the best way to make agents. We’ve made a ton of progress and now we’ve engineered them to be used at scale, by anyone. Today, we’re launching the FutureHouse Platform: an API and website to use our AI agents for scientific discovery. It’s been a bit of a journey! June 2024: we released a benchmark of what we believe is required of scientific agents to make an impact in biology, Lab-Bench. September 2024: we built one agent, PaperQA2, that could beat biology experts on literature research tasks by a few points. October 2024: we proved-out scaling by writing 17,000 missing Wikipedia articles for coding genes in humans. December 2024: we released a framework and training method to train agents across multiple tasks - beating biology experts in molecular cloning and literature research by >20 points of accuracy. May 2025: we’re releasing the FutureHouse Platform for anyone to deploy, visualize, and call on multiple agents. I’m so excited for this, because it’s the moment that we can see agents impacting people broadly. I’m so impressed with the team at FutureHouse for us to execute our plan in less than 1 year. From benchmark to wide deployment of agents that can exceed human performance on those benchmarks! So what exactly is the FutureHouse Platform? We’re starting with four agents: precedent search in literature (Owl), literature review (Falcon), chemical design (Phoenix), and concise literature search (Crow). The ethos of FutureHouse is to create tools for experts. Each agent’s individual actions, observations, and reasoning is displayed on the platform. Each scientific source is considered from retraction status, citation count, record of publisher, and citation graph. A complete description of the tools and how the LLM sees them is visible. I think you’ll find it very refreshing to have complete visibility into what the agents are doing. We’re scientific developers at heart at FutureHouse, so we built this platform API-first. For example, you can call Owl to determine if a hypothesis is novel. So - if you’re thinking about an agent that proposes new ideas, use our API to check them for novelty. Or checkout Z. Wei’s Fleming paper that uses Crow to check ADMET properties against literature by breaking a molecule into functional groups. We’ve open sourced almost everything already - including agents, the framework, the evals, and more. We have more benchmarking and head-to-head comparisons available in our blog post. See the complete run-down there on everything. You will notice our agents are slow! They do dozens of LLM queries, consider 100s of research papers (agents ONLY consider full-text papers), make calls to Open Targets, Clinical Trials APIs, and ponder citations. Please do not expect this to be like other LLMs/agents you’ve tried: the tradeoff in speed is made up for in accuracy, thoroughness and completeness. I hope, with patience, you find the output as exciting as we do! This truly represents a culmination of a ton of effort. Here are some things that kept me up at night: we wrote special tools for querying clinical trials. We found how to source open access papers and preprints at a scale to get to over 100 PDFs per question. We tested dozens of LLMs and permutations of them. We trained our own agents with Llama 3.1. We wrote a theoretical grounding on what an agent even is! We had to find a way to host ~50 tools, including many that require GPUs (not including the LLMs). Obviously this was a huge team effort: @m_skarlinski is the captain of the platform and has taught me and everyone at FutureHouse how to be part of a serious technology org. @SGRodriques is the indefatigable leader of FutureHouse and keeps us focused on the goal. Our entire front-end team is just half of @tylernadolsk time. And big thanks to James Braza for leading the fight against CI failures and teaching me so much about Python. @SidN137 and @Ryan__Rhys , for helping us define what an agent actually is. And @maykc for responding to my deranged slack DMs for more tools at all times. Everyone at FutureHouse contributed to this in some way, so thanks to them all! This is not the end, but it feels like the conclusion of the first chapter of FutureHouse’s mission to automate scientific discovery. DM me anything cool you find!

434

38,652

Andrew White 🐦‍⬛ · Jul 3, 2024 · 2:17 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

3 Jul 2024

Having trouble keeping up with language models in science? We just prerinted a 497 reference review introducing and reviewing language models and agents. We narrowed our focus to chemistry/biochem on language models (80!), but cover 24 language model agents across science. 1/2

396

69,239

Andrew White 🐦‍⬛ · Dec 27, 2022 · 8:23 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

27 Dec 2022

I'm going to industry on Jan 1. Will be on academic leave from @UofR and stay affiliated. Excited to try something new and will have more details to share soon!

388

81,149

Andrew White 🐦‍⬛ · Oct 9, 2024 · 5:47 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

9 Oct 2024

Is it time for the protein emoji to return? Can we overcome the past unicode rejection now with the tailwinds of the Nobel Prize? Can proteins be recognized as important as 🫘 or 🙎or 🥀?

392

34,397

Andrew White 🐦‍⬛ · May 6, 2025 · 3:16 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

6 May 2025

We're launching an agent that can do bioinformatics analysis, including repeating analysis from research papers. It is multimodal and results in a complete jupyter notebook (python or R) that ends in a concrete conclusion. Starting with closed-beta now

Sam Rodriques

@SGRodriques

6 May 2025

Introducing Finch, a new agent that fully automates data-driven discovery in biology. We are launching a closed beta for it today (sign up below). This is still early, but impressive, maybe similar to a good 1st yr grad student. In the video, see how it independently reproduces key findings from the Golub Lab's 2020 MetMap paper, including the fact that ADAM28 deletions are associated with breast cancer metastases to brain (fig 4b of the original paper). It also identifies several novel findings not already in the paper, like associations with EFNA5 and PTCH1 amplifications. Importantly, the prompt here is fully open-ended! We just ask the agent to explore the data. Similar to a first year grad student, it makes a bunch of silly mistakes, but also actually ends up finding some really cool stuff. And it works really fast by comparison... 1/3

416

32,439

Andrew White 🐦‍⬛ · Jun 5, 2025 · 3:46 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

5 Jun 2025

At FutureHouse, we’ve noticed scientific agents are good at applying average intelligence across tasks. They always seem to make the obvious choices, which is good, but discovery sometimes requires more intuition and insight than average. We’ve made the first step today towards superhuman insight by training a reasoning model for a specific domain of science: designing drug-like molecules. We’re releasing a 24B open-weights reasoning model called 𝚎𝚝𝚑𝚎𝚛𝟶. 𝚎𝚝𝚑𝚎𝚛𝟶 has been trained with reinforcement learning to exceed frontier and human experts across a range of molecular design tasks. 𝚎𝚝𝚑𝚎𝚛𝟶 takes in natural language, reasons in English, and outputs a new molecule. 𝚎𝚝𝚑𝚎𝚛𝟶 is now a tool for our chemistry design agent, Phoenix, which can call upon it to design molecules. Training a reasoning model for a scientific domain like chemistry, rather than math or programming, required a number of small technical advances. For example, we developed an iterative method of split specialist models and aggregation of reasoning traces. Another example is we used LLMs to rewrite questions that were partially solved. A major finding from this work is that we can train with >10x efficiency per experimental measurement when using a reasoning model, rather than fine-tuning. We also found that reasoning models can learn new tasks, developed specifically for this paper and not in pretraining corpora. We even saw a task have 0% performance until 100 steps into RL, at which it randomly solved once. This, along with our change in modality from natural language to molecules, bodes well for applying reasoning models far from natural language. Reasoning models in science are the future. Scientific tasks are naturally verifiable rewards: the physical world is the ultimate arbiter of accuracy, rather than human contractors. The data efficiency gain and ability to exceed frontier models with relatively few parameters/compute mean that we should expect more scientific reasoning models soon. Congrats to team @SidN137, James, @Ryan__Rhys, Albert, @GWellawatte , @maykcaldas , @ludomitch , and @SGRodriques. Thanks to @VoltagePark @nvidia and @huggingface for supporting us, and huge thanks to @ericschmidt for funding @FutureHouseSF The model weights, reward model, and new benchmark are open source. You can also read more about scientific reasoning models in our exclusive with Nature.

409

80,739

Andrew White 🐦‍⬛ · Sep 20, 2025 · 7:41 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

20 Sep 2025

Working on a new blog post to estimate when we can simulate cells atom-by-atom. I analyzed 500 papers published in last 35 years to look at the duration of all-atom simulations and how many atoms can be modeled.

388

43,620

Andrew White 🐦‍⬛ · Aug 7, 2022 · 7:17 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

7 Aug 2022

New preprint on pre-trained models for Bayesian optimization (BO) of sequences! We show LLMs trained on protein seqs can replace Gaussian processes in BO. Examples: BO of peptide inhibitors with AlphaFold and iterative design of proteins. 1/6 biorxiv.org/content/10.1101/…

348

Andrew White 🐦‍⬛ · Mar 3, 2023 · 8:27 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

3 Mar 2023

ChatGPT API for molecular properties: "Why is this molecule soluble?" Trying out our XAI code with new ChatGPT API.

The presence of a carboxylic acid and a hetero N nonbasic/heteroaromatic/aromatic group in this molecule are both positively correlated with its aqueous solubility. The carboxylic acid group is especially important, as it is highly polar and can form hydrogen bonds with water molecules, allowing the molecule to dissolve in water. The hetero N nonbasic/heteroaromatic/aromatic group also contributes to the molecule’s solubility, as it can form hydrogen bonds with water molecules and increase the molecule’s polarity. If either of these groups were absent, the molecule’s aqueous solubility would be significantly reduced.

ALT The presence of a carboxylic acid and a hetero N nonbasic/heteroaromatic/aromatic group in this molecule are both positively correlated with its aqueous solubility. The carboxylic acid group is especially important, as it is highly polar and can form hydrogen bonds with water molecules, allowing the molecule to dissolve in water. The hetero N nonbasic/heteroaromatic/aromatic group also contributes to the molecule’s solubility, as it can form hydrogen bonds with water molecules and increase the molecule’s polarity. If either of these groups were absent, the molecule’s aqueous solubility would be significantly reduced.

348

60,157

Andrew White 🐦‍⬛ · Jan 25, 2022 · 8:35 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

25 Jan 2022

Found this ASCII molecule viewer. One of the most creative ideas I've seen and it's sitting at 5 stars on GitHub. Mesmerizing stuff github.com/dewberryants/asci…

344

Andrew White 🐦‍⬛ · Aug 3, 2020 · 1:22 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

3 Aug 2020

Here's a picture showing which amino acids interact based on crystallography data. Blue = strong interaction. This is just a heuristic, but it's a nice reference for thinking beyond "hydrophobic doesn't like hydrophilic". Description: crowsandcats.blogspot.com/20…

316

Andrew White 🐦‍⬛ · Dec 7, 2021 · 9:55 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

7 Dec 2021

Imagine if instead of donating $500M to Harvard or $400M to Stanford, the most well-funded PIs and institutions, donors funded 1,000 high risk $500,000 grants. Or launched 50 institutes at state universities. Harvard could withdraw 1% of their endowment to make-up the diff.

310

Andrew White 🐦‍⬛ · Mar 20, 2023 · 12:11 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

20 Mar 2023

I believe we are underestimating the pace of AI over the next 12 months. The compound effect of people+LLMs will be huge - products will come faster, science will go faster, literature will be mined better. AI hype maybe too high, but we're missing hype on smart people + AI 1/3

321

125,274

Andrew White 🐦‍⬛ · Jun 7, 2023 · 5:18 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

7 Jun 2023

Nature has a policy that all code must be released from papers - but apparently Deep Mind has released the "pseudocode" for the algorithm instead of reproducible code. Interesting how AI companies are can balance competitive advantage while still getting high impact pubs 1/2

Google DeepMind

@GoogleDeepMind

7 Jun 2023

As society becomes more digital, it’s critical to improve the code powering the world's computing. Today in @Nature, we present AlphaDev, an AI system using reinforcement learning to discover enhanced computer science algorithms. How does it work? 🧵 dpmd.ai/alphadev-tw

305

101,850

Andrew White 🐦‍⬛ · Aug 25, 2022 · 3:29 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

25 Aug 2022

Want to add to this that I'm grateful to @ChemicalScience, @MLSTjournal, and @digital_rsc for giving us a great space to publish some of our most radical papers.

300

Andrew White 🐦‍⬛ · Dec 1, 2022 · 8:51 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

1 Dec 2022

9:00am @generate_biomed announces protein diffusion model making seemingly incredible folds, people note that it's unknown if these structures can experimentally form 3:45pm @_JosephWatson announces Baker Lab experimentally validated protein diffusion models Gotta keep up!

318

Andrew White 🐦‍⬛ · Mar 31, 2023 · 6:53 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

31 Mar 2023

People in tech are freaking out about LLMs, but not in science yet. GPT-4 can just go from jira ticket to completed code in ~20-40% of the time. It doesn't matter if you ask follow-up questions - you need to completely rethink tech. 1/2

313

160,367

Andrew White 🐦‍⬛ · May 26, 2021 · 10:41 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

26 May 2021

I got tenure!! 🎉Thank you students, mentors, and the wonderful computational chemistry community! It's been a long journey and I'm thrilled to have made it.

293

Andrew White 🐦‍⬛ · Dec 7, 2021 · 7:33 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

7 Dec 2021

According to my machine learning model, these 9 molecules are optimal. Please synthesize and test them. 😂

302

Andrew White 🐦‍⬛ · Apr 19, 2022 · 10:37 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

19 Apr 2022

New publishing concept: every 2-3 years you release a batch of ~10 papers around a theme. Call it an album.

301

Andrew White 🐦‍⬛ · Jun 13, 2023 · 5:59 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

13 Jun 2023

My wife said my whiteboard has reached psychopath-level vibes and it's time to erase.

303

28,788

Andrew White 🐦‍⬛ · Jun 3, 2023 · 1:11 AM UTC

Andrew White 🐦‍⬛

@andrewwhite01

3 Jun 2023

Starting next week in my lab, we're going to build an agent that can set-up and execute molecular dynamics simulations. Go from intent to system set-up. Follow along here: github.com/ur-whitelab/md-ag… Example: "Set-up a solvated simulation of GWGYKRR and KRAS"

GitHub - ur-whitelab/MDCrow: Molecular dynamics simulations with an LLM agent

Molecular dynamics simulations with an LLM agent. Contribute to ur-whitelab/MDCrow development by creating an account on GitHub.

github.com

317

56,089

Andrew White 🐦‍⬛ · May 19, 2023 · 1:11 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

19 May 2023

I wrote a short opinion piece about the impact of large language models in chemistry that came out today in @NatRevChem! I cover agents, hallucinations, XAI, and what I think is next - would love to hear what others think! nature.com/articles/s41570-0…

305

49,860

Andrew White 🐦‍⬛ · Jul 6, 2022 · 5:18 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

6 Jul 2022

I received tenure in July 2021. I then resolved to only work on projects that are truly exciting for me and my group. We went from 2-3 papers per year to 16 in the last 12 months. And they are exciting papers that were a joy to work on. 1/3

291

Andrew White 🐦‍⬛ · Mar 2, 2023 · 3:06 AM UTC

Andrew White 🐦‍⬛

@andrewwhite01

2 Mar 2023

I really love equivariant neural networks. There is no field like it - such a blend of graphs, math, quantum mechanics, chemistry, and deep learning. This (very good paper) is citing math papers over 100 years old and deep learning papers from 2022. arxiv.org/abs/2302.03655

299

30,224

Andrew White 🐦‍⬛ · Dec 6, 2023 · 3:00 AM UTC

Andrew White 🐦‍⬛

@andrewwhite01

6 Dec 2023

Why make one mistake, when you can automate your workflow to make 10,000 mistakes at once

275

27,725

Andrew White 🐦‍⬛ · Dec 15, 2022 · 7:29 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

15 Dec 2022

What are the hardest problems right now in drug discovery? Has deep learning had any effect?

279

125,113

Andrew White 🐦‍⬛ · Mar 14, 2023 · 5:37 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

14 Mar 2023

I was lucky enough to be a red teamer for GPT-4 and had a great time exploring it for chemistry. It's been tough to keep quiet! Want to thank the excellent team (especially @_lamaahmad!) for taking my feedback seriously about risks and mitigations for LLMs in chemistry. 1/2

283

59,247

Andrew White 🐦‍⬛ · Dec 30, 2022 · 8:20 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

30 Dec 2022

2022 was the best year of my career: 18 papers, each of which I'm proud of, and we started more interesting research. I began learning 3 languages and potty-trained my son (very hard). Many fun side-projects too, like protein emoji and molcloud. The biggest change though 1/3

276

66,712

Andrew White 🐦‍⬛ · Sep 9, 2024 · 2:37 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

9 Sep 2024

we made this: hasanyone.com/

Edison Platform

AI Agents for Scientific Discovery

platform.edisonscientific.com

275

59,265

Andrew White 🐦‍⬛ · Dec 8, 2022 · 2:20 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

8 Dec 2022

Why do molecules smell? We create explanations for why molecules have one of 112 scents. The explanations are distilled into natural language. We're excited about data -> model -> explanation of mols. Work by Aditi (looking for PhD program!), @gandhi_heta, @GWellawatte 1/3

275

Andrew White 🐦‍⬛ · Feb 18, 2022 · 4:55 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

18 Feb 2022

I have a new deaf student in my lab and she’s interested to find specific signs for deep learning and computational chemistry words. Anyone know of a community or resource for this?

244

Andrew White 🐦‍⬛ · Jan 10, 2022 · 5:01 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

10 Jan 2022

I have a postdoc opening on deep learning and simulation of protein structure. If you have expertise in simulation only, I will help you develop your skills in deep learning. Will involve work on AlphaFold. Deadline 2/15/2022. Please RT🙏 #postdocjobs thewhitelab.org/#postdoc

134

261

Andrew White 🐦‍⬛ · Feb 2, 2023 · 3:34 AM UTC

Andrew White 🐦‍⬛

@andrewwhite01

2 Feb 2023

However, they are not seen relieving themselves, further suggesting that the people of Middle Earth do not poop. END AI Answer. That's good enough for me, now I can finally get some sleep. 6/6

266

42,950

Andrew White 🐦‍⬛ · Dec 9, 2022 · 7:04 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

9 Dec 2022

New paper giving our perspective for how to explain deep learning models in chemistry. We hope the simplicity and utility is clear, and that more people will start providing explanations for predictions! By @GWellawatte, @gandhi_heta, Aditi chemrxiv.org/engage/chemrxiv…

255

Andrew White 🐦‍⬛ · Apr 12, 2023 · 12:55 AM UTC

Andrew White 🐦‍⬛

@andrewwhite01

12 Apr 2023

In-context learning (ICL) with LLMs can do regression w/uncertainty! We show Bayesian opt of molecules and catalyst/rxn conditions purely from text. ICL reqs no training, could do in a browser! Work by @marcporosoff @Kyam888 @MichtavyShane 1/6 arxiv.org/abs/2304.05341

261

87,707

Andrew White 🐦‍⬛ · Apr 4, 2023 · 8:28 AM UTC

Andrew White 🐦‍⬛

@andrewwhite01

4 Apr 2023

Here are my unscientific power rankings for LLMs working with SMILES: 1. GPT-4 2. text-davinci-003 3. Claude-v1.2 4. gpt-3.5-turbo 5. claude-instant 6. text-curie-001 7. cohere-command-xlarge You can compare them here: whitead.github.io/svelte-che…

265

44,217

Andrew White 🐦‍⬛ · Sep 19, 2025 · 3:40 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

19 Sep 2025

Our ether0 paper was accepted at NeurIPS 2025! Very proud of the @FutureHouseSF team!

278

20,478

Andrew White 🐦‍⬛ · Sep 7, 2021 · 9:48 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

7 Sep 2021

I teach machine learning for chemistry & materials. I think this is the future of chem, chem eng, mat sci. 5 students enrolled. Am I crazy, at the wrong school, have a bad reputation? I feel like Prof. Rentz from The Chair. Guess I gotta change course name to sex, drugs, & ML

249

Andrew White 🐦‍⬛ · May 20, 2025 · 3:22 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

20 May 2025

FutureHouse's goal has been to automate scientific discovery. Now we used our agents to make a genuine discovery – a new treatment for one kind of blindness (dAMD). We had multiple cycles of hypotheses, experiments, and data analysis – including identify the mechanism of action.

262

29,051

Andrew White 🐦‍⬛ · Mar 15, 2022 · 4:55 AM UTC

Andrew White 🐦‍⬛

@andrewwhite01

15 Mar 2022

Here's why I'm staying in Academia and not going to industry 🧵 I applied to industry at many stages in my academic career and I got rejected literally every time😂So congratulations to those of you that are getting industry jobs🥳

230

Andrew White 🐦‍⬛ · Sep 2, 2021 · 5:44 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

2 Sep 2021

My group is pleased to announce a new project: Emoji Chemistry😀⚗️🎉. We're working on a new periodic table of elements where tired letters 😴 are replaced with ✨emojis✨Here's an early screenshot of the future of chemistry 😎

241

Andrew White 🐦‍⬛ · Oct 30, 2024 · 10:05 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

30 Oct 2024

Every AI chemistry talk ever: "We need more data!" never does anyone say "I'm so grateful for the data we have" or "here are 10 more data points." Just complaint after complaint from these chemists

242

26,653

Andrew White 🐦‍⬛ · Sep 20, 2021 · 4:18 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

20 Sep 2021

My book on deep learning for molecules and materials has a new domain: dmol.pub shortest url I could get

241

Andrew White 🐦‍⬛ · Sep 4, 2021 · 5:58 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

4 Sep 2021

I just solved one of the most difficult open problems in science: how to insert SVGs into @matplotlib figures. Now you can have SVG molecules from @RDKit_org, arbitrarily nested figures, recursive figures, etc. All in beautiful SVG format. I'm retiring now.

ALT Wow! nested figures in SVG

233

Andrew White 🐦‍⬛ · Nov 5, 2025 · 3:35 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

5 Nov 2025

Read the tech report: arxiv.org/abs/2511.02824 Use it: platform.edisonscientific.co… See our new website with much more info: edisonscientific.com/

264

36,081

Andrew White 🐦‍⬛ · Aug 31, 2021 · 3:04 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

31 Aug 2021

1/6 For the last few months @glenhocky and I have been asking what large language models (LLM) can do for chemistry. In our new preprint, we show LLMs know a bit of chemistry and can do a lot: like compute the dissociation curve of H2. arxiv.org/abs/2108.13360

232

Andrew White 🐦‍⬛ · May 1, 2022 · 10:46 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

1 May 2022

Learning to write in LaTeX can be a difficult step in academia. I've put together a guide on how to go from novice to expert. It seems hard, but if you follow these steps you'll immediately earn the respect of your friends and colleagues.

236

Andrew White 🐦‍⬛ · Feb 27, 2023 · 1:18 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

27 Feb 2023

Here's a demo showing scraping a Github repo and then asking complex questions about code. Advantages are it's specific to the repo, the LLM can convert between language & code, and it cites specific line numbers. My first @streamlit app :)

242

31,865

Andrew White 🐦‍⬛ · Apr 26, 2022 · 12:10 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

26 Apr 2022

Congrats to @bradyajohnston to really solving the problem of going from molecular dynamics trajectory to Blender geometry. I's going to enable some cool new visualizations/art in computational chemistry. @bradyajohnston's code: github.com/BradyAJohnston/Mo…

231

Andrew White 🐦‍⬛ · Nov 17, 2021 · 1:21 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

17 Nov 2021

Happy International Students' Day 🎉They overcome many extra challenges to become scholars and scientists, and their success at facing them does not appear on CVs. I appreciate their contributions to science and hope we can do more to support them!

216

Andrew White 🐦‍⬛ · Dec 8, 2023 · 6:41 PM UTC

Andrew White 🐦‍⬛

@andrewwhite01

8 Dec 2023

Our first result at @FutureHouseSF! So much to talk about: 1. We quantified hallucinations major LLMs 2. Benchmarked RAGs on biology questions 3. Beat all models, tied meatbags 4. Wrote a complete human genome Wikipedia Will tweet more details later... futurehouse.org/wikicrow

PaperQA2: Superhuman scientific literature search | FutureHouse

PaperQA2 is the first AI agent to achieve superhuman performance on a variety of different scientific literature search tasks.

futurehouse.org

238

34,377