Eric Nguyen · Jun 15, 2026 · 12:21 PM UTC

Eric Nguyen

Pinned Tweet

Eric Nguyen

@exnx

Jun 15

Together with my co-founders Michael @MichaelPoli6, Stefano @Massastrello and Armin @athmsx, I am excited to announce @RadicalNumerics is emerging from stealth with a $50M seed round to build general biological intelligence. We’re also sharing an early preview of our new model Omnii, the most powerful genome language model to date. Omnii preview link: radicalnumerics.ai/blog/radi… At Radical Numerics, our mission is to master the code of life, and to drive the frontier of biological AI for both design and defense. This is our dual mandate, which comes from something our own team helped make possible. Our founding team trained Evo and Evo 2, the largest biological AI models (40B params) trained on DNA sequences. Trillions of tokens across all of life, from microbes to mammals. It’s fully open source, and created the field now known as generative genomics. Last year, scientists used Evo to generate the world’s first complete genome from scratch using AI. Turns out it was a bacteriophage—a type of virus. It functioned in the real world, and in this case it was harmless. But for us, it was a clear turning point. It showed that AI is no longer just analyzing biology. It is on the cusp of generating functional lifeforms. Eventually, AI will have the power to design and control life itself. That should make all of us incredibly excited, and incredibly uneasy. (Anyone can design DNA with a new function, and have it synthesized and delivered, like something from Amazon Prime). The same technology that will help us cure cancer is the very technology that might create the next global pandemic, or worse, allow the creation of bioweapons that can wipe out populations. We believe these forces are inseparable. If you work on the frontier of biology, you have to build technology to safeguard it from its misuse. Existing biosecurity tools are sorely losing the arms race, relying on outdated “have I seen this exact thing before?” style algorithms. We founded Radical Numerics to turn the tide. And we can’t do that by training on textbooks and natural language. We must understand the language of biology from the raw physical data itself, to reason across every molecule and modality, from DNA to proteins. The next frontier for AI goes far beyond chatbots or video generators to models that can understand and engineer life. Today, we’re previewing Omnii, which is already far surpassing Evo 2, and will continue improving as we scale and add new modalities (training now). 1. For human health, Omnii can read and write whole genomes (more on writing later). It’s state of the art (SOTA) on detecting causal variants for disease, and can rank Alzheimer's mutations zero-shot. We’re partnering with a diagnostics company to use Omnii for early cancer detection (pancreatic and multi-cancer). 2. For defense, Omnii is SOTA at detecting AI-generated pathogens. We benchmarked existing detection tools, and they simply can’t detect the AI-generated ones (“deepfake viruses”). We’re partnering with a US national lab to pilot Omnii for detecting the next pandemic, both natural and AI-generated. We have a data center full of Blackwells in construction now to build the most powerful biological AI models ever. This mission takes a new kind of AI lab that can actually scale on physical, biological data: new alignment research (mid/post training), scaling long context, building out mech interp teams to dissect what these models learn, new architectures and systems designs, all from the ground up. Our team is made up of AI researchers and scientists from top labs and institutions (e.g. Stanford, MIT, Google DeepMind), but more importantly, we all share the belief that this is the most important challenge of our lifetime. If you feel similarly, we are hiring. We aim to bring the brightest minds in AI and science together to save lives. Thanks to our partners on this journey, led by Emergence Capital @emergencecap, with Obvious Ventures @obviousvc, Triatomic @TriatomicCap , and Patrick Collison @patrickc. Our advisors include Eric Horvitz @erichorvitz, CSO of Microsoft, Chris Re @HazyResearch of Stanford, George Church @geochurch of Harvard, and Andrew Weber @AndyWeberNCB, former Assistant Secretary of Defense for Nuclear, Chemical and Biological Defense Programs. Fortune article: fortune.com/2026/06/15/exclu… Jobs: radicalnumerics.ai/join-us

154

310

1,395

2,549,006

Eric Nguyen · Feb 27, 2024 · 4:25 PM UTC

Eric Nguyen

@exnx

27 Feb 2024

Is DNA all you need? Introducing Evo, a long context 7B foundation model for biology Evo has SOTA *zero-shot* prediction across DNA, RNA, and protein modalities Evo can generate DNA, RNA+proteins & make CRISPR-Cas systems for first time blog arc-website-git-foundation-m…

147

675

106,316

Eric Nguyen · Mar 13, 2024 · 6:28 AM UTC

Eric Nguyen

@exnx

13 Mar 2024

Arc and Evo team is hiring ML engineers and software folks. If you're interested in cutting edge bio + ML and making impact, I highly recommend reaching out to Brian and Patrick. We want to go big. @pdhsu @BrianHie

Patrick Hsu

@pdhsu

27 Feb 2024

Replying to @pdhsu

By the way, if you're a software or ML engineer interested in creating life in a computer, we're hiring. Reach out to me and @BrianHie: patrick@arcinstitute.org, brianhie@stanford.edu

335

461,686

Eric Nguyen · Nov 14, 2024 · 8:50 PM UTC

Eric Nguyen

@exnx

14 Nov 2024

Evo has been published in @Science! A true privilege to work with such an amazing team! So many exciting new experimental results in this emerging field of Generative Genomics, including AI generated and *validated* CRISPR-Cas systems and transposons

Science Magazine

@ScienceMagazine

14 Nov 2024

A new Science study presents “Evo”—a machine learning model capable of decoding and designing DNA, RNA, and protein sequences, from molecular to genome scale, with unparalleled accuracy. Evo’s ability to predict, generate, and engineer entire genomic sequences could change the way synthetic biology is done. Learn more in this week's issue: bit.ly/3OsmUPr

259

37,401

Eric Nguyen · Mar 14, 2024 · 4:40 PM UTC

Eric Nguyen

@exnx

14 Mar 2024

🧬🧬🧬 Is learning from DNA a new grand challenge in biology? Check out the new blog on Evo from the ML team “director’s cut”: hazyresearch.stanford.edu/bl… And we always love demos, generate DNA w/ Evo and predict protein structure in 1 click (by Nitro Bio) evo.nitro.bio/ @MichaelPoli6

Learning from DNA: a grand challenge in biology

Is DNA all you need?

hazyresearch.stanford.edu

166

30,119

Eric Nguyen · Sep 14, 2023 · 7:44 PM UTC

Eric Nguyen

@exnx

14 Sep 2023

We gave a talk on HyenaDNA, a genomic foundation model, at OpenBioML ! YouTube link: piped.video/haSkAC1fPX0?si=CmhQ… We walk through how the Hyena operator works, and the intuition behind the implicit long convolutions. Excited to hear people’s feedback :)

104

16,247

Eric Nguyen · Apr 17, 2024 · 10:54 PM UTC

Eric Nguyen

@exnx

17 Apr 2024

Some guy gave a talk on Evo, maybe worth checking out piped.video/watch?v=rKy6O9iQ…

105

8,841

Eric Nguyen · Dec 10, 2024 · 11:49 PM UTC

Eric Nguyen

@exnx

10 Dec 2024

I'm at #NeurIPS2024 in Vancouver all week, excited to chat everything AI for bio + sciences!

100

9,638

Eric Nguyen · Mar 7, 2024 · 5:56 AM UTC

Eric Nguyen

@exnx

7 Mar 2024

I have to say, well done on the wrapper around Evo playground, it's beautiful. I can see more rapid iteration for sequence design happening this way, generation to protein folding in 1 click evo.nitro.bio/

9,241

Eric Nguyen · Mar 20, 2024 · 5:34 PM UTC

Eric Nguyen

@exnx

20 Mar 2024

That was fast, I think they’re the first external team to finetune Evo. Cool application on AAVs

Kenny Workman

@kenbwork

20 Mar 2024

ML models like Evo + AlphaFold are emerging to engineer molecules. It is sometimes unclear how to use these tools to build drugs. We trace a realistic design, build and test loop of virus capsids on LatchBio.

11,562

Eric Nguyen · Sep 22, 2023 · 2:54 AM UTC

Eric Nguyen

@exnx

22 Sep 2023

🎉🧬🎉🧬🎉 #NeurIPS2023 @MichaelPoli6 @marjanfaizi

4,137

Eric Nguyen · Dec 13, 2023 · 6:35 AM UTC

Eric Nguyen

@exnx

13 Dec 2023

HyenaDNA poster went awesome! Honestly had sooooo much fun sharing the work with folks interested and hanging out with the coolest, brightest teammates I could ask for. So proud and happy to work with them, and definitely a highlight of my PhD.

6,910

Eric Nguyen · Feb 27, 2024 · 5:53 PM UTC

Eric Nguyen

@exnx

27 Feb 2024

I 💛 working with Brian. Since when do you have a PI coding harder than you sometimes? 🚀🧬⚡️

Brian Hie @BrianHie

27 Feb 2024

In some new work (the first from the new lab!), we lay out a vision for a biological foundation model that unites DNA, RNA, and protein modalities and operates at molecular, systems, and genome levels of scale. Blog: arcinstitute.org/news/blog/e… Preprint: arcinstitute.org/manuscripts…

13,448

Eric Nguyen · Nov 14, 2023 · 7:05 PM UTC

Eric Nguyen

@exnx

14 Nov 2023

HyenaDNA is now integrated with the HuggingFace Transformer library! 🧬😍🧬😍🧬😍 You can load up a pretrained HyenaDNA model in a few lines of code 🚀 Protip: the HF HyenaDNA model class is the same as regular Hyena, so you can train it on natural language too 😉

Matthew Carrigan @carrigmat

14 Nov 2023

Big genomics news today at @huggingface: We're delighted to welcome HyenaDNA to the Hub! Models: huggingface.co/collections/L… Paper: arxiv.org/abs/2306.15794 Thanks to @HazyResearch @exnx @MichaelPoli6 @marjanfaizi for the model, and for your work on the port! More info in 🧵

12,449

Eric Nguyen · Jan 8, 2024 · 7:54 PM UTC

Eric Nguyen

@exnx

8 Jan 2024

🧬 🚀🧬🚀 I'm incredibly excited to co-organize the 1st MLGenX Workshop!!! (Machine Learning for Genomics Exploration) It will be at ICLR 2024 in Vienna, Austria! Call for papers here: mlgenx.github.io/index.html#…… Awesome speaker line up! Can't wait 🧬😀🧬 @EhsanHRA

9,107

Eric Nguyen · Mar 28, 2024 · 3:34 PM UTC

Eric Nguyen

@exnx

28 Mar 2024

What do mechanistic interpretability & scaling laws have in common? We introduce a method called MAD: Mechanistic Architecture Design Uses synthetics to hybridize models & do scaling laws on emerging archs eg transf++ hyena mamba (500+ models, upto 7B) arxiv.org/abs/2403.17844

Mechanistic Design and Scaling of Hybrid Architectures

The development of deep learning architectures is a resource-demanding process, due to a vast design space, long prototyping times, and high compute costs associated with at-scale model training...

arxiv.org

Michael Poli

@MichaelPoli6

28 Mar 2024

📢New research on mechanistic architecture design and scaling laws. - We perform the largest scaling laws analysis (500+ models, up to 7B) of beyond Transformer architectures to date - For the first time, we show that architecture performance on a set of isolated token manipulation tasks is correlated with metrics of interest at scale, such as compute-optimal loss. Say hello to fast architecture improvement! - Striped architectures consistently outperform homogeneous architectures, as they benefit from specialization of each layer type to particular subtasks An avalanche of other findings in the paper: 📝Paper: arxiv.org/abs/2403.17844 🖥️Repo: github.com/athms/mad-lab

15,119

Eric Nguyen · Jul 26, 2024 · 11:51 PM UTC

Eric Nguyen

@exnx

26 Jul 2024

When I told someone what ICML is like, they said, “so it’s a science fair for adults?” 😂 I had a blast working with these guys and sharing the work in Vienna

2,721

Eric Nguyen · Dec 18, 2024 · 6:06 PM UTC

Eric Nguyen

@exnx

18 Dec 2024

Prompt engineered Evo! It's wild to see Evo used to mine *real* biological machinery beyond what's found in nature, and *validated* in the wet lab Incredible work led by @aditimerch, @samuelhking & @BrianHie . I think they're on to something

Brian Hie @BrianHie

18 Dec 2024

In new work led by @aditimerch with @samuelhking, we prompt engineer Evo to perform function-guided protein design with high experimental success rates, including designs that go beyond natural sequences. We also release SynGenome, the first AI-generated genomics database. 🧵 1/N

ALT Semantic mining with Evo animation. Credit: Chiara Ricci-Tam, Arc Institute.

5,000

Eric Nguyen · Apr 22, 2024 · 4:49 AM UTC

Eric Nguyen

@exnx

22 Apr 2024

This Evo talk had better audio quality and some good discussion. Brian also makes a guest appearance :) piped.video/watch?v=VhRBYcCy…

Sequence modeling and design from molecular to genome scale with Evo

Sequence modeling and design from molecular to genome scale with Ev...

youtube.com

5,932

Eric Nguyen · Dec 12, 2023 · 4:44 PM UTC

Eric Nguyen

@exnx

12 Dec 2023

HyenaDNA poster is up! Come chat with us!

5,732

Eric Nguyen · Jul 21, 2024 · 9:02 PM UTC

Eric Nguyen

@exnx

21 Jul 2024

I'm heading to Vienna for ICML to share our work on hybrid architecture design & scaling, which helped us figure out how to train Evo on DNA: MAD: arxiv.org/abs/2403.17844 @MichaelPoli6 @ai_with_brains Excited to talk all things bio LMs too, feel free to reach out

6,075

Eric Nguyen · Nov 13, 2023 · 7:15 PM UTC

Eric Nguyen

@exnx

13 Nov 2023

Can we make convolutional LLMs even faster + longer? Introducing FlashFFTConv! Bringing the innovation from FlashAttention to convolutional LLMs! (Boosts HyenaDNA to 4M context, covering *longest human gene* at single nucleotide res!) Thx for colab @realDanFu @KumbongHermann!

Dan Fu

@realDanFu

13 Nov 2023

Announcing FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores! We speed up exact FFT convolutions by up to 7.93x over PyTorch, reduce memory footprint, and get 4.4x speedup end-to-end. Read on for more details: Thanks @arankomatsuzaki and @_akhaliq for tweeting it out yesterday :) The key idea: map the FFT onto tensor cores using a Monarch decomposition -- which allows kernel fusion for long sequences, and uses fast matmul units to compute the FFT (pic 2). The FFT convolution allows us to compute the convolution in O(N log N) time, instead of O(N^2) from computing it directly (as in PyTorch nn.Conv1d). With advances in gated convolutions & gated SSMs, this means that we have a sub-quadratic alternative to attention that scales well! But the FFT convolution has a critical flaw for ML - it has low hardware utilization on GPUs. We're talking several times slower than FlashAttention/FlashAttention-v2 until you get to very long sequences. There's two critical bottlenecks: the FFT convolution incurs a lot of expensive I/O to store intermediate outputs, and it doesn't use tensor cores (16x faster than general arithmetic on A100/H100)! Kernel fusion can partially address the I/O problems (32K & shorter), but you run into SRAM limits at long sequences. FlashFFTConv addresses these drawbacks, and achieves speedup over FlashAttention-v2 at sequences as short as 2K. The core idea is that a Monarch decomposition allows us to break up the FFT into smaller parts that can be computed using matrix-matrix multiply (pic 3). This decomposition brings another benefit: Since the FFT is being split into smaller parts, you can avoid SRAM limitations even for long sequences -- we support sequences up to length 4 million. Now, there's a good old-fashioned systems tradeoff with this Monarch decomposition. You can recurse on the decomposition to break the FFT down into more parts -- compute the FFT in 2 parts, vs 3 parts or 4 parts. Higher-order decompositions reduce your FLOPs, but require more I/O between intermediate steps. This introduces a natural tradeoff, which we model using a cost model that takes both compute time and I/O time into account (pic 4). (There's another tradeoff for higher-order -- if your sequence is too short, you actually get to matmuls that are too small to fill up your tensor cores). Our cost model gives us a natural guide -- we can use a order-2 decomposition for sequences up to 4K, then order-3 for sequences up to 32K, and then order-4 for sequences up to 4M. Even longer sequences may require even higher-order breakdowns! The upshot of all of this is end-to-end speedup and higher utilization. We see up to 4.4x speedup for long sequence models (where a bunch of the gain also comes from memory reduction). With these gains, long convolutional models are now competitive with FlashAttention-v2 at sequences as short as 2K -- with more speedup for longer sequences. FlashFFTConv is already being used to support several internal research projects, and we'll be releasing new long-sequence models trained with FlashFFTConv this week -- stay tuned for more! With great collaborators @KumbongHermann, @exnx, and @HazyResearch. Hermann is applying to PhD programs this year -- he's a beast, you should hire him! Shoutouts to @tri_dao, @BeidiChen for discussions on early versions of this work and always pushing the boundaries of ML & systems :) Supported by resources from @StanfordAILab, @StanfordCRFM, @StanfordHAI. Developed in collaboration with the great folks at @togethercompute, and soon available for select models in their new inference API. Check out our paper, blog, and code for more details: Paper: arxiv.org/abs/2311.05908 Blog: hazyresearch.stanford.edu/bl… Code: github.com/HazyResearch/flas…

14,405

Eric Nguyen · Jun 27, 2024 · 2:44 AM UTC

Eric Nguyen

@exnx

27 Jun 2024

Incredibly exciting work from @mgdurrant and @pdhsu ! They're on 🔥🔥🔥

Patrick Hsu

@pdhsu

26 Jun 2024

What if we could universally recombine, insert, delete, or invert any two pieces of DNA? In back-to-back @Nature papers, we report the discovery of bridge RNAs and 3 atomic structures of the first natural RNA-guided recombinase - a new mechanism for programmable genome design

2,638

Eric Nguyen · Nov 7, 2023 · 7:28 PM UTC

Eric Nguyen

@exnx

7 Nov 2023

maybe I am a little too excited for Neurips this year… #hyenadna

6,054

Eric Nguyen · Jul 29, 2024 · 8:16 AM UTC

Eric Nguyen

@exnx

29 Jul 2024

Post icml detour. Go team USA !!! 🇺🇸

2,331

Eric Nguyen · Feb 27, 2024 · 5:52 PM UTC

Eric Nguyen

@exnx

27 Feb 2024

I 💛 working with Patrick

Patrick Hsu

@pdhsu

27 Feb 2024

Is DNA all you need? In new work, we report Evo, a genomic foundation model that learns across the fundamental languages of biology: DNA, RNA, and proteins. Evo is capable of both prediction tasks and generative design, from molecular to whole genome scale.

8,173

Eric Nguyen · Aug 23, 2023 · 3:53 PM UTC

Eric Nguyen

@exnx

23 Aug 2023

I'm excited to be giving a talk on our recent work on HyenaDNA with OpenBioML! It's a public virtual talk, so any one is welcome. Sept 7: 10am PST meet.google.com/ftf-vios-rjm More info on their Discord Blog: hazyresearch.stanford.edu/bl… Paper: arxiv.org/abs/2306.15794

OpenBioML @openbioml

23 Aug 2023

We are happy to announce the next edition of our journal club, hosted on Thursday September 7th at 7 pm CEST and featuring @exnx for a talk on his recent "HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution" (nitter.app/exnx/status/1674441537…)

11,547

Eric Nguyen · Jul 22, 2023 · 6:59 PM UTC

Eric Nguyen

@exnx

22 Jul 2023

Heading to Honolulu for ICML now! Come talk to us about Hyena (or HyenaDNA) at our poster session :) Poster: Wed 26th at 2pm HST. Oral talk: Thurs 348pm HST I’m here all week, feel free to reach out. Looking forward to all the great research chatter! @MichaelPoli6 @Massastrello

15,962

Eric Nguyen · May 2, 2024 · 4:12 PM UTC

Eric Nguyen

@exnx

2 May 2024

I'll be in Vienna for ICLR '24 starting Sunday - open to chat everything Evo and bio foundation models! Reach out if you're working on cool applications of bio FMs, the crazier the better I'll also be at the MLGenX workshop too mlgenx.github.io/index.html

2,434

Eric Nguyen · Dec 10, 2023 · 5:49 PM UTC

Eric Nguyen

@exnx

10 Dec 2023

At Neurips all week, SO excited to talk everything long context + bio. Say 👋 at 1. HyenaDNA poster Tue 1045am #2015 2. GenBio workshop, I’ll be moderating! So many cool papers & topics. The field is 🔥⚡️🔥 3. @michaelpoli + Hyena team'll be here jazzed abt StripedHyena-7B :)

6,731

Eric Nguyen · Jun 10, 2024 · 5:50 PM UTC

Eric Nguyen

@exnx

10 Jun 2024

I'm giving a talk on Evo at Gordian Bio's happy hour this Wed evening, come say hello :) link: lu.ma/h2106rjx?tk=mTrGTC @GordianBio @MartinBJensen

4,734

Eric Nguyen · Jan 18, 2024 · 3:42 AM UTC

Eric Nguyen

@exnx

18 Jan 2024

🧬🚀🧬If you work at the intersection of ML & genomics, we're looking for reviewers at the 1st MLGenX workshop, ICLR 2024. Great excuse to go to Austria 😀 & get involved in the community + see latest research papers due feb 4, reviews happen feb 5-19 mlgenx.github.io/index.html

MLGenX Workshop @ ICLR 2026 @MLGenX

16 Jan 2024

Are you a researcher in ML for functional genomics and biological discovery? We're looking for reviewers to help us between 5 - 19 Feb. If you're interested, please fill out this form: forms.gle/tZoUCQJXYHVjSe6Z8 #MLGenX #ICLR2024

2,193

Eric Nguyen · Mar 20, 2024 · 2:41 AM UTC

Eric Nguyen

@exnx

20 Mar 2024

Fascinating discussion about DNA foundation models and Evo on the podcast. I love hearing public discourse on these topics and the bigger picture, well said @velvetatom :)

Tess van Stekelenburg

@velvetatom

20 Mar 2024

Can we use AI to run millions of years of biological evolution with the click of a button? While this future is still several iterations away - had a lot of fun on Securities discussing: - NVIDIA recent entry in bio - rise of DNA foundation models - AI as biological 'search'

10,278

Eric Nguyen · Jul 27, 2023 · 12:32 AM UTC

Eric Nguyen

@exnx

27 Jul 2023

Lots of buzz around Hyena!

6,283

Eric Nguyen · Oct 10, 2023 · 4:40 PM UTC

Eric Nguyen

@exnx

10 Oct 2023

🧬🧬🧬 cool to see folks build on Hyena(DNA) in bio. arxiv.org/abs/2310.02713

5,313

Eric Nguyen · Mar 9, 2024 · 9:47 PM UTC

Eric Nguyen

@exnx

9 Mar 2024

Yunha and Tatta Bio are hiring :) highly recommend taking a look digitalharbor.notion.site/Jo…

Job Board | Notion

Overview

digitalharbor.notion.site

3,010

Eric Nguyen · Jul 9, 2024 · 4:15 PM UTC

Eric Nguyen

@exnx

9 Jul 2024

Cool work on making Hyena next token prediction equivariant

Artem Moskalev @artemmoskalev

9 Jul 2024

🏄‍♂️Long-convolutional models go equivariant! Check our new work on SE(3)-Hyena for scalable equivariant learning. Can equivariantly process up to 3.5M tokens with global (aka all-to-all) context on a single A10 GPU. To appear at @GRaM_workshop. Paper: arxiv.org/abs/2407.01049 1/5

2,196

Eric Nguyen · Feb 27, 2024 · 4:25 PM UTC

Eric Nguyen

@exnx

27 Feb 2024

We’ve made it super easy to try Evo with a web UI hosted on Together AI, with chat-like interface for generating so you can tune generation. huggingface.co/togethercompu…

togethercomputer/evo-1-131k-base · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

2,112

Eric Nguyen · Mar 14, 2024 · 4:52 PM UTC

Eric Nguyen

@exnx

14 Mar 2024

w/ demo video of Evo

12,576

Eric Nguyen · Apr 11, 2024 · 11:26 PM UTC

Eric Nguyen

@exnx

11 Apr 2024

This'll be fun 🧬

Machine Learning for Protein Engineering Seminar @ml4proteins

11 Apr 2024

Next Tuesday, April 16th @ 4 pm ET we're very excited to have @BrianHie @exnx present Evo: sequence modeling and design from molecular to genome scale Read the preprint here: biorxiv.org/content/10.1101/… Sign up on our website to be added to the mailing list to receive Zoom links!

2,989

Eric Nguyen · May 14, 2024 · 7:54 PM UTC

Eric Nguyen

@exnx

14 May 2024

looking forward to the in-person meetup and talking bio FMs and Evo! 🧬 ⚡️

Valence Labs

@valence_ai

14 May 2024

May 16th is the new date for our TechBio meetup @Stanford with @stanfordbiotech. @bertonearnshaw will be there to give an overview of Valence Labs. We’ll also hear from @exnx on Evo, a long-context biological foundation model. RSVP here: portal.valencelabs.com/event…

2,642

Eric Nguyen · Feb 27, 2024 · 4:25 PM UTC

Eric Nguyen

@exnx

27 Feb 2024

Prev bio LMs could only model / generate 1 modality at a time & use short context DNA is too long & noisy, unclear if a DNA model could do well at tasks for all 3 modalities Evo is 1st model to be competitive at DNA/RNA/protein tasks against modality-specific models

2,353

Eric Nguyen · Nov 14, 2024 · 8:59 PM UTC

Eric Nguyen

@exnx

14 Nov 2024

learning from Michael has been a privilege!

Michael Poli

@MichaelPoli6

14 Nov 2024

An absolute privilege to see our work on Evo🧬 highlighted on the cover of the latest issue of Science. Thank you to all the friends and collaborators at Stanford (@StanfordAILab) and the Arc Institute (@arcinstitute) @exnx @BrianHie @pdhsu @HazyResearch @StefanoErmon and more. A lot has happened since the first release of Evo. We have made public the original pretraining dataset (OpenGenome)—links below—and will soon release the entire pretraining infrastructure and model code. We continue to push the scale of what's possible with "beyond Transformer" models applied to biology, in what could be among the most computationally intensive fully open (weights, data, pretraining infrastructure) sets of pretrained models—not only in biology, not only with new architectures, but across AI as a whole. While it's super exciting to consider the potential impact of Evo on biology, I also think it's valuable to spend a few words on the implications that Evo and related projects have for AI. There has been a flurry of work over the last couple of years (from the great people at @HazyResearch and a few other places) on developing bespoke model designs as "proofs of existence" to challenge the Transformer orthodoxy, at a time when model design was considered "partially solved." We have seen time and again that various classes of computational units outperform others in different modalities, on different tasks, in different regimes. We've seen this in scaling laws, on synthetics, on inference. It's now undeniable that, with a little bit of creativity, improving scaling is not only approachable but also particularly rewarding—which is exciting at a time when pure train-time scaling appears to be under attack by some (oh no, we are hitting a wall!—rant for another day). We have the tools to develop and adapt foundation models to ensure they fit the requirements of different modalities. We have the means to remove questionable tokenization schemes and make sure AI remains true to its promise of learning "end-to-end." And while I'm obviously excited by convolution-attention hyena hybrids due to their balance of efficiency and quality across domains, there's a lot more to do, particularly when considering multiple angles during model design: deployment platforms, inference requirements, target benchmarks, effects of architecture design on sampling, post-training methods... Flexibility is what we need—and I'm looking forward to sharing more soon about how we're thinking about this problem.

1,108

Eric Nguyen · Feb 27, 2024 · 4:25 PM UTC

Eric Nguyen

@exnx

27 Feb 2024

Evo can understand bio function at whole genome level For in silico gene essentiality test, Evo can predict which genes are essential to organism’s survival based on small DNA mutations zero-shot + no supervision Normally done in wet lab 6-12 mos but now can be done in secs

1,792

Eric Nguyen · Feb 27, 2024 · 4:25 PM UTC

Eric Nguyen

@exnx

27 Feb 2024

Evo is trained on 2.7M whole prokaryotic genomes & plasmids - 300B tokens from raw, unannotated genomic seqs - using next nucleotide/token prediction - to accelerate science, we curated & open source OpenGenome, the largest DNA pretraining dataset (in coming days)

2,420

Eric Nguyen · Dec 16, 2023 · 4:57 PM UTC

Eric Nguyen

@exnx

16 Dec 2023

Awesome work led by @realDanFu, it was great to help out on the paper!

Dan Fu

@realDanFu

16 Dec 2023

Today I'm talking about FlashFFTConv at the ENLSP workshop (Efficient Natural Language and Speech Processing)! The talk is at 9:48 AM, and the poster session is from 1:00 to 2:00!

1,470

Eric Nguyen · Aug 1, 2024 · 5:18 PM UTC

Eric Nguyen

@exnx

1 Aug 2024

makes sense to just keep sampling!

Jordan Juravsky

@jordanjuravsky

1 Aug 2024

Do you like LLMs? Do you also like for loops? Then you’ll love our new paper! We scale inference compute through repeated sampling: we let models make hundreds or thousands of attempts when solving a problem, rather than just one. By simply sampling more, we can boost LLM performance across a range of math and coding tasks, allowing weaker models (ex. Llama-3-8B) to outperform single attempts from more capable models (ex. GPT-4o). Notably, with DeepSeek-Coder-V2-Instruct and 250 attempts, we solve 56% of issues from SWE-bench Lite, outperforming the single-attempt SOTA of 43%. Joint work with @brad19brown, @ryansehrlich, @ronnieclark__, @quocleix, @HazyResearch, and @Azaliamirh! More details below:

1,325

Eric Nguyen · Mar 1, 2024 · 5:36 PM UTC

Eric Nguyen

@exnx

1 Mar 2024

Here’s to open and accessible science 🧪 🧬 What’s cool is you can prompt with text to control dna design We hope this will accelerate impactful research by others in the scientific community 🧫 🧪 🧬 @ai_with_brains @MichaelPoli6

Armin W. Thomas

@athmsx

1 Mar 2024

🧬 + 🤖 Evo, our recently released 7B long-context genomic foundation model, can be run in a simple notebook! 👉 Check out this Google Colab showing how to use Evo to go all the way from DNA generation to protein folding: colab.research.google.com/gi…

2,527

Eric Nguyen · Nov 14, 2024 · 9:02 PM UTC

Eric Nguyen

@exnx

14 Nov 2024

Gota love the Hazy vibes :)

Hazy Research: Strip Mall AI Research Club

@HazyResearch

14 Nov 2024

Thank you to @arcinstitute and the great @BrianHie for teaching us so much and taking a chance on a group that insists on naming its models after zoo animals (hyenaDNA, (hungry or not!) hippos, mamba) ... and its odd choices in AI architectures too. This project wouldn't have happened unless the effort was led by brilliant students @MichaelPoli6 and @exnx, who took a risk on training a DNA model with an advisor who finds the human body to be disgusting (beings of pure energy would be way more fun!). They wisely found better collaborators, built something amazing, and let me peak in... The whole lab is incredibly proud of them. These new state space models are fun, and so many in our lab have contributed to this line of work. We're so close to the beginning of the AIs we can build. Excited for what's next!

844

Eric Nguyen · Dec 8, 2023 · 9:29 PM UTC

Eric Nguyen

@exnx

8 Dec 2023

😍🚀 So excited to announce StripedHyena! Lots of fun releases coming up.

Michael Poli

@MichaelPoli6

8 Dec 2023

We've been hard at work pushing the frontiers of efficient architecture design and optimization. StripedHyena-7B is the result: the first alternative architecture truly competitive with the best Transformers of its size or larger. And it's very fast.

871

Eric Nguyen · Oct 27, 2024 · 3:12 AM UTC

Eric Nguyen

@exnx

27 Oct 2024

Matt is brilliant to work with! I think he’d be great faculty

Matt Durrant @mgdurrant

26 Oct 2024

Hey folks, I know most deadlines are fast approaching, but I am applying for assistant professor positions this cycle, and I would appreciate recommendations for open positions! My research plan is focused on MGEs, genome engineering, and applying AI to biological discovery.

2,221

Eric Nguyen · May 6, 2024 · 4:32 PM UTC

Eric Nguyen

@exnx

6 May 2024

a nice hail thunderstorm welcome in Vienna

1,649

Eric Nguyen · Jun 26, 2024 · 12:08 AM UTC

Eric Nguyen

@exnx

26 Jun 2024

A podcast a year in the making! ... Sam reached out after HyenaDNA last year but I wanted to wait until we had "more". Got to talk about all the research from @HazyResearch that went into long sequence models and its application in Evo w/ @arcinstitute and the whole team.

The TWIML AI Podcast

@twimlai

25 Jun 2024

Today, we're joined by Eric Nguyen (@exnx), PhD student at @stanford University, to discuss his research on long context foundation models in biology, particularly: 🧬 Hyena 🧬 Hyena DNA 🧬 Evo We dig into several topics across his research including convolutional models vs transformer models in long sequences, their model training and architectures, FFT optimizations, generating & designing DNA, hallucinations in DNA models, evaluation benchmarks, zero-shot vs. few-shot performance, potential in creating novel gene editing tools like CRISPR-Cas, and more! 🎧 / 🎥 Listen or watch the full episode at: twimlai.com/go/690. Note: Please note that this episode has audio and video sync issues. We apologize for any inconvenience this may cause and appreciate your understanding. Thank you for watching! 📖 CHAPTERS =============================== 00:00 - Introduction 1:14 - Motivation for Hyena architecture 2:39 - Limitations of transformer architectures with longer sequences 5:06 - Role of Fast Fourier Transform (FFT) in Hyena 7:54 - Explainability in long-sequence convolutions 9:07 - Hyena model 14:45 - Hyena DNA 19:10 - Hyena DNA model training 21:11 - Evo 24:32 - Designing DNA with language models 25:52 - Transformer-based approaches to DNA 28:21 - Hallucination in DNA models 33:41 - Evo gene editing tools 35:30 - Evo evaluation benchmarks 38:21 - Evo vs state-of-the-art models 40:38 - Zero-shot vs a few-shot performance 42:06 - Future directions

1,522

Eric Nguyen · May 13, 2024 · 3:50 PM UTC

Eric Nguyen

@exnx

13 May 2024

Incredible work by @bfspector making flash attention even faster. @HazyResearch ftw!

Benjamin F Spector

@bfspector

12 May 2024

(1/7) Happy mother’s day! We think what the mothers of America really want is a Flash Attention implementation that’s just 100 lines of code and 30% faster, and we’re happy to provide. We're excited to introduce ThunderKittens (TK), a simple DSL embedded within CUDA that makes it easy to express key technical ideas for building AI kernels. TK lets us write clean, easy-to-understand code that maximizes GPU utilization -- on all kinds of kernels! Code: github.com/HazyResearch/Thun… Writeups: (short) hazyresearch.stanford.edu/bl…, (long) hazyresearch.stanford.edu/bl…. Joint with @AaryanSinghal4, @simran_s_arora, @HazyResearch and team!

1,389

Eric Nguyen · Nov 15, 2024 · 3:08 AM UTC

Eric Nguyen

@exnx

15 Nov 2024

Learned so much about CRISPR systems from Brian!

Brian Kang @realbriankang

15 Nov 2024

Evo is now out in Science with experimental validation of model-generated Cas9 and transposon systems!

1,324

Eric Nguyen · Mar 1, 2024 · 5:12 PM UTC

Eric Nguyen

@exnx

1 Mar 2024

The first of (hopefully) many more tools and tutorials for the comp bio community on how to use Evo 🧬 What’s also incredible, you can prompt with text to control dna design We aim for impact and to accelerate science where we can 🧪 🧬 ⚡️ @MichaelPoli6 @ai_with_brains

Michael Poli

@MichaelPoli6

1 Mar 2024

In case you missed it: a new 7B StripedHyena model is out (the longest context one, yet), Evo-1 7B 🧬. And it now runs in a single notebook (powered by @togethercompute), from DNA generation to protein fold.

1,087

Eric Nguyen · Nov 29, 2023 · 7:45 AM UTC

Eric Nguyen

@exnx

29 Nov 2023

so exciting to see the early stages of using deep learning to design DNA sequences

Laura Gunsalus @lauragunsalus

28 Nov 2023

We’re excited to share Polygraph, a Python framework for evaluating native and synthetic DNA regulatory elements! @lal_avantika @gokcen biorxiv.org/content/10.1101/…

595

Eric Nguyen · Dec 16, 2023 · 12:05 AM UTC

Eric Nguyen

@exnx

16 Dec 2023

SO excited about moderating this year's (and first) GenBio conference at Neurips tomorrow! The organizers gathered an awesome line up of researchers and leaders in a space (generative bio) that is just accelerating 🚀🧬🚀 🧬

GenBio Workshop @ ICML26 @genbio_workshop

15 Dec 2023

Hope to see you all tomorrow! #NeurIPS2023

962

Eric Nguyen · Nov 14, 2024 · 9:21 PM UTC

Eric Nguyen

@exnx

14 Nov 2024

Evo not possible without Matt's incredible work and genome "whispering" skills

Matt Durrant @mgdurrant

14 Nov 2024

Thrilled to see this published! Lots of wonderful progress, including experimental validation of Evo-generated Cas9 and IS200/IS605 transposases. I am definitely a true believer when it comes to DNA language models, they have quickly become central to my discovery efforts.

1,078

Eric Nguyen · Feb 27, 2024 · 6:26 PM UTC

Eric Nguyen

@exnx

27 Feb 2024

I 💛 Together's open source commitment 🙏

Together AI

@togethercompute

27 Feb 2024

Introducing Evo: a long-context biological model based on StripedHyena that generalizes across DNA, RNA, and proteins. It is capable of prediction tasks and generative design, from molecular to whole genome scale (over 650k tokens in length). together.ai/blog/evo

1,512

Eric Nguyen · Feb 27, 2024 · 4:25 PM UTC

Eric Nguyen

@exnx

27 Feb 2024

This work was made possible with the entire Evo team, the most incredible researchers across @HazyResearch @arcinstitute @StanfordAILab @StanfordHAI @MichaelPoli6 @mgdurrant @ai_with_brains @realbriankang @StefanoErmon @pdhsu @BrianHie

1,324

Eric Nguyen · Mar 7, 2024 · 12:12 AM UTC

Eric Nguyen

@exnx

7 Mar 2024

Fun summary :) Big fan of the videos, @JuliaBauman2

Julia Bauman

@JuliaBauman2

6 Mar 2024

Evo: a foundation model for genomics! Using a well suited architecture, Evo learns from billions of bp of genomic sequence and performs well on several zero-shot prediction tasks on RNA, DNA and protein. Beautiful paper from @BrianHie & @pdhsu at @arcinstitute

1,135

Eric Nguyen · Oct 23, 2023 · 9:24 PM UTC

Eric Nguyen

@exnx

23 Oct 2023

super exciting work from our Hazy Research lab mate, @realDanFu and many others! Sub-quadratic scaling in sequence length and model width, for BERT! + Oral at Neurips this year!

Dan Fu

@realDanFu

23 Oct 2023

Excited about models that are sub-quadratic in sequence length and model dimension? Our Monarch Mixer paper is now on arXiv -- and super excited to present it as an oral at #NeurIPS2023! Let's dive in to what's new with the paper and the new goodies from this release: Monarch matrices are an expressive and hardware-efficient set of matrices that generalize the FFT -- and can be used to represent all sorts of fun linear transforms, from Hadamard transforms to Toeplitz matrices and more. Monarch mixer (M2) uses Monarch matrices to mix information both along the sequence (replacing attention) and along the model dimension. M2 replaces attention in Transformers with gated convolutions, and replace the linear layers in MLPs with sparse block-diagonal matrices. The result are architectures that scale sub-quadratically in both sequence length and model dimension! Back in July, we released a short blog post (hazyresearch.stanford.edu/bl…) with @togethercompute about using Monarch matrices to train some more efficient BERT models -- matching BERT-base in quality with 27% fewer parameters, and with long-context inference throughput. With this release, we're excited to announce two new M2-BERT-large models -- the 260M version matches BERT-large in downstream GLUE score with 24% fewer parameters (and also has much faster long-context throughput). Our paper also has a whole set of theoretical goodness that we didn't get to in our blog post. For causal language modeling -- e.g. GPT-style or decoder-only language modeling -- we need to parameterize the Monarch matrices to make sure that the sequence mixing is causal. This ensures that you can train with next token prediction, GPT-style. We use a mix of polynomial theory to interpret Monarch matrices as bivariate polynomial evaluation, and then causality is just a matter of keeping the degrees in check. (If you're familiar with the FFT convolution theorem, this is equivalent to the padding trick to turn the circular convolution into a causal convolution). Using this theory, we can train M2-GPT models -- fully sub-quadratic in the sequence length. In a weird twist, we found that we can get rid of the MLP layers entirely, and still match GPT performance... wild! Check out our paper, code, and blog post for more details: Paper: arxiv.org/abs/2310.12109 Code: github.com/HazyResearch/m2 Blog: hazyresearch.stanford.edu/bl… With @simran_s_arora, @Jessica_Grogan_, Isys Johnson, @EyubogluSabri, @ai_with_brains, @bfspector, @MichaelPoli6, Atri Rudra, and @HazyResearch Building on a lot of great work from great folks, including @tri_dao @_albertgu @davidwromero @srush_nlp @BeidiChen @exnx @BlinkDL_AI @MaxMa1987 @ramin_m_h and many many more! And of course, couldn't have done this work without support from @StanfordHAI @StanfordAILab @StanfordCRFM. In collaboration with @togethercompute. Check out our paper for more, and please reach out if you have ideas about usage or questions! arxiv.org/abs/2310.12109 And look forward to more soon ;)

2,525

Eric Nguyen · Feb 27, 2024 · 4:25 PM UTC

Eric Nguyen

@exnx

27 Feb 2024

Evo uses StripedHyena, a recent deep signal processing model with hyena + attn layers - 131k context, single nucleotide, byte level tokens - generates DNA @ 650k tokens long, 2500x longer than before paper arcinstitute.org/manuscripts… code github.com/evo-design/evo

2,990

Eric Nguyen · Dec 14, 2023 · 5:29 AM UTC

Eric Nguyen

@exnx

14 Dec 2023

Super cool neurips paper on using knowledge graphs on tabular data, especially in low data regimes + high dimensionality. I also love that @_camiloruiz is both a fellow bioengineer and deep learning researcher, we’re a rare breed :)

Camilo Ruiz @_camiloruiz

13 Dec 2023

Can deep learning work on small data with far more features than samples? We present PLATO: a method that achieves the state-of-the-art on such datasets by using prior domain information! neurips.cc/virtual/2022/post… 🧵 Published in #NeurIPS2023 with @ren_hongyu @kexinhuang5 @jure

1,175

Eric Nguyen · Feb 27, 2024 · 4:25 PM UTC

Eric Nguyen

@exnx

27 Feb 2024

Evo can design multimodal: large protein + ncRNA complexes to generate CRISPR systems, out of reach from all current bio gen models Evo enables a new approach to generating biological diversity - sampling seqs directly from a gen model = new forms of genome editing tools

1,654

Eric Nguyen · Jan 10, 2025 · 1:44 AM UTC

Eric Nguyen

@exnx

10 Jan 2025

Meta chain of thought 😍 Awesome work by @ZiyuX and team!

Rafael Rafailov @ NeurIPS

@rm_rafailov

9 Jan 2025

We have a new position paper on "inference time compute" and what we have been working on in the last few months! We present some theory on why it is necessary, how does it work, why we need it and what does it mean for "super" intelligence.

1,384

Eric Nguyen · Feb 27, 2024 · 5:57 PM UTC

Eric Nguyen

@exnx

27 Feb 2024

I 💛 working with Armin

Armin W. Thomas

@athmsx

27 Feb 2024

🧬 + 🤖 Say hello to "Evo": our long-context biological foundation model that generalizes across the fundamental languages of biology: DNA, RNA, and proteins: arcinstitute.org/news/blog/e…

1,514

Eric Nguyen · Feb 7, 2024 · 2:25 AM UTC

Eric Nguyen

@exnx

7 Feb 2024

The deadline for MLGenX workshop at ICLR in Vienna, Austria is this Thursday! (Got a few days extension) Good luck!

MLGenX Workshop @ ICLR 2026 @MLGenX

6 Feb 2024

📢Reminder that the MLGenX submission deadline is this Thursday, 8 Feb (AOE). Might be a good idea to start writing if you haven't yet! 📜📜📜 mlgenx.github.io #ICLR2024

870

Eric Nguyen · Oct 29, 2024 · 11:42 PM UTC

Eric Nguyen

@exnx

29 Oct 2024

awesome framework for combining multiple LMs!

Jon Saad-Falcon

@JonSaadFalcon

29 Oct 2024

Interested in Building O1-style LM systems that beat individual LMs? Checkout our latest tutorial on Archon, a modular framework for optimizing the combinations of multiple LMs and inference-time techniques! With Archon, we can beat LM systems that use individual state-of-the-art LMs, like GPT-4o and Claude-3.5-Sonnet, by 14.1% across instruction-following, reasoning, and coding tasks! Check out the tutorial to learn more! Tutorial: colab.research.google.com/dr… Github: github.com/ScalingIntelligen… Paper: arxiv.org/abs/2409.15254

738

Eric Nguyen · Apr 2, 2024 · 5:13 PM UTC

Eric Nguyen

@exnx

2 Apr 2024

Impressive !

Jacob Schreiber @jmschreiber91

2 Apr 2024

Sequence-based ML methods (Enformer, ChromBPNet...) are invaluable in genomics but the ecosystem for their *use* after training is less developed. Introducing, `tangermeme`: a PyTorch library for genomics discovery for everything-other-than-the-model. github.com/jmschrei/tangerme… 1.

1,589

Eric Nguyen · Feb 27, 2024 · 4:25 PM UTC

Eric Nguyen

@exnx

27 Feb 2024

We have lots of project resources for folks to play with, from the web UI to actually generate DNA, code, weights, colab blog arc-website-git-foundation-m… paper arcinstitute.org/manuscripts… code github.com/evo-design/evo

1,203

Eric Nguyen · Jan 26, 2024 · 1:42 AM UTC

Eric Nguyen

@exnx

26 Jan 2024

Patrick is doing some exciting stuff :)

Patrick Hsu

@pdhsu

25 Jan 2024

Just shared at @KeystoneSymp a new @ArcInstitute discovery of the bridge RNA recombinase mechanism: a new class of natural RNA-guided systems that retains the key property of programmability from RNAi and CRISPR while enabling large-scale genome design beyond RNA and DNA cuts

835

Eric Nguyen · Feb 27, 2024 · 4:25 PM UTC

Eric Nguyen

@exnx

27 Feb 2024

Evo marks a turning point in how we model biology. We believe this tech has the potential to accelerate discovery & understanding in the sciences + be applied to real-world problems, eg drug discovery, agriculture & sustainability. We’re just getting started.

1,205

Eric Nguyen · Nov 14, 2024 · 11:00 PM UTC

Eric Nguyen

@exnx

14 Nov 2024

Armin made Evo reach extra long context!

Armin W. Thomas

@athmsx

14 Nov 2024

Our work on Evo is now out in @ScienceMagazine and made it onto the cover! Grateful to have worked with an amazing team led by @exnx @MichaelPoli6 @BrianHie @pdhsu

1,073

Eric Nguyen · Nov 19, 2024 · 4:44 PM UTC

Eric Nguyen

@exnx

19 Nov 2024

Avanika is amazing!!! She's co-founding a company for enterprise AI agents!

Avanika Narayan

@Avanika15

19 Nov 2024

So excited to be launching @rox__ai with the incredible @IshanMkh, Diogo Ribeiro and @shriram_s. The Rox Agent Swarm is the first enterprise ready fleet of AI agents, designed to supercharge the world’s best sellers. Get your Agent Swarm now —> rox.com Rox is built for the best, with the best, and by the best (s/o @derhacobian, @amolsingh, Damon Lin)!!!

1,088

Eric Nguyen · Nov 14, 2024 · 8:50 PM UTC

Eric Nguyen

@exnx

14 Nov 2024

The team’s been hard at work training AI to learn from DNA, the fundamental language of life We show what DNA language models might be capable of... Moving beyond just a computational model - to something that can make *real-world* bio molecular machines

1,118

Eric Nguyen · Jul 27, 2023 · 5:49 PM UTC

Eric Nguyen

@exnx

27 Jul 2023

Team Hyena ftw! We had tons of fun at the poster session! Checkout our oral talk today at 348pm HST in room 313 (It’s the supervised learning track, not sure why they put us there lol)

854

Eric Nguyen · Jun 14, 2024 · 8:15 PM UTC

Eric Nguyen

@exnx

14 Jun 2024

Replying to @GordianBio @DeepOriginBio

Super fun sharing the work from the entire Evo team!

172

Eric Nguyen · Jun 29, 2023 · 3:35 PM UTC

Eric Nguyen

@exnx

29 Jun 2023

Thank you to the amazing HyenaDNA team! @MichaelPoli6 Marjan Faizi @ai_with_brains Callum Birch-Sykes @MichaelWornow @Massastrello Aman Patel @ClaytonRab Yoshua Bengio @StefanoErmon Stephen Baccus Chris Re And @HazyResearch @stanfordHAI @stanfordAILab Together, SynTensor! 10/

885

Eric Nguyen · Feb 20, 2024 · 6:18 PM UTC

Eric Nguyen

@exnx

20 Feb 2024

Awesome work from our Hazy labmate @gmachiraju, with cool new methods in explainability and attribution

Gautam "Machi" Machiraju @gmachiraju

20 Feb 2024

Given up on feature attribution? 📣 Thrilled to share *prospector heads* (aka “prospectors'') ⛏️ — a simple attribution method built for foundation models (FMs) & high-dimensional data. Prospectors are modality-generalizable, time-efficient, & excel in few-shot settings ✨ 🧵👇

755

Eric Nguyen · Oct 18, 2022 · 3:19 PM UTC

Eric Nguyen

@exnx

18 Oct 2022

We’re excited to see what other capabilities continuous-signal models can unlock in vision models, such as long-range temporal dependencies and robustness in videos! Our paper (accepted at NeurIPS ‘22!): arxiv.org/abs/2210.06583 Code: coming soon! 9/

Eric Nguyen · Feb 27, 2024 · 4:25 PM UTC

Eric Nguyen

@exnx

27 Feb 2024

Evo has the potential to generate seqs at the scale of whole genomes Evo generate sews 650k+ tokens on 1 GPU & find genomes that contain thousands of potential protein-coding seqs This is w/o any alignment which often needed even natural language, further improvement expected

1,421

Eric Nguyen · Sep 22, 2023 · 3:04 PM UTC

Eric Nguyen

@exnx

22 Sep 2023

Excited to be collaborating with Brian!

Brian Hie @BrianHie

20 Sep 2023

Delighted to share that I will be starting a research laboratory as a professor at @StanfordEng ChemE and @StanfordData, in collaboration with @arcinstitute. The lab will work on aligning biological AI with human good. evodesign.org/

1,045

Eric Nguyen · Feb 27, 2024 · 4:25 PM UTC

Eric Nguyen

@exnx

27 Feb 2024

web UI api.together.xyz/playground/… HF huggingface.co/togethercompu… Pip install (pypi.org/project/evo-model/0…

1,095

Eric Nguyen · Nov 14, 2024 · 8:50 PM UTC

Eric Nguyen

@exnx

14 Nov 2024

Thanks to @arcinstitute, @StanfordAILab & @StanfordHAI for providing support & resources to help this project and so many preceding projects that helped make this possible

460

Eric Nguyen · Nov 18, 2024 · 5:00 PM UTC

Eric Nguyen

@exnx

18 Nov 2024

Chris always has interesting perspective on new research directions, very thoughtful piece looking back on Evo and looking forward in other domains like physics

Hazy Research: Strip Mall AI Research Club

@HazyResearch

18 Nov 2024

An Unserious Person’s Take on Axiomatic Knowledge in the Era of Foundation Models. hazyresearch.stanford.edu/bl… This post explains why we started the work that led to Evo (HyenaDNA), recently on the cover of Science–thanks to a host of wonderful collaborators at @arcinstitute . It has some odd musing about foundation models and science... and what I got and continue to get wrong about foundation models, @StanfordCRFM. H/t @mzhangio for the memes. If you suffer to the end, this not very funny joke will make more sense... maybe. It's long.

822

Eric Nguyen · Nov 15, 2024 · 2:09 AM UTC

Eric Nguyen

@exnx

15 Nov 2024

David is simply brilliant! Learned so much from him

David Li @_David_Li

15 Nov 2024

Had lots of fun sprinting on Evo as my first big project at Stanford, with an awesome team of @exnx, @MichaelPoli6, @mgdurrant, @realbriankang, @dhruvakatrekar, @pdhsu, and @BrianHie! I generated and experimentally tested new ssDNA transposons using Evo (14/48 worked :O)

1,459

Eric Nguyen · Mar 7, 2023 · 7:21 PM UTC

Eric Nguyen

@exnx

7 Mar 2023

Excited to share our work on Hyena, an alternative to attn that can learn on sequences *10x longer*, up to *100x faster* than optimized attn, by using implicit long convolutions & gating Paper: arxiv.org/abs/2302.10866 Blog: hazyresearch.stanford.edu/bl… Code: github.com/HazyResearch/safa…

Michael Poli

@MichaelPoli6

7 Mar 2023

Attention is great. Are there other operators that scale? Excited to share our work on Hyena, an alternative to attn that can learn on sequences *10x longer*, up to *100x faster* than optimized attn, by using implicit long convolutions & gating 📜arxiv.org/abs/2302.10866 1/

563

Eric Nguyen · Dec 18, 2024 · 6:14 PM UTC

Eric Nguyen

@exnx

18 Dec 2024

Way to go @aditimerch ! What a first phd project!

Aditi Merchant @aditimerch

18 Dec 2024

Excited to have the first project of my PhD out!!! By leveraging Evo's ability to learn relationships across genes (i.e., "know a gene by the company it keeps"), we show that we can engineer highly divergent proteins with retained functionality. 🧵1/3

789

Eric Nguyen · Mar 5, 2024 · 2:24 AM UTC

Eric Nguyen

@exnx

5 Mar 2024

Based! from @HazyResearch :)

Simran Arora

@simran_s_arora

4 Mar 2024

Excited to release Based, an architecture that combines two✌️ simple, familiar, attention-like primitives – short (size-64) sliding window attention and softmax-approximating linear attention – to enable high quality and efficient inference! 💨 🚀 joint w/ @EyubogluSabri, @mzhangio, @HazyResearch and team!

1,212

Eric Nguyen · Jul 6, 2023 · 8:48 PM UTC

Eric Nguyen

@exnx

6 Jul 2023

Replying to @tri_dao @Stanford @Princeton @PrincetonCS

Princeton is damn lucky!

2,612

Eric Nguyen · Nov 14, 2024 · 8:50 PM UTC

Eric Nguyen

@exnx

14 Nov 2024

Brian Kang @realbriankang & Dhruva Katrekar @dhruvakatrekar led the wet lab validations of Evo Cas-9, a new AI-designed gene editing tool, that does not exist in nature, including a guide RNA that makes Cas9 cleave DNA even better. Just mind blowing to me!

439