For my latest attempt at introducing proteins to students, I made a Google Colab Notebook that predicts proteins from a single sequence. I asked the students to tweak the sequence to get a helix or two helices or... (1/5)
colab.research.google.com/gi…
RoseTTAFold updated to be All-Atom... biological assemblies containing proteins, nucleic acids, small molecules, metals, and covalent modifications ... and diffusion🤯biorxiv.org/content/10.1101/…
Excited to see colabfold published!
nature.com/articles/s41592-0…
Special thanks to @thesteinegger and @milot_mirdita (for MMseqs2) without whom I would have never considered to preprint let alone attempt to publish our notebook! (1/3)
AlphaFold inverted to hallucinate denovo proteins of up to 600 amino acids in length🤯
(animation below shows the designed protein docked into CryoEM density)
Exciting work with:
@chrisfrank662, @AKhoshouei, Yosta de Stigter, Dominik Schiewitz, @ShihaoFeng18, @hendrik_dietz
Ever wondered how many amino acids you can mutate to alanine and AlphaFold2 still predicts same structure? 🤔For denovo design Top7 (1QYS), single-sequence mode, it's 60%. (1/2)
Technical Report of HelixFold3 for Biomolecular Structure Prediction
The PaddleHelix research team at Baidu have released their AlphaFold3 replication under an open-source noncommercial license. Performance approaches that of AlphaFold3.
abs: arxiv.org/abs/2408.16975
website: paddlehelix.baidu.com/
github: github.com/PaddlePaddle/Padd…
Successfully predicted one of the @foldit denovo designs using #alphafold in google-colab😎 (1 model, no template, single sequence input, and no amber refine, ~2 mins). Notebook if anybody wanna try input your favorite sequence:
colab.research.google.com/dr…
We've been working on adding AlphaFold v2.3.1 support to ColabFold. 😎 Here is the notebook for those interested in testing: colab.research.google.com/gi… (1/5)
Homooligomeric prediction in #alphafold works a little too good. So far worked on nearly every case we (me & @minkbaek) tried. Going beyond dimers! Seems @deepmind accidentally "solved" the homooligomeric prediction problem (w/ MSA input) 😂 Give it a try: colab.research.google.com/gi…
7/ In particular, we showcase gLM2's ability to directly learn coevolutionary signal in protein-protein interfaces with no supervision! The learned contact maps can be extracted using @ZhidianZ et al's categorial Jacobian method.
Use your PhD thesis title as the prompt 🤓
Is it time to restart this trend? But now with #DALLE3?
(Here is mine: "Protein structure determination using evolutionary information")
Impatient? Wanna see models as the notebook is running? 😎 See ColabFold AlphaFold2_advanced. Also, you can now use '/' to specify chain breaks. For example: A/C will modeled as A and C (could be used to trim disordered regions, or specify 2+ proteins) github.com/sokrypton/ColabFo…
inspired by some of our work on AF2Rank (showing AF can be used to denoise template inputs) and RFdiffusion, I tried hacking AlphaFold to be a diffusion model for generating backbones. 😎
Weekend project! 🤓 So now that OpenFold weights are available. I was curious how different they are from AlphaFold weights and if they can be used for AfDesign evaluation. More specifically, if you design a protein with AlphaFold, can OpenFold predict it (and vice-versa)? (1/5)
Exciting new work from Qian Cong's group on predicting human protein interactome. Leveraging new eukaryotic genomes, new RoseTTAFold2 trained on +/- pairs of PPI and large distilled dataset of domain-domain interactions! 🤩
biorxiv.org/content/10.1101/…
AF3 server is LIVE! Just tried predicting complex with almost 5K amino acids.
TIP: you need to click "continue with google" to access the server (otherwise the "server" is grayed out). alphafoldserver.com/
"The AlphaFold parameters are made available under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) license" 🙂
github.com/deepmind/alphafol…
(thanks to @BrianWeitzner for alerting me)
BREAKING NEWS
The Royal Swedish Academy of Sciences has decided to award the 2024 #NobelPrize in Chemistry with one half to David Baker “for computational protein design” and the other half jointly to Demis Hassabis and John M. Jumper “for protein structure prediction.”
CASP is getting cut by NIH... 😢
(Anyone with extra funds wanna help support perhaps the most important competition of the century?)
science.org/content/article/…
We've (including @thesteinegger @milot_mirdita ) updated ColabFold to use the latest optimized AlphaFold implementation that reduces compile time from ~4.5mins to ~30 seconds! We've confirmed the results are identical for both monomer and multimer predictions. Try it out! (1/2)
And this is why code should always be published with papers... This group made an attempt to reproduce AlphaFold3 and found a number of potential issues in the published pseudo-code. See exciting 🧵 👇
🚀Excited to announce: Open-source AlphaFold3 implementation! 🚀
I am thrilled to announce one of the models we have been building for the last 8-weeks at Ligo - an open-source implementation of DeepMind’s frontier model, AlphaFold3! Here’s what we have learned, a thread (1/11):
We’re renewing our collaboration with @GoogleDeepMind!
We'll keep developing the AlphaFold Database to support protein science worldwide 🎉
To mark the moment we’ve synchronised the database with UniProtKB release 2025_03
ebi.ac.uk/about/news/technol…#AlphaFold
Weekend project: Comparing ESM3 from @EvoscaleAI to ESM2 and inv_cov. The ultimate test of a protein language models is how well the pairwise dependencies it learns correlate to structure. (1/8)
Interesting excuse... 🤔 hopefully this is not the start of closed source software in the field...
"As part of our commitment to releasing our research breakthroughs safely and responsibly, we will not be sharing model weights, to prevent use in potentially unsafe applications."
AlphaMissense, a tool by DeepMind, can help researchers learn more about the effects that missense mutations have on disease, and could help identify previously unknown disease-causing genes, according to a new Science study.
Learn more: scim.ag/49l
One request (for a tutorial next week), was to "hallucinate" a binder w/ AF for a given template structure. I tried w/ the spike protein. Surprisingly, at the get-go, even with a random starting sequence, AF already places the extra sequence near the ace2 binding site. (1/2)
Started teaching again! This time decided to try use #claude (@AnthropicAI) and @codesandbox for hosting to implement an interactive GREMLIN (Potts) model (w = coevolution, b = conservation) to show students how you go from MSA to contacts!
9kssnq.csb.app/
If we truly want to be "end-to-end" it seems... what we should be doing is using a 64-letter-code (4x4x4) not 20-letter-code, when training any model on protein sequences. 😅
nature.com/articles/s41467-0…
RoseTTAFold about to become 21X faster?
"NVIDIA just released an open-source optimized implementation that uses 9x less memory and is up to 21x faster than the baseline official implementation."
developer.nvidia.com/blog/ac…
📈 One standard graph in all bio deep learning papers should be: max similarity to anything in training set vs performance. (Reviewers shouldn't have to guess if there might be overfitting issues).
I think this is the most interesting/innovative part of BoltzGen. Diffusing to AF2-style encoding to co-generate both backbone and sidechains identities! 🤯
With this we train a model with the standard AF3 / Boltz-2 scalable architecture that has proven state-of-the-art for folding. Injecting conditioning inputs allows us to control the designed binder in various ways
We’ve been making phylogenetic trees differentiable :)
Check out our work at #ICML2023 workshops - Sampling and Optimization in Discrete Space (SODS) ፨ and Differentiable Almost Everything (DiffAE) 〆
Looking forward to discuss and learn more! 🙌
❤️ work with Avi & @sokrypton
Curious how this AlphaFold hallucination GIF was made?
See the AlphaFold hallucination "hack" implemented in google Colab here: 🙃
colab.research.google.com/gi…
AlphaFold/RosettaFold users: I was asked to give a talk summarizing how the community has been using AF/RF for "structural bioinformatics". I'll focus on deep structural homology/similarity insights that would otherwise not be possible with just sequences. Please share! (1/2)
Adding support for binder hallucination if anyone wants to try! (Code is very experimental, not intended for practical use... only use for art/science) 😀
colab.research.google.com/gi…
When there are few to no sequences in the MSA, we find sometimes changing the random_seed, or running for more recycles allows #alphafold to predict the correct structure. Notebook if anyone wants to try on their difficult cases: colab.research.google.com/gi…
Towards the end of the presentation I went down a bit of a rabbit hole trying to demonstrate that AF3 may still be learning to invert the convariance matrix, which is needed to extract the coevolution signal from input multiple sequence alignment (MSA) (1/9).
Nearly all existing protein-based therapeutics are created from a fraction of possible protein concepts. This is about to change. We are excited to share a publication in @Nature describing Chroma, an AI model that can program novel proteins. generatebiomedicines.com/new…
A recent preprint from @Lauren_L_Porter shows that it's sometimes possible to sample the alternative conformation of metamorphic proteins by removing the MSA. Though I think this is a very interesting observation, I disagree that coevolution is not used when it is provided. (1/9)
ALT https://www.biorxiv.org/content/10.1101/2023.11.21.567977v2
Pretty cool! Though... not sure if it's fair to call it a "single-sequence" method. Since large Language Models are essentially storing/retrieving MSA info (conservation/coevolution) from a single-sequence input. 🤔 (1/2)
Protein structure can be predicted from a single sequence alone with high accuracy.
@HelixonBio team have developed OmegaFold, achieving performance similar to RF and AF2's MSA versions. Only a single sequence is given as input. 1/5
I tried running our categorical Jacobian method (for extracting coevolution signal from language models) on Evo from @BrianHie@pdhsu on the 16S rRNA. It appears to pickup on local hairpins 🤓(1/3).
When there are few to no sequences in the MSA, we find sometimes changing the random_seed, or running for more recycles allows #alphafold to predict the correct structure. Notebook if anyone wants to try on their difficult cases: colab.research.google.com/gi…
Interested in protein evolution, protein folding, and ML? Looking for a short 1-2 year postdoc in Boston? I got some extra funding, message me!
Official ad: jhdsf.fas.harvard.edu/files/…
Often the authors want to release code, and it's an uphill battle to convince the company, lawyers, investors. The second the journal indicated they would accept w/o code, the authors lost all leverage.
The journal ( @nature ) has failed the community, not the authors.
Ran Colab/AlphaFold and then later realized you wanted to amber-relax the structure? 😬 Instead of rerunning from scratch, you can now run just the relax step, given a saved PDB file. (I separated the relax step into its own notebook). colab.research.google.com/gi…
#joe90 (on OpenBioML) noticed OmegaFold often fails on CASP domains, I was trying to understand why. Turns out OmegaFold does not like partial sequences! If you input just a single domain (or trim the N-term disordered tail), it fails. But it works if you use the full seq! (1/4)
Interesting... I suspect, by reducing the # of seqs in the MSA, this reduced the strength/certainty of restraints derived from coevolution, allowing AlphaFold to generate multiple hypotheses (conformations) with randomseed, model param, or msa pertubations biorxiv.org/content/10.1101/…
Our take on how to use alphafold to design peptide binders. In contrast to other work we do bout require that the structure of the peptide or the target is known, all you need is a list of binding residues at the target. biorxiv.org/content/10.1101/…
In AF_advanced colabfold we suggest enabling dropout (is_training=True), and iterating through seeds to sample from the uncertainty. This should theoretically return multiple conformations if there is any ambiguity in coevolution, w/o the need to resample/subsample the MSA (1/3):
Interesting... I suspect, by reducing the # of seqs in the MSA, this reduced the strength/certainty of restraints derived from coevolution, allowing AlphaFold to generate multiple hypotheses (conformations) with randomseed, model param, or msa pertubations biorxiv.org/content/10.1101/…
Today's hands-on session with @sokrypton and @arneelof on AF2 in ColabFold was eye-opening! 🚀 Attendees got a sneak peek of Colab[Design]Fold's upcoming version, where you can edit MSA, templates, and view cool post-analysis outputs. Stay tuned! ✨
#EMBOIntegModelling23@EMBO
🤞ColabFold v1.5.2🤞
This update attempts to fix a system memory leak when running predictions on large proteins/complexes.
Thx to liyv (from discord) to helping debug!
(1/3)
Today, we're announcing the lastest deep integration between Google Cloud and Alphabet’s AI research organizations.
You can now run AlphaFold —the groundbreaking protein structure prediction system—in #VertexAI ↓
cloud.google.com/blog/produc…
Recently gave a talk on a couple of experiments I did with #AlphaFold. Forgot to include an image of the actual structural output. Here it is. (One of the experiments was to take the MSA and reverse it. Turns out you get a reversed structure 😅 Should that have worked? )
“xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering the Language of Protein”
A borderline-SOTA antibody structure-prediction method is tucked away in the results section
biorxiv.org/content/10.1101/…
Minor update for AfDesign: Backprop through AF to generate a new sequence that is predicted to fold into a specific structure has proven tricky. Unlike TrRosetta, the AF landscape is rugged and often you'd get stuck in a local minimum (rmsd > 1) during gradient descent. (1/4)
I’m excited to be the Chief AI Officer of @Meta, working alongside @natfriedman, and thrilled to be accompanied by an incredible group of people joining on the same day.
Towards superintelligence 🚀
Chai-1 has always been available for commercial use via our server. Today, we're also making Chai-1(r) code and weights available under an Apache 2.0 license, which permits broad commercial use.
github.com/chaidiscovery/cha…