Scientist, Assistant Professor @MITBiology, #FirstGen, ProteinBERTologist, 🇺🇦 No Human is illegal. Moving to: bsky.app/profile/sokrypton.o…

Cambridge, MA
I'm excited to share that I'll be joining @MITBiology as an Asst Prof. in Jan 2024! Come join us! 🤓🧪🖥️🧬
169
148
2,003
218,517
For my latest attempt at introducing proteins to students, I made a Google Colab Notebook that predicts proteins from a single sequence. I asked the students to tweak the sequence to get a helix or two helices or... (1/5) colab.research.google.com/gi…
19
281
1,493
Really cool sculpture of Top7 by @mtyka . Celebrating 10 years of IPD @UWproteindesign (check out the reflection in the glass) 🤯
18
138
871
RoseTTAFold updated to be All-Atom... biological assemblies containing proteins, nucleic acids, small molecules, metals, and covalent modifications ... and diffusion🤯biorxiv.org/content/10.1101/…
7
229
818
142,501
Excited to see colabfold published! nature.com/articles/s41592-0… Special thanks to @thesteinegger and @milot_mirdita (for MMseqs2) without whom I would have never considered to preprint let alone attempt to publish our notebook! (1/3)
20
201
749
End-to-end learning of multiple sequence alignments with differentiable Smith-Waterman biorxiv.org/content/10.1101/… A fun collaboration with Samantha Petti, Nicholas Bhattacharya, @proteinrosh, @JustasDauparas, @countablyfinite, @keitokiddo, @srush_nlp & @pkoo562 (1/8)
8
185
689
AlphaFold inverted to hallucinate denovo proteins of up to 600 amino acids in length🤯 (animation below shows the designed protein docked into CryoEM density) Exciting work with: @chrisfrank662, @AKhoshouei, Yosta de Stigter, Dominik Schiewitz, @ShihaoFeng18, @hendrik_dietz
10
105
565
77,813
Ever wondered how many amino acids you can mutate to alanine and AlphaFold2 still predicts same structure? 🤔For denovo design Top7 (1QYS), single-sequence mode, it's 60%. (1/2)
17
97
559
124,298
AlphaFold3 reproduced and params/code released. 🤩
Technical Report of HelixFold3 for Biomolecular Structure Prediction The PaddleHelix research team at Baidu have released their AlphaFold3 replication under an open-source noncommercial license. Performance approaches that of AlphaFold3. abs: arxiv.org/abs/2408.16975 website: paddlehelix.baidu.com/ github: github.com/PaddlePaddle/Padd…
4
136
551
99,988
Successfully predicted one of the @foldit denovo designs using #alphafold in google-colab😎 (1 model, no template, single sequence input, and no amber refine, ~2 mins). Notebook if anybody wanna try input your favorite sequence: colab.research.google.com/dr…
14
129
528
Ok. I'm sold. 🤓 Prompt "implement dynamic programming in NumPy to align two sequences"
10
83
517
We've been working on adding AlphaFold v2.3.1 support to ColabFold. 😎 Here is the notebook for those interested in testing: colab.research.google.com/gi… (1/5)
5
109
493
66,606
DALL·E 3 prompt "protein structure, surrounded by small molecules resembling amino acid sidechains"🧑‍🎨🖼️ (1/10)
5
58
475
82,552
Homooligomeric prediction in #alphafold works a little too good. So far worked on nearly every case we (me & @minkbaek) tried. Going beyond dimers! Seems @deepmind accidentally "solved" the homooligomeric prediction problem (w/ MSA input) 😂 Give it a try: colab.research.google.com/gi…
14
111
468
I tried... 😅
Replying to @sokrypton
Make an unfolding video and reverse it. Like this:
19
65
474
90,230
OMG 😱 finally a protein language model that captures coevolution at protein-protein interface(s)!
Replying to @Micro_Yunha
7/ In particular, we showcase gLM2's ability to directly learn coevolutionary signal in protein-protein interfaces with no supervision! The learned contact maps can be extracted using @ZhidianZ et al's categorial Jacobian method.
3
96
450
45,141
Use your PhD thesis title as the prompt 🤓 Is it time to restart this trend? But now with #DALLE3? (Here is mine: "Protein structure determination using evolutionary information")
Protein structure determination using evolutionary information 🧐
35
41
415
264,972
Looks like someone already implemented ESMfold API plugin in Pymol! 😎 github.com/JinyuanSun/PymolF…
4
74
409
Impatient? Wanna see models as the notebook is running? 😎 See ColabFold AlphaFold2_advanced. Also, you can now use '/' to specify chain breaks. For example: A/C will modeled as A and C (could be used to trim disordered regions, or specify 2+ proteins) github.com/sokrypton/ColabFo…
12
85
388
Exciting to see this finally out. We started this project ~7 years ago 🫣 @g_pavlopoulos @BSRC_Fleming @kyrpides @jgi @SiruiLiu_ @UWproteindesign (and twitterless Fotis!) nature.com/articles/s41586-0…
9
71
404
46,229
Protenix - Yet another reproduction of AF3 github.com/bytedance/Proteni…
Protenix: Protein + X | ByteDance - A trainable PyTorch reproduction of AlphaFold 3 GitHub: github.com/bytedance/Proteni…
7
93
406
60,994
inspired by some of our work on AF2Rank (showing AF can be used to denoise template inputs) and RFdiffusion, I tried hacking AlphaFold to be a diffusion model for generating backbones. 😎
7
80
386
Come see the scientists that should've shared the Nobel Prize for "Computational Protein Design", speak: ipd.uw.edu/protein-design-th…
2
89
395
28,938
Now everyone can be a protein designer! 😂
Replying to @_JosephWatson
The code is available both to download from GitHub, and also, thanks to the wonderful @sokrypton, as a Colab Notebook. github.com/RosettaCommons/RF… colab.research.google.com/gi…
7
70
390
51,193
Weekend project: Use GPT4 to generate a GUI to control conditional fold generation of RFdiffusion 😎 colab.research.google.com/gi…
8
59
373
47,013
Weekend project! 🤓 So now that OpenFold weights are available. I was curious how different they are from AlphaFold weights and if they can be used for AfDesign evaluation. More specifically, if you design a protein with AlphaFold, can OpenFold predict it (and vice-versa)? (1/5)
3
61
364
A few more letters... and we can finally write amino acid sequence with protein structure encoded by amino acid sequence? 🧬🤔😀 @chrisfrank662 @DominikSchiwie0 Lara Fuss @hendrik_dietz biorxiv.org/content/10.1101/…
8
62
369
22,041
Exciting new work from Qian Cong's group on predicting human protein interactome. Leveraging new eukaryotic genomes, new RoseTTAFold2 trained on +/- pairs of PPI and large distilled dataset of domain-domain interactions! 🤩 biorxiv.org/content/10.1101/…
8
76
361
27,096
AF3 server is LIVE! Just tried predicting complex with almost 5K amino acids. TIP: you need to click "continue with google" to access the server (otherwise the "server" is grayed out). alphafoldserver.com/
7
78
360
32,794
"The AlphaFold parameters are made available under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) license" 🙂 github.com/deepmind/alphafol… (thanks to @BrianWeitzner for alerting me)
3
94
345
🤯 Congratulations David Baker @UWproteindesign , @demishassabis and John Jumper!!
BREAKING NEWS The Royal Swedish Academy of Sciences has decided to award the 2024 #NobelPrize in Chemistry with one half to David Baker “for computational protein design” and the other half jointly to Demis Hassabis and John M. Jumper “for protein structure prediction.”
5
35
344
10,454
We've (including @thesteinegger @milot_mirdita ) updated ColabFold to use the latest optimized AlphaFold implementation that reduces compile time from ~4.5mins to ~30 seconds! We've confirmed the results are identical for both monomer and multimer predictions. Try it out! (1/2)
2
61
318
And this is why code should always be published with papers... This group made an attempt to reproduce AlphaFold3 and found a number of potential issues in the published pseudo-code. See exciting 🧵 👇
🚀Excited to announce: Open-source AlphaFold3 implementation! 🚀 I am thrilled to announce one of the models we have been building for the last 8-weeks at Ligo - an open-source implementation of DeepMind’s frontier model, AlphaFold3! Here’s what we have learned, a thread (1/11):
73
316
30,048
AlphaFoldDB updated, now includes the MSAs 🥳
We’re renewing our collaboration with @GoogleDeepMind! We'll keep developing the AlphaFold Database to support protein science worldwide 🎉 To mark the moment we’ve synchronised the database with UniProtKB release 2025_03 ebi.ac.uk/about/news/technol… #AlphaFold
4
47
339
27,116
Now everyone customize/share protein language models for their custom task/dataset via @GoogleColab 🤓 Paper: biorxiv.org/content/10.1101/… Colab: colab.research.google.com/dr… Credit: @LTEnjoy, Zhikai Li, @ChenchenHa42849, @BonnieSwt, Junjie Shan, @XibinBayesZhou, Dacheng Ma, @duguyuan
2
106
321
50,334
Oh! This is gonna be useful!
1
48
307
Nice! AlphaFold3 server added interactive pAE and chain division! Including option to download MSA. (Just noticed this when doing a tutorial today!)
10
44
324
18,627
Weekend project: Comparing ESM3 from @EvoscaleAI to ESM2 and inv_cov. The ultimate test of a protein language models is how well the pairwise dependencies it learns correlate to structure. (1/8)
6
64
312
54,830
Whhhaaa.... Alpha-RNAfold? science.sciencemag.org/conte… @raphaeljlt Where's the colab notebook for this?? 😜
63
301
ESMFold in ColabFold! 😎 colab.research.google.com/gi… Special thanks to @proteinrosh @milot_mirdita @thesteinegger @alexrives
4
78
281
Interesting excuse... 🤔 hopefully this is not the start of closed source software in the field... "As part of our commitment to releasing our research breakthroughs safely and responsibly, we will not be sharing model weights, to prevent use in potentially unsafe applications."
AlphaMissense, a tool by DeepMind, can help researchers learn more about the effects that missense mutations have on disease, and could help identify previously unknown disease-causing genes, according to a new Science study. Learn more: scim.ag/49l
9
57
274
67,569
Is 3D dragging you down? Wish you could instead use the 2D ColabFold representation for all your work? 🤓 Introducing: py2Dmol 🧬
6
42
297
19,744
One request (for a tutorial next week), was to "hallucinate" a binder w/ AF for a given template structure. I tried w/ the spike protein. Surprisingly, at the get-go, even with a random starting sequence, AF already places the extra sequence near the ace2 binding site. (1/2)
7
36
252
Started teaching again! This time decided to try use #claude (@AnthropicAI) and @codesandbox for hosting to implement an interactive GREMLIN (Potts) model (w = coevolution, b = conservation) to show students how you go from MSA to contacts! 9kssnq.csb.app/
9
36
265
15,540
AF3 hack: When in doubt (or your ligand is not available), use ALL ligands and see what sticks. (green prediction, white @nickpolizzi_ 's design) 🤓
7
38
260
35,909
I spent way too much time on this single figure. 😀 sciencedirect.com/science/ar…
5
27
241
📈 One standard graph in all bio deep learning papers should be: max similarity to anything in training set vs performance. (Reviewers shouldn't have to guess if there might be overfitting issues).
12
26
236
24,069
If you missed it, the talks and slides are now ONLINE!! ebi.ac.uk/training/events/sc…
We are starting #AlphaFold #webinar series tomorrow with the 1st session about scope and vision of AlphaFold. Registration free but essential - ebi.ac.uk/training/events/sc… @BioExcelCoE, @DeepMind, @ELIXIREurope, #3DBioinfo
2
51
230
I think this is the most interesting/innovative part of BoltzGen. Diffusing to AF2-style encoding to co-generate both backbone and sidechains identities! 🤯
Replying to @HannesStaerk
With this we train a model with the standard AF3 / Boltz-2 scalable architecture that has proven state-of-the-art for folding. Injecting conditioning inputs allows us to control the designed binder in various ways
1
33
235
23,868
Check out some of our recent work on making phylogenetics differentiable ("learning" both ancestral sequences and tree topologies). 😎🧬🌳
We’ve been making phylogenetic trees differentiable :) Check out our work at #ICML2023 workshops - Sampling and Optimization in Discrete Space (SODS) ፨ and Differentiable Almost Everything (DiffAE) 〆 Looking forward to discuss and learn more! 🙌 ❤️ work with Avi & @sokrypton
5
43
218
38,100
And yes, tweets are officially citable and legit scientific contributions 😅 @Ag_smith @minkbaek (3/3)
2
81
221
Curious how this AlphaFold hallucination GIF was made? See the AlphaFold hallucination "hack" implemented in google Colab here: 🙃 colab.research.google.com/gi…
3
34
214
Alright, last one, now need to get back to real work... 😸
4
25
227
30,890
"scientists solve the protein folding problem" 😀
8
23
207
AlphaFold/RosettaFold users: I was asked to give a talk summarizing how the community has been using AF/RF for "structural bioinformatics". I'll focus on deep structural homology/similarity insights that would otherwise not be possible with just sequences. Please share! (1/2)
13
39
209
Adding support for binder hallucination if anyone wants to try! (Code is very experimental, not intended for practical use... only use for art/science) 😀 colab.research.google.com/gi…
9
29
216
An exciting day for structural biology! AlphaFold And RosaTTAFold released. nature.com/articles/s41586-0… science.sciencemag.org/conte…
3
53
196
Here is an example that took #alphafold ~12 recycles to fold! (denovo designed protein, single sequence input). Colored by predicted LDDT.
When there are few to no sequences in the MSA, we find sometimes changing the random_seed, or running for more recycles allows #alphafold to predict the correct structure. Notebook if anyone wants to try on their difficult cases: colab.research.google.com/gi…
5
41
194
Towards the end of the presentation I went down a bit of a rabbit hole trying to demonstrate that AF3 may still be learning to invert the convariance matrix, which is needed to extract the coevolution signal from input multiple sequence alignment (MSA) (1/9).

ALT Charlie Day GIF

for the folks who missed @sokrypton's excellent presentation and discussion of #AlphaFold3 last week, you can now check out the recording 👇 piped.video/qjFgthkKxcA
2
36
201
44,718
Nearly all existing protein-based therapeutics are created from a fraction of possible protein concepts. This is about to change. We are excited to share a publication in @Nature describing Chroma, an AI model that can program novel proteins. generatebiomedicines.com/new…
2
39
195
45,427
A recent preprint from @Lauren_L_Porter shows that it's sometimes possible to sample the alternative conformation of metamorphic proteins by removing the MSA. Though I think this is a very interesting observation, I disagree that coevolution is not used when it is provided. (1/9)
1
30
194
38,924
Pretty cool! Though... not sure if it's fair to call it a "single-sequence" method. Since large Language Models are essentially storing/retrieving MSA info (conservation/coevolution) from a single-sequence input. 🤔 (1/2)
Protein structure can be predicted from a single sequence alone with high accuracy. @HelixonBio team have developed OmegaFold, achieving performance similar to RF and AF2's MSA versions. Only a single sequence is given as input. 1/5
3
15
184
I tried running our categorical Jacobian method (for extracting coevolution signal from language models) on Evo from @BrianHie @pdhsu on the 16S rRNA. It appears to pickup on local hairpins 🤓(1/3).
5
27
192
26,021
When there are few to no sequences in the MSA, we find sometimes changing the random_seed, or running for more recycles allows #alphafold to predict the correct structure. Notebook if anyone wants to try on their difficult cases: colab.research.google.com/gi…
Additionally, @sokrypton found that the structure of these MP designs could be predicted from a single sequence by modifying the number of recycles!
3
41
182
Interested in protein evolution, protein folding, and ML? Looking for a short 1-2 year postdoc in Boston? I got some extra funding, message me! Official ad: jhdsf.fas.harvard.edu/files/…
7
75
180
Often the authors want to release code, and it's an uphill battle to convince the company, lawyers, investors. The second the journal indicated they would accept w/o code, the authors lost all leverage. The journal ( @nature ) has failed the community, not the authors.
3
24
178
13,723
Job-application-procrastination-project: ProteinMPNN in jax! 😅 GPU=A6000, length=2382, seqs=32 pytorch=3m22s jax=17.9s jax=4.46s (vmap) length=100, seqs=5000 jax=2s Special thanks to Shihao Feng, @JustasDauparas and @sim0nsays colab.research.google.com/gi… (1/3)
5
31
184
I think I just diffused into life the flying spaghetti 🍝monster... 😲
11
8
172
Ran Colab/AlphaFold and then later realized you wanted to amber-relax the structure? 😬 Instead of rerunning from scratch, you can now run just the relax step, given a saved PDB file. (I separated the relax step into its own notebook). colab.research.google.com/gi…
7
19
172
Some exciting results from CASP15! (1/6)
AlphaFold/ColabFold no longer dominating at this year’s CASP15 😮
2
35
165
Bindcraft is killing it! 🤩
Replying to @MartinPacesa
A total of 7 de novo binders in the competition, 6 made with BindCraft!
15
170
9,863
#joe90 (on OpenBioML) noticed OmegaFold often fails on CASP domains, I was trying to understand why. Turns out OmegaFold does not like partial sequences! If you input just a single domain (or trim the N-term disordered tail), it fails. But it works if you use the full seq! (1/4)
8
22
168
Interesting... I suspect, by reducing the # of seqs in the MSA, this reduced the strength/certainty of restraints derived from coevolution, allowing AlphaFold to generate multiple hypotheses (conformations) with randomseed, model param, or msa pertubations biorxiv.org/content/10.1101/…
4
37
157
EvoBind: in silico directed evolution of peptide binders with AlphaFold Check out colab notebook from @Patrick18287926 colab.research.google.com/gi…
Our take on how to use alphafold to design peptide binders. In contrast to other work we do bout require that the structure of the peptide or the target is known, all you need is a list of binding residues at the target. biorxiv.org/content/10.1101/…
1
29
152
Put together a gif showing how NNs have taken over CASP 😀

ALT https://docs.google.com/presentation/d/1xX4RJIbqICBoyYz9oOp2fwkwMOUyzOAbzFgO4rGt5BM/

5
29
150
Replying to @jankosinski
I think you may have already seen most of these, but you are always welcome to use any of my slides: 😅 docs.google.com/presentation…
2
19
147
Towards Predicting Equilibrium Distributions for Molecular Systems with Deep Learning distributionalgraphormer.git… Interesting work from: @MSFTResearch
38
155
15,923
In AF_advanced colabfold we suggest enabling dropout (is_training=True), and iterating through seeds to sample from the uncertainty. This should theoretically return multiple conformations if there is any ambiguity in coevolution, w/o the need to resample/subsample the MSA (1/3):
Interesting... I suspect, by reducing the # of seqs in the MSA, this reduced the strength/certainty of restraints derived from coevolution, allowing AlphaFold to generate multiple hypotheses (conformations) with randomseed, model param, or msa pertubations biorxiv.org/content/10.1101/…
2
32
144
Looks like you can run the entire AlphaFold model inside PyTorch using jax2torch😅 github.com/lucidrains/jax2to… (thx @KimiHerath for example!)
3
27
150
21,799
For those that are curious, the experimental notebook demo-ed at the workshop can be found here: 🤓 colab.research.google.com/gi…
Today's hands-on session with @sokrypton and @arneelof on AF2 in ColabFold was eye-opening! 🚀 Attendees got a sneak peek of Colab[Design]Fold's upcoming version, where you can edit MSA, templates, and view cool post-analysis outputs. Stay tuned! ✨ #EMBOIntegModelling23 @EMBO
3
26
152
22,204
🤞ColabFold v1.5.2🤞 This update attempts to fix a system memory leak when running predictions on large proteins/complexes. Thx to liyv (from discord) to helping debug! (1/3)
2
26
143
19,216
Now it makes sense why @deepmind made AlphaFold available for commercial use 😀 They can make 💵 from cloud access.
Today, we're announcing the lastest deep integration between Google Cloud and Alphabet’s AI research organizations. You can now run AlphaFold —the groundbreaking protein structure prediction system—in #VertexAIcloud.google.com/blog/produc…
5
13
145
Recently gave a talk on a couple of experiments I did with #AlphaFold. Forgot to include an image of the actual structural output. Here it is. (One of the experiments was to take the MSA and reverse it. Turns out you get a reversed structure 😅 Should that have worked? )
3
19
143
100-Billion param protein language model trained on ColabFoldDB 🤯 Doing better than E15B (ESM2 15 billion) on structure prediction.
“xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering the Language of Protein” A borderline-SOTA antibody structure-prediction method is tucked away in the results section biorxiv.org/content/10.1101/…
1
24
143
31,053
Minor update for AfDesign: Backprop through AF to generate a new sequence that is predicted to fold into a specific structure has proven tricky. Unlike TrRosetta, the AF landscape is rugged and often you'd get stuck in a local minimum (rmsd > 1) during gradient descent. (1/4)
3
29
139
Can we take a moment to acknowledge that everyone(?) on @Meta's superintelligence dream team (listed below) is an immigrant? 🌎🌍🌏
I’m excited to be the Chief AI Officer of @Meta, working alongside @natfriedman, and thrilled to be accompanied by an incredible group of people joining on the same day. Towards superintelligence 🚀
3
12
148
17,650
Chai-1r now available for both commercial and non-commercial use! 🤩
Chai-1 has always been available for commercial use via our server. Today, we're also making Chai-1(r) code and weights available under an Apache 2.0 license, which permits broad commercial use. github.com/chaidiscovery/cha…
1
22
149
12,259