Research scientist @googledeepmind I like Transformers and graphs. I also like chess and a few other things as well.

London
I have the great fortune to be demonstrating a graduate course on Geometric Deep Learning at Oxford. I have decided to start a self-contained YouTube series on some of the topics covered for students and non-students! Check out the first episode on GCNs! piped.video/watch?v=CwHNUX2G…
13
75
485
58,158
Today I finished my 8 month internship at @GoogleDeepMind. A big thanks to @PetarV_93 and the long list of brilliant people I’ve met for the wonderful time. I leave GDM with many new ideas and a lot of excitement. Final stretch of the PhD ahead now! 🫡
19
12
555
37,492
The Geometric Deep Learning series is back with a bang! Join Michael Spockstein and I as we go over the deep connection between Transformers and Graph Attention Networks! piped.video/watch?v=qAF3ZHmk…
1
31
228
25,352
Heading tomorrow to Vancouver for NeurIPS! Please do reach out if you want to chat about reasoning in Transformers / LLMs :) I'll be presenting our work "Transformers need glasses! 👓" on Thursday at 4:30pm at East Exhibit Hall A-C #1806.
3
25
169
21,392
Today I’m moving to London for 6 months to start an internship with @PetarV_93 at @GoogleDeepMind! Happy to meet up with London-based people — just dm me :-)
7
3
142
30,176
* Locality-Aware Graph-Rewiring in GNNs * with amazing collaborators @ameya_pa Amin Saberi @mmbronstein, and rewiring CEO @Francesco_dgv arxiv.org/abs/2310.01668 We argue that rewiring should (i) reduce over-squashing, (ii) be local, and (iii) remain sparse. A (short) thread👽!
6
22
133
15,526
Excited to be joining @MSFTResearch this summer in Amsterdam as a research intern on the AI4Science team supervised by @vgsatorras!
1
111
14,321
Farewell @MSFTResearch AI4Science. I had a beautiful summer in Amsterdam surrounded by truly amazing researchers. Back to Oxford now! (with a pit stop in Italy first 😉)
3
3
106
16,512
Happy that the neurips deadline is now behind us. Closed overleaf, slept and touched some grass :)
2
104
13,731
Had great fun presenting our work at the ICML @TAGinDS workshop yesterday in Baltimore! Thanks to amazing co-authors @crisbodnar Haitz, @mmbronstein @PetarV_93 @pl219_Cambridge Check out our work on arxiv arxiv.org/abs/2206.08702 :-)
1
10
89
Due to popular demand, the Erlangen Space Program continues! This time we have landed on an alien planet infested with Red Michael Rangers. What a great opportunity to stop and talk about Graph Attention Networks :-) piped.video/watch?v=iAEDA8aD…
1
9
92
30,317
Have you ever thought about incorporating attention into sheaf neural networks? Well, we have! Our work w/ @crisbodnar, Haitz Sáez, and @pl219_Cambridge will appear at the NeurReps workshop at #Neurips2022 as an oral! See you in New Orleans :-)
1
21
75
Accidentally stumbling upon signed boards by @MagnusCarlsen @GMJuditPolgar @VBkramnik @Kasparov63 at the office was really cool
4
5
64
26,355
I have matriculated at the University of Oxford!
5
1
69
In the brand new episode, we go over the important relationship between GNNs and the Weisfeiler-Lehman test 🤠 Join Neostein and Morpheusico in this matrix-themed adventure! piped.video/AJG1K0dbpes
7
74
8,570
Happy to say that I'm starting my PhD at @CompSciOxford with @mmbronstein! A warm thanks to the wonderful people I've had the pleasure to meet at Cambridge, especially to @pl219_Cambridge and @crisbodnar for having been such amazing mentors. Looking forward to it very much :-)
4
1
64
🧪 RoPE is used by a number of SOTA LLMs (LLama 3, Gemma 2, …), yet our understanding of *why* it helps remains quite limited. 🧪 In this work, we try to fix that! With the amazing Alex, @cperivol_, Razvan, and @PetarV_93!😎🔄
Round and Round we Go! 🔄 Rotary Positional Encodings (RoPE) are a common staple of frontier LLMs. _Why_ do they work so well, and _how_ do LLMs make advantage of them? The results might surprise you, as they challenge commonly-held wisdom! Read on ↩️ Work led by @fedzbar!
5
64
8,245
Had great fun working on this project with an amazing team! 😎 I think this work can be seen as an attempt to connect what we know about information propagation in GNNs to LLMs. 🕸️ 🤖 With interesting practical implications! Excited to see where this direction can take us!🚀
Transformers need glasses! 👓 Read on to see how we expose fundamental weaknesses of decoder-only Transformers on important tasks (e.g. copying & counting) + simple ways to make things a bit easier on the Transformer :) Work led by @fedzbar for his @GoogleDeepMind placement!
10
63
8,086
Happy to have been awarded the Early Career grant from G-Research! Part of the grant will go towards upgrading my YouTube equipment in order to fulfill my childhood dream of becoming a machine learning influencer.
Congratulations to our March PhD grant winners! 🎉 Read about how the grant, worth up to £2,000, will support their quantitative research, and find out if you're eligible to apply. gresearch.co.uk/blog/article…
3
3
50
16,123
People queueing to hop on the Erlangen rocket ship! Honored to be the designated photographer. Neural Sheaf Diffusion being presented by @crisbodnar @Francesco_dgv @b_p_chamberlain (and with @pl219_Cambridge @mmbronstein).
7
41
Excited to announce that our paper "Transcending TRANSCEND: Revisiting Malware Classification in the Presence of Concept Drift" (with @0xtrustypatches, @fbpierazzi, @lcavallaro) will appear at @IEEESSP 2022! Paper: arxiv.org/abs/2010.03856 Project page: s2lab.cs.ucl.ac.uk/projects/…
1
19
37
After a 4 hour long game I beat Torino’s chess champion in the final round winning the tournament with 5 wins out of 6 games. He insulted me, barely shook my hand and stormed off with a crowd watching. Chess turns grown men into children and I love it.
2
1
40
9,401
We will be presenting this work tomorrow at the @tf2m_workshop in Vienna. Please feel free to come check it out or to DM/email me if interested. Looking forward to it! Tagging co-authors that are on this app :) @AndreaBanino @_joaogui1 @PetarV_93 👓 👓 👓
Transformers need glasses! 👓 Read on to see how we expose fundamental weaknesses of decoder-only Transformers on important tasks (e.g. copying & counting) + simple ways to make things a bit easier on the Transformer :) Work led by @fedzbar for his @GoogleDeepMind placement!
6
34
6,287
Amazing to see people playing chess in the streets of Bhaktapur, Nepal.
28
3,182
Heading to New Orleans tomorrow for #NeurIPS2022, which will be my first conference as a PhD student! I’ll be presenting our paper “Sheaf Attention Networks” at the NeurReps workshop as an Oral. Feel free to dm me if you’d like to meet!
1
29
What better way is there to start 2024 than with a great week-long chess tournament result 😎 @MSFTResearch I’ll keep wearing your merch if you sponsor me (phd stipends are low so don’t worry too much about it)
28
4,378
Was great fun presenting our work yesterday (@0xtrustypatches @fbpierazzi @lcavallaro) at #SP22 in San Francisco! Thanks to @IEEESSP and @Kings_College for making the trip possible! Of course check out the project s2lab.cs.ucl.ac.uk/projects/… ;)
1
4
27
Our paper “Sheaf Attention Networks” was featured by @GRESEARCHjobs as one of their favourite papers! Joint work with amazing collaborators @crisbodnar @ocariz_h @pl219_Cambridge
Our Quant Researchers and ML Engineers reviewed their favourite papers from @NeurIPSConf 2022.   Click here to read their write-ups of the research they found most insightful. 📄gresearch.co.uk/neurips-2022…
1
4
27
7,635
Check out our new ICLR paper on latent graph inference :)
Our paper 'Latent Graph Inference using Product Manifolds' with Anees Kazi, @fedzbar and Prof Pietro Lio was just accepted to ICLR 2023 😄 In this work we use cartesian products of Riemannian manifolds to generate 'rich' embedding spaces to infer graphs. arxiv.org/pdf/2211.16199.pdf
26
4,069
We have AGI guys!!!!1!1!!
2
2
25
2,670
Excited to be heading to Vienna tomorrow for ICLR! I will be presenting our paper “Locality-Aware Graph Rewiring in GNNs” on Thursday at Halle B Poster #236. Happy of course to chat - dms are open :-)
1
22
4,102
The Erlangen program continues at full speed, now with the support of @EmmanuelMacron who showed up to our jazz session last night!
1
3
16
Such a bizarre hill to die on as well. I really don’t get people that completely dismiss entire research areas in general. If there are many people studying something there is likely some non-trivial value that may not be obvious to an outsider. It’s just rude and arrogant.
2
20
1,731
After a painful loss yesterday playing for the Oxford Uni team the bittersweet news is that I crossed 2300 in rapid on chess.com and I'm now #21 in Italy (#1562 globally). I also crossed 2400 in blitz placing me #64 in Italy (#5471 glob). A lot more work to do!
18
This result being so obvious and so overlooked is exactly why I like it so much :-)
There is so much low hanging fruit in thinking carefully about what’s happening in a transformer / LLM. Did it really take two years for someone to point out that RoPE will (obviously?) not decay activation with relative distance for random key value pairs?
3
21
4,287
@jacobbamberger presenting bundle neural networks at the GRaM workshop in Vienna! Check out our paper linked below! arxiv.org/abs/2405.15540
Jacob making sheaves great again
5
22
2,410
Cannot recommend the MSR AI4Science team in Amsterdam enough!
📢New internship opening at AI4Science in Amsterdam or Berlin!📢 For our interdisciplinary team working on electronic structure and deep learning, we are looking for someone to join us and work on synthetic data generation and curation. Please apply here! jobs.careers.microsoft.com/g…
19
5,290
This is one of the most spectacular quiet moves played in a championship game. The idea of d5 to setup Qc7 Kh7 Ng6 Rg8 Qf7 Qxg8+ Ra8+ Kf7 Rf8# truly shows how remarkable of a player Ding is. What an amazing championship.
1
16
1,508
Happy that the review period is over. Disclaimer: I don’t condone reviewee-reviewer violence. Courtesy of @lorgiusti
1
17
4,254
Happy to be in Zagreb today for the Croatian Machine Learning Workshop (CMLW 2024)! workshops.eeml.eu A picture of Razvan presenting our recent work on RoPE :-) In case you haven’t had the chance to read it yet arxiv.org/abs/2410.06205
1
1
17
1,132
Such sad news. Ross was a very kind hearted man. He was an important mentor to me during my master’s and contributed deeply to decisions in my life. May he rest in peace.
@rossjanderson Professor Ross Anderson, FRS, FREng Dear friend and treasured long term campaigner for privacy and security, Professor of Security Engineering at Cambridge University and Edinburgh University, Lovelace Medal winner, has died suddenly at home in Cambridge.
15
2,806
Was great fun working on this project with amazing collaborators. See you in Hawaii!
Happy to share our new theoretical work on over-squashing arxiv.org/abs/2302.02941 A quasi-Italian job with @lorgiusti @fedzbar @GiuliaLuise1 @pl219_Cambridge @mmbronstein Looking forward to present this at #ICML2023 More details below 🦧
15
3,258
🃏 🃏
In 2012, I did Physics practicals at the Cavendish... got scolded for clumsily dropping a pendulum. In 2024, I gave an invited talk there! Guess DL is Physics now? 🧑‍🔬 I presented five recent papers: Glasses👓 TransNAR🔢 softmax is not enough🌬️ RoPE analysis🔄 Positional Attn💠
1
16
8,820
Replying to @chessontwiter
I want Firouzja to play in the candidates, but this resignation is very suspicious. Of course the position seems unpleasant for black but it’s not to the point of resignation. It seems like the GMs do not want to interfere with Firouzja’s qualification and give up very easily.
1
12
1,894
Will be in Palo Alto/Stanford/SF 🏄‍♂️ from Wednesday 19th to Friday 21st and then in Honolulu 🏝️ for ICML. Feel free to DM if you’d like to chat/meet :-)
1
12
2,873
Replying to @chesscom
What a championship! So many emotions. Congrats to Ding but seeing Nepo like that was heartbreaking. I hope that he can recover from this.
1
2
12
8,087
Check out our fresh-out-of-the-oven 👨‍🍳 paper which will appear at the ICML 22 TAG in ML workshop :-) w/ amazing collaborators @crisbodnar Haitz Sáez @mmbronstein @PetarV_93 @pl219_Cambridge!
📢Using sheaf neural nets just got a lot easier. We propose pre-processed connection Laplacians as a geometric "diffusion" operator for graphs. This work by @fedzbar w/ Haitz Sáez, @mmbronstein, @PetarV_93 and @pl219_Cambridge will appear at the TAG in ML Workshop at #ICML2022.
1
3
10
Very excited that Francesco will be giving a talk tomorrow at our Learning on Graphs and Geometry seminar series. Feel free to join us at the mathematics department in Oxford or online on Zoom at 2pm British Summer Time :)
I'll be giving a talk tomorrow (2pm BST) at the Math Institute in Oxford on over-squashing and graph-rewiring frameworks (for more details, see the thread below) This is part of the new LOG**2 seminar -- here you can find info and a zoom link log-2.github.io :)
11
1,602
Magnus is completing side quests at this point
10
1,582
When you finally find reviewer #2
Trashing a competing method in a talk and then realizing the author is in the audience.
1
9
1,845
Replying to @ZakJost
I didn't think that starting a PhD meant that I'd be stitching together @mmbronstein and Neil Armstrong at 2am but here we are
9
552
Looking forward to Chaitanya’s talk on Thursday! Feel free to join in on zoom or at the maths institute if you are in Oxford :)
Excited to be giving a talk at Oxford @UniofOxford this Thursday, May 11 (and via Zoom; link below), and looking forward to meet all the excellent GDL-ers there! Thanks @epomqo for hosting! log-2.github.io/
9
1,709
Awesome talk going on by @battistabiggio at ICML! Congrats on the test of time award :-)
1
9
Here are some pictures of @Kings_College, which I took on a foggy evening.
1
6
Made it in the Italian top 50 in the rapid section of chess.com 😎 and top 3000 globally! I am definitely not procrastinating my thesis :-) #chesspunks Next up - 2400 blitz!
1
1
8
Perhaps something that might be more convincing experimentally and seems to break quite quickly is the task of predicting the penultimate token. This is on ChatGPT. This also follows from our analysis.
2
1
8
2,990
Highly recommended read!
My PhD Thesis is finally online. I hope it will provide some new perspectives for anyone interested in geometry, topology, deep learning and graph neural networks. repository.cam.ac.uk/items/0…
6
1,268
Congratulations Petar! The course was incredibly interesting, challenging and rewarding and has had a profound impact on my research interests. I’m glad that it will keep running :-) It was my favourite course in my 4 years of university.
1
7
That’s just wrong. He doesn’t have lots of inaccuracies and it’s quite easy to judge the quality of a game by numbers on a computer. If two 2800 GMs miss a move then it’s safe to say that no human would have found it. Firouzja is a brilliant young chess player.
6
Replying to @y0b1byte
I started my DPhil last week! Thanks a lot for doing this :-)
1
6
I bought 700g of pickled gherkins today! Disclaimer: I’m not sponsored by Sainsbury’s
1
4
Congratulations Cris! You are a massive inspiration.
1
5
1,328
Time to pack our bags and go home
5
466
Authors when the reviewer increases the score
From the 1987 National Aerobic Championship, here is the performance of The San Francisco Bay Club.
5
1,376
So jealous :)
Life achievement unlocked: coauthoring paper with the founder of @DeepMind @demishassabis and world chess champion @32gcfhkmm and many other amazing folks at @DeepMind. We used high level human chess concepts to look deeper into what self-taught super-human chess player.
5
Replying to @PetarV_93 @chaitjo
Didn't know that by training they meant neural networks!
5
563
Spectral rewirings satisfy (i) and (iii), but tend to break locality. On the other hand, spatial rewirings satisfy (i) and (ii), usually at the cost of density. Our LASER framework sits in the middle of the two, satisfying (i), (ii), and (iii).
1
4
828
Replying to @chessmensch
I see Qe4+ Qg6 Qxc2+! Qxc2 Be4+! Qxe4 c8=Q which is all forced, if black continues Qxd4 you have a draw with Qf5+. If not Qxd4 (eg. Qf3+) it’s also likely a draw? Any loss of tempo for black (eg. picking up d4) results in either a perpetual or consolidation and white holds.
3
714
Replying to @doomslide
I don’t think this is right for Pre-LN as the normalisations are in the wrong place. Also layer norm does not really act like that on z. In Llama3 norms get larger as layers increase, but are then pinned down by the final layer norm.
2
4
1,435
LASER works by considering a *sequence* of (gradually less local) snapshots and rewires through a “global” connectivity measure — controlling sparsity. We also provide a connection between our framework and Temporal GNNs. We show that LASER works well on a variety of tasks!
4
673
Really hoping that the 2024 candidates cycle will be the one for Hikaru. What an amazing player. Would love to see him play for the championship title against Ding.
Hikaru Nakamura is in the #FIDECandidates! Congratulations! 👏👏 📷 Anna Shtourman
3
2,430
Exactly. Research-via-disagreement is very valuable and interesting. I’m always happy when someone disagrees with me, but the disagreement has to be productive and respectful. Attacking/dismissing etc isn’t either of those. Dismissing the entire field of Bayesian ML even more so
1
4
120
Replying to @rusant10
Thank you! Will definitely create a playlist once number of videos > 1.
1
4
238
Regardless, our work does not predict that things will break at a specific size, just that at some point (with a specific architecture) they will break. In particular, the speed at which things break depends on the relative difference of the activations. This can be quite slow!
1
3
608
Replying to @mmbronstein
Happy to have been part of this important moment in history
3
1,022
Replying to @crisbodnar
I did spend more time on the thumbnail than on the video. I’ll just call that marketing :-)
3
403
The paper is really interesting. Excited to hear more about it at the meet up! Congrats
3
Same this is why I developed my own hologram
3
32
Replying to @PetarV_93
Interesting! Maybe I missed, but where can I find evidence for "hidden states in LLMs follow normal distributions" @simon_jegou This is something I've been on the look-out for :-)
1
3
222
Replying to @crisbodnar
Congrats Cris! Godspeed
1
3
300
There are nice intuitions here based on the spectral theory of lazy random walks. I also think these kind of normalizations are exactly why direction of vectors are so important in LLMs — they are invariant to these kind of normalizations.
1
3
220
Replying to @NorwayChess
This Fabi pic is giving MMA fight vibes
1
2
1,219
If we keep this up we will start commissioning GDL memes
3
338
Last week I had the pleasure of giving my 2nd seminar ever at the Cambridge Security Seminar Series, where I discussed our new conformal evaluator techniques for malware classification (appearing at S&P 22). Recording: cl.cam.ac.uk/research/securi… Thanks to @rossjanderson for hosting!
3
Replying to @kfountou
Really interesting work! Congrats
2
95
Replying to @vdutor
Thanks for the awesome event! It was great fun :-)
2
Replying to @IterIntellectus
So you can't distinguish the strength of two football players once they are better than you?
2
125
Hey Chaitanya! Just to give my thoughts on ChatGPT -- essentially I agree with Petar, we focused on Gemini-type models as we have a much better idea of how they work. It's much harder to exactly say what is happening without looking at the representations/architecture.
1
2
401
Everybody gangsta until the gradients become sentient
1
2
185
Replying to @charlieharris01
This is great - very enjoyable read. Thanks for putting it together :)
1
2
381
Amazing talk - highly recommend!
Recording of my talk on physically-motivated GNNs @Cambridge_CL Thanks again @PetarV_93 and @pl219_Cambridge for the invitation! cl.cam.ac.uk/seminars/wednes…
2
It was an amazing first lecture!
2