AI for mathematics and theoretical physics Tomorrow's problems on yesterday's machines Axiom - École nationale des ponts et chaussées

France
My paper Linear Algebra with Transformers was published in Transactions of Machine Learning Research (TMLR). This new version includes many new results and experiments. openreview.net/pdf?id=Hp4g7F… The source code should be available in a few days.
5
131
781
Transformers can be trained to solve a 132-years old open problem: discovering global Lyapunov functions. New paper on Arxiv (accepted in NeurIPS 2024), with @albe_alfa and @Amaury_Hayat arxiv.org/abs/2410.08304 1/8
15
123
671
73,347
Transformers for discrete optimisation problems 1- Train a model on candidate solutions 2- Use the model to generate more candidates 3- Improve the solutions with local search 4- Use the best candidates to fine tune the model 5- Iterate
New preprint up! "PatternBoost: Constructions in Mathematics with a Little Help from AI," with F. Charton, A.Z. Wagner, and G. Williamson: arxiv.org/abs/2411.00566
10
55
418
142,324
My talk at Physics ∩ ML last week piped.video/watch?v=81o-Uiop… Recent results on transformers learning mathematical properties (instead of just memorizing and interpolating) at 40:00. And a first attempt at particle physics, and gluons, at 57:00
3
47
343
The source code for my two papers "Linear Algebra with Transformers" (TMLR) arxiv.org/abs/2112.01898 and "What is my math transformer doing?" (NeurIPS 2022 Math-AI Workshop) arxiv.org/abs/2211.00170) is now available at github.com/facebookresearch/… (with trained models and test sets)
2
57
314
Transformers can be trained to compute the roots of polynomials f-charton.github.io/polynomi… It is often said that language models "cannot compute", evidence to the contrary is accumulating.
9
36
302
I am joining Axiom Math, a seed-stage startup on AI for maths. I will lead discovery: AI for advancing math research. 6 years after Deep Learning for Symbolic Maths, our first paper with @GuillaumeLample, I am proud of the field's progress, and excited about what comes next.
14
16
289
33,597
Transformers can be trained to solve problems of linear algebra (matrix transposition, addition, multiplication, inversion and eigenvalues) to very high accuracy. 1/4 Our new paper is on Arxiv: arxiv.org/abs/2112.01898
9
42
257
Transformers solve an open problem in symbolic mathematics: discovering Lyapunov functions, joint work with Alberto Alfarano and @Amaury_Hayat. My talk in IAIFI today (starts at 5:00) piped.video/watch?v=yCzV97QN…
Our first IAIFI Colloquium of the semester is starting now with @f_charton! "Transformers meet Lyapunov: Solving a long-standing open problem in mathematics." Watch live on YouTube: piped.video/live/yCzV97QNG8w…
4
35
245
47,255
My talk in Harvard yesterday. Transformers for symbolic regression (11:30), theoretical physics (26:00), and results on explainability in linear algebra (39:00) and arithmetic (50:00) piped.video/Sc6k06wVX3s?si=H_vf…
45
219
29,623
Looking for a postdoctorate student to work with me on applying transformers to open problems in mathematics and theoretical physics. This is an 18 month position, based in Paris. DM me if interested. metacareers.com/jobs/7714042…
8
36
173
53,066
One epoch is not all you need! Our paper, Emergent properties with repeated examples, with @KempeLab, won the NeurIPS24 Debunking Challenge, organized by the Science for Deep Learning workshop, @scifordl arxiv.org/abs/2410.07041
3
17
122
9,776
The source code for my paper: Learning the greatest common divisor: explaining transformer predictions (arxiv.org/abs/2308.15594, ICLR 2024 spotlight) is now available on github.com/facebookresearch/….
3
22
117
42,776
Math transformers learn better when trained from repeated examples. New paper with @KempeLab arxiv.org/html/2410.07041v1 On 3 problems, modular multiplication, GCD and eigenvalues, for the same training budget, models trained from smaller datasets achieve better performances. 1/5
4
22
120
28,219
What is my math transformer doing? Three results on interpretability and generalization. When trained to solve numerical problems from examples, transformers learn some of the underlying maths, and can generalize way out of distribution. New preprint: arxiv.org/abs/2211.00170
32
116
My presentation last Friday in the Collège de France (in French). Transformers learning maths, and recent results on explainability, and why transformers sometimes cheat. piped.video/RtB8kVCxJdw via @YouTube
4
22
110
31,655
Our new paper on Symbolic Regression with @pa_kamienny @stephanedascoli @GuillaumeLample is now on Arxiv ! We achieve performance comparable to SOTA genetic algorithms on SRBench with Transformers, whose inference time is orders of magnitude lower! arxiv.org/abs/2204.10532 1/4
3
26
113
How do transformers learn arithmetic tasks, such as GCD and modular sums and products? My talk in Collège de France on November 4th (in French, but the English subtitles are quite good). Thank you @wtgowers for inviting me to your seminar! piped.video/watch?v=e0jUi8W4…
6
17
107
36,480
Transformers can learn to compute the greatest common divisor of two positive integers. They make deterministic predictions that can be fully explained. Training from a log-uniform distribution of operands achieves best results. My new paper is on arXiv: arxiv.org/abs/2308.15594
7
15
87
15,045
Transformers can discover recurrence relations from sequences (aka IQ tests). New paper on symbolic regression, with @stephanedascoli @pa_kamienny and @GuillaumeLample
Deep Symbolic Regression for Recurrent Sequences -- arxiv.org/abs/2201.04600 We show that transformers are great at predicting symbolic functions from values, and can predict the recurrence relation of sequences better than Mathematica. You can try it here: bit.ly/3niE5FS
1
12
77
Leveraging maths to understand transformers. Transformers learning maths, or sometimes just pretending. A presentation at the NeurIPS 2022 MATH-AI workshop. neurips.cc/virtual/2022/work…
2
13
58
7,711
A pure physics paper based on intuitions from AI experiments, expect more of these!
I’m pretty excited about our new paper, which is a follow up to our last paper using AI to help solve a problem in theoretical particle physics. (With Lance, @f_charton, Matthias, Tianji, and @merz_garrett
2
4
42
3,058
Our Lyapunov paper in the New Scientist. Thanks @stokel !
An AI system has helped tackle a longstanding tough mathematical problem involving tools called Lyapunov functions. My latest for @newscientist newscientist.com/article/245…
4
34
5,485
Great video about our work on symbolic regression! Thanks @ykilcher and @stephanedascoli
📜Paper Video Time!📜Today I'm talking to Stéphane d'Ascoli (@stephanedascoli) about Deep Symbolic Regression for Recurrent Sequences. This model is given a sequence of numbers, like 1, 2, 3, 5, 8 and it figures out the *rule behind* the sequence. Insane🤯 piped.video/1HEdXwEYrGM
5
37
I am looking for a research engineer (code and experiments), to work on scientific reasoning, with Julia Kempe, Yann Ollivier, and me metacareers.com/jobs/3577161…
6
35
7,375
Presenting current work and recent results at Harvard on the 27th
4
31
6,092
Human mathematicians (masters students) achieve less than 10% accuracy on this task (vs more than 80% for our model). 8/8
1
30
2,549
SALSA PICANTE: a machine learning attack on LWE with binary secrets. Transformers can be trained to recover secrets from public-key cryptosystems. New preprint arxiv.org/abs/2303.04178, with @CathyYLi, @JSotakova, @em_wenger, Mohamed Mahlou, Evrard Garcelon, and @KristinLauter.
8
32
4,946
"A very perverse translation task," my work on transformers and linear algebra (arxiv.org/abs/2112.01898), discussed by mathematician Geordie Williamson (43:20 onwards) piped.video/trEY6c7eogQ
5
31
3,933
Transformers for amplitudes, a first step towards using symbolic language models in theoretical physics - with @KyleCranmer @merz_garrett Tianji Cai, Matthias Wilhelm and Lance Dixon
2
3
26
2,387
Transformers can learn string rewrites, if trained on a large and diverse set of rewrite rules. The number of rules matters, not the number of examples per rule. String rewrites form the basis of Markov algorithms. If transformers can learn them, they can learn any calculation.
I have been curious about the driving factors of generalization to unseen instructions. - so we therefore attempted to model this phenomenon with a symbolic task. - string rewrites. arxiv.org/pdf/2402.10891.pdf Happy to work with @f_charton and my reliable collaborator Justin on this study.
4
24
4,254
Today is my first year in Facebook, and my first year working as a researcher. Thanks to all who made this possible, I had a blast!
1
1
24
Today was my last day as a visiting entrepreneur in Facebook AI. I am amazed at how much I have learnt during those 18 months. Thanks everyone, I had a blast!
1
1
24
Open sourcing Int2Int, a Python code base for AI for maths, with a special focus on arithmetic and number theory github.com/f-charton/Int2Int A user manual, and instructions on how to extend it, can be found here arxiv.org/abs/2502.17513
2
21
1,290
The repository includes source code for data generation, model training, and evaluation of trained models. Since data generation is VERY compute-intensive, we have built 7 datasets, from 20 to 100M examples. We also include 7 pre-trained models.
3
21
Transformers for cryptanalysis Our new paper, SALSA: Attacking Lattice Cryptography with Transformers, with @em_wenger, Mingjie Chen and @KristinLauter, is on ArXiv arxiv.org/abs/2207.04785
7
24
For modular multiplication, models trained on 100 million different examples or more do not learn the task. Models trained on 25 or 50 million examples can achieve 100% accuracy. 2/5
2
4
20
19,904
Thank you, #NeurIPS2021 for the outstanding reviewer award. This was my first time reviewing research papers, so it means a lot to me.
2
2
20
Global Lyapunov functions control the stability of dynamical systems: whether a system starting close to an equilibrium always stays close to the equilibrium (or diverges away). A famous case is the three-body problem: the stability of three celestial bodies under gravitation 2/8
1
20
2,322
A very good introduction to our paper, which now looks embarrassingly simple... towardsdatascience.com/deep-…
2
6
16
Performance is greatly improved by adding a tiny number (0.03%) of easy and solvable "forward" examples (systems for which we have solutions) to the backward training set. Such "primed models" outperform state of the art methods by a large margin. 6/8
1
19
1,308
Thank you for having me @KyleCranmer and @gary_shiu Research featured in the talk: discovering Lyapunov functions (9:55), PatternBoost: generative models in combinatorics (31:20), Scattering amplitudes (44:10), Arithmetic, repetition, and a few unpublished results (46:40)
We had a great turnout for our inaugural AI for Science seminar with François Charton last week. If you missed it, check out the recording: mediaspace.wisc.edu/media/Xu… @KyleCranmer
4
19
2,734
We tested our models on sets of random dynamical systems, the stability of which is unknown, and could find new Lyapunov functions in 10 to 13% of the cases. (7/8)
1
18
2,082
In 1892, Lyapunov showed that global stability was guaranteed if a function V could be found, with a strict minimum at the equilibrium, infinite at infinity, and a gradient always pointing away from the system gradient. Unfortunately, he provided no method for finding V. 3/8
1
17
1,619
No general method exists for finding a Lyapunov function. To train our models, we introduce a backward generation technique that creates dynamical systems from their Lyapunov functions. These systems have a different distribution from the problems we actually want to solve. 4/8
1
17
1,417
The source code for our ICML 2022 paper Deep Learning for Recurrent Sequences (arxiv.org/abs/2201.04600) is now available on github.com/facebookresearch/…. Spotlight: Wednesday 20, 16:50 ET Poster session: Wednesday 20, 18:30 ET @stephanedascoli @pa_kamienny @GuillaumeLample
1
2
17
Our paper about leaning properties of differential systems with transformers is on Arxiv. Even on very advanced math computations, NLP models work surprisingly well. arxiv.org/abs/2006.06462
Could neural networks find alternatives to classical theories? We show that they can predict abstract mathematical properties of systems involving advanced notions like Fourier transforms, Jacobians, integration. 1/4 arxiv.org/abs/2006.06462 with @Amaury_Hayat and @f_charton
4
14
Congratulations Jeremy! And long live AI4Maths !
Replying to @CarnegieMellon
“The institute will focus on the mathematical components of these tasks and use the technologies to support mathematical reasoning and computation in all its applications,” said Jeremy Avigad, director of ICARM.
1
16
3,738
Transformers for amplitude bootstrap, a hard problem in theoretical physics. A fun collaboration with SLAC, UWisconsin-Madison and Niels Bohr Institute #ai4science #ai4maths
New paper out! We are using transformers to make progress in a cutting-edge problem in theoretical / mathematical physics. @GarrettMerz @datascience_uw, Lance Dixon & Tianji Cai @SLAClab, @f_charton & Niklas Nolte @AIatMeta, Mattias Wilhelm @UCPH_Research arxiv.org/abs/2405.06107
1
1
16
3,284
Transformers work wonders on natural language. Given enough examples, they can translate without a dictionary. Why not consider mathematics as a language and problem solving as translation tasks? with @GuillaumeLample, arxiv.org/abs/1912.01412
5
16
Yet, models trained on backward-generated data achieve good performance on test sets of polynomial systems that can be solved with numerical tools, despite having to generalize out-of-distribution. 5/8
1
15
1,341
On my way to ICLR, want to talk about AI for maths and physics? Ping me!
15
2,389
A very clear and insightful account of our paper: Deep Differential System Stability
This LANGUAGE MODEL determines stability properties of differential systems, a task that usually requires multiple steps of high-level math and at least three grad students! 😮 watch the video here piped.video/l12GXD0t_RE @f_charton @Amaury_Hayat @GuillaumeLample @facebookai
3
12
Human feedback, or an external verifier, applied to generated training data, can prevent model collapse.
How to leverage AI-synthesized data without catastrophic degradation? Rank-and-prune feedback, from humans or even weaker models, provably restores and even surpasses original performance! See arxiv.org/abs/2406.07515 @AIatMeta @feeelix_feng @dohmatobelvis @f_charton @yangpuPKU
3
14
2,112
Two lessons I gave in March, for the Journées de Calcul Formel (Francophone Computer Algebra Days), in Luminy. Lesson one on AI and mathematical discovery (integration, differential system stability, combinatorics) piped.video/watch?v=ZTmltujo…
1
3
17
2,433
Replying to @francoisfleuret
1- train loss: must drop (or you have a bug), fast (or lr too small), stable (or lr too large) 2- speed/mem/gpu usage: dataloaders fast (num_workers), all cores busy (batch size), no back and forth between cpu and gpu memory (bug) 3- can you use fp16, b16, or lower precision?
2
13
925
Replying to @ilyasut
Maths are the free lunch, deep learning, just a recipe
10
All hail the European labor laws!
1
9
1,549
Can transformers learn Planar N=4 Supersymmetric Yang-Mills?
Just finished an intense week @SLAClab with our small collaboration focusing on using AI to aid in state of the art theoretical physics calculations. 4 days, no talks, only blackboard, code, and results. @datascience_uw @AIatMeta
13
2,274
Most of the time, we use clarity as a proxy for truth, because we believe that we only express clearly what we understand well. Unfortunately, the self-supervised techniques used to train language models seem to do a much better job making them clear, than making them true.
1
3
12
1,346
We are not there yet... Note the interesting failure pattern: the answer is wrong (should be 23), but the divisors of both operands, provided as justification, are correct, and a correct but irrelevant comment (23 is prime) is added for good measure.
4
12
1,932
In NeurIPS next week, excited to chat about possible collaborations and new opportunities in AI for maths, physics and reasoning. DM me if you would like to meet.
12
1,257
Replying to @ameliovr @ylecun
Usually, because intermediate calculations become too complex: too long/deep formulas, too many branches. For integration, the Risch algorithm should handle 100% of cases, but it is very difficult to implement fully.
1
11
Replying to @francoisfleuret
The book is very good, but very chinese, a westernized adaptation cannot work
2
10
1,953
Replying to @francoisfleuret
Scaling up old ideas, with 10x the compute and a fancy acronym
1
1
11
1,138
An interview about our work with @GuillaumeLample , published a while ago on the news feed of the American Mathematical Society (thank you @writesRCrowell ) ams.org/news?news_id=6207
1
2
7
Can you know if a metabolic network has an equilibrium and which ? Transformers can ! We predict graph equilibriums and their associated flows with very high precision. 1/4 New paper on Arxiv arxiv.org/abs/2112.03588 with @Amaury_Hayat @RutgersCCIB @Rutgers_Camden
2
4
9
This is somewhat related to arxiv.org/abs/2406.07515 Adding some “truth signal” (local search, external verification) to generated data allows one to feed it back into the model, without triggering model collapse
2
9
890
Expériences conduisant aux intuitions. Les résultats des expériences avec des transformers nous ont indiqué où regarder.
6
647
Replying to @TaliaRinger
And, sometimes, you don't post the draft on Arxiv, because you think it is not ready, but you share your ideas with reputable researchers, just to find, later on, the same idea, with the exact same name, in a preprint from the same reputable researchers (and no acknowlegment).
9
2,133
Our project on using transformers to understand the scattering amplitudes of gluons was awarded a grant!
Woot! Lance Dixon (@SLAClab) & I (@UWMadPhysics @datascience_uw) have been awarded a grant from @doescience to use AI to take on a challenging problem theoretical particle physics. We will team up with @f_charton (@MetaAI) & Matthias Wilhelm (Niels Bohr) energy.gov/science/articles/…
9
My talk in Amplitudes 2024, at @the_IAS, recent work on transformers for theoretical physics begins at 15:00. Thank you Nima Arkani-Hamed, Jacob Nourjaily, Hofie Hannesdottir and Sebastian Mizera for inviting me. piped.video/watch?v=kbkm61hW…
1
8
1,547
We are still a niche, but a larger one.
8
305
This new version includes baselines and experiments with out-of-distribution generalization, showing that models trained on systems of 2 to 5 equations can predict the properties of larger systems (6 equations, or longer expressions) with high accuracy.
8
Same here, pdf was US letter, but the previewer I used to cut supplementary material converted it to A4 (european defaults). So, the margins are correct, the text is formatted as required, the only thing wrong is the amount of white space around it... seriously @neuripsconf ?
2
6
Lesson 2 on AI for arithmetic, and maths for interpretability piped.video/watch?v=4PuJitS_…
1
10
1,101
Replying to @aaron_defazio
In AI? AI for Science (maths, theoretical physics), this is the next frontier
1
8
630
Diversity helps generalization. New paper with @dylan_works_ and Justin Wang
"Only-IF: Revealing the Decisive Effect of Instruction Diversity on Generalization" arxiv.org/pdf/2410.04717 We isolated 'instruction-following' ability (apart from complex reasoning like math) and designed various controlled experiments to show that -
1
9
1,491
Code and datasets for our paper on Symbolic Mathematics are now available
The code for our @iclr_conf paper, Deep Learning for Symbolic Mathematics, is now available in @PyTorch! We also provide our datasets and pretrained models Code: github.com/facebookresearch/… Paper: arxiv.org/abs/1912.01412
1
2
7
Leveraging multivariate observations to discover causal/covariant features in a very noisy environment. My first research paper.
Back-to-back regression: Disentangling the influence of correlated factors from multivariate observations. Our latest paper with @f_charton, David Lopez Paz & Maxime Oquab at @facebookai is now freely available at Neuroimage: sciencedirect.com/science/ar… Here's the summary thread ⤵️
6
Our model is trained on a vast dataset of synthetic examples, and scales to input dimensions up to ten. After several million examples, the attention maps start to reveal intricate mathematical analysis : in the example f(x)=sin(x)/x below, we see Fourier-like patterns. 4/4
6
We selected functions from our test set that we could solve but Maple, Matlab and Mathematica could not. From this set, we chose the most "photogenic".
2
6
Table 4 on page 9 of our paper has a few. Table 7, page 11, has more interesting cases. There, the model was trained exclusively on functions SymPy can integrate. Yet it could solve problems that SymPy could not. So much for claims that we are overfitting.
6
Replying to @NeelNanda5
Not quite mech interp, but would love to meet (in NeurIPS from Thursday to Sunday)
6
961
Given enough examples, models trained on random matrices with independent and identically distributed (iid) coefficients (Wigner matrices) can predict with high precision. 2/4
1
5
On par with a thermos bottle, according to other experts...
1
5
French cuisine with Chat GPT, a three course réveillon for Christmas... (Experiment at your own risk) Pour commencer: the Brie en Croute à la Matelote
2
4
1,852
Replying to @francoisfleuret
This is why you use Adam (or others): the average of successive directions is a better strategy than the local direction, which is influenced by local bumpiness, and warm-up: initial bounces can be misleading, let's make them shorter
2
8
497
Actually this is how mathematics is done. A theorem usually begins as a wild guess with no guarantee of correctness, that one then tries to prove formally. And even when a counter exemple is found, the typical result is to change the wild guess a little, instead of rejecting it.
1
5
Replying to @Panda31808732
Vu la période d'enquête, les 2d >6mois et 3d sont essentiellement des personnes âgées où à risque, non? (6 mois avant le 5 décembre c'est une deuxième dose avant le 5 juin, et une première avant le 25 avril).
5
There are two parts in problem solving: finding a candidate and proving it correct. Our paper addresses the first part. In a real-world application, verification would need to be implemented , but we believe this is a much simpler task.
1
5