Building the foundations of Optimization and AI. Lived in 🇸🇰🇺🇸🇧🇪🇬🇧🇸🇦

Saudi Arabia
Machine learning papers trying hard to make it look like it all works well.
32
660
5,589
I am an AC for ICLR 2026. One of the papers in my batch was just withdrawn. The authors wrote a brief response, explaining why the reviewers failed at their job. I agree with most of their comments. The authors gave up. They are fed up. Just like many of us. I understand. We pretend the emperor has clothes, but he is naked. Here is the final part of their withdrawal notice. I took the liberty to make it public, to highlight that what we are doing with AI conference reviews these last few years is, basically, madness. --- Comment: We thank the reviewers for their time. However, upon reading the reviews for our paper, it became immediately apparent that the four "reject" ratings are not based on good-faith academic disagreement, but on a critical failure to read the submitted paper. The reviews are rife with demonstrably false claims that are directly contradicted by the text. The core justifications for rejection rely on asserting that key components are "missing" when they are explicitly detailed in the manuscript. Some specific examples are (and many are even fake claims). Claim: Harder tasks like GSM8K are missing. Fact: GSM8K results are in many tables, like Table 2 (Section 4.2) and Appendix G. Claim: The method does not use per-layer ranks. Fact: This is the entire point of our method. The reviewer clearly mistook our method for the baselines. (Section 2, Table 1). Claim: The GP kernel is not specified. Fact: It is specified in Appendix E (Table 6). Claim: There is no ablation of the method's three stages. Fact: Section 4.4 ("Ablation Study") and Appendix J are dedicated to this. Reviewers have a fundamental responsibility to read and evaluate the work they are assigned. The nature of these errors is so fundamental, so systemic in overlooking explicit content, that it goes far beyond what "limited time" or "oversight" can explain. This work has gone through several rounds of revision over the last year. In earlier submissions, the paper usually received borderline or weak-accept scores. Numerous signs strongly suggest that some reviewers are relying entirely on AI tools to automatically generate peer reviews, rather than fulfilling their fundamental responsibility of personally reading and evaluating manuscripts. We strongly protest this. This is a gross disrespect to the authors. It is a flagrant desecration of the reviewer's sacred duty. It fundamentally undermines the integrity of the entire peer-review process. Given that the reviews are not based on the actual content of our paper, we have decided to withdraw the submission. We leave this comment so that future readers of the OpenReview page are aware that the items described as "missing" are already present in the submitted manuscript. These negative reviews for this submission are factually unsound and do not reflect the content of the paper. We cannot and will not accept an assessment that is not based on the work we actually submitted.
33
203
1,498
150,398
What??? (IMO 2022, Chinese team)
21
140
1,061
Proud of my PhD student!!!
KAUST PhD student Kaja Gruntkowska has been awarded a @Google PhD Fellowship, becoming the first-ever recipient from the GCC countries. Recognized for her work in Algorithms and Optimization, her research advances both the theory and practice of optimization for machine learning, making AI training faster, more cost-effective, and more resource-efficient.
11
25
793
105,777
Boris Polyak (1935) passed away today. An immensely gentle person the way I knew him, and a giant of science in general and optimization in particular. His name, results and legacy will live forever.
10
100
566
150,488
1. What would you do if an #iclr2023 reviewer ignored the science of your "Markov Chain Monte Carlo" paper, but would give you the lowest score (1) solely because "chains" may refer to slavery, and because linking slavery to the fine people of Monte Carlo is offensive to them?
35
43
541
One of my rejected ICML 2025 (@icmlconf) papers. Can anyone spot any criticism in the metareview? What a joke 🙃
25
24
515
103,263
Re #ICML2021 reviews. The system is broken. We are at the stage where mostly non-experts evaluate the work of experts. It is now a very rare occurrence to receive an educated and scholarly review. Can a football player evaluate the technique and performance of a cyclist? No.
15
45
389
This detail from the CV of an applicant for an internship position to my Optimization and Machine Learning Lab is outright scary.
8
33
347
A good theory paper does not need experiments. A good empirical paper does not need theory. Why do most #NeurIPS2023 reviewers not understand this?
4
25
314
64,954
The feeling when you read a @NeurIPSConf review of your paper and find it equaly well-informed as the opinion of a random person you meet in a shopping mall...
5
11
289
#NeurIPS2023 reviewing if science was sport: any athelete can evaluate any other athlete, irrespective of their specialization, experience, or level. Result? An amateur 100m dash guy criticizes a high jumper for lack of speed during her world-record-breaking jump. Reject.
4
20
261
103,660
I can now offer remote internships in my Optimization & Machine Learning Lab richtarik.org at KAUST. These are paid internships, up to 6 months in duration. Areas: Optimization for ML, federated learning, theory of ML. To apply, send me an email with CV & transcript.
17
61
245
I am very very proud of my team at KAUST who have done a wonderful job by authoring several super exciting papers that were just accepted to the #NeurIPS2022 conference. Check them out! Also, I am hiring interns, students, postdocs & research scientists! richtarik.org/i_join.html
9
25
251
All 6 PhD students I graduated since 2019 (1 from @EdinburghUni 4 from @KAUST_News and one from @mipt_eng) now have 1,000+ citations! I feel incredibly privilleged to have had the chance to work with such fantastically talented people!
2
7
222
39,938
Just learned that together with Albert S. Berahas, Majid Jahani and Martin Takáč we've won the Charles Broyden Prize for the paper "Quasi-Newton methods for deep learning: forget the past, just sample". Super unexpected and nice!!!
17
22
215
13,745
Replying to @ilyasut
The fact that wheels are much better than other means of terrestrial transport suggests that human legs might be using wheels too.
5
3
208
As an ICML 2024 Area Chair, I've handled 19 papers. Recommended 7 to be accepted, 10 to be rejected, and 2 withdrew. The avg scores of the accepted papers = 4.25-6.33. The avg scores of the rejected papers = 2.60-6.00. I did not merely threshold the scores.
12
3
212
77,875
Message to all #NeurIPS2021 Area Chairs: Please request reviewers to engage with the authors. Simply reading the rebuttals and updating the original review with a CMT-era one-liner of the type “I’ve read the rebuttal and am keeping my score” should not be allowed anymore.
6
24
200
BurTorch: Beating PyTorch (by a mile) on small and also not so small compute graphs! arxiv.org/abs/2503.13795 Work led by my PhD student & ML Engineering magician Konstantin Burlachenko @burlachekok
3
28
186
15,230
Highly recommended if you are interested in the convergence of SGD for smooth nonconvex functions. Prior literature uses various assumptions to carry out the analysis. We show how they are related, and propose a new assumption that is weaker than them all.
Better Theory for SGD in the Nonconvex World Ahmed Khaled, Peter Richtárik. Action editor: Raman Arora. openreview.net/forum?id=AU4q… #optimal #convexity #sgd
3
28
163
32,520
Random photo of KAUST.
1
3
167
9,237
Replying to @gabrielpeyre
A similar method, but one that provably works: proceedings.mlr.press/v119/m… Beautiful work by Yura Malitsky and Konstantin Mishchenko (@konstmish).
2
23
152
As in the past several years, I've accepted an invite to serve as an Area Chair for @icmlconf (ICML 2023). However, this time we are asked to write a short 1-2 sentence statement explaining why peer review helps to advance science. This is what I wrote.
9
7
156
95,822
On February 12, one student wrote to me: “To be honest, your course was perfect. The best course I have ever learnt.” He referred to my CS 331 course “Stochastic Gradient Descent Methods”. This made my day! This is why I do teaching.
2
151
My former student Dmitry Kovalev is on the job market. Hire him - he is a genius. I believe he has the same talent as Nesterov. Wrote 30+ highly technical papers in his PhD. Solved several problems no one could. Pic from his thesis defense. @dakovalev1
6
11
146
52,636
I have MS & PhD positions open in my Optimization & Machine Learning Lab at KAUST. We work on fundamental work underpinning stochastic optimization, distributed and federated learning and more. Starting dates: Spring 2021 and Fall 2021. admissions.kaust.edu.sa
6
43
145
Replying to @lpachter
I knew from day one I heard about him years ago that he was not a prof at MIT, but that he had some sort of (possibly weak) affiliation. He never claimed that he was a prof AFAIK. His podcasting is thougtful, entertaining and successful. Not sure what all this criticism is about.
1
1
131
31,365
The idea of "discouraging resubmissions without substantial changes" is bad as it relies on the wrong assumption that previously rejected papers necessarily have something wrong with them. Many good, excellent and even breakthrough papers get routinely rejected...
Check out the updates to the reviewing process for NeurIPS 2021! link.medium.com/TJ1bsb8Tjfb
5
11
147
Thank you #ICLR2024 for desk-rejecting a paper (with positive scores only) because of the existence of a follow-up paper that builds upon the results of the submitted paper. Great reason. The AC could have reached out to us for clarifications and this could have been avoided.
7
6
142
49,419
Boris Polyak (Борис Поляк) is celebrating his 86th birthday today. A pioneer of so much in optimization (first order methods, momentum, control, and much much more), with breathtaking achievements over 60 years of active professional life (he is still working!). Happy birthday!!!
1
15
128
My stellar PhD student Lukang Sun (lukangsun.github.io) defended his PhD thesis "Stein Variational Gradient Descent and Consensus-Based Optimization: Towards a Convergence Analysis and Generalization" (hdl.handle.net/10754/698695)!
2
6
125
18,681
A random photo of KAUST.
3
4
127
We are indeed hiring in AI at KAUST (research and teaching faculty), and I am hiring at all levels to my own *Optimization and Machine Learning* lab as well: interns, MS/PhD students, PhD students, postdocs and research scientists.
KAUST (17 full papers at #NeurIPS2021) and its environment are now offering huge resources to advance both fundamental and applied AI research. We are hiring outstanding professors, postdocs, and PhD students: cemse.kaust.edu.sa/ai/hiring…
8
24
116
Kaja's Google PhD Fellowship featured around KAUST...
KAUST PhD student Kaja Gruntkowska has been awarded a @Google PhD Fellowship, becoming the first-ever recipient from the GCC countries. Recognized for her work in Algorithms and Optimization, her research advances both the theory and practice of optimization for machine learning, making AI training faster, more cost-effective, and more resource-efficient.
1
2
117
34,855
A 10 million RMB prize for Nemirovski and Nesterov - congrats to these giants of the field! Thanks for their beautiful works over the years, and for insipiring so many, including me.
3
11
113
12,786
Announcing the creation of *KAUST Center of Excellence in Generative AI*; official launch on July 1, 2024. Joint with @BernardSGhanem @SchmidhuberAI @bremen79 @TheSandyCoder and a few more KAUST colleagues. KAUST funding ($11m over 5 years) + industrial funding. We are looking for further industry partners! @KaustResearch @cemseKAUST @KAUST_News @AI_KAUST @kaustMLhub
2
15
114
10,514
Opinion about @NeurIPSConf 2020 papers: In the same way a great applied paper can achieve high scores and be immune to “there is no theory” kind of criticism, a great theoretical paper should be able to achieve high scores without facing “not useful in practice” criticism.
5
7
111
Some reviewers just won't participate in @NeurIPSConf post author feedback discussion, despite me as the AC prompting them to do so multiple times. This is of course unacceptable and utterly unprofessional. What do you suggest should be done in such cases? Be creative.
29
5
109
Yet another reviewer saying: reject, because you study the convex setting and deep learning models are nonconvex. In what world are boxing pros allowed to judge chess? What a joke.
10
2
107
50,356
I am looking for a postdoc to join my *Optimization and Machine Learning* Lab at KAUST. We have a new funding mechanism: apply for the KAUST Global Fellowship here kgfp.kaust.edu.sa
3
31
101
People are giving up on AI conferences due to non-sensical / non-professional reviews. It's time to stop this madness.
4
7
111
8,896
Just gave an invited talk on *Optimal Parallel SGD* at Apple, held in Cupertino (workshop on Privacy Preserving Machine Learning). Here are my 16 slides. 1/16
1
5
101
12,160
My brilliant PhD student Samuel Horváth (samuelhorvath.github.io) just defended his PhD thesis entitled *Better Methods and Theory for Federated Learning: Compression, Client Selection, and Heterogeneity*!
5
2
102
NeurIPS 2024: Accepted papers coming from my Optimization & Machine Learning Lab at @KAUST_News @AI_KAUST @cemseKAUST @KaustResearch 1. Oral PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression openreview.net/forum?id=YvA8… Vladimir Malinovskii · Denis Mazur · Ivan Ilin · Denis Kuznedelev · Konstantin Burlachenko · Kai Yi · Dan Alistarh · Peter Richtarik 2. Spotlight Improving the Worst-Case Bidirectional Communication Complexity for Nonconvex Distributed Optimization under Function Similarity openreview.net/forum?id=gkJ5… Kaja Gruntkowska · Alexander Tyurin · Peter Richtarik 3. Poster On the Optimal Time Complexities in Decentralized Stochastic Asynchronous Optimization openreview.net/forum?id=IXRa… Alexander Tyurin · Peter Richtarik 4. Poster Shadowheart SGD: Distributed Asynchronous SGD with Optimal Time Complexity Under Arbitrary Computation and Communication Heterogeneity openreview.net/forum?id=O8yH… Alexander Tyurin · Marta Pozzi · Ivan Ilin · Peter Richtarik 5. Poster MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence openreview.net/forum?id=Tck4… Ionut-Vlad Modoranu · Mher Safaryan · Grigory Malinovsky · Eldar Kurtić · Thomas Robert · Peter Richtarik · Dan Alistarh 6. Poster Freya PAGE: First Optimal Time Complexity for Large-Scale Nonconvex Finite-Sum Optimization with Heterogeneous Asynchronous Computations openreview.net/forum?id=AUeT… Alexander Tyurin · Kaja Gruntkowska · Peter Richtarik 7. Poster Don't Compress Gradients in Random Reshuffling: Compress Gradient Differences openreview.net/forum?id=CzPt… Abdurakhmon Sadiev · Grigory Malinovsky · Eduard Gorbunov · Igor Sokolov · Ahmed Khaled · Konstantin Burlachenko · Peter Richtarik 8. Poster The Power of Extrapolation in Federated Learning openreview.net/forum?id=FuTf… Hanmin Li · Kirill Acharya · Peter Richtarik 9. Poster Byzantine Robustness and Partial Participation Can Be Achieved at Once: Just Clip Gradient Differences openreview.net/forum?id=G8aS… Grigory Malinovsky · Peter Richtarik · Samuel Horváth · Eduard Gorbunov Proud of all of these papers. I'll tweet about them in more detail in due time. Marta Pozzi and Kirill Acharya were VSRP interns at KAUST. Alexander Tyurin (postdoc) got 4 papers accepted - same as in 2023. Kaja Gruntkowska got 2 papers accepted in her 1st semester as a PhD student.
2
7
96
13,328
I have never rejected an empirical paper on the basis of it containing no theory. I judge such papers on their own merits. Is it too much to ask reviewers to judge a theory paper based on the theory?
4
5
92
14,604
A key recipe in the empirical success of federated learning methods is performing multiple local gradient-type steps before aggregation. However, no methods of this type have better communication complexity than gradient descent (in the heterogeneous data regime). Not anymore!
9
15
93
How to run SGD in a distributed setting with n parallel workers who have different computing speeds and can communicate (via a server) with different communication speeds? For the first time, there is a provably optimal method: Shadowheart SGD: arxiv.org/abs/2402.04785
2
10
92
15,206
Replying to @gabrielpeyre
The Malitsky-Mishchenko stepsize is BB done right. arxiv.org/abs/1910.09529
4
8
84
8,475
I am attending the "2022 Workshop on Federated Learning and Analytics" organized by Google. This is an invite-only event run virtually via Google Meet with about 60 participants. I just gave a talk. 1/n
3
9
89
AI conference reviewing: Sometimes, a single bad (in terms of review quality) but vocal reviewer can sink a paper. This happened to me repeatedly, and at #ICML2024 as well. The AC sank the paper based on a clearly false statement made by this reviewer. That should not happen.
7
5
91
13,121
I've just received an email from the authors of a NeurIPS 2023 spotlight paper pointing me to their paper. They cite one of my prior works on the same topic. However, their key novelty claim is already addressed in that paper, and they do not mention it anywhere. What to do?
7
1
88
67,005
I have openings for students who wish to work on foundational (mathematical and algorithmic) questions related to machine learning and optimization.
Applications are open to @KAUST_News MS/PhD fellowship. Deadline: December 16 Apply NOW and share the news admissions.kaust.edu.sa/admi…
2
40
78
Snorkelling in the Red Sea with Michael I Jordan. Mike is visiting KAUST this week. cemse.kaust.edu.sa/events/ev… cemse.kaust.edu.sa/events/ev…
84
7,053
A theory paper rejected from @icmlconf because of "concerns with experiments". The experiments are fine (51 plots; experiments designed to support the theory). The theory extends & improves upon previous theoretical SOTA in an important ML subfield. arxiv.org/abs/2110.03294
10
8
81
This is how you do it.
8
84
11,645
My team and I are presenting 10 (conference + workshop) papers at #ICML2023. I am looking for interns, Ms/PhDs, PhDs, and postdocs at @AI_KAUST to work with me on theoretical & applied optimization for machine learning & federated learning. Apply here: apply.interfolio.com/105097
1
12
75
15,973
Yurii Nesterov is about to give a talk with an intriguing title at NOPTA 2024.
3
2
76
7,531
In the search for Mountain Gorillas in Rwanda with @kchonyc and @mireillechayer
2
2
76
14,037
To all journal & conference reviewers:
Depresssingly, this quote from 1984 is even more true today than it was then.
12
73
Hi from Hong Kong! With my childhood hero, Bruce Lee!
4
78
3,760
We are hiring Assistant Professors in Computer Science at KAUST (@KAUST_News). Apply here: apply.interfolio.com/133001
1
18
70
17,957
When you get what you didn't know you needed 😂 (courtesy of The Erwin Schroedinger International Institute For Mathematics and Physics, Vienna)
3
6
70
9,408
Congrats to my 3 PhD students, Abdurakhmon Sadiev (computer science), Kaja Gruntkowska (statistics) and Grigory Malinovsky (applied math) for making the Dean's List for their exceptional academic achievements and dedication to their research work. This comes with a $2,500 cash prize for each. Awesome to be working with such bright and dedicated individuals!
2
67
12,023
A random photo of KAUST.
3
1
65
More than 21,000 @NeurIPSConf submissions...
6
4
71
26,047
Replying to @sama
Claims made without evidence can be dismissed without evidence. And this was some claim. That is all that happened.
2
73
I have MS & PhD positions open in my Optimization & Machine Learning Lab at KAUST. We work on fundamental work underpinning stochastic optimization, distributed and federated learning and more. admissions.kaust.edu.sa/ Starting dates: Fall 2020, Spring 2021, Fall 2021.
2
31
67
Federated learning -- a subfield of machine learning we started with my former student Jakub Konecny and a Google team led by H. Brendan McMahan -- has become a key part of the US NATIONAL ARTIFICIAL INTELLIGENCE RESEARCH AND DEVELOPMENT STRATEGIC PLAN. Very pleased and honored!
1
8
68
9,188
Huge congrats to @dakovalev1 who defended his PhD thesis "Optimal Algorithms for Affinely Constrained, Distributed, Decentralized, Minimax, and High-Order Optimization Problems" on Sept 14! Committee: A. Nemirovski, Yu. Nesterov, D. Keyes, M. Parsani, D. Wang and myself. 1/n
4
2
68
Hello from Louvain-la-Neuve, Belgium -- the place where I did my postdoc, working with Yurii Nesterov. I was invited to a workshop (sites.uclouvain.be/algopt202…) celebrating Nesterov's 50 years of career in optimization (counting from the time when he started reading papers in the area). Outstanding speakers from all over the world. Reconnecting with old friends & colleagues. Gave my talk on the first day of the event already.
1
2
66
6,088
Our "Byzantine robustness" paper got accepted to ICLR 2023 after all (there was a heated debate about our "unethical" use of the word "Byzantine" in the reviews). The poster and the paper: openreview.net/forum?id=pfuq… @ed_gorbunov @sam_hrvth @gauthier_gidel
3
7
69
15,248
Random photo of KAUST
1
2
70
3,659
Looking at my DBLP profile, I realized I've published 25 papers in ICML (@icmlconf) and 25 papers in NeurIPS (@NeurIPSConf) over the years. dblp.org/pid/62/8001.html My "AI academic age" computed this way is therefore 50. What's yours?
4
1
65
30,214
Congrats to my former PhD student Eduard Gorbunov @ed_gorbunov (graduated in Dec 2021 & co-supervised with A. Gasnikov) for reaching 1,000 citations! All his works are deeply theoretical, so this is extremely impressive to me. Eduard is also an amazingly modest & nice person.
3
3
66
3. I am not really looking for advice; just venting a bit. We are doing our best to handle this. However, I believe this behavior is highly problematic, and certainly at odds with what a reviewer should really be doing. The reviewer wants us to change established terminology...
1
68
I am organizing "Workshop on Distributed Training in the Era of Large Models" at KAUST during November 24-26, 2025. If you have done some cool work in the area, you might want to attend. The talks are invite-only. I'll soon start sending invites. More info later!
2
12
69
4,342
Celebrating 1 year as a Full Professor. Where is the surprise party and where are the gifts? 😅😂🥳
4
62
We have developed the first minimax optimal SGD method (we call it "Rennala SGD") for optimization with parallel heterogeneous (each worker takes a different amount of time to compute a stochastic gradient) workers. Surprise: Known asynchronous methods are suboptimal! 1/n
2
11
62
10,437
Replying to @_justinconrad
This exact person is super interested in my area, too. 🤣
1
65
8,971
Super happy about this surprise prize; and huge congratulations to my outstanding student and collaborator Samuel Horváth. The paper was recently accepted to #ICLR2021, check it out! openreview.net/forum?id=vYVI…
1
1
63
Michael I Jordan giving his Al Kindi lecture at KAUST.
62
4,849
Hi from #AISTATS2024, held in Valencia, Spain. Love the size of the conference: all people in a single (large but not humongous) room. Much better than events with tens of parallel sessions.
1
1
57
4,786
I will be visiting China between Nov 15 and 29. I'll give - five 90 min talks @ BIMSA - Beijing Institute of Mathematical Sciences and Applications (Nov 18, 20, 21, 27 and 28), and - two 60 min academic talks (Nov 22 @ Tsinghua Uni and Nov 25 in Shanghai) Looking forward to meeting new people and reconnecting with colleagues and friends.
6
2
59
6,355
Interesting optimization department...
5
1
59
Want to visit KAUST (@KAUST_News : King Abdullah University of Science and Technology, Saudi Arabia) for a week and learn some cool applied mathematics? Here is your chance! cemse.kaust.edu.sa/amfs
5
21
58
I am giving a virtual talk at @Apple in about an hour. I'll talk about a recent breakthrough in the understanding of the local training "trick" in the area of federated learning, and the research that followed.
2
4
53
5,948
About 350 students (from various places, but mainly from Tsinghua University and BIMSA) are signed up for my 5 x 90 min minicourse on Optimization and Machine Learning at BIMSA, Beijing.
2
1
59
4,533
Belated celebration: I've supervised 10 (amazing!!!) PhD students! And sometimes they supervised me ;-) genealogy.math.ndsu.nodak.ed…
2
2
57
9,204
Saudi Arabia -> AISTATS @ Spain -> Belgium -> ICLR @ Austria...
2
57
8,642
One of the ways "conspiracy theories" get started...
2
5
56
Limiting a rebuttal to 1 page (example: @aistats_conf this year) is like being charged with a crime by multiple prosecutors, then asked to defend in a court, but only allowed a 1 minute speech to defend yourself.
4
3
56