"If there is not folly in the world, then the world itself is folly. You must understand that mistakes are not always regrets." - Paul Tobin, Bandette🤠

England, United Kingdom
This semester I'll teach an undergraduate "intro to RL" course at the UofA. For the first lecture, I collected some exciting, recent, impactful applications of RL. Link to the relevant slides: tinyurl.com/3zw9453p I thought this may be worthwhile to share.
26
100
736
108,155
First position paper I ever wrote. "Beyond Statistical Learning: Exact Learning Is Essential for General Intelligence" arxiv.org/abs/2506.23908 Background: I'd like LLMs to help me do math, but statistical learning seems inadequate to make this happen. What do you all think?
20
64
432
35,783
Yours truly and his coauthor Tor Lattimore happily present the near-final draft of their upcoming bandit book at banditalgs.com/ The pdf will stay free. In this phase we welcome reader comments. The book will be printed by #CambrideUniversityPress. Please share:)
7
163
401
Replying to @karpathy
@karpathy I think it would be good to distinguish RL as a problem from the algorithms that people use to address RL problems. This would allow us to discuss if the problem is with the algorithms, or if the problem is with posing a problem as an RL problem. 1/x
9
38
413
177,448
Interested in hearing about the theoretical foundations of RL from a multidisciplinary perspective (CS, control, stats, OR)? If so, join us at the (all virtual) RL Theory Bootcamp at the Simons Institute next week. Lectures in the morning and the afternoon ==>
4
73
367
After a 2 year break, I'll be teaching in the fall a grad course. Go Bandits! banditalgs.com
7
43
343
Is RL used in real applications? If so, how and where? And if not, why not and how can this be fixed? Join our excellent panelists and speakers at the half-day RL2 workshop organized at @icmlconf or submit a paper to present your views. sites.google.com/view/RL4Rea…
3
18
170
amathr.org/prizes/aiprize/ The Association for Mathematic Research announces "Prize in the Mathematics of Artificial Intelligence". I'm in the selection committee. The goal is to inspire young people to work on the intersection of AI and maths. Nominations to aiprize@amathr.org
6
62
169
38,733
I feel very much honoured to be selected for this role. To make the best of this job, hive mind of ML people on twitter, if you have any ideas about how to improve ICML, drop me a message (or just respond to this tweet).
Some decisions for ICML from the board: ICML General Chairs: 2022: Kamalika Chaudhuri @kamalikac 2023: Andreas Krause @arkrause ICML 2022 Program Chairs: Csaba Szepesvari @CsabaSzepesvari, Le Song @dasongle, and Stefanie Jegelka (maybe @StefanieJegelka )
17
2
165
Friends: I am looking for theory oriented postdocs in RL (with past theory experience). I appreciate if you spread the word.
1
75
147
Just for counterbalancing, hats off to those reviewers who are still doing a great job! I know that you are out there and while your numbers could be diminishing, we need you to keep doing what you do (post inspired by reading actual good reviews doing my editorial job).
4
5
154
Nothing inspires more than the humility of someone with great accomplishments. I hope that generations of researchers will pay attention to the wise words of Rich! (Coming back to X just to post this.)
“There are no authorities in science,” says Turing Award winner @RichardSSutton, Amii Fellow & Canada @CIFAR_News AI Chair. Sit down with Rich and @camlinke as they discuss the journey to this moment. Watch now: hubs.la/Q039xBP-0 #TuringAward #AI #ReinforcementLearning
1
12
151
10,832
Advice for future reviews: An important question to ask when figuring out whether to recommend accept or reject is "How difficult it is to fix the issues I found?" If very difficult, the paper can't be saved. If not too difficult, there is no reason to reject the paper.
5
8
134
Broader impact predictions back in the day.
Heinrich Hertz after proving the existence of radio waves stated that "it's of no use whatsoever" and regarding the applications of the discovery: "Nothing, I guess"
12
134
Our department is hiring theoreticians working on ML! If you are on the job market for faculty positions and have a strong track record in theory, this may be your dream job! careers.ualberta.ca/Competit… Why apply? Read on.. 1/x
4
23
108
39,677
This sounded like a crazy idea two weeks ago, but here we go! @RLtheory is the account to follow! Thanks for the speakers who already accepted our invitations! I hope the community will like this series!
excited to announce a new series of virtual seminars on ~~~REINFORCEMENT LEARNING THEORY~~~ we've set this up with @CiaraPikeBurke and @CsabaSzepesvari to keep track of all the advances of this fast-paced field. hope others will also find it useful! sites.google.com/view/rltheo…
4
25
111
I have a duty to spread the truth: "Don't worry about the overall importance of the problem; work on it if it looks interesting. I think there's a sufficient correlation between interest and importance. — David Blackwell" And remember: en.wikipedia.org/wiki/David_…
15
109
For whatever it's worth, I am offering a mentoring session at #AISTATS on Wednesday, April 14, 2021 18:30 MDT. All are welcome!
3
13
103
Please share: The newly created "Foundations team" of @DeepMindAI have openings for research scientists with strong theoretical background, and an unstoppable interest in pushing the boundaries of AI and machine learning. PM me if you are interested. #ICML2018
3
34
103
Just in case the travel restrictions would last until July, preorder our book now on Amazon: amazon.ca/Bandit-Algorithms-…
2
7
101
It seems to me that not only you, but too many people talk about RL as if these two things were the same, which prevents a more nuanced discussion. 2/2
4
4
104
20,023
After creating a new homepage, I discovered, I used to have a blog. Since I already had it, why not add a new post? Here we go: readingsml.blogspot.com/2020…
3
14
92
RL Theory Seminars are back! First talk, Policy learning "without'' overlap: Pessimism and generalized empirical Bernstein's inequality by Ying Jin! sites.google.com/view/rltheo…
2
20
89
15,564
Tomorrow we will have Martha White! She will talk about "Policy Gradient Methods as Approximate Policy Iteration: Advantages and Open Questions". Talks open to anyone! Join here: amiithinks.github.io/tea-tim…
The @rlai_lab Tea Time Talks return! Hosted by Amii’s Chief Scientific Advisory Dr. Richard S. Sutton, the 20-minute talks are delivered by students, faculty and guests, and range from ideas starting to take root to finished projects. hubs.ly/H0rjZ3X0 #AI #ML #RL
2
17
79
Replying to @roydanroy
Of course, can't compete with Dan, but I am also still looking for postdocs -- right down in Edmonton, driving distance to the rockies. Awesome hikes, climbs, kayaking, .. + I can promise interesting RL theory problems and a fast paced environment:)
5
8
86
simons.berkeley.edu/workshop… The third and final workshop in the RL theory program starts tomorrow. The topic is batch RL (sorry @jacobmbuckman) and simulation-based optimization. All are welcome! The workshop will stream on Youtube. To join on zoom, you need to register.
2
14
86
Venting. Reviewer: The paper is bad because of X, Y and Z. Rebuttal: You are wrong on X, Y and Z + detailed explanation. Reviewer: I maintain my score. The paper is bad (no explanation given). How is this ever an acceptable behavior? Why does a reviewer think this is fine?
8
2
85
12,045
@peter_richtarik's recent post gave me this idea: As next year yours truly will be partially responsible for reviewing quality at ICML, and you just got your first round of reviews back from named conference, vent for me. I promise to listen.
26
9
87
This is a mini water treatment plant that will be used to optimize the water treatment process using reinforcement learning. It's really awesome to see this happening in Alberta!
We are excited to advance the science of water treatment and AI with our partners @rlai_lab @UAlberta @AmiiThinks @DraytonValley and @ISLengineering! 💧💻 Many thanks to our supporters @ABInnovates @NSERC_CRSNG for this #aiforgood opportunity!
1
5
84
Offline RL is cool, but will it ever work? Next Tuesday, Yunzong Xu (MIT) will put the nail into the coffin of offline RL by showing us the proof of the correctness of a 2019 conjecture by Chen and Jiang that predicted bad bad news for offline RL. tinyurl.com/5n9aedv5
8
84
Replying to @jasondeanlee
He skipped this. Vitanyi & Li's book, or article below gives you the answer. In one formulation, see attached pic, one has that maximum likelihood for a large large class of distributions over one-way infinite sequences is implemented by Kolm-compression link.springer.com/chapter/10…
3
4
84
10,375
While some moments are pretty bleak (CMT mishaps), it warms my heart to see how many people care about @icmlconf. Thank you reviewers and other program committee members and I am looking forward to working with you in the coming year.
84
#NevernendingReviewingSeason What makes a review good? (1) Objective; (2) helps the decision maker; (3) helps the authors; (4) polite. Constructive criticism is the expression. Constructive, not destructive.
2
11
81
Happy to report that it seems chances are really high that we'll record and will post the lectures online. I'll test the tech on Friday to see whether it is able to track me as I zip from board to board.
2
5
81
To the attention of friends of #ReinforcementLearning: After all those years, finally, our home, @rlai_lab from @UofAResearch is live on twitter.
Hello World! This account will share the latest news and updates about what the Reinforcement Learning and Artificial Intelligence (RLAI) Lab at the University of Alberta is up to. Let’s figure out intelligence!
3
5
77
With some glitches, but we are done with the first of the series. Never knew so many people care about RL theory, yay! Great talk Chi Jin! Awesome audience! Next one can only be smoother:) Sign up here if you have not signed up yet: sites.google.com/corp/view/r…
3
6
74
Replying to @thegautamkamath
I grind for my students. And for the love of science and knowledge:) It's not rational, but I can't help it. I am not sure whether this sound honest, but I really never cared about anything but my students and the joy I get from learning new things and connecting to others
2
2
70
3,661
Tired of starring at the pages of the free pdf at banditalgs.com? Want to smell it, flip the pages? Visit the @CambridgeUP booth at #NeurIPS2020 or just head directly to bit.ly/2VPswrk for an incredible 30% discount! #BanditBook
1
4
66
Unsolicited student email: "This is my second reminder. I believe your research team is one of the best positions for me to continue my studies, I would be thankful if you could respond to my initial email." (The student never carefully checked my homepage.) Go figure!
5
2
67
.. and we will finish every day with a bonus talk which brings in the perspective of some particular application. For registration (no fees, just to receive the zoom link) and further details, visit the bootcamp website. simons.berkeley.edu/workshop…
6
68
We often hear about the theory-practice gap. At this workshop we will take a thorough look at this. Is there a gap? What is the nature of the gap? Who made it? Is it good to have the gap? If not, how to close it? I think this is super important for the healthiness of the field!
🧵 Thrilled to announce the #ICML RL workshop 'Aligning RL Experimentalists and Theorists'! We will have several talks and a panel delivered by a super lineup of speakers: @white_martha, @ShamKakade6, @yayitsamyzhang, Dylan Foster, Niao He, @svlevine, and @MengdiWang10. 1/3
1
11
69
8,124
More awesome RL content; Reinforcement Learning, Bit by Bit by Xiuyuan (Lucy) Lu (DeepMind) Date / Time: Lecture 1: 9:30 AM - 10:30 AM (PT), April 20th (Tuesday) Lecture 2: 10:30 AM - 11:30 AM (PT), April 23rd (Friday) rlforum.sites.stanford.edu/t… (Stanford RL forum!)
2
13
65
It's here! This weekend, a fully online, pre-ICML, soothing "RL for real life" 2x3 hours virtual conference! Fantastic invited speakers & panel, moderators. Prepare and submit your questions in advance!!! All credit should go to my incredible coorganizers.
Welcome to RL for Real Life Virtual Conference, June 27-28. sites.google.com/view/RL4Rea…, co-organized with @gabepsilon, Alborz Geramifard, Omer Gottesman, @LihongLi20, Anusha Nagabandi, Zhiwei (Tony) Qin, @CsabaSzepesvari With two panels on general RL and RL+healthcare topics.
6
62
Bandits going strong at UofA! 32 seats in the classroom all taken on the day when they became available.
3
1
66
Now that the #COLT2024 decisions are out, I'd like to announce a workshop that we are organize that will happen just before COLT. The workshop theme is RL Theory. All are welcome! Details here: rltheory-workshop.github.io Please spread the word!
2
20
64
23,188
Illustration, slightly edited to protect anonymity: "paper feels incremental ..putting together well-known ideas in a straightforward manner." What can I say? Previous work missed even these. And straightforward once done. Reviewer also admitted not reading the proof. Great job?!
ICML review rant: The ML community is screwed if we keep insisting that scientific inquiry about known algorithms isn't "novel" (even if it leads to major new capabilities / SoTA), but that engineering yet another new, incremental algorithm that we know nothing about is great.
5
64
Any tips on what to write as a broader impact statement for theory papers to be sent to NeuroIPS? #powerofmath #poweroftheory
10
3
59
1/x Our department has 2 Assistant Professor positions in AI/ML and one in Theoretical Computing Science. Here are the job ads. Our department is a super fun, collegial place. Ads: careers.ualberta.ca/Competit… careers.ualberta.ca/Competit…
1
15
59
The moment when the hope that review quality can be improved appears to be fading into the void.. But: #NeverGiveUp #ICML2022
5
3
60
New post on the inescapable appeal of Bayesian methods in the context of adversarial bandits. Or how Bayesian methods can help the agnostic. Hint: Minimax theorems open wormhole between distant corners of the universe. banditalgs.com/2019/03/17/ba…
16
59
"What information to seek, how to seek that information, and what information to retain?" What else is there to know? A principled approach to this problem will be presented tomorrow by DeepMind's Xiuyuan Lu. Last RL Theory Seminar before the summer break! tinyurl.com/2e2yu873
7
58
One day before reviews are due for Phase 1 at #ICML2022, 50% of the reviewers have submitted zero reviews. The review load for this phase is <=2 papers and there were 19 days for writing these <=2 reviews. What percentage of reviewers will submit all of their reviews in time?
18% 50-69
27% 70-89
12% 90-100
43% just relax Csaba
941 votes • Final results
11
2
58
Asking for a friend: A student wants to pick up intuition about Bregman divergences and their use in convex optimization/online learning. There are lots of excellent texts out there, but is there one that is strong on providing intuition? 1/x
5
3
57
New favourite quote:)
'Just because you've implemented something doesn't mean you understand it.' -- Brian Cantwell Smith
2
57
Exactly what the program committee needs to know! Thanks Mike! :-D
2
55
Super proud of Tor and Andras! It's a delight to have them in the team! The paper can be access from here: proceedings.mlr.press/v134/l…
Huge congratulations to Tor and Andras! Their paper “Improved Regret for Zeroth-Order Stochastic Convex Bandits” was recently recognised for a best paper runner-up award by the flagship learning theory conference, COLT: dpmd.ai/colt21 1/
1
3
54
I am delighted to invite everyone tomorrow for the first RL Theory Seminar talk of 2021 by Andrea Zanette. Andrea will explain to us why and how batch reinforcement learning can be much harder than online RL. For details check out sites.google.com/view/rltheo…
11
54
I got many good comments, suggestions and I have significantly expanded the list. I am quite pleased with the result, RL seems to be doing quite well. Very nice applications and more in the works! Thanks everyone!
This semester I'll teach an undergraduate "intro to RL" course at the UofA. For the first lecture, I collected some exciting, recent, impactful applications of RL. Link to the relevant slides: tinyurl.com/3zw9453p I thought this may be worthwhile to share.
5
55
10,537
Wow, I just discovered this treat: mlstory.org/index.html Moritz Hardt and Ben Recht: "Patterns, predictions, and actions". I will surely recommend this for my students or whoever starts with this subject! Very cool. Thank you @beenwrekt !
3
2
51
8,807
NeurIPS experience: Does anyone enjoy moving around a silly avatar with the speed of a snail in oversized rooms to get to specific posters?
9
52
My typical day..
On the first page of my (1993) PhD Thesis. Still true.
54
Signal boosting; please repost! We need more nominations! There are so many deserving people, **please be generous and send a nomination**! It should not take much time (a short nomination is preferred to none). We are hoping the prize will motivate more to take the math+AI path!
amathr.org/prizes/aiprize/ The Association for Mathematic Research announces "Prize in the Mathematics of Artificial Intelligence". I'm in the selection committee. The goal is to inspire young people to work on the intersection of AI and maths. Nominations to aiprize@amathr.org
1
26
48
17,509
Improper learning? Who would do that? Is not that bad by definition? Not even proper? Come to our seminar to find out what Max Simchowitz thinks about improper learning for non-stochastic control!
Our next talk: 06/30: Max Simchowitz (UC Berkeley) "Improper Learning for Non-Stochastic Control" For details, please see the website: sites.google.com/view/rltheo…
1
6
47
Replying to @thegautamkamath
When I was a PhD student, a few times I was quite discourage by some reviews. SIAM J. Opt told me in 2000 that exploration in finite MDPs is old-fashioned:) Soon enough though, I learned not to pay attention to failures or rejections and focused on positives. ==>
2
1
50
Cool universality argument for SGD with FF neuralnets: Take any learning alg A for learning Boolean functions without noise from a sample of size n. Then there is a NN architecture G(A,n) such that SGD+G(A,n)+Any reasonable loss with sequential processing "implements" A.
A tour de force by Abbe & Sandon, arxiv.org/pdf/2001.02992.pdf "Any function distribution that can be learned from samples in poly-time can also be learned by a poly-size neural net trained with SGD on a poly-time initialization with poly-steps" + "[this] does not hold for GD"
1
7
46
I am very excited to announce that I am joining Deepmind, taking a two year leave. I will miss people in Edmonton, but you should visit!
2
4
49
@neu_rips being featured in @marcgbellemare's talk (awesome talk Marc, by the way!! congrats again for all those involved!!). But Twitter does work, eh?
1
45
Replying to @beenwrekt
You mean no progress? Nah.. Btw, I like the style of some of these old papers that describe some unbaked idea for what they are, not trying to oversell them, making them look bigger than what they are (eg a heuristic is a heuristic..). Papers of this type won't make it today.
2
1
47
No measure theory required and martingales mentioned 76 times. It must be about discrete stuff. But no, it is not at all. So how does this work? LOL (This is from the Meyn and Tweedie book about Markov chains, which I love regardless. It seems Sean is not on twitter anymore!?!)
1
3
46
5,275
Replying to @yisongyue
Research is done in many small steps. You may think something goes unnoticed, but it may have influenced someone, who gets a new idea, writes another small thing. This leads to the next thing. Wait 20 years, the many little things add up and a much cleaner, deeper ==>
1
2
46
You must see this, new webpage! sites.ualberta.ca/~szepesva/ ..after the service I have previously used to compile my publications-page stopped working (dire times..), put together in a day with the help of bibbase.org and jemdoc.jaboc.net
4
45
..and next week we take a break to let the "Deep RL meets theory" workshop to take the stage! Check out the program at: simons.berkeley.edu/workshop… Do not forget to put all these events in your calendar! The most convenient way to do this is to go here: simons.berkeley.edu/workshop…
We are glad to announce that we are now officially part of the "Theory of RL" program at the Simons Institute! See our updated schedule that now includes two new speakers and the RL theory workshops at @SimonsInstitute.
7
43
A frequent issue in batch RL is that evaluation methods are biased and the size of the bias is unknown. Come and join us tomorrow to learn from Yi Su about how to build optimizers that do almost as well as if the bias was known! For details: tinyurl.com/v5s68k5c
1
10
42
I guess I'll be out from here; you know where to find me. I'll probably check back time to time for the odd messages, but this will wind down and stop eventually. There are rules about how many social media accounts one should keep alive. Thx!
1
38
4,996
Aaditya Ramdas (not on twitter; good for him) is coediting a special issue for MLJ on "Conformal Prediction and Distribution-Free Uncertainty Quantification". Deadline Nov 30. Consider submitting if you have something! I will be looking forward to see what comes out of this!
2
5
38
6,889
For those who like books, I also love the Anthony-Bartlett book stat.berkeley.edu/~bartlett/… While it is quite short, it explains soo much about how SLT has evolved over the years!
6
43
Proud of my colleagues, winning an IJCAI distinguished paper award! Go @GoogleDeepMind @UAlbertaCS @AmiiThinks !
What do you get when you cross modern Machine Learning with good old-fashioned Search? An IJCAI distinguished paper award 🙂 for Levin Tree Search with Context Models: aihub.org/2023/08/23/congrat…
1
2
36
9,285
Representation learning and exploration in RL together? Aditya Modi got you covered! Details? Well, you should come to the next talk! For details visit: tinyurl.com/1gl2z6cc
2
41
Advice for people thinking of registering an email address at CMT or other similar reviewing systems: Register an email that is NOT associated with your school/workplace. School and workplace change. Then you will end up with multiple identities, which is not what you want:)
2
1
43
Very happy for this! What a spectacular future for @UAlberta / @UAlbertaCS and @AmiiThinks !
A packed house to hear @BFlanaganUofA from the @UAlberta and @AmiiThinks announce that 20 new faculty will be hired in AI across campus in the next 3 years, with 5 of these positions in CS.
1
40
3,939
I hope everyone enjoyed ICLR. As promised, RL Theory seminars are back and we are super lucky to have Kwang-Sung Jun fixing our bad ideas about how to use Boltzmann exploration via the help of the mysterious "Maillard sampling" idea. Intrigued? Check out tinyurl.com/4wzdxb2m
8
40
Why do we use softmax to represent policies? Could we use some other "transfer" function? Which one? Pros/cons? Come to see our posters to hear about the gravitational pull of softmax and how physicist are always right! I can't guarantee to be up at the time of the oral though:)
Come hear Jincheng Mei, Chenjun Xiao, @daibond_alpha, @LihongLi20, @CsabaSzepesvari, Dale Schuurmans talk about "Escaping the Gravitational Pull of Softmax" on Tuesday. Oral: 0715–0730 MST Poster: 10–12pm MST Link: nips.cc/virtual/2020/protect… #NeurIPS2020
1
2
37
Ladies and gentlemen! We are delighted to give you OPPO, optimistic policy optimization (very much related to the previous talk by the way!) to achieve efficient and effective exploration with linear function approximation in finite horizon MDPs as presented by Zhuoran Yang!
Our next talk: 09/22: Zhuoran Yang (Princeton) "Provably Efficient Exploration in Policy Optimization" For details, please see the website: sites.google.com/view/rltheo…
4
40
Replying to @pcastr
SOMs are an awesome example of how curiosity driven research looks like. Neither neuroscience, nor solving any real problem. Yet, one can still write books about SOMs, think about them in various ways, etc. Sg to remember when judging relevance while reviewing!
2
38
Our chance to stay positive during these dire times is to attend Simon's seminar tomorrow where I hope we learn that despite all other signs RL is not much harder than bandits. Long live RL, long live bandits!
Our next talk: 11/24: Simon S. Du (University of Washington) "Is Reinforcement Learning More Difficult Than Bandits? A Near-optimal Algorithm Escaping the Curse of Horizon" For details, please see the website: sites.google.com/view/rltheo…
1
2
39
Please join us and Matthieu to hear about breaking news about how averaging and regularization work together to make your RL algorithms go faster!
Reminder: this talk is coming up tomorrow! ***Note that the talk starts at 4PM UTC, one hour earlier than our regular time slot*** Public YouTube link: piped.video/watch?v=DfJHL7Ij… Sign up for the talk on Google Meet: forms.gle/zXy2dpapg2PzHjvb9
3
38
Huge congratulations to my colleagues at @DeepMind! This is a really awesome achievement!
In a major scientific breakthrough, the latest version of #AlphaFold has been recognised as a solution to one of biology's grand challenges - the “protein folding problem”. It was validated today at #CASP14, the biennial Critical Assessment of protein Structure Prediction (1/3)
40
Huge improvements for the sample complexity of RL for representation learning in low-rank (linear) MDPs! How? Why? Really? Come check out the seminar of Masatoshi Uehara tomorrow! For details follow this link: tinyurl.com/5n9aedv5
2
39
It is a great pleasure to have Fei Feng from UCLA speaking at our next seminar. Join us to learn about how to combine RL and unsupervised learning and keep everything provably efficient!
Our next talk: 07/07: Fei Feng (UCLA) "Provably Efficient Exploration for RL with Unsupervised Learning" For details, please see the website: sites.google.com/view/rltheo…
5
37
Join us on Tuesday to hear from Mengdi about the latest and greatest lower and upper bounds in off-policy evaluation with linear function approximation!
Our next talk: 08/04: Mengdi Wang (Princeton / DeepMind) "Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation" For details, please see the website: sites.google.com/view/rltheo…
6
37
We are delighted to have Shie give the next RL Theory Virtual Seminar. I hope to see many of you online at the seminar.
Our next talk: 06/09: Shie Mannor (Technion) "Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs" For details, please see the website: sites.google.com/view/rltheo…
4
38
Gentle reminder, this talk is happening tomorrow! I hope to see many of you there:)
Our next talk: 06/16: Niao He (UIUC) "A Unified Switching System Perspective and O.D.E. Analysis of Q-Learning Algorithms" For details, please see the website: sites.google.com/view/rltheo…
7
37
Perhaps better to focus on what needs to be done than on who is doing it or whether we call it RL or anything else. But I am glad you recognize that some sort of planning with models (or not?) will be needed! We are on the same page with this one. And Merry Christmas!! 2/2
2
1
38
2,846
Yours truly talks RL.. Thanks @TalkRLPodcast /Robin for having me!!
Episode 10 @CsabaSzepesvari of DeepMind shares his views on Bandits, Adversaries, PUCT in AlphaGo / AlphaZero / MuZero, AGI and RL, what is timeless, and more! talkrl.com/episodes/csaba-sz…
2
37