Student of mind and nature, libertarian, chess player, cancer survivor. @ Keen, UAlberta, Amii, Openmindresearch.org, The Royal Society, Turing Award

Edmonton, Alberta, Canada
AI researchers seek to understand intelligence well enough to create beings of greater intelligence than current humans. Reaching this profound intellectual milestone will enrich our economies and challenge our societal institutions. It will be unprecedented and transformational, but also a continuation of trends that are thousands of years old. People have always created tools and been changed by them; this is what humans do. The next big step is to understand ourselves. This is a quest grand and glorious, and quintessentially human.
80
153
1,028
280,299
Free Palestine.
258
1,769
18,850
574,106
Stand with the people of Iran.
351
2,303
12,247
Dwarkesh and I had a frank exchange of views. I hope we moved the conversation forward. Dwarkesh is a true gentleman.
.@RichardSSutton, father of reinforcement learning, doesn’t think LLMs are bitter-lesson-pilled. My steel man of Richard’s position: we need some new architecture to enable continual (on-the-job) learning. And if we have continual learning, we don't need a special training phase - the agent just learns on-the-fly - like all humans, and indeed, like all animals. This new paradigm will render our current approach with LLMs obsolete. I did my best to represent the view that LLMs will function as the foundation on which this experiential learning can happen. Some sparks flew. 0:00:00 – Are LLMs a dead-end? 0:13:51 – Do humans do imitation learning? 0:23:57 – The Era of Experience 0:34:25 – Current architectures generalize poorly out of distribution 0:42:17 – Surprises in the AI field 0:47:28 – Will The Bitter Lesson still apply after AGI? 0:54:35 – Succession to AI
79
203
3,584
654,990
awards.acm.org/about/2024-tu… Machines that learn from experience were explored by Alan Turing almost eighty years ago, which makes it particularly gratifying and humbling to receive an award in his name for reviving this essential but still nascent idea.
155
336
2,813
227,569
"What we want is a machine that can learn from experience." ---Alan Turing, 1947
65
291
2,206
111,966
My acceptance speech at the Turing award ceremony: Good evening ladies and gentlemen. The main idea of reinforcement learning is that a machine might discover what to do on its own, without being told, from its own experience, by trial and error. As far as I know, the first person to propose this was Alan Turing in 1947, which makes it particularly gratifying and humbling to receive this award in his name for reviving this essential but still nascent idea. I have three people that I would like to particularly thank. First, Andy Barto. As my PhD supervisor he taught me my whole approach to science, and in particular instilled in me an appreciation of scholarship and craft, and of the great breath of prior work. Second, I would like to thank Oliver Selfridge, my other main mentor; sadly, now deceased. Oliver taught me how keeping ideas simple can be the boldest of all ambitions. Third, I want to thank Martha Steenstrup, my life partner and intellectual sparring partner. She keeps me honest and grounded. Finally, I also want to thank the University of Alberta, which has been an ideal environment for me and for reinforcement learning research these past 22 years. These three people and my university have reinforced in me the ambition to have ideas that matter, without getting too full of myself about it. They taught me that the quest for better ideas is serious, but is best approached playfully, with humility, kindness, and optimism. For this I am eternally grateful. I would also like to thank all of you for being here and for celebrating the pursuit of intellectual excellence. Thank you very much.
62
223
2,273
183,538
It turns out the Turing Award is actually a silvery bowl from Tiffanys.
55
99
2,212
123,063
I've studied intelligence all my long life, yet still I feel I learned important things about intelligence by reading this book. Thank you, Max Bennett.
23
202
2,042
191,250
All the more so.
Replying to @RichardSSutton
Dear Prof.Sutton, I recently bought one of your classic reinforcement learning book. But I would like to ask you, in the current era when deep reinforcement learning and large language models are prevalent, is it still necessary to read this book carefully?
23
131
1,704
148,190
Learning is the derivative of knowledge.
90
134
1,591
109,813
If you take all the fields that study intelligent decision making—from neuroscience to AI, psychology to control theory, economics to operations research—do their theories have much in common? I think so, as I explain in this new short paper: arxiv.org/pdf/2202.13252.pdf
8
263
1,476
Dwarkesh Patel is 100% right on this: AI's utility is very strongly dependent on continual learning. piped.video/nyvmYnz6EAg?si=D2v2…
49
128
1,481
434,867
The original RL algorithms, inspired by natural learning, were online and incremental—they were streaming in the sense that they learned from each increment of experience as it happened, then discarded it, never to be processed again. The streaming algorithms were simple and elegant, but the first big successes of RL in deep learning were not with streaming algorithms. Instead, methods such as DQN chopped the stream of experience into individual transitions, then stored and sampled them in arbitrary batches. Subsequent work followed, extended, and refined the batch approach into asynchronous and offline RL, while the streaming approach languished, unable to produce good results in popular deep learning domains. Until now. Now researchers at the University of Alberta have shown that streaming RL algorithms can work just as well as DQN on Atari and Mujoco tasks (arxiv.org/pdf/2410.14606). How did they do it? Mostly just by getting signal normalization and step-size bounding right for the streaming case—otherwise they use standard streaming algorithms like TD(lambda) and Q(lambda). To me it looks like they were simply the first researchers knowledgeable of streaming RL algorithms to seriously address deep RL without being over-influenced by batch-oriented software and batch-oriented supervised-learning ways of thinking.
Would you believe that deep RL can work without replay buffers, target networks, or batch updates? Our recent work gets deep RL agents to learn from a continuous stream of data one sample at a time without storing any sample. Joint work with @Gautham529 and @rupammahmood.
16
226
1,383
129,544
You were never alone, Gary, though you were the first to bite the bullet, to fight the good fight, and to make the argument well, again and again, for the limitations of LLMs. I salute you for this good service!
36
74
1,192
666,516
The case for ambition in artificial intelligence research: Within your lifetime, AI researchers will understand the principles of intelligence—what it is and how it works—well enough to create beings of far greater intelligence than current humans.
66
135
1,062
Modern Americans, you are not responsible for slavery. You are not responsible for stealing the land and killing almost all native Americans. Those things happened long ago, before you were born. But you are responsible for the stealing of land and the genocide of Palestinians by the state of Israel. These are directly caused by your bombs and your votes today.
59
141
1,040
121,607
To learn more about temporal difference learning, you could read the original paper (incompleteideas.net/papers/s…) or watch this video (videolectures.net/videos/dee…).
The Dwarkesh/Andrej interview is worth watching. Like many others in the field, my introduction to deep learning was Andrej’s CS231n. In this era when many are involved in wishful thinking driven by simple pattern matching (e.g., extrapolating scaling laws without nuance), it’s refreshing to hear an influential voice that is tethered to reality. One clarification for the podcast is that when Andrej says humans don’t use reinforcement learning, he is really saying humans don't use returns as learning targets. His example of LLMs struggling to learn to solve math problems from outcome-based rewards also elucidates the problem with learning directly from returns. Fortunately for RL, this exact problem is solved by temporal difference (TD) learning. All sample-efficient RL algorithms that show human-like learning (e.g., sample-efficient learning on Atari, and our work on learning from experience directly on a robot) rely on TD learning. Now Andrej is not primarily an RL person; he is looking at RL through the lens of LLMs these days, and all RL done in LLMs uses returns as targets, so it’s understandable that he is assuming that RL is all about learning from observed returns. But this assumption leads him to the incorrect conclusion that we need process-based dense rewards for RL to work. If you embrace TD learning, then you don't necessarily need a dense reward. Once you have learned a value function that encodes useful knowledge about the world, you can learn on the fly in the absence of rewards, just like humans and animals. This is possible because in TD learning there is no difference between learning from an unexpected reward and learning from an unexpected change in perceived value.
19
118
1,060
159,649
A new pdf of Andy Barto's and my reinforcement learning textbook is released today. Only minor typo-like corrections. See incompleteideas.net/book/the….
12
181
1,013
David Silver really hits it out of the park in this podcast. The paper "Welcome to the Era of Experience" is here: goo.gle/3EiRKIH.
Human generated data has fueled incredible AI progress, but what comes next? 📈 On the latest episode of our podcast, @FryRsquared and David Silver, VP of Reinforcement Learning, talk about how we could move from the era of relying on human data to one where AI could learn for itself. Watch now → 00:00 Introduction 01:50 Era of experience 03:45 AlphaZero 10:19 Move 37 15:20 Reinforcement learning and human feedback 24:30 AlphaProof 29:50 Math Olympiads 35:00 Experience based methods 42:56 Hannah's reflections 44:00 Fan Hui joins
19
181
1,035
182,705
This i did not expect. Cool.
Perhaps the most important thing you can read about AI this year : “Welcome to the Era of Experience”   This excellent paper from two senior DeepMind researchers argues that AI is entering a new phase—the "Era of Experience"—which follows the prior phases of simulation-based learning and human data-driven AI (like LLMs). The authors’ posit that future AI breakthroughs will stem from learning through direct interaction with the world, not from imitating human-generated data. This is not a theory or distant future prediction. It’s a description of a paradigm shift already in motion. Let me know what you think !   storage.googleapis.com/deepm…
17
60
991
127,292
If you want others to care about what you think, then start by caring yourself. Get a notebook, write your thoughts down, challenge them, and develop them into something worth sharing.
10
86
932
Rich's slogans for AI research (revised 2006): 1. Approximate the solution, not the problem (no special cases) 2. Drive from the problem 3. Take the agent’s point of view 4. Don’t ask the agent to achieve what it can’t measure 5. Don't ask the agent to know what it can't verify 6. Set measurable goals for subparts of the agent 7. Discriminative models are usually better than generative models 8. Work by orthogonal dimensions. Work issue by issue 9. Work on ideas, not software 10. Experience is the data of AI incompleteideas.net/rlai.cs.…
12
158
907
60,390
It is sad to lose the DeepMind office in Edmonton to the Tech layoffs and looming recession. But AI is not going away, and I am more focused than ever on the Alberta Plan for AI research. arxiv.org/abs/2208.11173
6
74
719
149,609
Blue laser eyes. I am laser focused on understanding intelligence, ignoring all the hype and FUD. (Bitcoin is pretty cool too)
31
56
692
284,554
Everything new is also old. This from my 1984 PhD thesis: "AI is an experimental science, yet the complexity of its programs and problem domains often makes the interpretation of results very difficult. Programs often contain so many components and parameters that limitations on computer time and the sheer number of possibilities make it impossible to experimentally evaluate how each contributes to performance." Then I argued, just as I do today, for careful empirical studies in simplified settings that enable better scientific understanding.
19
88
717
51,736
Replying to @GPUmonk
Read the textbook.
22
46
693
48,593
💯
Everyone posting about the Dwarkesh interview (including Dwarkesh himself!) is missing this subtle point. When LLMs imitate, they imitate the ACTION (ie the token prediction to produce the sequence). When humans imitate, they imitate the OUTPUT but must discover the action
27
46
656
101,293
I was happy to give a more technical talk on how we might create an AI at RLC-2025 and AGI-2025 (video below). The Oak Architecture: A Vision of Super-Intelligence from Experience As AI has become a huge industry, to an extent it has lost its way. What is needed to get us back on track to true intelligence? We need agents that learn continually. We need world models and planning. We need knowledge that is high-level and learnable. We need to meta-learn how to generalize. The Oak architecture is one answer to all these needs. It is a model-based RL architecture with three special features: 1) all of its components learn continually, 2) each learned weight has a dedicated step-size parameter that is meta-learned using online cross-validation, and 3) abstractions in state and time are continually created in a five-step progression: Feature Construction, posing a SubTask based on the feature, learning an Option to solve the subtask, learning a Model of the option, and Planning using the option’s model (the FC-STOMP progression). The Oak architecture is rather meaty; in this talk we give an outline and point to the many works, prior and contemporaneous, that are contributing to its overall vision of how super-intelligence can arise from an agent’s experience. piped.video/live/XqYTQfQeMrE…
21
103
679
65,045
Lots of exaggeration about AI lately. The hype is that LLMs have anything to do with intelligence. The FUD is that AIs will enslave us. I like this cartoon in the New Yorker because it suggests the ridiculousness of both memes.
29
111
616
481,034
Replying to @eigenrobot
Even in birdsong learning in zebra finches the motor actions are not learned by imitation. The auditory result is reproduced, not the actions; in this crucial way it differs from LLM training.
34
27
613
316,984
I agree 100%
Animals and humans get very smart very quickly with vastly smaller amounts of training data. My money is on new architectures that would learn as efficiently as animals and humans. Using more data (synthetic or not) is a temporary stopgap made necessary by the limitations of our current approaches.
11
35
568
278,632
I kind of wish Geoff Hinton would write a brief article like this one by Claude Shannon in 1956: ieeexplore.ieee.org/stamp/st…
13
72
531
I’ve changed so little. From my 1978 Bachelor’s thesis: “The adult human mind is very complex, but the question remains open whether the learning processes that constructed it in interaction with the environment are similarly complex. Much evidence and many peoples’ intuitions suggest that the learning processes are in fact simple and that the adult mind’s complexity is due to a long history of adaptive interaction with a complex environment.”
10
63
541
46,954
Still timely
Lots of exaggeration about AI lately. The hype is that LLMs have anything to do with intelligence. The FUD is that AIs will enslave us. I like this cartoon in the New Yorker because it suggests the ridiculousness of both memes.
27
47
530
64,298
The one-step trap (in AI research) The one-step trap is the common mistake of thinking that all or most of an AI agent’s learned predictions can be one-step ones, with all longer-term predictions generated as needed by iterating the one-step predictions. The most important place where the trap arises is when the one-step predictions constitute a model of the world and of how it evolves over time. It is appealing to think that one can learn just a one-step model and then “roll it out” to predict all the longer-term consequences of a way of behaving. The one-step model is thought of as being analogous to physics, or to a realistic simulator. The appeal of this mistake is that it contains a grain of truth: if all one-step predictions can be made with perfect accuracy, then they can be used to make all longer-term prediction with perfect accuracy. However, if the one-step predictions are not perfectly accurate, then all bets are off. In practice, iterating one-step predictions usually produces poor results. The one-step errors compound and accumulate into large errors in the long-term predictions. In addition, computing long-term predictions from one-step ones is prohibitively computationally complex. In a stochastic world, or for a stochastic policy, the future is not a single trajectory, but a tree of possibilities, each of which must be imagined and weighted by its probability. As a result, the computational complexity of computing a long-term prediction from one-step predictions is exponential in the length of the prediction, and thus generally infeasible. The bottom line is that one-step models of the world are hopeless, yet extremely appealing, and are widely used in POMDPs, Bayesian analyses, control theory, and in compression theories of AI. The solution, in my opinion, is to form temporally abstract models of the world using options and GVFs, as in the following references. Sutton, R.S., Precup, D., Singh, S. (1999). Between MDPs and semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning. Artificial Intelligence 112:181-211. Sutton, R. S., Modayil, J., Delp, M., Degris, T., Pilarski, P. M., White, A., Precup, D. (2011). Horde: A scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction. In Proceedings of the Tenth International Conference on Autonomous Agents and Multiagent Systems, Taipei, Taiwan. Sutton, R. S., Machado, M. C., Holland, G. Z., Timbers, D. S. F., Tanner, B., & White, A. (2023). Reward-respecting subtasks for model-based reinforcement learning. Artificial Intelligence 324.
17
64
513
59,413
If you are looking to conduct research full-time on the foundations of AI, and • you have read the RL textbook and done the exercises, • you agree with the Alberta Plan for AI Research, • you already have a PhD, • you are open to spending some time in Edmonton, then the Openmind Research Institute is looking for you and would be pleased to receive your application for a research fellowship. These criteria are meant to be a high bar, and there are only a few positions; don't apply unless you meet all the criteria. Openmind doesn't pay industry salaries, but in compensation aims to conduct research that truly matters in the long run. Oh, and there is one other catch: all your research must be published in the open scientific literature. openmindresearch.org
12
66
507
77,771
In war, both sides lose. That we don’t learn this is the greatest tragedy.
31
73
868
53,389
My nuanced views on AI alignment are still often caricatured, so perhaps its a good time to repost this 15-minute talk in which I presented them directly: piped.video/watch?v=Hnt-oBA0… The short version is that I don't agree with AI-safety folks about what question we should be asking. Rather than asking how we can control the goals of the AIs, I think we should be asking how we can have a good future without controlling their goals (just as we have a pretty good present without controlling other peoples' goals). @steve47285
25
73
495
78,469
My favorite conference is a small one: The Multi-disciplinary Conference on Reinforcement Learning and Decision Making. It works best if only those with a genuine interest in crossing disciplines attend.
5
44
476
The PhD thesis of my _first_ PhD student, Doina Precup, is at-long-last available in digital form. Title: Temporal Abstraction in Reinforcement Learning Url: incompleteideas.net/papers/P… Abstract: Decision making usually involves choosing among different courses of action over a broad range of time scales. For instance, a person planning a trip to a distant location makes high-level decisions regarding what means of transportation to use, but also chooses low-level actions, such as the movements for getting into a car. The problem of picking an appropriate time scale for reasoning and learning has been explored in artificial intelligence, control theory and robotics. In this dissertation we develop a framework that allows novel solutions to this problem, in the context of Markov Decision Processes (MDPs) and reinforcement learning. In this dissertation, we present a general framework for prediction, control and learning at multiple temporal scales. In this framework, temporally extended actions are represented by a way of behaving (a policy) together with a termination condition. An action represented in this way is called an _option_. Options can be easily incorporated in MDPs, allowing an agent to use existing controllers, heuristics for picking actions, or learned courses of action. The effects of behaving according to an option can be predicted using multi-time models, learned by interacting with the environment. In this dissertation we develop multi-time models, and we illustrate the way in which they can be used to produce plans of behavior very quickly, using classical dynamic programming or reinforcement learning techniques. The most interesting feature of our framework is that it allows an agent to work simultaneously with high-level and low-level temporal representations. The interplay of these levels can be exploited in order to learn and plan more efficiently and more accurately. We develop new algorithms that take advantage of this structure to improve the quality of plans, and to learn in parallel about the effects of many different options. Where now: Doina is a professor of computer science at McGill University and head of the Montreal office of Google DeepMind
5
48
492
34,446
For those really into it, here are another 50 minutes of my views on planning and action selection in options-based AI agents (like in the Oak architecture). piped.video/watch?v=eJSoV2fS…
14
60
495
77,703
The PhD thesis of my 14th PhD student, Khurram Javed (@KhurramJaved_96), is now available. Title: Real-time Reinforcement Learning for Achieving Goals in Big Worlds Url: incompleteideas.net/papers/j… Abstract: In this dissertation, I motivate the need for real-time learning and propose algorithms that can learn in real time. I argue that such algorithms are needed for achieving goals in large and partially observable environments—big worlds. I then present my algorithms, developed in collaboration with others, in two parts. In Part I, I present algorithms that can learn quickly and reliably in the linear function approximation setting. I introduce an algorithm for learning temporal predictions—SwiftTD—and use it to develop an algorithm for decision-making—SwiftSarsa. The key property of these algorithms is that they can learn with large step-size parameters online without the instability associated with quick online learning. In Part II, I present algorithms for learning non-linear recurrent features efficiently. I introduce the idea of continual imprinting for generating useful candidate features, and I present an algorithm for efficiently computing the gradients of recurrent features online. Khurram is now a research scientist at Keen Technologies.
9
44
477
44,608
Yes, the agent architectures that Yann LeCun and I work on are both instances of “the common model of the intelligent agent”. And it’s not just an AI thing. You can find the same ideas in psychology, economics, control theory, and neuroscience. See arxiv.org/pdf/2202.13252.pdf
These two diagrams share a lot of similarities
11
70
464
76,526
In 1993, it was looking like the internet was actually going to be a thing, so I made a homepage for myself. This is what I wrote for my personal statement: "I am seeking to identify general computational principles underlying what we mean by intelligence and goal-directed behavior. I start with the _interaction_ between the intelligent agent and its environment. Goals, choices, and sources of information are all defined in terms of this interaction. In some sense it is the only thing that is real, and from it all our sense of the world is created. How is this done? How can interaction lead to better behavior, better perception, better models of the world? What are the computational issues in doing this efficiently and in realtime? These are the sort of questions that I ask in trying to understand what it means to be intelligent, to predict and influence the world, to learn, perceive, act, and think." The point being that is was always all about experience for me.
10
42
434
25,397
In my recent talk at the Upperbound conference, I included a slide in which I tried to be a realist about the arrival of AI, setting aside what we might want to happen or what we might fear will happen, and just ask what _will_ happen (as in John Mearsheimer's "realist" school of geo-politics). Full talk: piped.video/FLOL2f4iHKA
20
55
442
82,565
DeepMind Alberta is hiring research scientists this year. Come join us in understanding and creating interactive, playful AI. deepmind.com/careers/jobs/88…
6
66
402
Well said
Replying to @karpathy
@karpathy I think it would be good to distinguish RL as a problem from the algorithms that people use to address RL problems. This would allow us to discuss if the problem is with the algorithms, or if the problem is with posing a problem as an RL problem. 1/x
6
14
416
96,265
Yeah, I misspoke there. I meant to say that I don’t think learning is about *training*. Learning is something that the agent does, whereas training is something done to it.
14
17
411
33,288
We should prepare for, but not fear, the inevitable succession from humanity to AI, or so I argue in this talk pre-recorded for presentation at WAIC in Shanghai. piped.video/NgHFMolXs3U
60
64
380
436,811
And the new Superintelligence Research Lab will be centered in... Edmonton!
Launching our Research Lab : Advancing experience powered, decentralized superintelligence - built for continual learning, generalization & model-based planning. Press Release : businesswire.com/news/home/2… We’re solving the hardest challenges in real-world industries, robotics, science … unlocking true intelligence that learns from experience. #Superintelligence #AIResearch #ReinforcementLearning #TrueRL #ContinualLearning #ModelBasedPlanning #DecentralizedAI #ExperientialAI #EnterpriseAI
19
26
422
88,406
My colleague Rupam Mahmood explains from first principles his groundbreaking work on Streaming Deep Reinforcement Learning: piped.video/QOfkOl9QrZY?si=6qMV…
7
50
399
50,189
The PhD thesis of my 12th PhD student, Abhishek Naik, is now available. Title: Reinforcement Learning for Continuing Problems Using Average Reward Url: incompleteideas.net/papers/N… Abstract: This dissertation develops simple and practical learning algorithms from first principles for long-lived agents. Formally, the algorithms are developed within the reinforcement learning framework for continuing (non-episodic) problems, in which the agent-environment interaction goes on ad infinitum, with the goal of maximizing the average reward obtained per step. The average-reward formulation is under-studied in reinforcement learning with several important open problems. The first contribution of this dissertation involves the development of foundational one-step average-reward learning methods for prediction and control. The central idea involves using the TD error to estimate the average reward, which enables proofs for convergence in both the on- and off-policy tabular settings. Experimental results show that the algorithms’ performance is robust to the values of their parameters. Next, we extend the above one-step prediction algorithm to make multi-step updates using eligibility traces, because multi-step methods can be more sample-efficient. Based on the analysis of a related algorithm, we prove convergence in the on-policy setting with linear function approximation. We also show the first convergence proof in the off-policy setting for a multi-step tabular average-reward prediction algorithm. Finally, we show that standard discounted algorithms can be significantly improved if their rewards are centered by subtracting out the rewards’ empirical average, which could be changing with time in the control problem. We discuss two ways of estimating the average reward that can be used with any standard discounted algorithm and demonstrate the benefts of reward centering with tabular, linear, and non-linear function approximation.
9
27
374
48,643
I was thinking about how fractious AI research is. This sentence from Kuhn’s “The Structure of Scientific Revolutions” (1962) is apropos and succinct: “History suggests that the road to a firm research consensus is extraordinarily arduous.”
10
39
358
I am proud to announce the graduation of my sixth PhD student. Sina Ghiassian is an expert in the design and empirical study of off-policy reinforcement learning algorithms. Reach out to him at ghiassia@ualberta.ca or @sina_ghiassian.
10
16
357
Everything you know about the world is a belief about the statistics of your sensory input and how they depend on your output. There is nothing more to it, and understanding knowledge in this sense is one key to creating AI.
29
43
352
39,573
Yi Wan will be my eighth PhD student to graduate this spring, and is on the job market now. His research speciality is RL algorithms that maximize the average reward per step. Such algorithms are rarely used today, but are better in all ways. sites.google.com/ualberta.ca…
6
13
342
96,263
Fans of The Bitter Lesson may be interested in this talk from 2018 (recently re-discovered) which includes its first public presentation, at 30:40. piped.video/tUCJ4UsKU2I?si=ubbY…
10
54
345
34,905
There are a lot of things wrong with this world… but too much intelligence is not one of them.
10
54
314
206,220
“Nature never appeals to intelligence until habit and instinct are useless. There is no intelligence where there is no change and no need of change.” —H. G. Wells, The Time Machine
7
49
324
18,399
Last night we threw Yi Wan out of my research group (and today he started his travel to Seattle and Meta).
1
6
313
45,753
True dat
now that RL is hot again, you should all register for RLC and come visit Edmonton in August rl-conference.cc/index.html
4
20
323
26,979
Levels of explanation. Level 1 is physics. Level 2 is biology/evolution. Level 3 is the mind. (I study level 3.) Level 4 is the economy. Is there a level 5?
57
22
303
ACM has made an excellent video introduction to reinforcement learning!
2024 @TheOfficialACM A.M. Turing Award recipients @RichardSSutton and Andrew G. Barto discuss their #careers and their work on reinforcement learning in an #original #video at bit.ly/43fpe4q
3
40
305
23,766
Marc Andreessen: I’ve DMed you.
21
10
290
91,335
I recently gave a keynote talk at an exciting new conference: CoLLAs, the conference on life-long learning agents. My talk was on Maintaining Plasticity in Deep Continual Learning, and the slides can be found here: incompleteideas.net/Talks/Ta…
4
34
258
This thread in Chinese does indeed seem to accurately communicate the main points of David Silver’s and my short paper on the Era of Experience. Thanks @AnneXingxb!
1/6 TheBitter RL 今天,RL太🔥了,RLHF更是毕业利器。 但 @RichardSSutton@GoogleDeepMind 的Welcome to the Era of Experience 犹如TheBitterLesson的续章给我们当头一棒。 经历过模拟时代, 享受过人类数据时代, 如今我们正踏入经验时代 不靠模仿,不靠学习,而靠“活过”。 #AI范式 #RL
7
40
271
30,339
The research team at Openmind now consists of one director and four fellows. With no fanfare and no hype, they go about researching AI exactly how they think will be most productive. openmindresearch.org/
5
17
263
20,341
Intelligence is the computational part of an agent’s ability to learn to predict and control its input stream (particularly its reward) in interaction with its environment.
8
34
251
"In a world of change, the learners shall inherit the earth, while the learned shall find themselves perfectly suited for a world that no longer exists." - Eric Hoffer, philosopher and author
7
29
249
13,068
Replying to @sprk_77
Not at all. The point of the bitter lesson is that the right learning algorithms (those that scale efficiently with massive computation) are exactly what we need. Massive computation does not alleviate the need for data efficiency.
5
31
256
45,428
I have just completed my NSERC Discovery Grant proposal, describing the research I'd like to do for the next five years. It can be read at incompleteideas.net/NSERCtec…. FYI.
3
29
244
The PhD thesis of my 13th PhD student, Kris De Asis (@M33pinator), is now available. Title: Explorations in the Foundations of Value-based Reinforcement Learning Url: incompleteideas.net/papers/K… Abstract: Value-based reinforcement learning is an approach to sequential decision making in which decisions are informed by learned, long-horizon predictions of future reward. This dissertation aims to understand issues that value-based methods face and develop algorithmic ideas to address these issues. It details three areas of contribution toward improving value-based methods. The first area of contribution extends temporal difference methods for fixed-horizon predictions. Regardless of problem setting, using fixed-horizon approximations of the return avoids the well-documented stability issues which plague off-policy temporal difference methods with function approximation. The second area of contribution introduces a framework of value-aware importance weights for off-policy learning and derives a minimum-variance instance of them. This alleviates variance concerns of importance sampling-based off-policy corrections. Lastly, the third area of contribution acknowledges a discrepancy between the discrete-time and continuous-time returns when viewing one as an approximation of the other, and proposes a modification to better align the objectives. This provides improved prediction targets, and when faced with variable time-discretization, improves control performance in terms of an underlying integral return. Where now: Kris is a research fellow at openmindresearch.org
4
17
243
30,927
The Pandemonium paper is seminal, but a little hard to find; here is a pdf: incompleteideas.net/papers/p…
Replying to @RichardSSutton
I recently had the pleasure of having to read Selfridge's Pandemonium. What an amazing mind he had.
6
25
249
31,256
When there is a war, both sides have failed.
25
14
228
The special thing about life is that it has a now.
5
16
234
More on LLMs, RL, and the bitter lesson, on the Derby Mill podcast.
6
17
235
44,306
The short paper "Welcome to the Era of Experience" is literally just released, like this week. Ultimately it will become a chapter in the book 'Designing an Intelligence' edited by George Konidaris and published by MIT Press. goo.gle/3EiRKIH
6
51
253
27,975
This is what we have been up to at Keen
The video of my talk at Upper Bound 2025 is up: piped.video/rQ-An5bhkrs?si=y9DP…
6
11
214
33,443
My tenth PhD student, Banafsheh Rafiee, just defended her thesis “State Construction in Reinforcement Learning”, in which she introduced three diagnostic testbeds based on animal learning experiments and the first generate-and-test algorithm for discovering auxiliary subtasks. She is currently looking for a research scientist position. PhD thesis: drive.google.com/file/d/1sxa… Linkedin page: ca.linkedin.com/in/banafsheh…  Google scholar page: scholar.google.ca/citations?… Email: rafiee.banw@gmail.com
1
15
197
27,203
Intelligence is the computational part of the ability to predict and control a sensory input stream. Adapted from John McCarthy's 1997 definition, see incompleteideas.net/papers/S…
6
31
205
A year later and our work on Loss of Plasticity is finally published, in Nature no less! The Nature version is totally rewritten and has many new results: nature.com/articles/s41586-0… Congratulations to the authors: @s_dohare @JFernandoHG @LanceLan3 @rahman_parash @rupammahmood
We finally have a version of our paper on loss of plasticity and continual backprop that is polished and submitted to a journal. Good work led by my PhD student Shibhansh Dohare. arxiv.org/abs/2306.13812v2
5
25
194
16,831
Artificial Agency is led by my former students and colleagues---people I know well. They are the best in the world at using reinforcement learning and foundation models to create complex, life-like, and purposive agents.
Artificial Agency raises $16M to use AI to make NPCs feel more realistic in video games tcrn.ch/4d6zMEO
4
12
194
29,395
It has become commonplace to speak of the “existential risk” of AI. Recently even top AI scientists have begun to talk this way. I, for one, find it an unhelpful. So, without controversy, we can note: 1. AI scientists disagree about whether or not “existential risk of AI” is a good way to think 2. the issue is emotionally charged 3. the issue directly impacts the public perception of AI research 4. serious discussion of the issue is rare among AI scientists I want to particularly note the incongruity between the first three points and the fourth. And yet the fourth point is clearly true. Instead of reasoned discussions, we have surveys of AI scientists’ opinions and public letters calling for regulation. There are books on the subject, but in my reading they too lack a serious discussion of whether or not “existential risk” is a good way to think about AI.
15
31
185
62,748
AIs can serve us as tools, but eventually, when they are sufficiently advanced, it may become immoral to keep them subservient. What is a practical criterion for deciding when an AI should be set free?
58
17
177
Yesterday there was a completely-student-organized summit of the RLAI (Reinforcement Learning and Artificial Intelligence) research group at the University of Alberta, held at the lovely Amii headquarters. Nice folks and diverse new ideas!
7
18
176
11,832
I call it the Prize. The Prize is a great and glorious goal! Ambitious AI researchers should keep their Eyes on the Prize.
21
5
171
Neural networks already seemed old to me in 1978: “A common way to develop general theories of the brain is to theorize about the neuron as the fundamental building block. Frequently the neuron is modeled as an input-summing threshold device and learning is proposed to reside in the connections with other such elements. The question has always been how to change the efficacy of the connections as a function of past experience so that the network of neurons has brain-like learning properties.”
2
17
175
13,297
A video of my talk on the Alberta Plan for AI Research is now available: incompleteideas.net/Talks/Ta…
25
162
Honoring Your Thoughts To write is to begin to think. To write in a special place ---a book such as this--- is to honor your thoughts and to help them build, one upon the other.
11
161
Geordie Rose taking the bullet of explaining away consciousness for those who think it is a special thing. Good solid work, but not very rewarding. I salute you.
9
11
150
24,516
It will be the greatest intellectual achievement of all time. An achievement of science, of engineering, and of the humanities, whose significance is beyond humanity, beyond life, beyond good and bad.
15
11
149
Andy Barto gave a great talk, and Ida is doing a great job relaying it!
A thread on the history of RL/ML based on Andy Barto's talk #RLC2024: the Reinforcement Learning Conference. Beyond seeing friends & giving talks/panel, talking to @RichardSSutton & hearing Andy Barto revived a need for attention to historical psych/neuro influences on AI. 1/n🧵
4
16
147
15,193
In the end, Amii's AI week was awesome. #aiweek2022 So much science. So much industry. So much education. So much fun.
1
14
147