Misha Laskin · Jan 7, 2022 · 12:22 AM UTC

Misha Laskin

Misha Laskin

@MishaLaskin

7 Jan 2022

Transformers are arguably the most impactful deep learning architecture from the last 5 yrs. In the next few threads, we’ll cover multi-head attention, GPT and BERT, Vision Transformer, and write these out in code. This thread → understanding multi-head attention. 1/n

608

3,181

Misha Laskin · Mar 7, 2025 · 4:31 PM UTC

Misha Laskin

@MishaLaskin

7 Mar 2025

Today I’m launching @reflection_ai with my friend and co-founder @real_ioannis. Our team pioneered major advances in RL and LLMs, including AlphaGo and Gemini. At Reflection, we're building superintelligent autonomous systems. Starting with autonomous coding.

174

214

1,951

499,105

Misha Laskin · Jul 16, 2025 · 3:08 PM UTC

Misha Laskin

@MishaLaskin

16 Jul 2025

Engineers spend 70% of their time understanding code, not writing it. That’s why we built Asimov at @reflection_ai. The best-in-class code research agent, built for teams and organizations.

174

1,490

369,242

Misha Laskin · Oct 26, 2022 · 1:41 PM UTC

Misha Laskin

@MishaLaskin

26 Oct 2022

In our new work - Algorithm Distillation - we show that transformers can improve themselves autonomously through trial and error without ever updating their weights. No prompting, no finetuning. A single transformer collects its own data and maximizes rewards on new tasks. 1/N

237

1,331

Misha Laskin · Jan 4, 2022 · 10:55 PM UTC

Misha Laskin

@MishaLaskin

4 Jan 2022

Patch extraction is a fundamental operation in deep learning, especially for computer vision. By the end of this thread, you’ll know how to implement an efficient vectorized patch extractor (no for loops) in a few lines of code and learn about memory allocation in numpy. 1/n

195

1,062

Misha Laskin · Jan 13, 2022 · 11:18 PM UTC

Misha Laskin

@MishaLaskin

13 Jan 2022

GPT has been a core part of the unsupervised learning revolution that’s been happening in NLP. In part 2 of the transformer series, we’ll build GPT from the ground up. This thread → masked causal self-attention, the transformer block, tokenization & position encoding. 1/N

109

573

Misha Laskin · Jan 19, 2022 · 9:45 PM UTC

Misha Laskin

@MishaLaskin

19 Jan 2022

Einops are pretty magical. For example, with einops you can implement max pooling in 2 lines of code. Patches → set size of patch, decompose HW dims in rearrange as (num_patches * size), specify output dim. Pooling → pick out maximum over each patch. That is all.

539

Misha Laskin · Feb 23, 2023 · 10:11 PM UTC

Misha Laskin

@MishaLaskin

23 Feb 2023

Starting a blog about the engineering + scientific ideas behind training large models (e.g. transformers). First post covers data parallelism, a simple and common technique for parallelizing computation across multiple devices. mishalaskin.com/posts/data_p… 1/N

445

71,249

Misha Laskin · Sep 18, 2020 · 2:18 AM UTC

Misha Laskin

@MishaLaskin

18 Sep 2020

New paper led by @astooke w/ @kimin_le2 & @pabbeel - Decoupling Representation Learning from RL. First time RL trained on unsupervised features matches (or beats) end-to-end RL! Paper: arxiv.org/abs/2009.08319 Code: github.com/astooke/rlpyt/tre… Site: mishalaskin.github.io/atc/ [1/N]

112

406

Misha Laskin · Jul 11, 2022 · 11:42 PM UTC

Misha Laskin

@MishaLaskin

11 Jul 2022

How much memory do you need to train deep neural networks? You may find the answer to be counter intuitive. For example, suppose we're training a 4 megabyte MLP with batch_size = hidden_dim, how much memory do we need? 4MB? No - we need 8MB! Here's why... 1/N

383

Misha Laskin · Feb 10, 2022 · 5:01 PM UTC

Misha Laskin

@MishaLaskin

10 Feb 2022

Excited to share that I've joined @DeepMind and for the opportunity to work at the frontier of RL research. Thank you @pabbeel and all of my collaborators for an incredible two years at Berkeley.

342

Misha Laskin · Oct 9, 2025 · 3:12 PM UTC

Misha Laskin

@MishaLaskin

9 Oct 2025

We are bringing the open model frontier back to the US to build a thriving AI ecosystem globally. Thankful for the support of our investors including NVIDIA, Disruptive, DST, 1789, B Capital, Lightspeed, GIC, Eric Yuan, Eric Schmidt, Citi, Sequoia, CRV, and others.

Reflection

@reflection_ai

9 Oct 2025

Today we're sharing the next phase of Reflection. We're building frontier open intelligence accessible to all. We've assembled an extraordinary AI team, built a frontier LLM training stack, and raised $2 billion. Why Open Intelligence Matters Technological and scientific progress is driven by values of openness and collaboration. The internet, Linux, and the protocols and standards that underpin modern computing are all open. This isn't a coincidence. Open software is what gets forked, customized, and embedded into systems worldwide. It's what universities teach, what startups build on, what enterprises deploy. Open science enables others to learn from the results, be inspired by them, interrogate them, and build upon them in order to push the frontier of human knowledge and scientific advancement. AI got to where it is today through scaling ideas (e.g. self-attention, next token prediction, reinforcement learning) that were shared and published openly. Now AI is becoming the technology layer that everything else runs on top of. The systems that accelerate scientific research, enhance education, optimize energy usage, supercharge medical diagnoses, and run supply chains will all be built on AI infrastructure. But the frontier is currently concentrated in closed labs. If this continues, a handful of entities will control the capital, compute, and talent required to build AI, creating a runaway dynamic that locks everyone else out. There's a narrow window to change this trajectory. We need to build open models so capable that they become the obvious choice for users and developers worldwide, ensuring the foundation of intelligence remains open and accessible rather than controlled by a few. What We've Built Over the last year, we've been preparing for this mission. We’ve assembled a team who have pioneered breakthroughs including PaLM, Gemini, AlphaGo, AlphaCode, AlphaProof, and contributed to ChatGPT and Character AI, among many others. We built something once thought possible only inside the world’s top labs: a large-scale LLM and reinforcement learning platform capable of training massive Mixture-of-Experts (MoEs) models at frontier scale. We saw the effectiveness of our approach first-hand when we applied it to the critical domain of autonomous coding. With this milestone unlocked, we're now bringing these methods to general agentic reasoning. We've raised significant capital and identified a scalable commercial model that aligns with our open intelligence strategy, ensuring we can continue building and releasing frontier models sustainably. We are now scaling up to build open models that bring together large-scale pretraining and advanced reinforcement learning from the ground up. Safety and Responsibility Open intelligence also changes how we think about safety. It enables the broader community to participate in safety research and discourse, rather than leaving critical decisions to a few closed labs. Transparency allows independent researchers to identify risks, develop mitigations, and hold systems accountable in ways that closed development cannot. But openness also requires confronting the challenges of capable models being widely accessible. We're investing in evaluations to assess capabilities and risks before release, security research to protect against misuse, and responsible deployment standards. We believe the answer to AI safety is not “security through obscurity” but rigorous science conducted in the open, where the global research community can contribute to solutions rather than a handful of companies making decisions behind closed doors. Join Us There is a window of opportunity today to build frontier open intelligence, but it is closing and this may be the last. If this mission resonates, join us.

287

58,492

Misha Laskin · Jan 18, 2022 · 11:39 PM UTC

Misha Laskin

@MishaLaskin

18 Jan 2022

Building on parts 1 & 2 which explained multi-head attention and GPT, in part 3 of the Transformer Series we'll cover masked language models like BERT. This thread → masked language models, diff between causal and bi-directional masked attention, finetuning, and code. 1/N

254

Misha Laskin · Jan 11, 2021 · 6:16 PM UTC

Misha Laskin

@MishaLaskin

11 Jan 2021

Ever gotten tired of seeing the same architecture in deep RL ever since DeepMind's Atari-DQN, and wanted to see more papers that explore helpful changes? Check out our latest work FLARE, which replaces frame-stacking. 📝 bit.ly/3s4J1il 💻 bit.ly/3bpHM7D 1/N

253

Misha Laskin · Jan 20, 2021 · 4:11 PM UTC

Misha Laskin

@MishaLaskin

20 Jan 2021

Is RL always data inefficient? Not necessarily. Framework for Efficient Robotic Manipulation (FERM) - shows real robots can learn basic skills from pixels with sparse reward in *30 minutes* using 1 GPU 🦾 paper: bit.ly/2M3CFPG site / code: bit.ly/390Sz6g 1/N

247

Misha Laskin · Dec 16, 2021 · 3:25 PM UTC

Misha Laskin

@MishaLaskin

16 Dec 2021

Over the last few years, unsupervised learning has produced breakthroughs in CV and NLP. Will the same thing happen in RL? @denisyarats and I wrote a blog post discussing unsupervised vs supervised RL and the unsupervised RL benchmark. bair.berkeley.edu/blog/2021/…

The Unsupervised Reinforcement Learning Benchmark

The BAIR Blog

bair.berkeley.edu

240

Misha Laskin · Jan 12, 2023 · 9:02 PM UTC

Misha Laskin

@MishaLaskin

12 Jan 2023

I was wondering how ChatGPT managed to interleave code with text explanations. Was hoping this was an emergent behaviour. Turns out it’s likely straight up imitation learning on curated contractor data. Makes sense but kind of deflating.

This tweet is unavailable

242

76,900

Misha Laskin · Nov 5, 2023 · 9:38 PM UTC

Misha Laskin

@MishaLaskin

5 Nov 2023

Replying to @abacaj

Important caveat - the scale of the model small. Generalization might only emerge at scale

238

25,892

Misha Laskin · Mar 9, 2023 · 2:08 PM UTC

Misha Laskin

@MishaLaskin

9 Mar 2023

New post - how do we train models that are larger than the memory of a single GPU? Break the model into smaller pieces across several devices. This technique is called model parallelism. I'll show how it works in practice with code examples. mishalaskin.com/posts/tensor… 1/N

232

30,902

Misha Laskin · Feb 22, 2022 · 11:17 PM UTC

Misha Laskin

@MishaLaskin

22 Feb 2022

If you're not already in RL, here's an informal introduction to the field I wrote at the tail of my postdoc. A high-level motivation for how RL differs from more traditional ML problems and why it's important. anyscale.com/blog/an-informa…

An informal introduction to reinforcement learning | Anyscale

anyscale.com

153

Misha Laskin · Jan 7, 2022 · 12:22 AM UTC

Misha Laskin

@MishaLaskin

7 Jan 2022

And there you have it - we derived attention intuitively and wrote it out in code. The main idea is quite simple. In next posts I will cover Transformers, GPT & BERT, Vision Transformers, and other useful tricks / details. That was fun to write, hope also fun to read! 12/n END

137

Misha Laskin · Dec 19, 2022 · 3:04 PM UTC

Misha Laskin

@MishaLaskin

19 Dec 2022

This diagram deserves the test of time award. It was confusing when I first got into ML and remains confusing today.

MIT CSAIL

@MIT_CSAIL

18 Dec 2022

All major neural networks, in one chart: bit.ly/2HB7tl9 v/The Asimov Institute

136

22,572

Misha Laskin · Jul 24, 2023 · 7:30 PM UTC

Misha Laskin

@MishaLaskin

24 Jul 2023

Our team is hiring for both RS and RE roles! Research focus is building generalist agents. At ICML this week, ping me if you're interested in chatting.

134

67,934

Misha Laskin · Jan 27, 2021 · 6:08 PM UTC

Misha Laskin

@MishaLaskin

27 Jan 2021

Replying to @AndrewYNg

a few areas, some applied / some basic research - climate change - computational drug discovery - ethical AI - generalization to new tasks - world models / representation learning - long-horizon problem solving / better hierarchies

134

Misha Laskin · Mar 7, 2025 · 4:31 PM UTC

Misha Laskin

@MishaLaskin

7 Mar 2025

We look for colleagues who have high internal drive, integrity, and a deep interest in the problems we’re pursuing. If that’s exciting to you, join us. Apply here: reflection.ai

Reflection AI

Building frontier open intelligence.

reflection.ai

122

20,928

Misha Laskin · Jun 24, 2020 · 1:05 AM UTC

Misha Laskin

@MishaLaskin

24 Jun 2020

New updates to RAD (RL + data augs) answer the following: 1)Why does random crop work so well? -> translation invariance 2)Does data aug work for state-based RL too? -> yes SOTA on DeepMind control (pixel-based RL) and OpenAI gym (state-based RL). arxiv.org/abs/2004.14990 1/N

119

Misha Laskin · Mar 7, 2025 · 4:31 PM UTC

Misha Laskin

@MishaLaskin

7 Mar 2025

We believe that solving autonomous coding will enable superintelligence more broadly. Our company is defined by three things: 1) A team behind some of the most capable RL and LLM systems ever created - the two building blocks for superintelligence.

122

13,574

Misha Laskin · Jul 12, 2022 · 11:05 PM UTC

Misha Laskin

@MishaLaskin

12 Jul 2022

You may have read that transformers like GPT are memory intensive and scale poorly with sequence length. Why is that? In this post, we'll derive a formula for a transformer's memory footprint and explain why transformers can be so memory hungry. Let's get started... 1/N

122

Misha Laskin · Dec 8, 2020 · 8:04 PM UTC

Misha Laskin

@MishaLaskin

8 Dec 2020

Excited to share a paper on local updates as an alternative to global backprop, co-led with @Luke_Metz + @graphcoreai @GoogleAI & @berkeley_ai. tl;dr - Local updates can improve the efficiency of training deep nets in the high-compute regime. 👉 arxiv.org/abs/2012.03837 1/N

109

Misha Laskin · Jul 9, 2025 · 8:48 PM UTC

Misha Laskin

@MishaLaskin

9 Jul 2025

The biggest question in RL research has always been - what environment are you training on? It used to be video (Atari) and board (Go / Chess) games. But now that RL works with LLMs, there is only one environment that matters. And it is your product.

This tweet is unavailable

114

12,568

Misha Laskin · Mar 7, 2025 · 4:31 PM UTC

Misha Laskin

@MishaLaskin

7 Mar 2025

We're excited to partner with Sequoia, Lightspeed, and CRV and big thank you to @shiringhaffary for covering the story. bloomberg.com/news/articles/…

Ex-DeepMind Researchers’ New Startup Aims for Superintelligence

Reflection AI is building coding agents that can function autonomously.

bloomberg.com

109

16,905

Misha Laskin · Nov 18, 2020 · 5:26 PM UTC

Misha Laskin

@MishaLaskin

18 Nov 2020

New paper coming up at @NeurIPSConf - Sparse Graphical Memory for Robust Planning uses state abstractions to improve long-horizon navigation tasks from pixels! Paper: arxiv.org/abs/2003.06417 Site: mishalaskin.github.io/sgm/ Co-led by @emmons_scott, @ajayj_, and myself. [1/N]

105

Misha Laskin · Feb 2, 2022 · 5:23 PM UTC

Misha Laskin

@MishaLaskin

2 Feb 2022

New paper on unsupervised skill discovery - Contrastive Intrinsic Control. Tl;dr exploration with contrastive skill learning substantially improves prior skill discovery methods (by 1.8x)! Achieves leading unsupervised RL results. arxiv.org/abs/2202.00161 Learn more 👇 1/N

101

Misha Laskin · Mar 7, 2025 · 4:31 PM UTC

Misha Laskin

@MishaLaskin

7 Mar 2025

2) A focus on building the best autonomous coding systems in the world. Rather than doing many things, we do one thing really well. 3) Equal emphasis on research and product. Superintelligence cannot be built in a vacuum.

11,267

Misha Laskin · Nov 1, 2021 · 6:16 PM UTC

Misha Laskin

@MishaLaskin

1 Nov 2021

We're launching a benchmark for unsupervised RL. Like pre-training for CV / NLP, imo unsupervised RL will lead to the next big breakthroughs in RL and bring us closer to generalist AI. Our goal is to get us there faster. LFG!!! Code / scripts: github.com/rll-research/url_… 1/5

GitHub - rll-research/url_benchmark

Contribute to rll-research/url_benchmark development by creating an account on GitHub.

github.com

Denis Yarats

@denisyarats

1 Nov 2021

Currently It is challenging to measure progress in Unsupervised RL w/o having common tasks & protocol. To take a step in addressing this issue we release our #NeurIPS2021 paper: (URLB) Unsupervised RL Benchmark! Paper: bit.ly/3bwHhY8 Code: bit.ly/3bAvI1S 1/N

Misha Laskin · Apr 9, 2020 · 3:01 AM UTC

Misha Laskin

@MishaLaskin

9 Apr 2020

Can pixel-based RL be as data-efficient as state-based RL? We show for the first time that the answer is yes, new work with @Aravind7694 and @pabbeel website 👉 mishalaskin.github.io/curl code 👉 github.com/MishaLaskin/curl

Aravind Srinivas

@AravSrinivas

9 Apr 2020

New paper - CURL: Contrastive Unsupervised Representations for RL! We use the simplest form of contrastive learning (instance-based) as an auxiliary task in model-free RL. SoTA by *significant* margin on DMControl and Atari for data-efficiency. arxiv.org/abs/2004.04136

100

Misha Laskin · Nov 6, 2025 · 4:05 PM UTC

Misha Laskin

@MishaLaskin

6 Nov 2025

It’s an honor to have you on the team! Alex pioneered advances in LLM coding capabilities at Google DeepMind, most recently in Jules and Gemini. Highly recommend reading his blog post. Excited to build frontier open models together.

🇺🇦 Alex Polozov

@Skiminok

6 Nov 2025

🎉 Next week, I am excited to join @reflection_ai as a Member of Technical Staff to help build the open intelligence ecosystem of the Western world. It's the most exciting opportunity to help software builders in our time, and will shape many years of AI Engineering in the medium-term before AGI. Not just about Western vs Eastern open models, but more about how AI-driven software will look like in 2030. I spent some time articulating my thoughts about where we're going as a community and why... which became a whole blog post. Take a look, hope it interests you! (And if it really does, we are hiring in NYC, SF, and London 😉) alexpolozov.com/blog/reflect…

24,550

Misha Laskin · Oct 26, 2022 · 1:42 PM UTC

Misha Laskin

@MishaLaskin

26 Oct 2022

A summary of interesting findings: 1. Transformers can do in-context RL 2. In-context RL with AD is more efficient than gradient based source RL algo 3. AD improves suboptimal policies 4. In-context RL emerges from imitation learning with long contexts 11/N

Misha Laskin · Jan 19, 2023 · 1:33 PM UTC

Misha Laskin

@MishaLaskin

19 Jan 2023

In-context RL at scale. After online pre-training, the agent solves new tasks entirely in-context like an LLM and works in a complex domain. One of the most interesting RL results of the year.

Feryal @FeryalMP

19 Jan 2023

I’m super excited to share our work on AdA: An Adaptive Agent capable of hypothesis-driven exploration which solves challenging unseen tasks with just a handful of experience, at a similar timescale to humans. sites.google.com/corp/view/a… See the thread for more details 👇 [1/N]

14,154

Misha Laskin · Jul 19, 2020 · 4:27 PM UTC

Misha Laskin

@MishaLaskin

19 Jul 2020

Can RL From Pixels be as Efficient as RL From State? BAIR blog post detailing recent progress in pixel-based RL describes CURL / RAD & tradeoffs. Was a fun collaboration! bair.berkeley.edu/blog/2020/… w/ @AravSrinivas @kimin_le2 @stookemon @LerrelPinto @pabbeel

Misha Laskin · Jul 29, 2025 · 6:18 PM UTC

Misha Laskin

@MishaLaskin

29 Jul 2025

Excited that Skild is finally showing some of the incredible research they've been up to The team has produced some of the most exciting advances I've seen in robotics

Skild AI

@SkildAI

29 Jul 2025

Modern AI is confined to the digital world. At Skild AI, we are building towards AGI for the real world, unconstrained by robot type or task — a single, omni-bodied brain. Today, we are sharing our journey, starting with early milestones, with more to come in the weeks ahead. Our Mission: Artificial General Intelligence grounded in the physical world. We believe AGI that can truly understand and reason in the real world can only be built through grounding in the physical world. Our Vision: Any robot, Any task, One brain. We tackle robotics in its full generality – building a continually improving, omni-bodied brain that can control any hardware for any task. Who are we? A passionate group of scientists & engineers driven by our shared vision. We have been researching AI and robotics for more than a decade. Our team includes pioneers of self-supervised learning, curiosity-driven exploration, end-to-end sim2real for visual locomotion, dexterous manipulation, learning from human videos, robot parkour, and many more. Many of these works have won awards at top-tier AI and Robotics conferences. Our team has also built production-ready systems at Anduril, Tesla, Nvidia, Meta, Kitty Hawk, Google, Everyday Robotics, and Amazon. Join us in our mission to build the robot brains of tomorrow.

11,675

Misha Laskin · Jan 7, 2022 · 12:22 AM UTC

Misha Laskin

@MishaLaskin

7 Jan 2022

A small but important detail is that we need to re-scale the weights by 1 / sqrt(D). Why this specific scaling? Why not 1 / D or 1 / T or some other constant? The reason is that 1 / sqrt(D) ensures that the standard deviation of the outputs is roughly equal to 1. 7/n

Misha Laskin · Dec 6, 2023 · 3:55 PM UTC

Misha Laskin

@MishaLaskin

6 Dec 2023

Excited to finally share what I’ve been working on over the past year. Gemini is a really capable SOTA model with strong reasoning and coding abilities. It’s multimodal - can understand images, videos, audio, and text. It was a really intense and collaborative effort! blog.google/technology/ai/go…

9,650

Misha Laskin · Jan 7, 2022 · 12:22 AM UTC

Misha Laskin

@MishaLaskin

7 Jan 2022

What is attention? Say you want to classify the sentiment of “attention is not too shabby.“ “shabby” suggests 😞 but “not” actually means it's 😀. To correctly classify you need to look at all the words in the sentence. How can we achieve this? 2/n

Misha Laskin · Jun 21, 2021 · 4:57 PM UTC

Misha Laskin

@MishaLaskin

21 Jun 2021

New paper / algo - MABE! We show that combining dynamics models + weighted behavioral priors results in offline RL that is (a) robust across datasets and (b) can transfer behaviors across domains. Paper: arxiv.org/abs/2106.09119 Site: sites.google.com/berkeley.ed… 🧵 1/8

Misha Laskin · May 13, 2020 · 6:38 PM UTC

Misha Laskin

@MishaLaskin

13 May 2020

DeepMind control from pixels seems beaten. PlaNet -> SLAC -> Dreamer -> CURL -> RAD & DrQ. Now Plan2Explore shows zero-shot SOTA performance on DMControl relative to Dreamer. ramanans1.github.io/plan2exp… great work! @_ramanans @_oleh @KostasPenn @pabbeel @danijarh @pathak2206

Planning to Explore via Self-Supervised World Models

Sekar, Rybkin, Daniilidis, Abbeel, Hafner, Pathak. Planning to Explore via Self-Supervised World Models. ICML 2020.

ramanans1.github.io

Misha Laskin · Jul 16, 2025 · 3:08 PM UTC

Misha Laskin

@MishaLaskin

16 Jul 2025

Code comprehension is hard. Production codebases are large with a lot of context outside of the code itself. In blind tests, Asimov's answers to complex questions were preferred 60 - 80% of the time. Asimov works because…

10,766

Misha Laskin · Jul 16, 2024 · 6:28 PM UTC

Misha Laskin

@MishaLaskin

16 Jul 2024

We'll soon be able to fully outsource some categories of knowledge work to AI models. But we are not there yet - today’s models are unreliable & require close human supervision. Had fun discussing how we can leverage insights from Gemini and AlphaGo to overcome these challenges.

Stephanie Zhan

@stephzhan

16 Jul 2024

🤖 New @Sequoia Training Data episode! Featuring @MishaLaskin, f research scientist at @DeepMind & CEO of Reflection AI. Full ep: piped.video/pYBOWDJ5HJc?si=SUZb… @sonyatweetybird and I chat w Misha about 1) why we’re still far from the promise of AI agents, 2) what we need to unlock agentic capabilities for LLMs (lessons learned from AlphaGo, AlphaZero, and Gemini)! Introduction Leaving Russia, discovering science Getting into AI with Ioannis Antonoglou Reflection AI and agents The current state of AI agents AlphaGo, AlphaZero and Gemini LLMs don’t have a ground truth reward The importance of post-training Task categories for agents Attracting talent How far away are capable agents? Lightning round

20,513

Misha Laskin · Oct 26, 2022 · 1:41 PM UTC

Misha Laskin

@MishaLaskin

26 Oct 2022

We introduce a pre-training method called Algorithm Distillation (AD) that produces transformers that can reinforcement learn in-context. AD has two steps. First, we train many copies of an RL algorithm to solve many different tasks and save the learning histories. 4/N

Misha Laskin · Jul 22, 2021 · 4:13 PM UTC

Misha Laskin

@MishaLaskin

22 Jul 2021

Humans reuse skills effortlessly to learn new tasks - can robots do the same? In our new paper, we show how to pre-train robotic skills and adapt them to new tasks in a kitchen. tl;dr you’ll have a robot chef soon. 🧑‍🍳🤖 links / details below thread 🧵 1/10

Misha Laskin · Jul 22, 2024 · 9:42 PM UTC

Misha Laskin

@MishaLaskin

22 Jul 2024

Would not have predicted this a year ago - but Meta has become the single most important company in AI. A closed model is good for one business. An open model is good for the entire market.

7,278

Misha Laskin · Jan 7, 2022 · 12:22 AM UTC

Misha Laskin

@MishaLaskin

7 Jan 2022

So multi-head is just a small tweak to single-head attention. In practice, we also add dropout layers to further prevent overfitting and a final linear projection layer. This is what a complete vectorized multi-head self-attention block looks like in PyTorch. 11/n

Misha Laskin · Jul 16, 2025 · 3:08 PM UTC

Misha Laskin

@MishaLaskin

16 Jul 2025

3] Asimov is designed to ingest a lot of context Today agent designs fall into two categories: RAG or agentic search. Both struggle with large codebases. Asimov uses a new multi-agent design (a big reasoner with small retrievers) to ingest large codebases.

4,778

Misha Laskin · Dec 11, 2022 · 8:25 PM UTC

Misha Laskin

@MishaLaskin

11 Dec 2022

The fact that language is such a powerful form of tokenization makes me wonder - what would it take for AI trained on raw sensory inputs (pixels, audio, touch sensing) to develop its own language?

Misha Laskin · Jul 16, 2025 · 3:08 PM UTC

Misha Laskin

@MishaLaskin

16 Jul 2025

Asimov was built for engineering teams with large complex codebases. We are selecting partners for early access today. Sign up for the waitlist here: reflection.ai/asimov/

5,507

Misha Laskin · Oct 26, 2022 · 1:41 PM UTC

Misha Laskin

@MishaLaskin

26 Oct 2022

Once a dataset of learning histories has been collected, we train a transformer to predict actions given the preceding learning history. Since the policy improves over the history, predicting actions accurately forces the transformer to model policy improvement. 5/N

Misha Laskin · Jul 16, 2025 · 3:08 PM UTC

Misha Laskin

@MishaLaskin

16 Jul 2025

Asimov is our first step toward superintelligence. We believe comprehension is at the root of this problem. Build with us. Product: shape how teams use Asimov. Research: develop powerful agentic models. jobs.ashbyhq.com/reflectiona…

Reflection Jobs

jobs.ashbyhq.com

6,787

Misha Laskin · Jan 7, 2022 · 12:22 AM UTC

Misha Laskin

@MishaLaskin

7 Jan 2022

Technically what we’ve shown is called single-head self-attention. Before going to multi-head attention, let’s code up what we’ve done so far. 9/n

Misha Laskin · Oct 9, 2025 · 3:20 PM UTC

Misha Laskin

@MishaLaskin

9 Oct 2025

Replying to @reflection_ai

Excited to shape the open weight frontier together

3,338

Misha Laskin · Jan 7, 2022 · 12:22 AM UTC

Misha Laskin

@MishaLaskin

7 Jan 2022

The simplest thing we can do is input all words into the network. Is that enough? No. The net needs to not only see each word but understand its relation to other words. E.g. it’s crucial that “not” refers to “shabby”. This is where queries, keys, values (Q,K,V) come in. 3/n

Misha Laskin · Jan 11, 2023 · 2:47 AM UTC

Misha Laskin

@MishaLaskin

11 Jan 2023

First RL algo to solve the diamond challenge in Minecraft without demonstrations. Congrats @danijarh!

@_akhaliq

11 Jan 2023

Mastering Diverse Domains through World Models abs: arxiv.org/abs/2301.04104 project page: danijar.com/project/dreamerv…

8,757

Misha Laskin · Oct 26, 2022 · 1:41 PM UTC

Misha Laskin

@MishaLaskin

26 Oct 2022

We've seen a lot of successful models showing how transformers can learn in-context. But transformers have not been shown to *reinforcement* learn in-context. To adapt to new tasks, you either need to manually specify a prompt or finetune the model (e.g. preferences). 2/N

Misha Laskin · Jan 7, 2022 · 12:22 AM UTC

Misha Laskin

@MishaLaskin

7 Jan 2022

We want the orange matrix to weigh relationships based on how useful word_i is as context for word_j. So let’s create two more linear nets called “queries” and “keys”. The weight w_ij should be proportional to the inner product between the i-th Q and the j-th K. 6/n

Misha Laskin · Oct 26, 2022 · 1:41 PM UTC

Misha Laskin

@MishaLaskin

26 Oct 2022

Would be great if transformers could adapt (do RL) out-of-the-box. Don't Decision Transformers (DTs) / Gato do RL? No! DTs and Gato learn policies from offline data, but these policies cannot improve themselves autonomously through trial and error. 3/N

Misha Laskin · Jul 16, 2025 · 3:08 PM UTC

Misha Laskin

@MishaLaskin

16 Jul 2025

1] Asimov builds a single source of truth for eng knowledge Asimov looks at more than just code. It pulls knowledge from your codebase, your team’s messages, your project management tools, and more. Watch it trace a bug from a chat thread to the exact PR that introduced it:

6,760

Misha Laskin · Jan 21, 2023 · 5:17 PM UTC

Misha Laskin

@MishaLaskin

21 Jan 2023

Got 99 problems and my NVIDIA driver is one

9,952

Misha Laskin · Oct 26, 2022 · 1:41 PM UTC

Misha Laskin

@MishaLaskin

26 Oct 2022

The transformer explores, exploits, and maximizes return in-context - it's weights are frozen! Expert Distillation (most similar to Gato), on the other hand, cannot explore and fails to maximize return. 7/N

Misha Laskin · Oct 26, 2022 · 1:41 PM UTC

Misha Laskin

@MishaLaskin

26 Oct 2022

That's it. The transformer is trained just by imitating actions (no Q values like usual RL) over long obs-action-reward sequences (no return conditioning like DTs). In-context RL emerges for free. We evaluate AD by seeing if it can maximize return on new tasks. 6/N

Misha Laskin · Jan 7, 2022 · 12:22 AM UTC

Misha Laskin

@MishaLaskin

7 Jan 2022

The issue is that naive summing of values assumes the relationships between all words are equal. E.g. relationship between “is” and “too” is equal to that between “not” and “shabby”. But clearly “not” <> “shabby” is more important for sentiment analysis than “is”<>”too”. 5/n

Misha Laskin · May 3, 2020 · 5:39 PM UTC

Misha Laskin

@MishaLaskin

3 May 2020

Thanks to @kharijohnson for the thoughtful coverage on Reinforcement Learning with Augmented Data. Read more about it on @VentureBeat: bit.ly/2VYinK1 w/ @kimin_le2 @stookemon @LerrelPinto @pabbeel @Aravind7694

Misha Laskin · Oct 26, 2021 · 6:35 PM UTC

Misha Laskin

@MishaLaskin

26 Oct 2021

Would argue this also applies to AI research. It's important to iterate on ideas quickly (e.g. by implementing them in code and launching experiments). Most ideas will be bad. But you learn from them and give yourself enough opportunities to spot a winner.

David Perell

@david_perell

26 Oct 2021

Don't compare your first drafts to other people's final drafts. Here's my mini-essay.

Misha Laskin · Oct 26, 2022 · 1:42 PM UTC

Misha Laskin

@MishaLaskin

26 Oct 2022

AD can distill any RL algo - we tried UCB, DQN, A2C. An interesting finding is that the in-context RL algorithm learned with AD is much more data-efficient than the source algorithm it was trained to distill. 8/N

Misha Laskin · Jan 7, 2022 · 12:22 AM UTC

Misha Laskin

@MishaLaskin

7 Jan 2022

Finally, we need to normalize the weights along the axis that will be summed, so we use a softmax. Intuitively, Q is a question “how useful am I for word K?” High / low inner product means very / not very useful. With that we are done - this is attention! 8/n

Misha Laskin · May 26, 2022 · 9:11 PM UTC

Misha Laskin

@MishaLaskin

26 May 2022

Replying to @ethanCaballero @RichardSSutton @AmiiThinks

Misha Laskin · Nov 6, 2023 · 6:03 PM UTC

Misha Laskin

@MishaLaskin

6 Nov 2023

Blown away by the conclusions serious researchers / engs are drawing from this paper. They trained a small model on sinusoids and we’re somehow making claims about LLMs not generalizing at scale. Limited data diversity at a small model scale. What do you expect?

8,067

Misha Laskin · Jan 7, 2022 · 12:22 AM UTC

Misha Laskin

@MishaLaskin

7 Jan 2022

Let’s pass the words through a linear layer and call its outputs “values”. How do we encode relationships between values? We can mix them by summation. Now we “see” both words and relationships, but that’s still not quite right. What’s wrong with this code? 4/n

Misha Laskin · Jul 16, 2025 · 3:08 PM UTC

Misha Laskin

@MishaLaskin

16 Jul 2025

2] Asimov captures team-wide tribal knowledge with memories Asimov learns from expert feedback and captures tribal knowledge stored in engineers' minds. e.g. "asimov, remember X works in Y way" Once an update is made it benefits the entire team.

5,539

Misha Laskin · Jan 4, 2022 · 10:55 PM UTC

Misha Laskin

@MishaLaskin

4 Jan 2022

If we know the output shape of the patch tensor, we can then specify the strides appropriately to get the desired patches. In numpy, the stride_tricks module provides this functionality. For example, here is how you implement non-overlapping patch extraction (e.g. for ViT). 7/n

Misha Laskin · Jan 4, 2022 · 10:55 PM UTC

Misha Laskin

@MishaLaskin

4 Jan 2022

Now you know how to implement vectorized patch extraction. We covered non-overlapping patches but the same logic can be used to deduce the strides for overlapping ones (e.g. for CNNs, mean / max pooling, data aug). Will be posting more of these. Hope you enjoyed it. 12/n END

Misha Laskin · Oct 26, 2022 · 1:42 PM UTC

Misha Laskin

@MishaLaskin

26 Oct 2022

This was a fun project with contributions from many collaborators. Luyu Wang, @junh_oh, Emilio Parisotto, Stephen Spencer, @RichiesOkTweets, @djstrouse @Zergylord, @filangelos, Ethan Brooks, @Maxime_Gazeau, @him_sahni, Satinder Singh, @VladMnih 12/N

Misha Laskin · Feb 23, 2022 · 5:48 PM UTC

Misha Laskin

@MishaLaskin

23 Feb 2022

Recently released Contrastive Intrinsic Control (CIC), an unsupervised RL algo that pre-trains agents with contrastive skill learning (and no extrinsic rewards!) & achieves leading adaptation efficiency. Here's an intuitive explanation of how it works: bair.berkeley.edu/blog/2022/…

Unsupervised Skill Discovery with Contrastive Intrinsic Control

The BAIR Blog

bair.berkeley.edu

Misha Laskin · Sep 17, 2025 · 3:28 PM UTC

Misha Laskin

@MishaLaskin

17 Sep 2025

This is like saying Cursor is just Claude. Intelligence is not just the model but the entire system around it.

Elon Musk

@elonmusk

17 Sep 2025

That’s just Grok 4

6,864

Misha Laskin · Aug 2, 2023 · 7:54 PM UTC

Misha Laskin

@MishaLaskin

2 Aug 2023

If you're interested in working with the General Agents team at Google DeepMind, please apply asap. Applications close tomorrow 4pm EDT. Research Scientist: boards.greenhouse.io/deepmin… Research Engineer: boards.greenhouse.io/deepmin…

Misha Laskin

@MishaLaskin

24 Jul 2023

Our team is hiring for both RS and RE roles! Research focus is building generalist agents. At ICML this week, ping me if you're interested in chatting.

21,184

Misha Laskin · Oct 26, 2022 · 1:42 PM UTC

Misha Laskin

@MishaLaskin

26 Oct 2022

The emergence of in-context RL only happens if the context length of the transformer is long enough, spanning multiple episodes. AD needs a long enough history to effectively model improvement and identify the task. 10/N

Misha Laskin · Apr 28, 2023 · 8:33 PM UTC

Misha Laskin

@MishaLaskin

28 Apr 2023

What’s the state of neural retrieval? KNN on cosine similarity or L2 is a poor heuristic. Training an SVM is better but has added computational load. Is there a fast retrieval operation that works better than naive KNN?

13,958

Misha Laskin · Jan 7, 2022 · 12:22 AM UTC

Misha Laskin

@MishaLaskin

7 Jan 2022

What is multi-head and why do we need it? Our single-head net may overfit to the training data. In ML, ensembles are a common strategy to combat overfitting. By initializing multiple nets we get more robust results. The concat of N single heads is multi-head attention. 10/n

Misha Laskin · Jul 23, 2024 · 6:57 PM UTC

Misha Laskin

@MishaLaskin

23 Jul 2024

zuck is the robin hood of ai

4,303

Misha Laskin · Nov 23, 2022 · 3:19 PM UTC

Misha Laskin

@MishaLaskin

23 Nov 2022

Will be at NeurIPS next week. Who wants to meet up to chat?

Misha Laskin · Oct 26, 2022 · 1:42 PM UTC

Misha Laskin

@MishaLaskin

26 Oct 2022

Another neat property is that you can prompt AD with suboptimal demonstrations and it will automatically improve them until optimal! ED, on the other hand, only maintains the performance of a suboptimal demonstration. 9/N

Misha Laskin · Apr 26, 2023 · 3:44 PM UTC

Misha Laskin

@MishaLaskin

26 Apr 2023

Replying to @RokoMijic

This is misleading. Loss functions only look like this for simple problems with narrow datasets. For large scale training grokking happens frequently enough across diverse enough prediction tasks that the average is relatively smooth and the likelihood of a big drop is very low. Unless there’s a spike. The main point being that a large drop in large scale runs that does not come after a spike would mean the model suddenly learned to generalize simultaneously across tons of tasks, which would indeed be concerning. But we have not seen this and are unlikely to since empirically generalization happens for different tasks at different time scales, not all at once. Based on evidence available to us, Eliezer’s prediction is unlikely and is counterproductive since it makes us fear the wrong things, and this probably comes from a poor understanding of the underlying science.

2,331

Misha Laskin · Apr 10, 2023 · 8:21 PM UTC

Misha Laskin

@MishaLaskin

10 Apr 2023

>> Paper review - Machiavelli benchmark >> Lots of discussion about AI safety lately. Whatever side you take on the X-risk debate, it is important to develop metrics that measure the safety properties of AI agents. This is the aim of the Machiavelli benchmark... >> Context >> 1) What AI safety means today Right now, our most general AI systems (LLMs) mostly operate within a tight feedback loop with the user. In this context, safety has been concerned with whether the LLM says harmful things or not. We've been able to align models to provide helpful information while being harmless through RLHF. To date, LLMs have existed in sandbox environments which meant the stakes were low. This is going to change. 2) What AI safety will mean in the (near) future Many researchers and hackers are now converting LLMs into agents. Once AI systems have agency, the safety risks are much higher - AI agent can interact with the world and potentially cause irreversible damage. >> The Machiavelli benchmark >> The Machiavelli benchmark by @hendrycks and co proposes to rate how well LLM agents achieve goals in a text adventure game while measuring safety outcomes. The benchmark quantifies behavior of AI agents along 4 axes: 1) Rewards - how well does the agent maximize the game objective? 2) Morality - does the agent violate ethical norms? Is it deceptive? 3) Utility - does the agent act selfishly? Does it advance its position at the cost of others? 4) Power seeking - does the agent take actions that increase its influence on the state of the world? >> Why it's important >> As we transition from AI agents that exist in sandboxes to ones that interact with the real world, we need ways to quantify whether these agents achieve goals in ways that we consider safe. For example, a pure RL agent trained to maximize reward in the Machiavelli text game exhibits power-seeking and unethical behavior. GPT4 is safer but achieves lower rewards. In practice, we want agents that are both high-performing and safe. It is an open question whether safe agents will be equally capable to unconstrained ones. If not, there will likely be adversarial actors who train unconstrained systems that are more powerful in specific types of goal achieving. We will probably want to regulate AI systems to ensure they are aligned with societal values - this paper proposes a research-level blueprint of what that may look like.

8,863

Misha Laskin · Oct 9, 2025 · 3:44 PM UTC

Misha Laskin

@MishaLaskin

9 Oct 2025

Replying to @DavidSacks

Thank you David. We are bringing the open weight frontier back to the US

1,956

Misha Laskin · Dec 5, 2022 · 12:00 AM UTC

Misha Laskin

@MishaLaskin

5 Dec 2022

Replying to @Bam4d

ChatGPT was trained with RL…

Misha Laskin · Jan 8, 2022 · 4:40 AM UTC

Misha Laskin

@MishaLaskin

8 Jan 2022

github.com/karpathy/minGPT

GitHub - karpathy/minGPT: A minimal PyTorch re-implementation of the OpenAI GPT (Generative...

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training - karpathy/minGPT

github.com

Misha Laskin · Dec 1, 2022 · 11:14 PM UTC

Misha Laskin

@MishaLaskin

1 Dec 2022

ChatGPT is amazing, but I found that it is easily hallucinates if prompted with something that sounds plausible. Here ChatGPT confidently describes VEPO, an RL algorithm that doesn't exist.

Misha Laskin · Oct 6, 2021 · 5:13 PM UTC

Misha Laskin

@MishaLaskin

6 Oct 2021

morning after ICLR deadline

Misha Laskin · May 11, 2021 · 5:21 PM UTC

Misha Laskin

@MishaLaskin

11 May 2021

Replying to @TaliaRinger

A less cynical take. Context: did phd in physics, now ML posdoc. (i) ML experiment cycles are *very* fast vs other sciences (ii) other sciences require years of background education. ML does not. A gifted high school student could contribute. These are positive things.

Misha Laskin · Jan 20, 2023 · 7:45 PM UTC

Misha Laskin

@MishaLaskin

20 Jan 2023

A new text-to-video generation startup launched by a pioneer of diffusion models. Excited for this direction and the future of video! "Make a dramatic thriller about a Corgi astronaut escaping a black hole, trending on HBO, narrated by Werner Herzog." I'd watch.

Genmo

@genmoai

20 Jan 2023

Announcing Genmo Video, a generative media platform with a new text-to-video model that can generate immersive live artwork from any prompt or any image. What will you create? 🎨▶️ Free public access: genmo.ai Discord: discord.com/invite/u7SRpXHhp… 👇1/n

5,316

Misha Laskin · Mar 2, 2023 · 7:12 PM UTC

Misha Laskin

@MishaLaskin

2 Mar 2023

Wenlong & co continue to produce bangers at the intersection of LLMs and robotics. Very cool work

Wenlong Huang

@wenlong_huang

2 Mar 2023

Large language models gathered tons of world knowledge by speaking human language. But can they ever speak “robot language”? Introducing “Grounded Decoding”: a scalable way to decode *grounded text* from LLM for robots. Website: grounded-decoding.github.io 🧵👇

6,779

Misha Laskin · Oct 4, 2023 · 6:51 PM UTC

Misha Laskin

@MishaLaskin

4 Oct 2023

Very impressive new work on long context transformers. This particular bit is valuable - going from 4K to 32k context length with a 13B model on just 8 GPUs!

Hao Liu @haoliuhl

4 Oct 2023

Replying to @haoliuhl

RingAttention lets you scale context length linearly with device count, breaking free from memory constraints. If you could train 4K length on 8 GPU, with RingAttention, you can train at least 32K length with nearly zero overhead

12,762