For the last few months I've been working on a from-scratch implementation of AlphaGo, a 2016 AI breakthrough that inspired me to get into deep learning. My casual understanding of AlphaGo was "search-augmented deep neural networks trained with self-play", but I wanted to go deeper and understand it by creating it. Frontier deep learning research has always been expensive, but any given capability gets cheaper very quickly. In 2026, you no longer need DeepMind's resources to train a strong Go AI - you can vibe code all of it yourself for just a few thousand dollars of rented compute. It was a huge honor to be invited to teach this with @dwarkesh_sp on @dwarkeshpodcast I am an AlphaGo & Go apprentice, not a master, so all factual errors in the podcast are mine. Web version of tutorial: evjang.com/2026/04/28/autogo… Code: github.com/ericjang/autogo Play the go bot here: autogo.evjang.com/
New blackboard lecture w @ericjang11 He walks through how to build AlphaGo from scratch, but with modern AI tools. Sometimes you understand the future better by stepping backward. AlphaGo is still the cleanest worked example of the primitives of intelligence: search, learning from experience, and self-play. You have to go back to 2017 to get insight into how the more general AIs of the future might learn. Once he explained how AlphaGo works, it gave us the context to have a discussion about how RL works in LLMs and how it could work better – naive policy gradient RL has to figure out which of the 100k+ tokens in your trajectory actually got you the right answer, while AlphaGo’s MCTS suggests a strictly better action every single move, giving you a training target that sidesteps the credit assignment problem. The way humans learn is surely closer to the second. Eric also kickstarted an Autoresearch loop on his project. And it was very interesting to discuss which parts of AI research LLMs can already automate pretty well (implementing and running experiments, optimizing hyperparameters) and which they still struggle with (choosing the right question to investigate next, escaping research dead ends). Informative to all the recent discussion about when we should expect an intelligence explosion, and what it would look like from the inside. Timestamps: 0:00:00 – Basics of Go 0:08:06 – Monte Carlo Tree Search 0:31:53 – What the neural network does 1:00:22 – Self-play 1:25:27 – Alternative RL approaches 1:45:36 – Why doesn’t MCTS work for LLMs 2:00:58 – Off-policy training 2:11:51 – RL is even more information inefficient than you thought 2:22:05 – Automated AI researchers
50
184
2,450
543,717
Looking forward to getting vaccinated every 4 nanoseconds
Execute this code to debug COVID.
41
576
6,562
The opening sentence goes so hard. This paper was 10 years ahead of its time.
34
364
5,030
314,849
pain
59
140
4,503
I look forward to a day where an artificial neural network can look at this sign and tell me if I can park here
68
183
2,300
Why is the market selling off Nvidia / compute stocks? The correct reaction to R1 breakthrough should be to buy *even more* compute.
276
74
1,857
239,252
Progress on NEO’s AI has been really fast of late. Here are some early clips of a generalist model we’re developing at @1x_tech. The following clips are 100% autonomous, running on a single set of neural network weights. First, a quiet little robot that picks up leaves and puts them in a bag.
96
236
1,790
298,330
Every once in awhile a paper comes out that makes you breathe a sigh of relief that you don't publish in that field... arxiv.org/pdf/2003.08505.pdf "Our results show that when hyperparameters are properly tuned via cross-validation, most methods perform similarly to one another"
31
415
1,610
We're inviting the first set of users to pre-order and experience NEO. This is a product that is early for its time. Some features are still in active development & polish. There will be mistakes. We will quickly learn from them, and use your early feedback to improve NEO for broad adoption in every home. Here's what we'll do: 1. ship a safe, general-purpose home robot that fulfills simple chores around the home. In the Chores feature, NEO will be autonomous + call for human assistance when NEO can't do a task, like how Waymo is operated & supervised. 2. In Fully Autonomous mode, you have an embodied AI assistant that you can talk to and interact with. You can also ask it to attempt best effort autonomy where NEO chains simple autonomy primitives (walk -> pick -> place -> walk) and executes them with the onboard Redwood model. 3. The more users use NEO (with data sharing enabled), the more we spin the data flywheel. This not only improves capability, but also safety and EQ. 4. The more autonomous NEO gets, the more we can expand our coverage of the service. Once NEOs are quite capable at a broad set of tasks around many different homes, we think they will be useful in other domains: hospitality, logistics, helping around farms, picking up litter off the streets. In the coming weeks we'll be sharing an AI + Autonomy update that represents a signfiicant step towards this goal. Stay tuned!
NEO The Home Robot Order Today
178
99
1,659
374,558
ChatGPT is cool but I'm most impressed by how OpenAI is able to simultaneously serve a multi-billion parameter model to all the new users trying it out right now (including those coming from front page of HN). Kudos to whoever worked on the inference infra there
21
55
1,427
Instead of finding the perfect prompt for an LLM (let's think step by step), you can ask LLMs to critique their outputs and immediately fix their own mistakes. Here's a fun example:
59
204
1,441
1,068,599
This talk by @karpathy piped.video/IHH47nZ7FZU has convinced me that Tesla is several years ahead of most CV labs in regards to pushing the limits of DL. Commonplace questions like "how do you do early stopping for a multi-task model?" are non-trivial when at scale.
18
250
1,372
Replying to @will__ye
congrats! looking forward to see what you'll do next monday
1
1,281
148,149
We've been dogfooding NEO Gamma in 1X employee homes for weeks now, doing chores around the house. Under the suit, NEO Gamma has a lot of HW improvements that make it more reliable. The 1X AI team also pushed *hard* to get natural human-like walking, sitting, and bending down to pick things off the ground. My conviction that the humanoid form factor is the *only* viable shape for serving labor in a home has never been higher.
Introducing NEO Gamma. Another step closer to home.
151
117
1,259
175,364
i would like to know where masterclass finds the rizz to pull such renowned educators to teach their online courses
30
47
1,240
283,118
So excited to finally share NEO publicly. In hard tech, the simplest things (ultrapure water, ultraflat mirrors) are ultra hard. We’ve made an ultra quiet robot that is ultra safe around humans. We’ll be sharing more progress on the AI side of things very soon 😎
Introducing NEO Beta. Designed for humans. Built for the home.
75
93
1,246
234,520
Whoever runs recruiting at @AnthropicAI is playing absolutely amazing 4d chess lately. Bravo. No notes.
16
25
1,217
148,361
My book, "AI is Good for You", is finally out! I've been working on this (slowly) for the last 3 years. It covers the last decade of progress in AI and 6 ingredients I think are important to build towards increasingly general AI systems. evjang.com/book/
76
130
1,006
218,159
Give me six months to work on a deep learning research project and I will spend the first four augmenting the data.
19
71
918
Bitch ass Optimus prime
49
15
922
237,635
I deeply regret my participation in the speculation of Q*
28
18
791
314,657
animals are beautiful machines
Fascinating footage of a kestrel hovering with its head held perfectly still as it hunts for prey.
14
65
796
I worry that investment into LLM/generative AI companies is way too exuberant right now, precisely for this reason. I estimate there are <200 people on the planet right now who know how to productively train 100B+ parameter models with startup resources (I am not one of them).
building a company on large ML models is humorously closer to biotech than traditional software - niche, shallow talent market - long iteration times - material capital requirements - likely need an incumbent partner for distribution
44
60
774
413,161
It's out! Supervised learning, empirically speaking, seems to be the best "data sponge" for acquiring generalization. What if we make generalization the first-class citizen in algorithmic design, and tailor everything else in service of it? evjang.com/2021/10/23/genera…
Been working on a blog post for the last couple weeks outlining a principle for building general AI systems with deep learning. I will probably bet on it for the rest of my career. Excited to share it early next week! 🧑‍🔬
19
136
790
Deep RL in a nutshell I said what I said
the dumbest way to solve a maze? simulate a gas of thousands of particles diffusing from the start point, until one particle reaches the exit. trace back the winning particle
11
49
703
robotics ML practitioner tip: when adding an extra sensor input to your model (e.g. tactile, more history, past image frames), train two baselines along with it (A) random noise (B) zeros instead of your new sensor's values but with the same architecture. If the random/zero baseline makes your model worse, or slows down convergence, it suggests that your sensor fusion architecture / init is suboptimal and may cancel out the benefit or outweigh the effect of the new sensor's information content
13
45
721
61,011
gotta hand it to them... two consecutive catches is an excellent proof of dynamic teleop quality. Not just impressive from mech + tendon miniaturization but also achieivng low latency teleop + camera calibration + retargeting
Got a new hand for Black Friday
19
35
682
70,812
I loved the part where Optimus (teleoperated) clenches its hands in effort as it struggles to come up with words. Speaking synced with gestures evokes a sense of life. I don't share the gripe that critics have about teleoperated demos. Proving that you *can* teleoperate co-speech gesture generation shows you the upper bound on what is possible, and this is a task that neural nets can probably do. Also, I have a lot of respect that TSLA is pivoting to a robotics company (taxis, humanoids, etc). Lots of people laugh at the move right now, but it takes guts to do this. What other car company *actually* risks their bottom line to be more than a car company?
A conversation between Tesla Optimus bot and a human is the best thing you’ll see on the internet today.
24
39
678
133,952
Here’s our latest software update on 1X AI Every behavior you see in this video is controlled from pixels to actions with a single neural net architecture. No teleop, no scripted replay or task specific code, no CGI, all in one continuous video shot
All Neural Networks. All Autonomous. All 1X speed
38
92
673
120,377
reading in between the lines, is Q* the fabled breakthrough in AlphaStar-style search + LLM that so many big labs are trying to get working? Many research projects in GPT-4 self-verification + search have not yielded really strong performance improvements, so I'd be quite surprised if it worked reuters.com/technology/sam-a…
30
42
651
370,688
Portable, Self-Contained Neuroprosthetic Hand with Deep Learning-Based Finger Control arxiv.org/pdf/2103.13452.pdf sites.google.com/view/jules-…
9
101
608
So which based engineer is gonna `git clone llama`, spend 2-3M USD, and open-source the weights under Apache 2.0 or MIT License? I will send you a box of Norwegian chocolates
30
24
600
178,179
I finally learned what a determinant was and wrote a blog post on it. Check out this 2-part tutorial on Normalizing Flows! blog.evjang.com/2018/01/nf1.… blog.evjang.com/2018/01/nf2.…
5
166
588
"There are exactly four normed division algebras: the real numbers (R), complex numbers (C), quaternions (H), and octonions (O). " I really love when @QuantaMagazine goes in depth on more obscure and under-appreciated fields. Real investigative journalism! quantamagazine.org/the-octon…
6
166
544
I find it extremely funny that continue.dev is also backed by ... @ycombinator
19
10
542
61,434
everybody talks big game about democratizing AI, but today I'm super grateful that @GoogleColab gives free TPU + GPU instances. Makes it possible for a lot of students to learn about ML without spending a few thousand dollars building a deep learning PC
14
42
544
77,136
average robotics company: 1. complain about ROS 2. rewrite ROS with a more performant in-house communication middleware 3. dies coincidence??
It's a right of passage
43
22
518
184,569
A potential disruptor to OpenAI / Anthropic / large-scale foundation model startups would be if ppl figured out a way to incrementally fork Stable Diffusion and train the models a bit further on single-GPU desktop machines, and then "merge" the distilled compute from other forks.
22
40
519
If you've ever wondered what "bits-per-pixel" actually means, why logistic distributions are awesome, how to improve training stability of normalizing flows and more... Check out this blog post on training likelihood models blog.evjang.com/2019/07/like…
5
121
528
Here is the sequel to "Just ask for Generalization" - in this blog post I argue that Generalization *is* Language, and suggest how we might be able to re-use Language Models as "generalization modules" for non-NLP domains. Check it out! evjang.com/2021/12/17/lang-g…
9
88
527
Really enjoyed this @karpathy interview. I think people disagree with his takes because he's slightly ahead of the curve and some of the ideas are not obvious yet. The framing of "tesla has software problem, waymo has hardware problem" is neat. piped.video/hM_h0UA7upI?si=juQp…
9
32
499
50,516
Is anyone else frustrated by the deluge of RL/ML papers that obfuscate what the paper is actually doing with a bunch of technical jargon and mathematical preliminaries? So tired of reading papers like this 😭 I wish academic writing/publishing conventions could be dismantled.
22
43
497
Collaborative artwork from r/place - each user places 1 pixel at a time every few minutes.
8
269
473
Here’s our latest RL update: Natural Mogging (thread below!)
Redwood AI | Mobility Reinforcement Learning
26
36
501
96,867
World Models combine two exciting fields in AI: video generation and robotics. We think that world models may soon unlock general purpose eval for robots. In September we announced the 1X World Model Challenge to move this area of research forward. 🧵 on some new updates:
8
60
508
63,628
Thrilled to be sharing an important milestone towards bringing humanoid robots to the home: 1X AI has trained Redwood, a VLA capable of end-to-end mobile manipulation tasks like retrieving objects for users, opening doors, and navigating around the home. Here’s some of its capabilities and more info in a thread.
Redwood NEO’s AI
23
39
501
72,009
I think we are <12 months away from an AI model making novel math discoveries for simple unproven conjectures and <24 months away from "rudimentary" self-improvement of LLMs (perhaps saturating after 2-3 iterations of self-improvement) The future is thrilling
See mathematician Michel van Garrel talking about how our latest Gemini Deep Think model was able to prove a conjecture using a very different approach than he was considering.
19
43
490
57,378
Over the last few months at @1x_tech we’ve been working on a learned simulator for general purpose robotics. Here’s a thread of some of the cool learned dynamics along with failure modes 1/n
15
35
484
108,269
Inspired by the @Tesla_Optimus video released yesterday, we made some videos of what our robots at @1x__tech can do! Thread: 1/4 The behavior you see here is controlled end-to-end from pixels->actions through a single neural net, at 1X speed
19
65
450
139,137
Few understand how difficult building this kind of humanoid is. Nice work!
Introducing Torso, a bimanual android actuated with artificial muscles.
12
16
454
36,675
Honored to be among the top espionage targets for Robotics 🥰
26
8
461
40,109
mistral's brand is already becoming one of my favorites in the AI space releases 87GB torrent containing 8x 7B MoE model via tweet, refuses to elaborate
magnet:?xt=urn:btih:5546272da9065eddeb6fcd7ffddeef5b75be79a7&dn=mixtral-8x7b-32kseqlen&tr=udp%3A%2F%2Fopentracker.i2p.rocks%3A6969%2Fannounce&tr=http%3A%2F%https://nitter.app/t.co/g0m9cEUz0T%3A80%2Fannounce RELEASE a6bbd9affe0c2725c1b7410d66833e24
5
29
452
64,144
IMHO, the Grokking paper is one of the most important breakthroughs of last year. It's the spiritual successor to "Understanding DL Requires Rethinking...". It has <5 citations on Google Scholar. Why aren't more people studying it? Are people not able to reproduce it or sth?
24
37
454
something cool coming very soon apple.com/mac/?focus=great-i…
26
26
459
88,376
"deep learning is hitting a wall 🤡" (would be curious to see the DALLE-2 output for the above quote)
7
28
438
New blog post! An introduction to "what is meta-learning" and a tutorial on implementing MAML in 50 lines of JAX. blog.evjang.com/2019/02/maml…
2
101
443
10M vested over 4 years: plausible for very strong hires If 10M annual TC, then I would believe this is a psyop by Google to bankrupt OpenAI by having their technical staff anchor to ludicrous comp
OpenAI offers to senior Google researchers could be worth $10 million. theinformation.com/articles/… By @jon_victor_
23
10
433
316,005
Robotic locomotion: challenges and opportunities
No way that cat will get through this!
5
56
429
A short thread about DeepMind's recent GATO paper. It trains a basic transformer on an impressive number of datasets
2
52
424
I was an Area Chair (AC) for NeurIPS 2022 and my batch had 10/12 papers voted as unanimous accepts by reviewers. I was asked by the SAC to recalibrate the reviews and reject a few more, as that was too many papers to be accepted.
18
38
425
Replying to @rabois
I think non-woke employees will also not appreciate having their jobs threatened for waiting in line for coffee.
10
3
341
It is a great honor to have 1X featured as the first episode of S3 season 2. Wouldn’t have been possible without @jasonjoyride + team and @radbackwards burning the midnight oil for the last month. Here is the story of the company - check it out!
Jason Carman
13
62
412
64,559
In 2023 I published a book on AI, robotics, and recipes for AGI. I did significant editing over the last year to improve the writing and accessibility and am pleased to announce a “v1.5” edition:
19
32
407
49,179
Many AI labs are training "automated scientists", i.e. agents that discover things about the world, propose and test hypotheses, read your tensorboard/wandb tea leaves, etc. But hindsight is 20/20 when training on the Internet - such agents can easily memorize all prior scientific discoveries (special relativity, etc) without learning the reasoning primitives that allow them to work that out on their own. An analogy is kind of like a modern school kid who asks calculators and ChatGPT for all their answers and loses the ability to think for themselves. Some reasoning definitely emerges through trying to compress all this data losslessly. One could argue that gradient descent *is* repeated hypothesis testing. Sufficiently powerful association is indistinguishable from reasoning, so at some scale the difference between reasoning and memorization is just semantics. But even if you can “just ask” the model to think harder, there is a practical question of where you get enough data to train your model to have these emergent skills. To improve that self-reflection and introspection ability, I think the next frontier in automated reasoning and hypothesis testing requires a lot of data on how *how* one updates prior beliefs, i.e. onpolicy updates. It’s not obvious to me how one scrapes a dataset of “belief updates” from the Internet. Collecting this starts to looks a lot like deep RL. Reasoning properly about how one should update one’s beliefs requires a some “continuity of the self”. That’s not to say that GPT7 needs to be self-aware, but it needs egocentric data from many many individual traces updating themselves. This requires either environments where one can throw compute at it to get onpolicy traces (certain mathematics problems, coding environments, zero-sum games like Go), or perhaps generating “fictitious individuals” from offline data, or something hardcore like evolving robots in the real world and using natural selection to weed out the brains that don’t reason well. I think scraping a chronological sequence of all scientific literature is an OK start, but it’s noisy because of selection bias with what is published (i.e. your agent would probably be very confident that any deep learning idea it implements will be state of the art).
20
40
402
56,969
New blog post on how Dijkstra's Algorithm (and shortest paths) shows up in unexpected places! blog.evjang.com/2018/08/dijk…
10
128
396
Oh, you're a "builder" / VC in the LLM / generative AI space? Name a good warmup schedule and optimizer for a 100B parameter transformer
25
7
401
128,319
*Slaps roof of DALL-E 2 This can fit so many NFT collections inside
7
17
384
Revoking visas of Chinese students studying in critical fields like AI and Robotics is incredibly short-sighted and harmful to America’s long term prosperity. We want the best from every country to work for team America
The U.S. will begin revoking visas of Chinese students, including those with connections to the Chinese Communist Party or studying in critical fields.
17
25
398
71,424
I wish @sequoia hadn't deleted web.archive.org/web/20221027… it was a good article that gave me insight into @SBF_FTX and Alameda's early days. More importantly, VCs should not be afraid to own their failures instead of sweeping them under the rug
18
37
373
caption this
57
26
373
kids, brush up on eigenvalues and spectral decomp. you can save the world with that stuff
6
45
372
Been working on a blog post for the last couple weeks outlining a principle for building general AI systems with deep learning. I will probably bet on it for the rest of my career. Excited to share it early next week! 🧑‍🔬
9
5
367
teleop is a central piece of infra for collecting high quality data. All robotics companies doing manipulation use it. If you're a hardcore game / vr / netcode engineer and would like to join the team that ships software enabling us to serve a large number of robots in diverse, unstructured environments, apply here! 1x.tech/open-positions/softw…
1x's NEO Humanoid has the LOWEST latency VR teleoperation I've ever seen! It matches nearly instantly!
13
20
365
55,496
The number of startups trying to sell data to general purpose + humanoid robots companies seems to exceeds the number of actual robot startups. This is ... concerning. Do the hard thing. Hard things filter out competition. Optimize for the world you want to see, not your IRR
15
15
361
45,062
Every single demo in this video blows my mind 🤯 4 years ago I could have never imagined this level of capability. Congrats to the Gemini team for training such an amazing model! piped.video/UIZAiXYceBI?si=5rlK…
11
28
345
72,882
1/n GPT-3 is very expensive to train, costing an estimated $5M (even when you know exactly what to do).
17
76
338
Very impressed by PaLM's ability to explain jokes. I wonder if it could invert the function - i.e. given explanations of a dual-use word, generate a punny joke
13
41
330
Enabled my FSD12.3.3 trial. It is likely the most impressive scale of end-to-end visuomotor policies deployed in the real world. Night driving works great. I am impressed by the confidence they have to onboard so many new users so quickly. Congratulations to the @Tesla_AI team!
4
19
344
36,248
In many areas of computer science (cryptography, NP complexity), verifying a solution is much easier than generating one. This blog post finds that LLMS (mostly GPT-4) may be capable of self-verifying its solutions. evjang.com/2023/03/26/self-r…
12
52
341
71,732
The real power move is to work in robotics, where nothing is reproducible, even by the authors 🤷‍♂️🤖
10
16
326
Incredible style pose transfer + generative models paper applied to dance videos. This is so cool, my fingers are trembling as I tweet this. piped.video/PCBTZh41Ris?t=2m13s Paper: arxiv.org/pdf/1808.07371.pdf
5
121
330
An engineer is a special kind of scientist that researches why their code doesn’t work
8
19
326
As always, @karpathy has incredible choice of words and a good take on what otherwise felt like a frustrating podcast to listen to: - Animal brains are more akin to "maturation" than "tabular-rasa learning" - LLMs as "summoning ghosts"
Finally had a chance to listen through this pod with Sutton, which was interesting and amusing. As background, Sutton's "The Bitter Lesson" has become a bit of biblical text in frontier LLM circles. Researchers routinely talk about and ask whether this or that approach or idea is sufficiently "bitter lesson pilled" (meaning arranged so that it benefits from added computation for free) as a proxy for whether it's going to work or worth even pursuing. The underlying assumption being that LLMs are of course highly "bitter lesson pilled" indeed, just look at LLM scaling laws where if you put compute on the x-axis, number go up and to the right. So it's amusing to see that Sutton, the author of the post, is not so sure that LLMs are "bitter lesson pilled" at all. They are trained on giant datasets of fundamentally human data, which is both 1) human generated and 2) finite. What do you do when you run out? How do you prevent a human bias? So there you have it, bitter lesson pilled LLM researchers taken down by the author of the bitter lesson - rough! In some sense, Dwarkesh (who represents the LLM researchers viewpoint in the pod) and Sutton are slightly speaking past each other because Sutton has a very different architecture in mind and LLMs break a lot of its principles. He calls himself a "classicist" and evokes the original concept of Alan Turing of building a "child machine" - a system capable of learning through experience by dynamically interacting with the world. There's no giant pretraining stage of imitating internet webpages. There's also no supervised finetuning, which he points out is absent in the animal kingdom (it's a subtle point but Sutton is right in the strong sense: animals may of course observe demonstrations, but their actions are not directly forced/"teleoperated" by other animals). Another important note he makes is that even if you just treat pretraining as an initialization of a prior before you finetune with reinforcement learning, Sutton sees the approach as tainted with human bias and fundamentally off course, a bit like when AlphaZero (which has never seen human games of Go) beats AlphaGo (which initializes from them). In Sutton's world view, all there is is an interaction with a world via reinforcement learning, where the reward functions are partially environment specific, but also intrinsically motivated, e.g. "fun", "curiosity", and related to the quality of the prediction in your world model. And the agent is always learning at test time by default, it's not trained once and then deployed thereafter. Overall, Sutton is a lot more interested in what we have common with the animal kingdom instead of what differentiates us. "If we understood a squirrel, we'd be almost done". As for my take... First, I should say that I think Sutton was a great guest for the pod and I like that the AI field maintains entropy of thought and that not everyone is exploiting the next local iteration LLMs. AI has gone through too many discrete transitions of the dominant approach to lose that. And I also think that his criticism of LLMs as not bitter lesson pilled is not inadequate. Frontier LLMs are now highly complex artifacts with a lot of humanness involved at all the stages - the foundation (the pretraining data) is all human text, the finetuning data is human and curated, the reinforcement learning environment mixture is tuned by human engineers. We do not in fact have an actual, single, clean, actually bitter lesson pilled, "turn the crank" algorithm that you could unleash upon the world and see it learn automatically from experience alone. Does such an algorithm even exist? Finding it would of course be a huge AI breakthrough. Two "example proofs" are commonly offered to argue that such a thing is possible. The first example is the success of AlphaZero learning to play Go completely from scratch with no human supervision whatsoever. But the game of Go is clearly such a simple, closed, environment that it's difficult to see the analogous formulation in the messiness of reality. I love Go, but algorithmically and categorically, it is essentially a harder version of tic tac toe. The second example is that of animals, like squirrels. And here, personally, I am also quite hesitant whether it's appropriate because animals arise by a very different computational process and via different constraints than what we have practically available to us in the industry. Animal brains are nowhere near the blank slate they appear to be at birth. First, a lot of what is commonly attributed to "learning" is imo a lot more "maturation". And second, even that which clearly is "learning" and not maturation is a lot more "finetuning" on top of something clearly powerful and preexisting. Example. A baby zebra is born and within a few dozen minutes it can run around the savannah and follow its mother. This is a highly complex sensory-motor task and there is no way in my mind that this is achieved from scratch, tabula rasa. The brains of animals and the billions of parameters within have a powerful initialization encoded in the ATCGs of their DNA, trained via the "outer loop" optimization in the course of evolution. If the baby zebra spasmed its muscles around at random as a reinforcement learning policy would have you do at initialization, it wouldn't get very far at all. Similarly, our AIs now also have neural networks with billions of parameters. These parameters need their own rich, high information density supervision signal. We are not going to re-run evolution. But we do have mountains of internet documents. Yes it is basically supervised learning that is ~absent in the animal kingdom. But it is a way to practically gather enough soft constraints over billions of parameters, to try to get to a point where you're not starting from scratch. TLDR: Pretraining is our crappy evolution. It is one candidate solution to the cold start problem, to be followed later by finetuning on tasks that look more correct, e.g. within the reinforcement learning framework, as state of the art frontier LLM labs now do pervasively. I still think it is worth to be inspired by animals. I think there are multiple powerful ideas that LLM agents are algorithmically missing that can still be adapted from animal intelligence. And I still think the bitter lesson is correct, but I see it more as something platonic to pursue, not necessarily to reach, in our real world and practically speaking. And I say both of these with double digit percent uncertainty and cheer the work of those who disagree, especially those a lot more ambitious bitter lesson wise. So that brings us to where we are. Stated plainly, today's frontier LLM research is not about building animals. It is about summoning ghosts. You can think of ghosts as a fundamentally different kind of point in the space of possible intelligences. They are muddled by humanity. Thoroughly engineered by it. They are these imperfect replicas, a kind of statistical distillation of humanity's documents with some sprinkle on top. They are not platonically bitter lesson pilled, but they are perhaps "practically" bitter lesson pilled, at least compared to a lot of what came before. It seems possibly to me that over time, we can further finetune our ghosts more and more in the direction of animals; That it's not so much a fundamental incompatibility but a matter of initialization in the intelligence space. But it's also quite possible that they diverge even further and end up permanently different, un-animal-like, but still incredibly helpful and properly world-altering. It's possible that ghosts:animals :: planes:birds. Anyway, in summary, overall and actionably, I think this pod is solid "real talk" from Sutton to the frontier LLM researchers, who might be gear shifted a little too much in the exploit mode. Probably we are still not sufficiently bitter lesson pilled and there is a very good chance of more powerful ideas and paradigms, other than exhaustive benchbuilding and benchmaxxing. And animals might be a good source of inspiration. Intrinsic motivation, fun, curiosity, empowerment, multi-agent self-play, culture. Use your imagination.
11
10
347
50,928
Browser tabs of an AI researcher
10
15
326
For students who wish they could attend NeurIPS but can't afford the travel cost ($ or time), here's a couple life hacks to get the same value for free:
2
46
328
How it started / How it's going 1xtech.medium.com/1x-raises-…
20
8
325
78,058
TensorFlow trick: if you want a custom gradient f'(x) + g'(x) for a layer f(x), just do f(x) + g(x) - tf.stop_gradient(g(x)) and let autodiff do the rest.
6
80
324
Simply augmenting the data often yields bigger perf gains than tweaking the model. We formalize "meta-augmentation" and show that you can apply it to pretty much any meta-learning problem and any meta-learner. arxiv.org/abs/2007.05549 with Janarthanan Rajendran, @AlexIrpan
5
70
321
Two robot arms move at the same speed, driven by different actuators with the same mass. The first arm collides with a table with a gentle tap. The second arm collides with the table, destroying both arm and table. Read this blog post to see why! 🦾💥 evjang.com/2024/08/31/motors…
9
50
325
74,771
Now you ask GPT-4 if it met the assignment, at which point it apologizes and generates a valid non-rhyming poem! full marks
19
10
319
59,254
IIRC, This project began as mostly one researcher’s (Alex) interest at Google Brain. Nearly a decade later, it is a full fledged company that has made technology to transport scents digitally. A role model of patient research and dogged pursuit of a mission!
Well, we actually did it. We digitized scent. A fresh summer plum was the first fruit and scent to be fully digitized and reprinted with no human intervention. It smells great. Holy moly, I’m still processing the magnitude of what we’ve done. And yet, it feels like as we cross this finish line we are instantly at a new starting line. I’ll have more to share about what’s in store that we’re building on top of this. A huge HUGE congrats to the entire team across scientific, engineering, operational, and creative disciplines. It takes a village named Osmo to do this. I don’t know if this is embarrassing, but I carry the plum scent with me a lot of places and smell it constantly. It makes me smile. I’m curious, if y’all want to smell it? If we made a limited release fragrance of the first teleported scent and dedicated the proceeds to science, would you want it?
7
13
313
45,111
New blog post on Expressivity, Trainability, Generalization problems in Machine Learning, and why RL is still very unsolved! blog.evjang.com/2017/11/exp-…
4
97
315
Probably nothing
A different kind of unboxing is about to happen. Tomorrow.
17
10
307
31,588
Anything that can be implemented in JAX *will* be implemented in JAX. Here's a differentiable path tracer (and a tutorial!) Blog Post: blog.evjang.com/2019/11/jaxp… Code: github.com/ericjang/pt-jax
6
68
314