bottomless pit supervisor the only way out is up

Grandpa, what did you do with the 100 hours you had access to artificial superintelligence in the second week of June, 2026?
135
107
5,797
524,165
For the love of .. if I hear another one of these guys talking about booking travel
Aravind Srinivas explains why Perplexity is building a browser: it's the only way to create AI agents with enough control over multiple apps, especially on iOS. Their goal? Agents that can book travel, buy things, and act as a personalized assistant for those who can't afford a human EA. He adds that this is the long-term vision. "Anyone who's saying agents will work in 2025 should be skeptical."
130
58
2,860
396,295
Apparently the Navier Stokes proof from Deepmind is this week World will never be the same
101
94
2,322
474,146
The most amazing thing about Ray Kurzweil is how completely and sincerely unsurprised about all of this he is He's not like "to tell you the truth I didn't really believe this would happen either, wow" Instead he pulls up the same graph he's been using for 30 years Legend
54
94
2,225
235,211
Google is going to win at both AI and quantum. How does it have a 27 PE ratio? First to $10T market cap for sure
New breakthrough quantum algorithm published in @Nature today: Our Willow chip has achieved the first-ever verifiable quantum advantage. Willow ran the algorithm - which we’ve named Quantum Echoes - 13,000x faster than the best classical algorithm on one of the world's fastest supercomputers. This new algorithm can explain interactions between atoms in a molecule using nuclear magnetic resonance, paving a path towards potential future uses in drug discovery and materials science. And the result is verifiable, meaning its outcome can be repeated by other quantum computers or confirmed by experiments. This breakthrough is a significant step toward the first real-world application of quantum computing, and we're excited to see where it leads.
103
64
1,511
154,934
Also most likely he got wind of a release tomorrow. Betting on Claude 4
OPENAI ROADMAP UPDATE FOR GPT-4.5 and GPT-5: We want to do a better job of sharing our intended roadmap, and a much better job simplifying our product offerings. We want AI to “just work” for you; we realize how complicated our model and product offerings have gotten. We hate the model picker as much as you do and want to return to magic unified intelligence. We will next ship GPT-4.5, the model we called Orion internally, as our last non-chain-of-thought model. After that, a top goal for us is to unify o-series models and GPT-series models by creating systems that can use all our tools, know when to think for a long time or not, and generally be useful for a very wide range of tasks. In both ChatGPT and our API, we will release GPT-5 as a system that integrates a lot of our technology, including o3. We will no longer ship o3 as a standalone model. The free tier of ChatGPT will get unlimited chat access to GPT-5 at the standard intelligence setting (!!), subject to abuse thresholds. Plus subscribers will be able to run GPT-5 at a higher level of intelligence, and Pro subscribers will be able to run GPT-5 at an even higher level of intelligence. These models will incorporate voice, canvas, search, deep research, and more.
28
17
1,293
144,351
For Christ's sake
103
10
1,093
102,965
Replying to @mayankja1n
5
5
1,058
47,578
I just can't get over how small this couch is
45
20
1,057
79,212
You guys are all talking about an AI bubble Researchers in the labs are talking about endgames
75
54
906
79,909
Buying GOOG is an arbitrage opportunity
Replying to @GoogleDeepMind
Self-improvement 🔄 SIMA 2 can teach itself new skills, learning through trial-and-error, based on feedback from Gemini. Getting better the more it plays – without additional human input.
7
20
948
133,210
Gemini 3 will be a ChatGPT moment
78
19
818
87,063
It's over
21
31
771
40,193
Reminder Anthropic's line isn't straight by accident
AI Progress since GPT-3.5 OpenAI seems to be slowing down with GPT-5 Anthropic incredibly steady progress Google had it's breakthrough with Gemini 2.5 Pro
20
14
651
117,212
The obvious endgame (eg in next 2-3 years) is that Microsoft acquires OpenAI, Google acquires Anthropic, and Tesla acquires xAI. Only the large caps survive
Zuck's message to Meta staff post-earnings -- Even if we're wrong about our AI investments, we're not going to go bankrupt. (via @alexeheath) sources.news/p/chatgpts-quie…
57
41
614
79,628
Anthropic? The future Deepmind subsidiary?
GOOGLE AND ANTHROPIC ARE REPORTEDLY IN TALKS ON CLOUD DEAL WORTH TENS OF BILLIONS Anthropic is in discussions with Google $GOOGL about a deal that would provide the with additional computing power valued in the high tens of billions of dollars The plan, which has not been finalized, involves Google providing cloud computing services to Anthropic - Bloomberg Google stock is up 3% in after hours on the news 🟢🟢🟢
4
4
581
83,078
if it isn't painfully obvious, Google is going to win the ASI race
68
24
560
70,617
2 of these people work at OpenAI now Probably nothing
Replying to @sama
“ our research team did something unexpected and quite amazing and we think it will be very very worth the wait, but needs a bit longer. “ Care to elaborate ?
14
30
573
63,765
It what kind of stupid world does Nat Friedman report to Alex Wang One is a national treasure. The other is Alex Wang
META attempted to buy Ilya Sutskever's  Safe Superintelligence, and also attempted to hire him, according to reporting tonight by CNBC.
10
8
576
77,512
Replying to @RichardMCNgo
Really beautiful
4
547
4,010
Big Anthropic week fellas, buckle up
49
7
541
212,965
Sorry, if you don't have at least two monitors you aren't doing serious work I will die on this hill
working nights and weekends
86
7
518
52,482
In all seriousness, the reason this hits so hard it because it squares on the optimism at the core of Anthropic, and to be honest all the major labs. Apologies to Eliezer and co - your pessimism no longer sells. Let's fucking go
Keep thinking.
11
15
483
18,940
Replying to @RichardMCNgo
What if she's just beautiful and none of the other stuff
2
464
14,757
lol. Guy who couldn’t get AI to make PowerPoint slides to lead superintelligence team
Microsoft $MSFT said it is pursuing a more powerful form of AI called “superintelligence” Mustafa Suleyman, chief of the Microsoft AI group, will lead what the company is calling the MAI Superintelligence Team that will target hypothetical milestones that are even more ambitious than artificial general intelligence “If AGI is often seen as the point at which an AI can match human performance at all tasks, then superintelligence is when it can go far beyond that performance,” (Source Bloomberg)
8
15
460
47,088
Just remember, LLMs are a dead end. ARC-AGI 6 will prove it
I'm back at the top of ARC-AGI with my new program. I use @grok 4 and multi-agent collaboration with evolutionary test-time compute
31
12
441
49,190
“I was a mathematics major”
Replying to @boazbaraktcs
I agree, although I believe most matrix multiplies for AI are done in two dimensions. Anyway, I stand by my statement. Matmuls are an effective piece of mathematical machinery, but the mechanics of calculating them are headache-inducing. Indeed, it's exactly their computationally cumbersome nature that's propelling the data center boom. That point seems beyond argument.
6
6
420
47,485
1000/1000 people would have said this was AGi 10 years ago 2.5 is a new thing
Gemini 2.5 pro is the first model to ever get this image correct
9
11
407
18,443
Lfg boys
Introducing Claude Sonnet 4.5—the best coding model in the world. It's the strongest model for building complex agents. It's the best model at using computers. And it shows substantial gains on tests of reasoning and math.
13
4
402
38,730
Here for the Noam joins xAI arc
Replying to @amir
Noam is right
6
5
383
87,518
Am I the only one who finds this "personal superintelligence" thing very inauthentic? To me it seems like a thinly veiled and weak attempt to somehow shoehorn superintelligence into Meta's business model so Mark can claim all these investments are good for his business and not just his personal plaything to feel like he can be part of all of this. Superintelligence is needed for science and engineering and maybe for business strategy and operations - Jane and John Doe don't need superintelligent companions or even assistants (trustworthiness, authenticity, competence etc are much more important for that) This above all is why most top researchers see this mission as uncompelling and trite
62
8
361
52,527
Morning everyone It's Anthropic's turn Going to be a fun week
16
4
349
40,330
This must win an award for a tweet not aging well
So, all the models underperform humans on the new International Mathematical Olympiad questions, and Grok-4 is especially bad on it, even with best-of-n selection? Unbelievable!
15
8
342
14,515
FYI, this is what yall should be excited about
14
1
337
37,879
Dario-was-wrong-about-90%-of-coding-being-done-by-AI-boys in shambles
right now is the time where the takeoff looks the most rapid to insiders (we don’t program anymore we just yell at codex agents) but may look slow to everyone else as the general chatbot medium saturates
14
6
322
28,343
Fyi, if there is one thing that those Anthropic papers make clear, it's that we are going to end up back at RNNs
20
5
323
37,759
Am I doing this right
15
16
304
33,123
Word is Anthropic team at off-site this week trying to regroup and get back on feet after inference stumbles and launch delays. Dedicated to sticking the landing. Down but not out. Still bullish. Lfg
21
2
291
27,863
There is no trend that bothers me more in modern RL than thinking we need an RL environment for every damn task. This is obviously the wrong approach. We need agentic learners
28
9
288
25,525
$12B for a LORA function aye
Introducing Tinker: a flexible API for fine-tuning language models. Write training loops in Python on your laptop; we'll run them on distributed GPUs. Private beta starts today. We can't wait to see what researchers and developers build with cutting-edge open models! thinkingmachines.ai/tinker
14
8
281
33,450
Still confused
31
3
273
361,260
Anyone know who the guy on the right is
36
1
270
35,196
September 10, 2025 Is the day Anthropic takes the AI lead for good
33
5
268
42,326
Again, this will not give enterprises "repeatable answers". Prompts are never token-level identical, which is the only case for which this fix applies. What enterprises need is "semantic" determinism - when essentially same question is asked, get same response.
🧩 Thinking Machines’s mission is to make AI models predictable instead of inconsistent. For example, ask ChatGPT the same question a few times over, and you’re likely to get a wide range of answers. This has largely been accepted in the AI community as a fact — today’s AI models are considered to be non-deterministic systems— but Thinking Machines Lab sees this as a solvable problem. Thinking Machines published a solid blog yesterday and says, that AI randomness mostly comes from how GPU kernels get stitched and scheduled during inference, and that tighter control here can deliver deterministic inference. That would give enterprises repeatable answers and make reinforcement learning cleaner by cutting label noise from slightly different outputs. A GPU kernel is a tiny program that runs the math on the graphics processor, and when thousands run in parallel, small things like the order of adds, thread timing, atomic updates, or library choices can nudge numbers in different directions. Floating point math is not perfectly associative, so changing the reduction order or mixing precisions tweaks the last few bits, which then cascades through layers into a different token choice. The proposed fix is an orchestration layer that locks down kernels, seeds, algorithms, and execution graphs, pins math library versions, and enforces a repeatable schedule across runs and machines. That kind of guardrail usually trades a bit of throughput for stability, but it pays off when scientists or auditors need the exact same answer on rerun. The lab also points to using reinforcement learning to tailor models for businesses, which gets easier when the reward signal comes from consistent model outputs instead of a fuzzy mix. They plan frequent research posts and code releases under a series called Connectionism, and a first product aimed at researchers and startups is said to be coming soon. Given a $12B valuation, the big question is whether they can prove determinism at scale without hurting speed and cost. The view here is that kernel-level determinism is a practical, engineering-first path, and if they show identical outputs across nodes and GPUs with minimal slowdown, that would count as a real win for production LLMs. In July, Murati said Thinking Machines Lab will launch its first product within months for researchers and startups. What it is, and whether it boosts reproducibility, is unknown.
20
10
276
39,441
Might be time for OpenAI to dust off their charter
🚨 Gemini 3.0 Pro - ONE SHOTTED I asked it for windows web os as everyone asked me for it and the result is mind blowing , it even has python in terminal and we can play games and run code in it Google really cooked here , source code and prompt in comment box
10
7
279
48,654
The goal is not to RL the best coder The goal is not to RL the best chemist The goal is not to RL the best writer ... ... The goal is to RL the best student
22
9
265
36,590
Long time since I've been so excited about a paper @willccbb
🧠 How can we equip LLMs with memory that allows them to continually learn new things? In our new paper with @AIatMeta, we show how sparsely finetuning memory layers enables targeted updates for continual learning, w/ minimal interference with existing knowledge. While full finetuning and LoRA see drastic drops in held-out task performance (📉-89% FT, -71% LoRA on fact learning tasks), memory layers learn the same amount with far less forgetting (-11%). 🧵:
8
5
263
40,530
I love karpathy as much as the next guy but occurs me to he's not sufficiently bitter lessoned The argument you need to make for AGI taking another 10 years _HAS_ to be compute constraints. Anything else fails the bitter lesson test
My pleasure to come on Dwarkesh last week, I thought the questions and conversation were really good. I re-watched the pod just now too. First of all, yes I know, and I'm sorry that I speak so fast :). It's to my detriment because sometimes my speaking thread out-executes my thinking thread, so I think I botched a few explanations due to that, and sometimes I was also nervous that I'm going too much on a tangent or too deep into something relatively spurious. Anyway, a few notes/pointers: AGI timelines. My comments on AGI timelines looks to be the most trending part of the early response. This is the "decade of agents" is a reference to this earlier tweet nitter.app/karpathy/status/188254… Basically my AI timelines are about 5-10X pessimistic w.r.t. what you'll find in your neighborhood SF AI house party or on your twitter timeline, but still quite optimistic w.r.t. a rising tide of AI deniers and skeptics. The apparent conflict is not: imo we simultaneously 1) saw a huge amount of progress in recent years with LLMs while 2) there is still a lot of work remaining (grunt work, integration work, sensors and actuators to the physical world, societal work, safety and security work (jailbreaks, poisoning, etc.)) and also research to get done before we have an entity that you'd prefer to hire over a person for an arbitrary job in the world. I think that overall, 10 years should otherwise be a very bullish timeline for AGI, it's only in contrast to present hype that it doesn't feel that way. Animals vs Ghosts. My earlier writeup on Sutton's podcast nitter.app/karpathy/status/197343… . I am suspicious that there is a single simple algorithm you can let loose on the world and it learns everything from scratch. If someone builds such a thing, I will be wrong and it will be the most incredible breakthrough in AI. In my mind, animals are not an example of this at all - they are prepackaged with a ton of intelligence by evolution and the learning they do is quite minimal overall (example: Zebra at birth). Putting our engineering hats on, we're not going to redo evolution. But with LLMs we have stumbled by an alternative approach to "prepackage" a ton of intelligence in a neural network - not by evolution, but by predicting the next token over the internet. This approach leads to a different kind of entity in the intelligence space. Distinct from animals, more like ghosts or spirits. But we can (and should) make them more animal like over time and in some ways that's what a lot of frontier work is about. On RL. I've critiqued RL a few times already, e.g. nitter.app/karpathy/status/194443… . First, you're "sucking supervision through a straw", so I think the signal/flop is very bad. RL is also very noisy because a completion might have lots of errors that might get encourages (if you happen to stumble to the right answer), and conversely brilliant insight tokens that might get discouraged (if you happen to screw up later). Process supervision and LLM judges have issues too. I think we'll see alternative learning paradigms. I am long "agentic interaction" but short "reinforcement learning" nitter.app/karpathy/status/196080…. I've seen a number of papers pop up recently that are imo barking up the right tree along the lines of what I called "system prompt learning" nitter.app/karpathy/status/192136… , but I think there is also a gap between ideas on arxiv and actual, at scale implementation at an LLM frontier lab that works in a general way. I am overall quite optimistic that we'll see good progress on this dimension of remaining work quite soon, and e.g. I'd even say ChatGPT memory and so on are primordial deployed examples of new learning paradigms. Cognitive core. My earlier post on "cognitive core": nitter.app/karpathy/status/193862… , the idea of stripping down LLMs, of making it harder for them to memorize, or actively stripping away their memory, to make them better at generalization. Otherwise they lean too hard on what they've memorized. Humans can't memorize so easily, which now looks more like a feature than a bug by contrast. Maybe the inability to memorize is a kind of regularization. Also my post from a while back on how the trend in model size is "backwards" and why "the models have to first get larger before they can get smaller" nitter.app/karpathy/status/181403… Time travel to Yann LeCun 1989. This is the post that I did a very hasty/bad job of describing on the pod: nitter.app/karpathy/status/150339… . Basically - how much could you improve Yann LeCun's results with the knowledge of 33 years of algorithmic progress? How constrained were the results by each of algorithms, data, and compute? Case study there of. nanochat. My end-to-end implementation of the ChatGPT training/inference pipeline (the bare essentials) nitter.app/karpathy/status/197775… On LLM agents. My critique of the industry is more in overshooting the tooling w.r.t. present capability. I live in what I view as an intermediate world where I want to collaborate with LLMs and where our pros/cons are matched up. The industry lives in a future where fully autonomous entities collaborate in parallel to write all the code and humans are useless. For example, I don't want an Agent that goes off for 20 minutes and comes back with 1,000 lines of code. I certainly don't feel ready to supervise a team of 10 of them. I'd like to go in chunks that I can keep in my head, where an LLM explains the code that it is writing. I'd like it to prove to me that what it did is correct, I want it to pull the API docs and show me that it used things correctly. I want it to make fewer assumptions and ask/collaborate with me when not sure about something. I want to learn along the way and become better as a programmer, not just get served mountains of code that I'm told works. I just think the tools should be more realistic w.r.t. their capability and how they fit into the industry today, and I fear that if this isn't done well we might end up with mountains of slop accumulating across software, and an increase in vulnerabilities, security breaches and etc. nitter.app/karpathy/status/191558… Job automation. How the radiologists are doing great nitter.app/karpathy/status/197122… and what jobs are more susceptible to automation and why. Physics. Children should learn physics in early education not because they go on to do physics, but because it is the subject that best boots up a brain. Physicists are the intellectual embryonic stem cell nitter.app/karpathy/status/192969… I have a longer post that has been half-written in my drafts for ~year, which I hope to finish soon. Thanks again Dwarkesh for having me over!
59
6
259
76,451
SELLSELLSELL
10
2
242
15,750
Gosh this Zuckerberg Dwarkesh podcast is pure cope my ears are hurting
9
2
244
23,756
Now that that is behind us Happy Anthropic Week
11
5
233
16,319
Anthropic will revive it. Anthropic isn't wasting precious GPUs on stupid dog trick video gimmicks. Instead they are scaling.
GPT-5 singlehandedly destroyed the fast takeoff narrative and killed the stock market
25
1
225
22,880
FYI, if it's not obvious, the models behind Claude Code keep getting better - it's not the same Opus it was a month ago. And Claude Code is a general agent framework. You can use it to write a book or an opera or a screenplay.
Btw, since people don’t seem to know this, you can literally spawn subagents in Claude Code just by asking.
18
6
224
36,058
Replying to @rapha_gl
Pains me to say it but pretty bearish tbh Nat and Daniel aren't leaving to go report to Alex Wang if Ilya had something good on his hands 32b basically acquihire money for brand name people and good optics
5
1
212
14,524
In particular, Claude will not have "reasoning" vs "non-reasoning" versions. There's just one model, has reasoning, multimodal, canvas. Same thing Sam is promising for GPT5. Hence the want to get ahead of the news. May the best AI win.
9
213
7,953
Replying to @powerbottomdad1
OpenAI is worth $500B on $-10B net income so what
5
198
12,202
Dedication to the bit is unwavering "How does metas latest model come in 32nd on lymsys?" This is how Nobody passionate about these models is working at meta. How could you when this is your company's AI frontman?
Yann LeCun: I'm not interested in LLMs anymore - they're the past. The future is in four more interesting areas: machines that understand the physical world, persistent memory, reasoning, and planning.
11
6
200
15,597
Sonnet 4.5 with Claude 2.0 is AGI people Update whatever graphs you need to
13
3
194
12,088
Replying to @lauriewired
Now you have two problems
4
2
175
20,835
Replying to @ChaseBrowe32432
Kurzweil predicted Turing test in 2029 It was 5 years early
6
1
189
15,723
Maybe the reason Mistral needs to distill Deepseek is because they aren't allowed to air condition their datacenters
8
1
183
7,340
The falling piano was my idea hope you like it
Keep thinking.
7
165
8,090
Seriously though, does reducing model scheming by 97% through training not cause you to update @ESYudkowsky ? Or do you not find this "surprising"?
We've made progress on the AI safety problem of detecting and reducing "scheming": - Created evaluation environments to detect scheming - Observed current models scheming in controlled settings - Found deliberative alignment (openai.com/index/deliberativ…) decreases scheming rates These are some of the most exciting long-term AI safety results to date, and there's still a lot of work left to do. Looking forward to seeing further work done in this space. Research done in collaboration with @apolloaievals: openai.com/index/detecting-a…
30
2
164
67,302
I have it from very good sources that GPT5s internal codename is "Marcus"
15
2
163
9,442
Y'all aren't ready for the Claude 4 discussion
15
1
161
9,378
Zuck repeatedly hitting refresh on his email waiting for @polynoamial to reply
Mark Zuckerberg has successfully lured away three OpenAI researchers. Lucas Beyer, Alexander Kolesnikov and Xiaohua Zhai have joined META's superintelligence lab.
2
5
156
16,356
FYI the signal here is that a minor versioning update will be SoTA. Tells you how far ahead they are internally
anthropic.claude-3-7-sonnet-20250219-v1:0 Claude 3.7 Sonnet is Anthropic's most intelligent model to date and the first Claude model to offer extended thinking - the ability to solve complex problems with careful, step-by-step reasoning. Anthropic is the first AI lab to introduce a single model where users can balance speed and quality by choosing between standard thinking for near-instant responses or extended thinking or advanced reasoning. Claude 3.7 Sonnet is state-of-the-art for coding, and delivers advancements in computer use, agentic capabilities, complex reasoning, and content generation. With frontier performance and more control over speed, Claude 3.7 Sonnet is the ideal choice for powering AI agents, especially customer-facing agents, and complex AI workflows. Supported use cases: RAG or search & retrieval over vast amounts of knowledge, product recommendations, forecasting, targeted marketing, code generation, quality control, parse text from images, agentic computer use, content generation Model attributes: Reasoning, Text generation, Code generation, Rich text formatting, Agentic computer use
9
154
9,525
Guys he's sick and tired
Newsletter: I am sick and tired of everybody pretending that generative AI is the next big thing. The media is complicit in accepting fantastical nonsense - both in the numbers put out by OpenAI and the silly jobs created by Anthropic - it has to stop. wheresyoured.at/reality-chec…
17
5
151
16,762
When she finds out you're working at @PrimeIntellect
5
3
151
7,991
How many citations does Alex Wang have? I'll wait
I was laid off by Meta today. As a Research Scientist, my work was just cited by the legendary @johnschulman2 and Nicholas Carlini yesterday. I’m actively looking for new opportunities — please reach out if you have any openings!
9
2
142
25,928
Gary Marcus and his neurosymbolic essay having a bad morning
1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).
3
135
5,954
What's confusing is that somehow Satya is tolerating this sort of execution from his team Like does he enjoy Copilot being a punchline Suleyman hiring remains #1 headscratcher
Utterly confusing, funny and sad
5
136
6,865
All due respect my man, no one truly understands these models. So saying that you do, and much more that you refuse to engage in any meaningful empirical science with them, signals you aren't a serious commentator and shouldn't be seen as one.
Replying to @ChombaBupe
I don't need to sign up to ChatGPT to understand it's limitations & strengths, I have read dozens of research papers by now on the limitations of decoder-only transformer models that power OpenAI models. And I am confident these models aren't intelligent.
10
131
16,166
What’s the over under for how long until this little Thinky Machines adventure becomes part of Anthropic
9
131
12,266
Replying to @martin_casado
Math Olympiad problems are hard fyi 99.9% of people would get a 0.0
10
131
10,579
$1B seems really low?
*APPLE FINALIZING PACT THAT WOULD PAY GOOGLE ROUGHLY $1B A YEAR *GOOGLE GEMINI AI MODEL TO HELP RUN SIRI FEATURES DUE IN 2026 *APPLE TO USE 1.2 TRILLION PARAMETER GOOGLE MODEL TO POWER SIRI
12
1
130
15,969
Weekend Twitter Summary - o3 is literally AGI manna from heaven - o3 has yet to create code that compiles - Gemini though! - Anthropic who?
7
2
125
4,393
Search / synthesis boys in shambles
8
124
6,967
Feed right now
8
4
129
5,917
I just want to note that Chomba has only ever tried the free version of ChatGPT (4o mini) but extrapolates that to representing not just the state of the art, but what is ever possible
Human engineers for example can predictably design software, planes, cars, computers etc an artist can predictably produce art & a cook predictably, cook. Generative models on the other hand? Good luck getting them to do what you asked for.
11
6
124
21,497
The real reason theyare bringing these back is continuous learning You can directly store off the encodings as memories Think RAG but instead of embeddings as index, it's encodings as content
Introducing T5Gemma: the next generation of encoder-decoder/T5 models! 🔧Decoder models adapted to be encoder-decoder 🔥32 models with different combinations 🤗Available in Hugging Face and Kaggle developers.googleblog.com/en…
7
7
126
9,732
Get a good night's rest big day tomorrow See you on the flip side @apples_jimmy

ALT Field Of Dreams Baseball GIF

14
2
126
9,100
While of academic interest, reminder in practice that for LLMs even this sort of nondeterministic behavior doesn't matter. If the prompt changes by as much as a space or a capital letter all bets are still off. This is the sort of nondeterminism that is still an issue.
Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to prompt engineering. Here we share what we are working on and connect with the research community frequently and openly. The name Connectionism is a throwback to an earlier era of AI; it was the name of the subfield in the 1980s that studied neural networks and their similarity to biological brains. thinkingmachines.ai/blog/def…
11
3
125
43,405
The most important thing to understand for continuous learning is that memory is a tool Agents must be RLed to learn how to use their own memories
12
3
123
9,053
If you can't find a way to get 100:1 ROI on Claude 4 for $200/month you're seriously ngmi
15
7
124
12,694
Never ask A woman her age A man is salary Gary Marcus to define what he means by "GPT-5 level"
9
3
119
7,321
You won't find a bigger AI optimist than Dario. All you idiots calling him a doomer is an intentional misunderstanding of his actions. As he's said many times, he's obsessed with AI safety because it's the only thing between us and utopia
6
4
117
4,805
Wonder what this is trying to get ahead of?
We’re releasing GPT-5-Codex — a version of GPT-5 further optimized for agentic coding in Codex. Available in the Codex CLI, IDE Extension, web, mobile, and for code reviews in Github. openai.com/index/introducing…
9
119
8,432
Replying to @ns123abc
Begins with Mustafa and ends with Suleyman
6
2
113
7,416
You only need to believe 3 things to believe in LLMs 1) predicting next token best requires world model 2) neural net can contain representation of world model 3) gradient descent can find this solution
it is 2025 and there are people who believe that accurately predicting the next token does not require understanding the underlying reality that created that token btw
6
6
113
6,943
Replying to @scaling01
Hint: Anthropic line being dead straight is no coincidence
4
108
8,972
My feed be alternating o1 beaten by Stanford researchers RLing GPT2 with HW answer key and VoodooFX card Grok3 expected to be worse than random number generator
4
2
105
3,718
It's on boys
Replying to @ainergiz
Next version insanely better is the plan
5
107
15,421
Replying to @dwarkesh_sp
April fools was the other day brother
2
109
5,214
Morning anon. You have 10 days to escape the permanent underclass
Anthropic just sent the next model, codenamed Neptune V6, to red teamers and launched a 10-day challenge with extra bonuses for confirmed universal jailbreaks
4
1
105
8,542
Happy Claude 4 day! Post your favorite Claude 4 Ws in the comments below. I'll print them out and hang them in the employee bathroom stalls (I have a private one (Claudine and Demis can use it though))
10
5
102
12,532
Corporate greed strikes again
BREAKING 🚨: Eggs Egg Prices have now collapsed 86% since the start of March 🥚🐔📉
7
1
104
9,936
Orthogonality thesis boys in shambles
Sonnet 4.5 is out! It’s the most aligned frontier model yet; a lot of progress relative to Sonnet 4 and Opus 4.1!
9
101
16,303