Excels at reasoning & tool use🪄 Tensor-enjoyer 🧪 @METR_Evals. My COI policy is available under “Disclosures” at contextwindows.substack.com/…

Oakland, CA
Running list of conjectures about neural networks 📜:
6
13
172
41,554
If you’re bored with today’s neural net architectures, then my advice to you is to start training models to use an external memory, now that RL is finally working.
12
77
739
60,397
The normalization scheme that DeepMind researchers came up with for their "linear recurrent unit" (LRU) is a nice example of how it is possible to predictably engineer circuits in artificial neural networks, when you know what you're doing. A thread:
6
68
646
151,038
Just read their paper. Looks like they re-invented an existing method known as context distillation (or merely re-branded it for their startup). No mention of prior work, sadly. Links to papers in thread.
Announcing Bread Technologies. We’re building machines that learn like humans. We raised a $5 million seed round led by Menlo Ventures and have been building in stealth for 10 months. Today, we rise 🍞
21
17
499
90,987
We're working on it!
guys pleeease I need to see Sonnet 4.5 on this
12
5
468
69,438
Before you say “this isn’t surprising”… Yes, it is. We got people to preregister their expectations, and even folks who are extremely in-the-know about AI coding abilities still failed to predict this result. Your *vibes* are not reliable indicators of productivity effects.
We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers. The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.
23
31
424
24,493
METR: “What is my purpose?” ALL: “You put new AI models on The Graph.” METR: “Oh my god...”
guys pleeease I need to see Sonnet 4.5 on this
8
15
560
54,839
YES! If you initialize a LoRA layer based on the SVD of the original weight matrix (with its top singular values & vectors), you get significantly better fine-tuning results. This is a straight-up free lunch, as far as I can tell.
PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models Significantly improved finetuned perf by simply changing the initialization of LoRA's AB matrix from Gaussian/zero to principal components of W repo: github.com/GraphPKU/PiSSA abs: arxiv.org/abs/2404.02948
6
33
319
32,329
What excites me most about the rising tide of RNNs/SSMs is that it could let the fields of machine learning and computational neuroscience use the same modeling tools.
7
32
280
59,338
Note: sparse coding is an *established* method for disentangling representations. Anthropic did not invent it, nor did they claim to. If their new results seem surprising, now's a great time to revisit the older literature (Olshausen, Kanerva, etc.).
5
22
222
35,323
Wow! Papers from two different teams—one from academia and one from Google DeepMind—with the same finding: linear recurrence + local (sliding window) attention is your best bet if you want an efficient alternative to global attention.
Simple linear attention language models balance the recall-throughput tradeoff Recent work has shown that attention-based language models excel at recall, the ability to ground generations in tokens previously seen in context. However, the efficiency of attention-based models is
5
22
211
34,098
Stability changed the name of these models to "Stable Beluga 1/2" and quietly removed the sentence of the blog post that mentioned they used two unnamed LLMs to generate their dataset. (This likely means they used OpenAI models, in clear violation of ToS) web.archive.org/web/20230721…
14
36
202
88,669
We put the model in a test and then we steer the model away from thinking “I am in a test” and then we steer the model away from introspecting “I am being steered away from thinking «I am in a test»”
Problem: AIs can detect when they are being tested and fake good behavior. Can we suppress the “I’m being tested” concept & make them act normally? Yes! In a new paper, we show that subtracting this concept vector can elicit real-world behavior even when normal prompting fails.
4
8
206
32,161
A short thread of news 🧵 (1/3) I’ve joined the policy team at METR! Rapid AI changes will require measuring & addressing potential new threats far more quickly.
29
6
207
10,862
Funnily enough, this Anthropic co-founder gave a talk that Sonnet 4.5 can't engage with. Mentions of bioweapons trigger its safety filters.
Technological Optimism and Appropriate Fear - an essay where I grapple with how I feel about the continued steady march towards powerful AI systems. The world will bend around AI akin to how a black hole pulls and bends everything around itself.
10
10
201
18,930
If AI progress totally stalled today, most white-collar job tasks wouldn’t be automated within the next 5 years
"Even if AI progress totally stalls, it's sufficiently easy to collect data on all these different white collar job tasks that we should expect to see them automated within the next 5 years."
13
9
187
19,275
Prediction for 2024/2025: OpenAI showcases an AI assistant that controls a virtual desktop or browser to do a bunch of routine white-collar job tasks with minimal human correction. Public freakout in response to this is significantly more intense than it was for Sora or GPT-4.
13
12
170
13,208
Recently, I've seen lots of buzz about "entropy-based sampling" for LLMs, aka the "Shrek sampler". It's time to put your mana where your mouth is. I've tried to make the resolution criteria relatively objective, and won't bet on the market myself. Link in thread below.
9
10
162
18,186
> Transformers significantly outperform neural sequence models with recurrent or convolutional representations on ICLL tasks […] we provide evidence that their ability to do so relies on specialized “n-gram heads” (higher-order variants of previously-described “induction heads”)
4
23
160
29,112
Wait, so then it's no mystery why OpenAI's new base models are good at chess: they explicitly crafted the pretraining dataset to cover that! I presume whatever extra tuning they did to chat models wasn't focused on chess, so some of that was forgotten. @GrantSlatton @davidad
11
9
163
480,824
(This is me. I do this too!)
SILENCE frontier lab gpu poors are talking
6
3
156
6,360
Folks ask “Will the scaling laws keep holding, or will they bend?” This is a false dichotomy. If a scaling law keeps holding, it will bend. Chinchilla & other loss scaling trends are power laws *plus a constant offset* from an unknown (nonzero) minimum achievable task error.
6
6
154
49,308
Neural networks are associative memory machines par excellence. If you want to wire them by hand or to interpret them, this is important to know. (Diagram is mine, but the content is classic connectionist stuff, and probably goes back to at least the 1940s w/ McCulloch & Pitts)
5
12
140
16,494
A week ago, these were a few easy arguments for why the pace of AI progress is about to increase: “RL compute is just now scaling to match pre-training” and “AI is starting to make SWE/R&D go faster”. Grok 4 and the RCT from METR has made these arguments seem a little weaker now
Grok 4 being trained on as much RL compute as pretraining compute is big if true. This seemed pretty inevitable but surprised to see it happen by mid-2025.
6
6
140
15,955
“Pre-training is still scaling! Pre-training is still scaling!” I continue to insist as I slowly reallocate my entire training cluster to RL
2
6
144
14,781
Researchers at FAIR were way ahead of their time working on this back in 2019! Excited to hear from more folks who are exploring cool new directions out of Meta
As part of our recent work on memory layer architectures, I wrote up some of my thoughts on the continual learning problem broadly: Blog post: jessylin.com/2025/10/20/cont… Some of the exposition goes beyond mem layers, so I thought it'd be useful to highlight separately:
1
9
138
19,788
“Orthogonalization” aka “that trick that jailbreaks Llama3 weights”. It’s actually a pretty neat training-free method to ablate a feature, lots of potential uses if it works well.
4
9
137
15,456
OK but the fact you can do RL on base model chains-of-thought—and it just works™️—is wild.
7
6
139
10,067
By default, we’ll see open-weight models catch up to this capability level within the next ~12 months. And then what?
OpenAI, Anthropic, and DeepMind all now say (in varying words) that absent mitigations, their models will be useful (i.e. there is"uplift") for malicious actors who want to make biological weapons, and are implementing precautions based on this concern.
11
7
140
17,744
I’m old enough to remember when some thought “scaling” meant “training bigger models”. That the future was quadrillion-parameter GPTs trained on Common Crawl. AFAICT few still hold that. Later it was retconned to just mean “doing whatever keeps improving performance”.
news: OpenAI's upcomning Orion model shows how GPT improvements are slowing down It's prompting OpenAI to bake in reasoning and other tweaks after the initial model training phase.
8
9
132
19,003
The Transformer's quadratic complexity won't kill it. What might is that, for long contexts, the KV cache ends up being huge, *even bigger than the weights*. Crossover point is when L×2×D×N = L×12×(D^2). Compute is cheap, but memory bandwidth is expensive. latent.space/p/transformers-…
Releasing Yarn-Llama-2-13b-128k, a Llama-2 model, trained for 128k context length using YaRN scaling. The model was trained in collaboration with u/bloc97 and @theemozilla of @NousResearch and @Void13950782 of @AiEleuther.
7
9
134
39,871
Subtle point: there’s a huge difference between typical tasks from your job that take you 1 hour of work, and tasks that a brand new hire could do in their first hour on the job. Most “short” tasks you’ve done probably weren’t standalone: they depended on tons of prior context.
4
13
134
7,086
FYI: these policies would prohibit Meta from releasing Llama3 weights (specifically the 400B model).
Here are 5 policy recommendations for the upcoming AI Safety Summit in Seoul, from me and my colleagues at ICFG. In Bletchley, world leaders discussed major risks of frontier AI development. In Seoul, they should agree on concrete next steps to address them.
10
12
127
25,229
Some developers say AI is now a massive productivity booster. Are they right? @METR_Evals is running another study to measure this. HMU if you want to participate
i'm probably ten times more productive with help from AI now
17
14
135
21,449
Epoch AI posts, for dummies
Many AI leaders claim AI's value mainly will come from accelerating R&D—"geniuses in datacenters." This view has key flaws: R&D contributes less to economic growth & is harder to automate than believed. Most of AI's value will instead come from broad deployment in the economy.
1
9
133
7,252
Why are we instructing our LLMs in 50-line megaprompts? Weren’t structured control flow, subroutines, namespaces etc. invented like a half century ago?
9
6
126
29,682
Thinking in latent space? Oh I’ll show you how thinking in latent space feels alright
4
4
123
6,447
“The bitter lesson is we just needed to rebrand reward functions as verifiers” - Rich Sutton, probably
2
7
121
9,562
This looks legit. Attention heads tend to use the beginning of sequence for "null attention", so maintaining those tokens at the start of the KV cache allows for better sliding-window generation of long text. Can also be combined with long context tricks. arxiv.org/abs/2309.17453
4
11
121
21,336
Contrary to claims SB 1047 would only impact AI megacorps, “covered models” include any non-derivative model that is as generally capable as circa-2024 frontier models. Algorithmic progress means in a matter of years, smaller players and even hobbyists *will* fall into its scope.
I support SB 1047: the regulation asks billion-$ tech companies to take reasonable precautions when training models with the greatest capability for misuse, poses few to no costs on other developers, and supports academic & open-source research through compute funding.
8
21
111
35,900
“Good Guys with AI will defend us against Bad Guys with AI.” OK but *who specifically* is gonna develop and deploy those defenses? The police? The military? AI companies? NGOs? You and me?
11
7
123
10,041
Much of the backlash to SB 1047 is best seen as an expression of negative partisanship against the AI Safety movement. For those folks, the key point is not “This bill has XYZ specific problems”, but rather “This whole campaign must be stopped, or else the Doomers win”
5
7
111
8,781
Replying to @Miles_Brundage
Hard to fault them when they can’t verify what the actual thing is
1
103
12,498
In Mamba, the selection mechanism has a knob to modulate the flow of time, via Δt. If an input sets Δt → 0, time is effectively frozen, so the state value is momentarily prevented from changing, which acts to "hold" or "latch onto" a memory. And Δt → ∞ fast-forwards to reset!
4
7
108
12,099
Researchers keep writing these papers with headline claims that “Transformers are X” or “Attention is Y”, with tiny disclaimers inside that they’re *really* just talking about linear attention, not the kind of attention that Transformers actually use.
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality Presents Mamba-2, which outperforms Mamba and Transformer++ in both perplexity and wall-clock time arxiv.org/abs/2405.21060
7
5
104
20,972
Replying to @rom1504
Nobody asked the content authors. Many of them are objecting now, yet nothing is done. I think by default we should take an opt-in approach, where the author must choose to make their data broadly available as part of a corpus. Re: your question -> no, I don't mean that
3
4
88
2,203
Steering vectors are so strange to me. Like… there are many possible interventions! Why does just adding a vector everywhere work? And like… there are many possible ways of trying to make a vector elicit a behavior! Why does just diffing activations from contrast pairs work?
Steering vectors are fascinating, but they are such an inexact tool it seems epistemologically irresponsible to draw very strong conclusions about what's happening inside an AI model from experiments with them alone.
18
3
107
11,106
From my perspective, "Is it really *reasoning*?" and "Does it really have a *world model*?" and "Is that really *generalization*?" are fundamentally kind of confused. These ten-dollar words are ways of expressing normative judgments that a computation is useful-for-some-purposes.
.@TrentonBricken explains how we know LLMs are actually generalizing - aka they're not just stochastic parrots: - Training models on code makes them better at reasoning in language. - Models fine tuned on math problems become better at entity detection. - We can just straightforwardly read the world-models developed by smaller NNs which are easier to interpret (Othello). Transfer learning shows models are developing a deeper understanding of their data. Full episode out Thursday.
15
8
99
14,554
FYI: I now think SB 1047 is not a bad bill. It definitely isn’t my favorite approach, but given a stark choice between it and a random draw from the set of alternative AI regulatory proposals, I’d be picking it more often than not.
5
4
104
10,574
If you use a custom 20B token synthetic training dataset and don't release it for public scrutiny, I will just assume you trained your model on the test data, or on stuff derived from the test data.
How far does one billion parameters take you? As it turns out, pretty far!!! Today we're releasing phi-1.5, a 1.3B parameter LLM exhibiting emergent behaviors surprisingly close to much larger LLMs. For warm-up, see an example completion w. comparison to Falcon 7B & Llama2-7B
2
5
92
14,719
We now have an interactive version of the time horizons graph (and the raw data) up on the METR website!
Replying to @METR_Evals
You can now find most of our measurements at the top of the blog post below in an interactive chart. We plan to keep this view up-to-date, periodically adding to it whenever we have new time-horizon measurements to share. metr.org/blog/2025-03-19-mea…
1
8
97
10,038
Wild seeing the race to cobble together AI systems that make decisions: - autonomously - with brittle methods - for reasons nobody understands - daisy-chained across the Internet - without any vigilance controls - affecting people with no notice or consent
4
14
88
6,494
ArXiv is already a junkyard of preprints peddling promises of infinite memory—if only we would tweak the Transformer just a tad. Whenever you see a new one, the question to ask is always “Why this one?” This may be the one, but what makes this time different?
Google presents Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention 1B model that was fine-tuned on up to 5K sequence length passkey instances solves the 1M length problem arxiv.org/abs/2404.07143
10
3
95
12,108
If this is true, it seems kinda bad for activation interpretability? Like, interpreting activations seems like a much harder problem if the latents at each layer contain ~all the input-space structure (even structure that the model doesn’t use!)
LLMs are injective and invertible. In our new paper, we show that different prompts always map to different embeddings, and this property can be used to recover input tokens from individual embeddings in latent space. (1/6)
11
3
94
9,857
Replying to @teortaxesTex
I love neuroscience and I appreciate bio-inspiration as an aesthetic but stuff like this is typically a giant grift.
1
87
3,131
Lock in your predictions: in 24 hours, will you look back on this post as substantially true or just self-promotion hype?
we just had an AI breakthrough in our lab robotics is about to have its ChatGPT moment and that moment is happening tomorrow
29
1
87
18,649
🚨 SB 1047 was just amended🚨 - “Covered model” now means a model whose training is >10^26 FLOP and costing >$100M estimated worth of compute (inflation-adjusted) - “Derivative model” now excludes models fine-tuned for >25% of the original training compute (continued below ⤵️)
7
3
83
25,034
Exciting to see @WhiteHouse talk about [unobjectionable high-level goal] in the AI Action Plan, which really underscores why [my preferred policy idea] is so important!
Today the @WhiteHouse released America’s AI Action Plan to win the global race. We need to OUT-INNOVATE our competitors, BUILD AI & energy infrastructure, & EXPORT American AI around the world. Visit AI.gov
2
7
86
5,005
Replying to @deanwball
TFW you asked the genie for evidence-based AI policy, but forgot to also wish for AI policymakers who can tell good evidence from bad. Rookie mistake!
2
6
80
3,118
Feels notable that Anthropic, OpenAI, and Google were all able to quickly figure out massive Transformer context windows without anybody revealing their methods. And the open community is hot on their heels. All that secrecy wasn't worth much, apparently.
5
11
83
16,860
Replying to @teortaxesTex
You’re reading too much into this
2
83
2,098
If we somehow time-traveled a copy of GPT-4o back to 2004 and let a focus group of NeurIPS (then NIPS) attendees interact with it for 2 hours, what percent would endorse calling it “AGI” afterward? (Pretend it won’t give responses that would require knowledge of the then-future.)
14% <25%
16% 25-50%
24% 50-75%
47% >75%
1,700 votes • Final results
24
9
84
15,899
Replying to @rom1504
No. I would say we ML researchers should hold ourselves to a high standard of conduct, such that that when people tell us they don't want us training on the content they authored, we respect their wishes.
5
5
70
3,081
How does Stability get to call StableVicuna "open source" when the model is derived from the not-open-source Vicuna, and is a not-open-source LLaMA tuned with ToS-encumbered data from the not-open-source GPT-3/ChatGPT?
12
5
81
20,130
Contrast pairs are overpowered. Once you have them, you can use them to generate control vectors, and to initialize classifiers, and to do RL/DPO, and probably more
Replying to @AnthropicAI
To make the probes, we track how the model’s internal state changes between “Yes” vs “No” answers to questions like "Are you doing something dangerous?" We use this info to detect when a sleeper agent is about to misbehave (e.g. insert a code vulnerability). It works quite well:
2
5
81
11,940
Transformer is seemingly now the all-around heavyweight champion. Doesn't matter whether autoregressive or diffusion, text or image or video or robotics/multimodal, unsupervised or supervised or RL ...
Stability AI announces Stable Diffusion 3 most capable text-to-image model, utilizing a diffusion transformer architecture for greatly improved performance in multi-subject prompts, image quality, and spelling abilities. Prompt: Epic anime artwork of a wizard atop a mountain at night casting a cosmic spell into the dark sky that says "Stable Diffusion 3" made out of colorful energy
4
12
80
17,357
This syncretism of rhetoric from the AI Safety movement and China-hawks unsettles me. It feels like a kind of unholy alliance in the making …
How a US/China superintelligence arms race will play out: “The CCP is going to have an all-out effort to infiltrate American AI labs. Thousands of people, the full force of the Ministry of State Security. There's an enormous incentive for a first strike.” @leopoldasch
6
1
75
10,849
It’s like LoRA and control vectors had a baby!
ReFT: Representation Finetuning for Language Models 10x-50x more parameter-efficient than prior state-of-the-art parameter-efficient fine-tuning methods repo: github.com/stanfordnlp/pyref… abs: arxiv.org/abs/2404.03592
1
9
77
9,081
This was funny when the hacked accounts were just random individuals, but OpenAI’s new official newsroom account getting taken over by crypto-spammers is just a real bad look.
4
2
81
4,151
linear probing right now
GDM Mech Interp Update: We study if SAEs help probes generalise OOD (they don't 😢). Based on this + parallel negative results on real-world tasks, we're de-prioritising SAE work. Our guess is that SAEs aren't useless, but also aren't a game-changer More + new research in 🧵
4
81
2,751
Llama4 appears to be here.
5
6
81
10,565
Excited to try this out! (Though I'm kinda doubtful it'll be better than Hedgehog) It's basically just linear attention on top of queries & keys that have been passed through a LayerNorm -> elementwise squaring.
Linear Transformers with Learnable Kernel Functions are Better In-Context Models Advancing the frontier of subquadratic architectures for Language Models (LMs) is crucial in the rapidly evolving field of natural language processing. Current innovations, including State Space Models, were initially celebrated for surpassing Transformer performance on language modeling tasks. However, these models have revealed deficiencies in essential In-Context Learning capabilities - a domain where the Transformer traditionally shines. The Based model emerged as a hybrid solution, blending a Linear Transformer with a kernel inspired by the Taylor expansion of exponential functions, augmented by convolutional networks. Mirroring the Transformer's in-context adeptness, it became a strong contender in the field. In our work, we present a singular, elegant alteration to the Based kernel that amplifies its In-Context Learning abilities evaluated with the Multi-Query Associative Recall task and overall language modeling process, as demonstrated on the Pile dataset.
3
9
71
21,457
9
71
4,666
Can somebody with a cybersecurity background weigh in on how big of a deal this is? Just finished the report, but I didn’t feel like I learned much from it.
Replying to @AnthropicAI
We believe this is the first documented case of a large-scale AI cyberattack executed without substantial human intervention. It has significant implications for cybersecurity in the age of AI agents. Read more: anthropic.com/news/disruptin…
24
4
146
37,783
We used to have vectorized LISP running on massively-parallel hardware that looked like this REMEMBER WHAT THEY TOOK FROM YOU
symbolic AI is going to make large hoards of compute obsolete
3
9
73
10,538
I used to *love* sneering at @GaryMarcus and his takes on AI progress. Something shifted when I started building products w/ LLMs in my day job. I started seeing more vividly why reliability matters, and how the current zeitgeist is hurting itself making promises we can't keep
4
4
65
10,067
This is basically DPO without preference labels! Simply assume the supervised responses to prompts are better than the model's responses to those same prompts. Similar to the trick Intel used for Neural Chat, where they assumed GPT-4 responses > Llama2 responses.
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models Significantly improves the LLM’s performance across a variety of benchmarks and even outperform models trained through DPO with extra GPT-4 preference data arxiv.org/abs/2401.01335
5
8
70
10,945
Highly recommend this 3-hour video. Makes me feel jealous of the researchers who get to explore model internals!
Replying to @NeelNanda5
We discuss their papers showing that model diffing is unexpectedly easy when fine-tuning in a narrow domain, and on finding and fixing flaws with crosscoders, a sparse autoencoder based approach Video: piped.video/VQ_7zLXHf3s
5
73
15,884
This is good news for future open-weight model releases, I think. It implies that even as developers cross their bio-risk capability thresholds, there is a way they can keep releasing fine-tunable model weights that don’t rely on refusals.
Thought real machine unlearning was impossible? We show that distilling a conventionally “unlearned” model creates a model resistant to relearning attacks. 𝐃𝐢𝐬𝐭𝐢𝐥𝐥𝐚𝐭𝐢𝐨𝐧 𝐦𝐚𝐤𝐞𝐬 𝐮𝐧𝐥𝐞𝐚𝐫𝐧𝐢𝐧𝐠 𝐫𝐞𝐚𝐥.
8
4
70
5,535
OPINION: we should probably move away from training AI systems on datasets like LAION-400M/5B and Books3, fair use aside. (I say this as someone who knows the folks that collected those datasets & who thinks they deserve credit for doing uncelebrated but very impactful work.)
6
3
55
11,573
> see new Transformer contender > query is a learned, fixed vector > no other RNN baselines > no language modeling experiments
Attention as an RNN abs: arxiv.org/abs/2405.13956 "attention can be viewed as an RNN with the special ability to compute its many-to-one RNN output efficiently" Proposes Aaren, a new module that can be trained in parallel (like Transformers) but also be efficiently updated at inference time, thereby requiring only constant memory (like RNNs).
3
67
5,645
Worried about the future of openness in AI? Here is a way to help: We're putting together a public list of all the good work that's been enabled by open-weight foundation models, to show why transparency & public scrutiny is worth protecting. ⬇️ Links below ⬇️
3
12
63
10,836
If we can detect an LLM is copying from a span of context (à la induction heads), couldn't we then grab the rest of the span and run it through the model in parallel (à la speculative sampling)? Could be an easy win for tasks that call for in-context retrieval...
4
5
64
14,137
As evidence of this, the California state legislature is considering another AI bill, AB 3211. That bill would have far worse impacts on tech companies and open-source, as reported by observers like @deanwball, @TheZvi, & @binarybits . Yet it’s produced almost no real opposition.
5
5
67
5,716
ICYMI: this interviewee confirms speculations that OpenAI’s Fine-tuning API uses LoRA under the hood. Around the 43.5 minute mark.
🆕 @latentspacepod: Is finetuning GPT4o worth it? w/ @AlistairPullen of @cosine_sh Betteridge's law says no: with 59 different flavors of RAG, and >2million token context + prompt caching, it's reasonable to believe that "in context learning is all you need". But Genie is the first to make a huge bet finetuning @OpenAI GPT4o for code at the largest scale it has ever been used externally; resulting in what is now the #1 coding agent in the world according to SWE-Bench Full (30%), Lite (50%), and Verified (40%), by a country mile. Most finetuning is in the <100m token range. It's no surprise that the results aren't that gamechanging. We delve into the process of wandering the idea maze with YC, working with @john__allard and co, and creating billions of tokens of synthetic code data from real user logs and purposefully sabotaging ASTs to create reasoning traces that exhibit: - Perfect info lineage - Incremental knowledge discovery - Step by step decision making Enjoy! Full pod link below.
5
5
59
17,959
Replying to @jxmnop
but jack
1
60
1,522
This is earth-shattering news. The "hard problem" of mechanistic interpretability has been solved. The formal/cautious/technical language of most ppl commenting on this obscures the gravity of it. What this means -> not just AGI, but *safe* *superintelligence* is 100% coming🧵
2
3
57
4,640
IDK who needs to hear this but the "70k unused embeddings for multimodal extensions" line item is pure filler. If they weren't used during training, they just contain random noise. You could've added those extra rows to the embedding matrix yourself, for the same effect.
4
4
60
15,592
Evaluation is hard! This goes for AI just as with us. In games like chess and Go, evaluation is easy, which allows for tight feedback loops and rapid self-improvement. But in rich domains, the bottleneck IS evaluation (doing experiments, peer review, &c.) anthropic.com/index/evaluati…
4
2
61
10,985
“intelligence fizzle”: when AI is used for AI R&D but this produces insufficient returns for an intelligence explosion from fixed inputs see also: subcritical intelligence reaction
2
2
62
4,529
Replying to @jxmnop
This chart is computed based on a specific distribution of SWE/MLE-oriented tasks that METR developed, namely SWAA + RE-Bench + HCAST. Appendix D of the HCAST paper has summaries of the tasks. Here are the ones in the ~4 hour range. metr.org/hcast.pdf
4
1
63
14,477
Replying to @jeremyphoward
This account has made similar wild claims & promises before. I don't put much weight in stuff they post anymore
3
59
8,941
What I mean is "can perform complex reasoning" wait nvm I meant "can win at strategic games" wait nvm I meant "can understand human language" wait nvm I meant "can automate economically-valued office tasks" wait nvm I meant "can assist in scientific discovery" wait nvm I meant
3
4
57
3,560
Are AI systems best described as tools, as an alien species, or as our mind-children? I think this is something of a litmus test for broader views.
26
3
56
16,145
Read this post. It describes—in better words than I've ever found—a shift in paradigm within ML in recent years, towards an "industrial" one based on predictable input-output relations. Lots of great lines, some of which I'll quote below (h/t @gleech) nostalgebraist.tumblr.com/po…
3
6
60
7,538
And *then* he said...
Before we scramble to deeply integrate LLMs everywhere in the economy, can we pause and think whether it is wise to do so? This is quite immature technology and we don't understand how it works. If we're not careful we're setting ourselves up for a lot of correlated failures.
5
60
7,642
If you think of model internals as a kind of “biology”, then you can think of steering vectors as early and extremely basic “pharmaceuticals”. Within this metaphor, it’s no surprise that they often produce unintended side effects!
A number of people have asked me why we titled our recent paper "On the Biology of a Large Language Model". Why call it "biology"?
2
2
59
4,196
Rather than trying to "solve" superposition & to always explain/predict/control neural network computations using the same units of analysis, consider a more "Hopfieldian" lens, where representational spaces rule (via dynamics at multiple valid scales) piped.video/cl_Wa7CGm7A?si=8bWl…
2
11
58
6,535
If o3 really is just a GPT trained using RL to do long-form thinking… how will you adjust, as an AI researcher? How will you avoid ending up like one of those soldiers who thought they were still fighting WWII for years after peace had been declared?
7
56
4,691