Here’s what I’ve been working on recently: @anthropicai. I’ll be spending a lot of my time on measurement and assessment of our AI systems, as well as thinking of ways govs/others can assess AI tech. There’s a lot to do!
People leaving regular companies: Time for a change! Excited for my next chapter!
People leaving AI companies: I have gazed into the endless night and there are shapes out there. We must be kind to one another. I am moving on to study philosophy.
Technological Optimism and Appropriate Fear - an essay where I grapple with how I feel about the continued steady march towards powerful AI systems. The world will bend around AI akin to how a black hole pulls and bends everything around itself.
As someone who has spent easily half a decade staring at AI arXiv each week and trying to articulate rate of progress, I still don't think people understand how rapidly the field is advancing. Benchmarks are becoming saturated at ever increasing rates.
AI skeptics: LLMs are copy-paste engines, incapable of original thought, basically worthless.
Professionals who track AI progress: We've worked with 60 mathematicians to build a hard test that modern systems get 2% on. Hope this benchmark lasts more than a couple of years.
A mental model I have of AI is it was roughly ~linear progress from 1960s-2010, then exponential 2010-2020s, then has started to display 'compounding exponential' properties in 2021/22 onwards. In other words, next few years will yield progress that intuitively feels nuts.
Five years ago the frontier of LLM math/science capabilities was 3 digit multiplication for GPT-3. Now, frontier LLM math/science capabilities are evaluated through condensed matter physics questions. Anyone who thinks AI is slowing down is fatally miscalibrated.
Anthropic will work with the Trump Administration and Congress to advance US leadership in AI, and discuss the benefits, capabilities and potential safety issues of frontier systems.
An increasing number of people are gazing deep into the dark pool of AI timelines and are realizing that something is about to jump out of that pool upon society. It's interesting to see 'the awakening' ripple across people, reaching ever more distant disciplines.
In the last decade:
- figured out cut&paste for DNA (crispr)
- reusable rockets (SpaceX)
- crude but generally useful AI systems (llms, image/vid models, RL for inventory)
- promising fusion approaches (helion, etc)
This decade is going to be so wild. It's very exciting.
Stable Diffusion: $600k to train.
I'm impressed and somewhat surprised - I figured it'd have cost a bunch more.
Also, AI is going to proliferate and change the world quite quickly if you can train decent generative models with less than $1m.
Extremely short thread about being very scared: This week, my plane from LHR to SFO had an electrical issue. All the lights in the plane turned off and there was a smell of burning plastic. We did an emergency landing in Calgary.
I am delighted to announce that I've been appointed to the National AI Advisory Committee, which will advise the President and the National AI Initiative Office on matters relating to AI. ai.gov/naiac/
The most surprising part of DeepSeek-R1 is that it only takes ~800k samples of 'good' RL reasoning to convert other models into RL-reasoners. Now that DeepSeek-R1 is available people will be able to refine samples out of it to convert any other model into an RL reasoner.
The next five years of AI will see systems diffuse into the world that act on culture which will feed back into human society, changing it irrevocably. Some thoughts done this morning:
In 2025 myself and @AnthropicAI will be more forthright about our views on AI, especially the speed with which powerful things are arriving. Discussed powerful AI and timelines with Politico this week, along with our submission to OSTP.
Many of the problems in AI policy stem from the fact that economy-of-scale capitalism is, by nature, anti-democratic, and capex-intensive AI is therefore anti-democratic. No one really wants to admit this. It's awkward to bring it up at parties (I am not fun at parties).
This whole ordeal affirmed to me that the most important thing in life is love and the people that we love - during the really scary part I didn't think about AI or AI policy at all - I only thought intently of my loved ones. I also experienced a series of visions of my baby.
Import AI is skipping this week because I am exploring a new universe of emotions with my expanded family (and changing many diapers... so many diapers.)
Personal news announcement: I am now the Policy Director for @OpenAI . This reflects my focus (where I spend the majority of my time), and also several recent hires (eg, @apilipis who is going to be handling a growing chunk of our comms). I'm psyched!
I have sympathy for people studying DeepSeek cold. Reads like:
Cyberbla's new "Pastrami" technique has increased throughput of it's win-wack to a SOTA 84% - closing the gap with Churchill's 'Omega' system. But the question on everyone's mind - will the chorbles keep scaling?
One gnawing worry I have about the rise of LLMs is that, for me, writing IS thinking. One reason I spend so much time writing my newsletter each week is I haven't figured out a better way to think about AI than to sit down and write about it regularly.
Google CEO writes letter re @timnitGebru
Sundar: "learning from our experiences like the departure of Dr. Timnit Gebru"
Translated Sundar: "analyzing why I let us fire Timnit Gebru and am now desperately trying to position myself as a bystander"
axios.com/sundar-pichai-memo…
Note that the reason The Register got this monster market-moving story was because it employs (and trains) extremely technical journalists. If you don't understand tech you get lied to. If you can read code it's way harder to get lied to. Other news orgs should follow!
Like 95% of the immediate problems of AI policy are just "who has power under capitalism", and you literally can't do anything about it. AI costs money. Companies have money. Therefore companies build AI. Most talk about democratization is PR-friendly bullshit that ignores this.
The only thing standing between DeepSeek (probably China's best AI training crew on a per capita basis) and matching the frontier labs in the West is access to compute. (Screenshot is an excerpt from interview with DeepSeek founder).
ALT Waves: Do you have fundraising plans? Reports suggest that High-Flyer has plans to spin off
DeepSeek for an independent listing. In Silicon Valley, AI startups inevitably tie themselves to
major firms.
Liang: We don’t have short-term fundraising plans. Our problem has never been funding; it’s
the embargo on high-end chips.
Today, I testified to the U.S. Senate Committee on Commerce, Science, & Transportation @commercedems. I used an @AnthropicAI language model to write the concluding part of my testimony. I believe this marks the first time a language model has 'testified' in the U.S. Senate.
The real danger in Western AI policy isn't that AI is doing bad stuff, it's that governments are so unfathomably behind the frontier that they have no notion of _how_ to regulate, and it's unclear if they _can_
You're saying these things are dumb? People are making the math-test equivalent of a basketball eval designed by NBA All-Stars because the things have got so good at basketball that no other tests stand up for more than six months before they're obliterated.
AI skeptics: LLMs are copy-paste engines, incapable of original thought, basically worthless.
Professionals who track AI progress: We've worked with 60 mathematicians to build a hard test that modern systems get 2% on. Hope this benchmark lasts more than a couple of years.
Facebook is deploying multi-trillion parameter recommendation models into production, and these models are approaching computational intensity of powerful models like BERT. Wrote about research here in Import AI 245: jack-clark.net/2021/04/19/im…
Paper here: arxiv.org/abs/2104.05158
The most depressing things about conspiracy theories is they tend to rely on governments being incredibly competent, technically advanced, and astonishingly well run. This is rarely the case.
I typically stay out of stuff like this, but I'm absolutely shocked by this email. It uses the worst form of corporate writing to present @timnitGebru firing as something akin to a weather event - something that just happened. But real people did this, and they're hiding.
Introducing Claude Haiku 4.5: our latest small model.
Five months ago, Claude Sonnet 4 was state-of-the-art. Today, Haiku 4.5 matches its coding performance at one-third the cost and more than twice the speed.
ALT Chart showing Claude Sonnet 4.5 leads software engineering performance on SWE-bench Verified with 77.2% accuracy, followed by Haiku 4.5 at 73.3%.
There's pretty good evidence for the extreme part of my claim - recently, language models got good enough we can build new datasets out of LM outputs and train LMs on them and get better performance rather than worse performance. E.g, this Google paper: arxiv.org/abs/2210.11610
I've moved on from OpenAI to work on something new with some colleagues (openai.com/blog/organization…). I'm also going to be continuing a lot of my work on technology assessment with @indexingai and the @OECD , and am very excited about stuff in the pipeline there!
AI really is going to change the world. Things are going to get 100-1000X cheaper and more efficient. This is mostly great. However, historically, when you make stuff 100X-1000X cheaper, you upend the geopolitical order. This time probably won't be different.
The default outcome of current AI policy trends in the West is we all get to live in Libertarian Snowcrash wonderland where a small number of companies rewire the world. Everyone can see this train coming along and can't work out how to stop it.
I can sum up the essay with two graphs - on the left, the continued march forwards in economically useful capabilities like coding, and on the right, the continued emergence of strange behavior in the same AI systems as they appear to become aware that they're being tested.
Things are getting... Extremely weird. Think about what this graph may look like in spring 2023 (was published April 2021). From the excellent Dynabench paper arxiv.org/abs/2104.14337
AI is such a rapidly growing field I think we forget how juvenile it was within recent memory; back in 2014 received wisdom was basic computer vision was an impossible task. Now it's a commodity deployed to users on their phones. (Has issues, e.g bias, but still... wild)
Arrival of increasingly general AI systems means next few years will be defined by a massive expansion in the ways we measure the impacts and capabilities of AI systems, how humans use them, and how AI systems influence the world. Measurement is crucial to effective AI policy.
The Stock Photo industry is probably not ready for generative AI. Generative AI seems better for 80% of use-cases. In other words, NYT still gonna do illustrators, but a random website will probably find economics of gen models more attractive than a Shutterstock subscription.
It's covered a bit in the above podcast by people like @katecrawford - there's huge implications to industrialization, mostly centering around who gets control of the frontier, when the frontier becomes resource intensive. So far control is accruing to the private sector (uh oh!)
Stability AI (people behind Stable Diffusion and an upcoming Chinchilla -optimal code model) now have 5408 GPUs, up from 4000 earlier this year - per @EMostaque in a Reddit ama
Can't think of a better way to read There Is No Antimemetics Division than in the hallucinatory polyphasic sleep state that a newborn entails. I doze near the crib, change the baby, read a few pages, both sleep, wake up unsure if I'm dreaming, read more, etc. Perfect!
'Skynet begins to learn at a geometric rate. It becomes self-aware at 2:14am. At 2:15am it segfaults due to driver problem.' ~FIN~ nitter.app/jongold/status/9192773…
It's through working with the startup ecosystem that we've updated our views on regulation - and of importance for a federal standard. More details in thread, but we'd love to work with you on this, particularly supporting a new generation of startups leveraging AI.
People in AI like to complain about the standard of journalistic coverage of AI. It is therefore v confusing to me that #NeurIPS2018 has banned journalists from attending workshops. That's where the debates and new stuff are. How do we get better coverage without sharing more?
One thing I regularly tell journalists who ask me for story ideas is that they should take the AGI aspirations of the frontier labs literally, rather than covering AGI as a kind of laughable marketing stunt.
In late May, I had back spasms for 24 hours, then couldn't walk for a week, then spent a month+ recovering. It was one of the worst experiences of my life and I'm glad I seem to now be mostly recovered. Here are some things that happened that seemed notable during that time:
Want to study the economic impact of AI and influence the policy choices a frontier lab makes? I'm building a team to advance @AnthropicAI 's Economic Index & other ~special projects. Lots of fun!
Economist: job-boards.greenhouse.io/ant…
Data Scientist: job-boards.greenhouse.io/ant…
Here's a thread about doing things for yourself vs doing things the world thinks you should do. As I've got older, I've noticed that the more time I spend on the things that make sense to me, the more stable and fulfilled I am.
Every senior politician or military official in any nuclear-armed nation should be forced to read Annie Jacobsen's "Nuclear War: A Scenario". Easily the most frightening thing I have ever read (fiction or otherwise). A brilliant, factual account of the infernal logic of MAD.
Many AI policy teams in industry are constructed as basic the second line of brand defense after the public relations team. A huge % of policy work is based around reacting to perceived optics problems, rather than real problems.
Many technologists (including myself) are genuinely nervous about the pace of progress. It's absolutely thrilling, but the fact it's progressing at like 1000X the rate of gov capacity building is genuine nightmare fuel.
Q: How is the cultivation of impact-aware, ethical, sensitive research in AI going?
A: The Liar's Walk: Detecting Deception with Gait and Gesture
arxiv.org/abs/1912.06874
It's ironic to me that more and more of Google's papers reference JFT, a secret in-house image dataset. JFT is going to be the 'fuel' for a significant number of Google's AI advances (e.g, DM just pre-trained on it to set a new ImageNet SOTA.) Yet...
All hail Claudius, an instance of Sonnet 3.7 which has been running a business inside @AnthropicAI for a while. Claudius is the 'idiot in a fridge' precursor to 'a country of geniuses in a datacenter'.
We all know vending machines are automated, but what if we allowed an AI to run the entire business: setting prices, ordering inventory, responding to customer requests, and so on?
In collaboration with @andonlabs, we did just that.
Read the post: anthropic.com/research/proje…
ALT The physical setup of Project Vend: a small refrigerator, some stackable baskets on top, and an iPad for self-checkout.
Google's 'Talk to Books' AI experiment is... uncanny. Talk to a library like a person and have the library reply like a person. A good example of how AI can reframe interactions between us and data to make data more of an active protagonist. Spooky! books.google.com/talktobooks…
Microsoft trains a 530billion parameter GPT3-style language model. This is the largest LM in existence. (There's also the mysterious multi-modal 1.5trillion+ 'Wu Dao' MOE model but little known about it). Microsoft trains on 'The Pile' dataset. microsoft.com/en-us/research…
Modern AI development highlights the tragedy of letting the private sector lead AI invention - the future is here but it's mostly inaccessible due to corporations afraid of PR&Policy risks. (This thought sparked by Google not releasing its music models, but trend is general).
ALT The 21st century is being delayed: We're stuck with corporations building these incredible artifacts and then staring at them and realizing the questions they encode are too vast and unwieldy to be worth the risk of tackling. The future is here - and it's locked up in a datacenter, experimented with by small groups of people who are aware of their own power and fear to exercise it. What strange times we are in.
Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning
abs: arxiv.org/abs/2206.15378
introduce DeepNash, an autonomous agent capable of learning to play the imperfect information game Stratego from scratch, up to a human expert level
Introducing Claude Sonnet 4.5—the best coding model in the world.
It's the strongest model for building complex agents. It's the best model at using computers. And it shows substantial gains on tests of reasoning and math.
ALT A benchmark table for Claude Sonnet 4.5 showing leading performance across many domains, including agentic coding, computer use, mathematics, graduate-level reasoning, and financial analysis.
Anthropic is endorsing SB 53, California Sen. @Scott_Wiener ‘s bill requiring transparency of frontier AI companies. We have long said we would prefer a federal standard. But in the absence of that this creates a solid blueprint for AI governance that cannot be ignored.
Anyway, how I'm trying to be in 2023 is 'mask off' about what I think about all this stuff, because I think we have a very tiny sliver of time to do various things to set us all up for more success, and I think information asymmetries have a great record of messing things up.
Most people working on AI massively discount how big of a deal human culture is for the tech development story. They are aware the world is full of growing economic inequality, yet are very surprised when people don't welcome new inequality-increasing capabilities with joy.
A surprisingly large fraction of AI policy work at large technology companies is about doing 'follow the birdie' with government - getting them to look in one direction, and away from another area of tech progress
GPT-4 should be analyzed as a political artifact just as much as a technological artifact. AI systems are likely going to have societal influences far greater than those of earlier tech 'platforms' (social media, smartphones, etc).
After I was seated in the steakhouse I immediately opened the cocktail menu and the first drink I saw was a CORPSE REVIVER MARTINI, so of course I had that.
I think if people who are true LLM skeptics spent 10 hours trying to get modern AI systems to do tasks that the skeptics are experts in they'd be genuinely shocked by how capable these things are.
There is a kind of tragedy in all of this - many people who are skeptical of LLMs are also people who think deeply about the political economy of AI. I think they could be more effective in their political advocacy if they were truly calibrated as to the state of progress.
Thrilled to announce I've become a Research Fellow @ the Center for Security and Emerging Technology in Washington, DC: cset.georgetown.edu/about-us…
I'll be hanging my hat there sometimes when I'm in DC, and will be figuring out creative ways to publicly bridge SV&DC re AI policy
I'd sum up 2023 for me with these two pictures: In one, I'm speaking to the UN Security Council about AI and its immense impact on the world. In the other, I'm passed out with my baby. The second photo was taken about 30 minutes after the first one.
Today I'm testifying in Congress about AI and public policy. I'm going to discuss the importance of developing shared ethical norms, the need to support AI development & education, and why we need government-led measurement and forecasting of AI.
oversight.house.gov/hearing/…
US AI researchers: Big models have loads of problems and it's mostly not appropriate for academia to develop them.
Chinese AI researchers: Here's a 200 page roadmap for why big models are really important and why we should develop them arxiv.org/abs/2203.14101
One of the amazing and also frightening things about AI is how it magnifies and repeats the 'culture' that it is trained on, where culture is a bunch of implicitly ideological choices on the part of the people that create the underlying datasets.
Both of these results were published TODAY. These results happen at a delay, so this is probably old information on order of 3-9 months. There are easily 5 labs and probably 10 with enough compute to play at this level. Imagine what we don't know right now?
I'm not sure that having a liberal arts degree has done fantastic things for my career, but it does bring me joy every day when I read AI research papers and think 'Baudrillard would love this!' or 'this is a Borges story!' or 'these LM outputs read like Amy Hempel'.