Co-founder and Chief Compute Officer @AnthropicAI

SF
(1/4) Learning ML engineering is a long slog even for legendary hackers like @gdb. IMO, the two hardest parts of ML eng are: 1) Feedback loops are measured in minutes or days in ML (compared to seconds in normal eng) 2) Errors are often silent in ML
How I became a machine learning practitioner: blog.gregbrockman.com/how-i-… (Spoiler alert: you can too!)
2
113
479
Training/eval'ing GPT-3 involved a bunch of gnarly distributed system problems (which I love, but are an acquired taste tbh). The API hides those messy details so you can use normal python w/ a tight feedback loop. Gave me the same tingles as switching from TF to pytorch 😊
We're releasing an API for accessing new AI models developed by OpenAI. You can "program" the API in natural language with just a few examples of your task. See how companies are using the API today, or join our waitlist: beta.openai.com/
2
16
98
An illustration of @OpenAI Universe. Each dot is a task in task space. You can measure the power of an AI by the range of tasks it solves.
4
30
100
ML dev speed hack #0 - Overfit a single batch - Before doing anything else, verify that your model can memorize the labels for a single batch and quickly bring the loss to zero - This is fast to run, and if the model can't do this, then you know it is broken
3
16
91
My personal experience with GPT-3 is similar to Max's. The model's surprisingly capable, but still has many weaknesses (which we tried our best to point out in the GPT-3 paper). I expect the future to be shiny, but getting there will need a lot of work from the whole community.
New blog post up: so, you've probably seen all the tweets about GPT-3. GPT-3 is objectively a step forward in the field of AI text-generation, but the current hype on VC Twitter misrepresents the model's current capabilities. GPT-3 isn't magic. minimaxir.com/2020/07/gpt3-e…
3
13
86
ML dev speed hack #2 - Assert tensor shapes - Wrong shapes due to silent broadcasting or reduction is an extreme hot spot for silent errors, asserting on shapes (in torch or TF) makes them loud - If you're ever tempted to write shapes in a comment, make an assert instead
2
10
80
Excited to get to work with AWS and Annapurna Labs on optimizing Trainium from silicon to software. Our team’s been having fun going deep into the Neuron stack to get as close as possible to 100% peak theoretical performance.
We're expanding our collaboration with AWS. This includes a new $4 billion investment from Amazon and establishes AWS as our primary cloud and training partner. anthropic.com/news/anthropic…
1
12
78
15,104
We've long had a culture of pair-programming at Anthropic, with one engineer as the Driver and one as the Navigator. It's been interesting to watch Claude rapidly becoming proficient in the Driver role. We're hiring for great Navigators :)
Introducing Claude 3.7 Sonnet: our most intelligent model to date. It's a hybrid reasoning model, producing near-instant responses or extended, step-by-step thinking. One model, two ways to think. We’re also releasing an agentic coding tool: Claude Code.
6
7
104
16,792
I love these new models. Excited to see how the world will put them to work.
Today, we're announcing Claude 3, our next generation of AI models. The three state-of-the-art models—Claude 3 Opus, Claude 3 Sonnet, and Claude 3 Haiku—set new industry benchmarks across reasoning, math, coding, multilingual understanding, and vision.
1
2
70
5,872
(2/4) Most ML people deal with silent errors and slow feedback loops via the "ratchet" approach: 1) Start with known working model 2) Record learning curves on small task (~1min to train) 3) Make a tiny code change 4) Inspect curves 5) Run full training after ~5 tiny changes
1
3
65
This is awesome! Language models do a form of data compression, so they can help people who have limited bandwidth from their bodies due to mobility issues.
Typing using only 4 keys is challenging! This is my first go at making a semantic keyboard, which works by guiding a language model to write a text for you. Using GPT-3:
1
9
63
Our work on the Adversarial Patch covered by @BBC. Glad to see mainstream media interested in ML security. Not sure what's going on with that photoshopped toast... bbc.com/news/technology-4255…
3
12
59
ML dev speed hack #1 - PyTorch over TF - Time to first step is faster b/c no static graph compilation - Easier to get loud errors via assertions within the code - Easier to drop into debugger and inspect tensors (TF2.0 may solve some of these problems but is still raw)
3
7
59
Now seems like a good time to mention that we’re always looking for ways to more efficiently turn raw compute into useful safety research. If you know of great software engineers who are interested in building big machines then have them message me at tom@anthropic.com
We’ve raised $580 million in a Series B. This will help us further develop our research to build usable, reliable AI systems. Find out more: anthropic.com/news/announcem…
1
7
54
Excited to share what I've been working on for the last few months! If you're interested in scaling laws and safety (or scaling laws *𝗳𝗼𝗿* safety) then check out our careers page: anthropic.com/#careers
7
54
Immensely proud of this work by the team at Anthropic. Incentives matter, and this sets up the incentive to solve safety problems so that we can scale further.
Today, we’re publishing our Responsible Scaling Policy (RSP) – a series of technical and organizational protocols to help us manage the risks of developing increasingly capable AI systems.
3
5
53
14,650
I like this model.
Introducing Claude 3.5 Sonnet—our most intelligent model yet. This is the first release in our 3.5 model family. Sonnet now outperforms competitor models on key evaluations, at twice the speed of Claude 3 Opus and one-fifth the cost. Try it for free: claude.ai
4
3
53
5,260
I encourage y’all to read (or at least skim) the paper. I’m really proud to have had a part in creating this work over the last 18 months and am glad to get to share it with you. Paper: arxiv.org/abs/2005.14165 Samples & Data: github.com/openai/gpt-3 (12/12)
3
7
49
Our new paper: "Is Generator Conditioning Causally Related to GAN Performance?" TLDR: "Almost certainly" arxiv.org/abs/1802.08768
2
12
46
Learning Day! Today I'll be learning about GPU kernel programming by going through the Numba tutorials and writing some CUDA kernels from scratch. 🍿🌽🌰 <= my kernels
We've recently rolled out Learning Day on OpenAI's policy team and it's wonderful. Today I'll be reading a book on tech transfer initiatives between West and USSR during the 20th century. Ask me about the Gorky autoplant for a good time, comrades!
2
36
Wanted to give credit to @colinraffel for this excellent summary thread for T5. I really appreciate having an overview before diving into the nitty gritty of a paper, and I used this as inspiration to do my own summary thread yesterday.
2
36
I now suspect that I have worked with several spies
2
36
2^42 = 4.398 Trillion Math checks out 👌✨
Extrapolating the spectacular performance of GPT3 into the future suggests that the answer to life, the universe and everything is just 4.398 trillion parameters.
36
Everybody dance now! Great execution on this pose-transfer GAN project from Caroline Chan et al. at @berkeley_ai arxiv.org/pdf/1808.07371.pdf Their paper is quite simple and easy to follow. I'll mention three cool tricks that they used to get their results.
1
11
33
Progress on making the "inscrutable matrices" inside of Transformers more understandable! It seems like this technique is now "shovel ready" for engineers who want to work on scaling it up on frontier LLMs.
If you'd asked me a year ago, superposition would have been by far the reason I was most worried that mechanistic interpretability would hit a dead end. I'm now very optimistic. I'd go as far as saying it's now primarily an engineering problem -- hard, but less fundamental risk.
1
3
35
4,986
Really enjoyed this convo about our journey so far and where we're headed. Especially fun reminiscing about the early days and the origin stories for each of us.
Our co-founders discuss the past, present, and future of Anthropic. Timestamps: 00:00 Why work on AI? 02:08 Scaling breakthroughs 10:57 Sentiment shifting 18:30 The Responsible Scaling Policy 30:42 Founding story 39:08 Racing to the top 43:43 Looking to the future
3
1
37
6,742
ML dev speed hack #4 - Use ipdb.set_trace() - It's hard to make an ML job take less than 10 seconds to start, which is too slow to maintain flow - Using the ipdb workflow lets you zero in on a bug and play with tensors with a fast feedback loop
1
1
30
ML dev speed hack #3 - Add ML test to CI - If more than one entrypoint or more than one person working on the codebase, then add a test that runs for N steps and then checks loss - If you only have one person and entrypoint then an ML test in CI is probably overkill
2
2
28
ML dev speed hack #5 - Use nvvp to debug throughput - ML throughput (step time) is one place where we have the tools to make errors loud and feedback fast - You can use torch.cuda.nvtx.range_push to annotate the nvvp timeline to be more readable
2
2
27
Replying to @jackclarkSF
I wonder what might be "practicing the scales" for ML research engineering? - write minimal versions of core papers (xformer, gan, vae) - profile them and make faster - write core ops (like xent-loss) in pytorch and make them numerically stable - write simple GPU ops
2
5
27
IMO, ML researchers outside of the defense community underestimate how hard adversarial examples will be to solve. Two reasons: 1) Research runs at the speed of the conference cycle. A broken defense gets accepted to NIPS, but the community doesn't know it's broken until ICML.
2
4
21
Adversarial examples for the human visual system: An image that when viewed for a fraction of second looks like X, but on reflection you realize it's Y.
Adversarial examples that fool both human and computer vision arxiv.org/abs/1802.08195
3
20
(4/4) Within the ratchet approach, I want more tools and best practices for making feedback loops shorter and for making errors louder. Below is a short list of development speed hacks that I have found useful.
1
17
So many flips on my twitter feed 🙃 Proud to be involved in the safety collaboration between @OpenAI and @DeepMindAI blog.openai.com/deep-reinfor…
1
18
Listening to this singing neural net over the last few months has dramatically increased my appreciation for country music 🤠
Introducing Jukebox, a neural net that generates music, including rudimentary singing, as raw audio in a variety of genres and artist styles. We're releasing a tool for everyone to explore the generated samples, as well as the model and code: openai.com/blog/jukebox/
2
18
I was lucky enough to work with Colin as I was starting out in ML research. For anyone who's interested in doing a PhD @unccs, I highly recommend talking with him. Congrats, Prof Raffel!
1
14
Learning Day rules!! 📚🍎 People post what they're learning in Slack, and if there are others who are interested then we can learn together. We often learn faster together because each person knows a different piece of the full story.
Each Thursday at OpenAI is Learning Day: a day where employees have the option to self-study technical skills that will make them better at their job but which aren't being learned from daily work. Here's how it works: openai.com/blog/learning-day…
2
15
I know I’m like ten years late to the party, but guys, Anki is really good! Breaking down tough ideas into tiny questions means that I notice which parts I’m confused about.
1
15
Replying to @johnschulman2
Excited to get to work with you again, Joschu!
15
3,276
I'm still optimistic that there will be a solution to our contest. Three reasons: 1) Ensembles of humans make robust decisions on simple tasks 2) There are lots of things to try, and I've been consistently surprised by the speed of ML progress 3) This task is really really easy
3
1
12
This stuffed-giraffe-resistant hand from my roboticist friends at @openai brings joy to my heart. Problem: Deep learning has trouble generalizing outside the training set. Solution: Put WAY MORE STUFF in the training set.
Replying to @OpenAI
We’re all used to robots that fail when their environment changes unpredictably. Our robotic system is adaptable enough to handle unexpected situations not seen during training, such as being prodded by a stuffed giraffe:
2
13
Curious what other folks recommend for speeding up ML development feedback loops and for making errors louder.
2
1
11
Cool, thanks for sharing. Makes perfect sense that BPE would mess up these tasks. Can't believe we missed running these experiments for the paper!
1
11
Just released a little app. meetanotherday.com - A fake meeting scheduler for people with too many real meetings
1
1
10
New study on algorithmic efficiency trends by @Hernandez_Danny My bet is that this trend will keep up for at least three more years on ImageNet. That means that in 2023 it will take 250x (!) less compute to train to AlexNet level than it took in 2012. (1/2)
Since 2012, the amount of compute for training to AlexNet-level performance on ImageNet has been decreasing exponentially — halving every 16 months, in total a 44x improvement. By contrast, Moore's Law would only have yielded an 11x cost improvement: openai.com/blog/ai-and-effic…
1
1
12
2) Some defenses do well against a *specific attack* (like small perturbations), but they don't generalize to other threat models
1
3
11
My favorite behavior from working on github.com/nottombrown/rl-te… - The noodle ballerina 🍝💃 Training care of noodle coach @raelifin
1
6
10
AI friends and AI skeptics of twitter - Will there be an AI project that uses 100x more compute than AlphaGo Zero within the next three years?
71% Yes (70% confidence)
13% No (70% confidence)
16% Maybe
289 votes • Final results
6
5
9
Great work from some of my colleagues at OpenAI! I’m glad that they’ve stayed true to the Charter and are being careful not to release dual-use technology without giving our institutions some time to react and adapt.
We've trained an unsupervised language model that can generate coherent paragraphs and perform rudimentary reading comprehension, machine translation, question answering, and summarization — all without task-specific training: blog.openai.com/better-langu…
2
9
Replying to @lukeprog
Have you seen this plotted as an "expected vaccine date" with error bars? If not then I could potentially scrape and re-plot it
2
9
And a special thanks to @diamant_ron for the time spent walking me through the system architecture, chip design, software stack, ISA, etc. You're unique in not only being both an expert on chip design, but also an ML practitioner who understands the option spaces that we're exploring. Really grateful to get to work with you!
3
1
11
2,435
No giraffes were in the training set for this hand. The model was just trained on TONS of random environments, so instead of memorizing solutions to specific envs, the easiest things for the model to do was to figure out a general strategy (which included giraffe resistance!).
Replying to @OpenAI
We’re all used to robots that fail when their environment changes unpredictably. Our robotic system is adaptable enough to handle unexpected situations not seen during training, such as being prodded by a stuffed giraffe:
7
Replying to @stewfortier
I've also been surprised by this. My theory is that it's quite good at playing the straight man, so it works as a good comic foil to outlandish prompts.
1
6
Allstar class of AI fellows from @open_phil with a wide range of study (provable defenses to adversaries, safe exploration in RL, language understanding, strategies of conflict, and interpretability). Looking forward to seeing how each of these students pushes forward their field
Excited to announce our first class of AI Fellows, seven machine learning students to whom we’re collectively recommending $1.1 million in PhD fellowship support over the next five years: openphilanthropy.org/focus/g…
1
8
I love this post because it helps operationalize one of the big disagreements between AI optimists and pessimists: Will we continue to see bigger and bigger compute projects or will the AI hype bubble collapse? blog.openai.com/ai-and-compu…
2
1
7
OpenAI has a program that is offering free access to the API to academic researchers: forms.office.com/Pages/Respo…
1
9
Nerding out over this colorful network architecture diagram from @OpenAI Five. s3-us-west-2.amazonaws.com/o…
1
5
These comics by the talented @sh_reya are 👌✨ @waitbutwhy needs to watch the stick figure throne.
hello Twitter, I present a fun intro to AI safety! these comics took longer than I thought, so I'm posting half the series today & the second half on Monday. let me know what you think! or if you have any other ideas :-)
1
7
Replying to @Julian
"compute"
1
6
Awesome guide to InfoGAN by @avitaloliver - I especially like that it includes exercises for the non-lazy reader.
Replying to @DepthFirstLearn
Today we released our first guide, about InfoGAN: depthfirstlearning.com/2018/… Stay tuned, there's more to come!
4
6
We teach our computer children to draw. We teach them by making them fight each other.
6
Trick #3: They add a face-specific GAN to touch up the face after the main generation is finished. They include an ablation study and it looks like it helps substantially.
1
6
Really excited to work with you, Julian!
4
848
Navigation tips from @catherineols
Claude Code is very useful, but it can still get confused. A few quick tips from my experience coding with it at Anthropic 👉 1) Work from a clean commit so it's easy to reset all the changes. Often I want to back up and explain it from scratch a different way.
2
3
17
7,956
RL-Teacher talent-show runner-up: Noodle Dog Conductor
1
6
Replying to @gwern
You should add the meme.
6
Finally, we look at some ways where these models might go wrong. We look at the potential for misuse and study the biases of the model. My personal hope is that by studying these current weaknesses, we can develop solutions that will scale with more powerful systems. (9/12)
2
7
Awesome! Best portrait scan that I've seen so far!
4
Shout out to all y’all producing OSS, research papers, blog posts and other public goods! You don’t show up in GDP, but you show up in our hearts ❤️
Realization of the day: progress in open source software does not show up in GDP figures because it isn't bought or sold. It might indirectly increase GDP because of improved productivity of companies that consume it, or related donations / managed services.
6
Trick #1: They want their poses to be aligned, so they scale and translate the source pose to match the target. (Also note that they start with a Pix2PixHD setup, so in addition to the normal GAN loss, they have an autoencoder loss in VGG feature space)
1
5
Replying to @girishsastry
The 1.5x is a rule of thumb I took from @robinhanson It might come from combining both: 1) All else equal, we should find ourselves near the middle of a trend (so it should go on for 2x) 2) We tend to only notice trends that have been going for a while (so adjust down to 1.5x)
5