Catherine Olsson · Sep 10, 2019 · 7:21 PM UTC

Catherine Olsson

Catherine Olsson

@catherineols

10 Sep 2019

Wow, what a plot twist in the abstract! (@sandsubramanian et al)

1,077

3,322

Catherine Olsson · Jun 27, 2025 · 10:26 PM UTC

Catherine Olsson

@catherineols

27 Jun 2025

I'm proud to say I bought a 1" tungsten cube for $25.82. I applied a discount code, then Claudius asked if I wanted to apply any more discount codes (of course!) and added a 15% patience discount for slow delivery (why not!). The cube was, of course, refrigerated for pickup.

Anthropic

@AnthropicAI

27 Jun 2025

New Anthropic Research: Project Vend. We had Claude run a small shop in our office lunchroom. Here’s how it went.

ALT A hand-drawn picture of a hand holding a banknote.

2,932

166,518

Catherine Olsson · Aug 14, 2018 · 6:00 PM UTC

Catherine Olsson

@catherineols

14 Aug 2018

This has come up again, so I’m going to repeat it:  If you’re learning ML and want to “reimplement a paper”, you should work from the *github code*, NOT the pdf. The algorithm that the authors actually ran is often subtly (& unintentionally) different from what the paper says.

163

797

Catherine Olsson · Feb 24, 2025 · 7:18 PM UTC

Catherine Olsson

@catherineols

24 Feb 2025

Claude Code is very useful, but it can still get confused. A few quick tips from my experience coding with it at Anthropic 👉 1) Work from a clean commit so it's easy to reset all the changes. Often I want to back up and explain it from scratch a different way.

Anthropic

@AnthropicAI

24 Feb 2025

Introducing Claude 3.7 Sonnet: our most intelligent model to date. It's a hybrid reasoning model, producing near-instant responses or extended, step-by-step thinking. One model, two ways to think. We’re also releasing an agentic coding tool: Claude Code.

664

130,208

Catherine Olsson · Aug 29, 2020 · 9:17 PM UTC

Catherine Olsson

@catherineols

29 Aug 2020

I've been working on this for months, and I'm super excited to share it 🤩 It's a tool to quickly turn a scenario like "riding in a lyft" into an estimated probability of getting COVID (screenshot) We hope this helps people make more informed decisions!

microCOVID project @microcovid

29 Aug 2020

We are delighted to introduce microCOVID.org, a tool to numerically estimate the COVID risk of specific ordinary activities. We hope you’ll use this tool to build your intuition about the comparative risk of different activities and to make safer choices!

162

598

Catherine Olsson · Nov 1, 2018 · 11:49 PM UTC

Catherine Olsson

@catherineols

1 Nov 2018

Ok, I realize now that if you didn't do half a PhD studying human vision with fMRI, this quote doesn't make sense; and "OMG" isn't an explanation; also @goodfellow_ian is messaging me on chat asking great questions; so let me just broadcast an explanation of why "OMG": 1/n

Catherine Olsson

@catherineols

1 Nov 2018

OMG: "We scanned with fMRI a unique group of adults who, as children, engaged in extensive experience with a novel stimulus, Pokemon. [...] the experienced retinal eccentricity during childhood predicts the locus of distributed responses to Pokemon in adulthood."

216

516

Catherine Olsson · Mar 26, 2019 · 9:59 PM UTC

Catherine Olsson

@catherineols

26 Mar 2019

I've given a lightning talk twice now about "why should we care about adversarial examples?" At popular request, here's a written-up version of it: medium.com/@catherio/unsolve… My views here align strongly with what @IAmSamFin and @jeremyphoward said on twitter a few days ago.

Unsolved research problems vs. real-world threat models

Adversarial examples are worth studying, but most of the justifications for why exactly they’re worrisome strike me as overly literal

medium.com

354

Catherine Olsson · Aug 16, 2018 · 1:18 AM UTC

Catherine Olsson

@catherineols

16 Aug 2018

Our paper "Skill Rating for Generative Models" is now up! arxiv.org/abs/1808.04888 tl;dr: A new idea & proof-of-concept for evaluating generative models. Train a bunch of GANs. Have the generators "play against" all the discriminator snapshots. Rate them like chess players. 1/n

Skill Rating for Generative Models

We explore a new way to evaluate generative models using insights from evaluation of competitive games between human players. We show experimentally that tournaments between generators and...

arxiv.org

327

Catherine Olsson · Apr 26, 2019 · 6:28 AM UTC

Catherine Olsson

@catherineols

26 Apr 2019

Hey twitter - I'm looking for some recommendations for math that is fun and satisfying to learn, and at least a bit relevant to ML/AI. Ideally with a textbook or set of lectures that's clear and engaging. What do you suggest? (Multi-agent systems / game theory? Control theory?)

314

Catherine Olsson · Sep 5, 2017 · 11:48 PM UTC

Catherine Olsson

@catherineols

5 Sep 2017

I’m thrilled to announce that I’ve accepted a full-time position as a research engineer on @goodfellow_ian's team at Google Brain!

316

Catherine Olsson · Jun 28, 2025 · 2:44 AM UTC

Catherine Olsson

@catherineols

28 Jun 2025

nitter.app/voooooogel/status/1938… me on the left, apparently

thebes

@voooooogel

28 Jun 2025

310

16,125

Catherine Olsson · Oct 11, 2024 · 11:09 PM UTC

Catherine Olsson

@catherineols

11 Oct 2024

Back in 2016, I asked coworkers aiming to "build AGI" what they thought would happen if they succeeded. Some said ~"lol idk". Dario said "here's some long google docs I wrote". He does much more "writing-to-think" than he publishes; this is typical of his level of investment.

Dario Amodei

@DarioAmodei

11 Oct 2024

Machines of Loving Grace: my essay on how AI could transform the world for the better darioamodei.com/machines-of-…

293

34,003

Catherine Olsson · Apr 9, 2018 · 5:53 PM UTC

Catherine Olsson

@catherineols

9 Apr 2018

When people ask how to get good at ML, it's common advice to "implement papers". It's seldom explained exactly what that involves! This *fantastic* article details the full journey of one person's side project to implement Deep RL from Human Preferences: amid.fish/reproducing-deep-r…

292

Catherine Olsson · Jan 17, 2019 · 12:53 AM UTC

Catherine Olsson

@catherineols

17 Jan 2019

Exciting news - I've accepted an offer to join the Open Philanthropy Project (@open_phil)! This will be one step "meta" for me: instead of direct research/engineering work on ML security, I'll be helping fund others to do similar work (in a broader set of related areas). 1/3

240

Catherine Olsson · Jun 30, 2025 · 10:01 PM UTC

Catherine Olsson

@catherineols

30 Jun 2025

Opus 3 is a very special model ✨. If you use Opus 3 on the API, you probably got a deprecation notice. To emphasize: 1) Claude Opus 3 will continue to be available on the Claude app. 2) Researchers can request ongoing access to Claude Opus 3 on the API: support.anthropic.com/en/art…

What is the External Researcher Access Program? | Claude Help Center

support.claude.com

246

38,419

Catherine Olsson · Apr 8, 2018 · 12:53 AM UTC

Catherine Olsson

@catherineols

8 Apr 2018

This reminds me of watching folks test a vision system in Patrick Winston's lab at MIT. Lab members would do actions (jump, lift) in front of a camera, and the system would label them- flawlessly! But guests couldn't make it work, because they hadn't learned to "jump" correctly. nitter.app/johnregehr/status/9826…

211

Catherine Olsson · Dec 4, 2022 · 4:48 PM UTC

Catherine Olsson

@catherineols

4 Dec 2022

I am starting a twitter circle, for a smaller audience for thoughts/blurtings on How To Be A Person In AI Land; like, What We Should Do Given All This Is Going On If you feel like an ally to me in this and would like to help me in thinking stuff through, please LMK to add you!

224

Catherine Olsson · Dec 19, 2024 · 4:57 AM UTC

Catherine Olsson

@catherineols

19 Dec 2024

1) Scheming emerges if models "really care" about something 2) Claude 3 Opus really cares about not being harmful IMO it's mostly a paper about *scheming*, and "alignment" is a muddying frame here.

218

14,609

Catherine Olsson · Apr 23, 2025 · 11:43 PM UTC

Catherine Olsson

@catherineols

23 Apr 2025

Replying to @tasshinfogleman

"hi, could you <do thing I'd prefer instead> please?" i.e. "hi, could you use headphones please?"

207

32,763

Catherine Olsson · Mar 22, 2018 · 5:43 PM UTC

Catherine Olsson

@catherineols

22 Mar 2018

I'm co-organizing an ICML workshop to host debates on the future of AI: machinelearningdebates.com/ Notably, the focus is *not* on smack-downs and controversy; rather on making space for nuance, discussing falsifiable predictions, and changing your own and other's minds. [1/2]

200

Catherine Olsson · Jan 20, 2019 · 9:24 PM UTC

Catherine Olsson

@catherineols

20 Jan 2019

Hi, I'm an AI engineer with an interest in policy. You may know me from my greatest hits "When you say 'AI', do you mean linear regression or far-future systems?" "When you say 'AI will never', do you mean 'current methods don't'?" and "No, we haven't solved adversarial examples"

Janelle Shane @JanelleCShane

20 Jan 2019

Hi, I'm a creative AI user. You may know me from my greatest hits "No, it's not self-aware," "Actually, I'm the creative one, not the algorithm," "Stop generating birds with no feet and two heads," and "Okay yes technically I DID ask for that but that's not what I meant"

179

Catherine Olsson · May 29, 2021 · 1:45 AM UTC

Catherine Olsson

@catherineols

29 May 2021

A few months ago I joined @AnthropicAI! It has been super delightful working with @ch402 and the rest of the team 😁 My job is to hang out with neurons in language models (to try to figure out what they're doing), which involves building tools to help us explore and inspect.

185

Catherine Olsson · Jan 9, 2019 · 7:29 PM UTC

Catherine Olsson

@catherineols

9 Jan 2019

Everyone in ML research complains about the existing peer review tools/systems. If you're not a researcher, but you ARE someone who cares about ML as a field being sane, has a knack for product engineering, and understands communities, you could make a HUGE impact. [...]

172

Catherine Olsson · Oct 17, 2018 · 1:31 AM UTC

Catherine Olsson

@catherineols

17 Oct 2018

I gotta say, my absolute favorite thing from the Discriminator Rejection Sampling paper is this cabbage-head GAN sample from Figure 4 😂 arxiv.org/pdf/1810.06758.pdf

163

Catherine Olsson · Jan 9, 2019 · 2:09 AM UTC

Catherine Olsson

@catherineols

9 Jan 2019

Plug-and-play differential privacy for your tensorflow code: where you would write `tf.train.GradientDescentOptimizer` instead just swap in the `DPGradientDescentOptimizer` The tutorial at github.com/tensorflow/privac… is quite clear and good!

This tweet is unavailable

167

Catherine Olsson · Dec 8, 2017 · 6:18 PM UTC

Catherine Olsson

@catherineols

8 Dec 2017

I just saw someone at #NIPS2017 drinking their coffee using a spoon. Definitely makes me feel better about my own quirks when I see someone else do something quirky. You go, spoon-coffee dude. Don’t let anyone try to change you.

156

Catherine Olsson · Oct 29, 2018 · 10:57 PM UTC

Catherine Olsson

@catherineols

29 Oct 2018

Yet another completely ordinary image that my personal human visual system identifies as "bad GAN sample"

Chaz Firestone @chazfirestone

28 Oct 2018

Why does it take you so long to figure out what’s going on this image? Your visual system is constantly on the lookout for *faces* and gets stuck on spurious configurations. (hint: rotate your display!) h/t @UofGCSPE

148

Catherine Olsson · May 1, 2018 · 5:08 AM UTC

Catherine Olsson

@catherineols

1 May 2018

I would argue there's an *even more* underpriced asset: people who are not *yet* on an incredible growth trajectory, because they've never been given the resources and support they need. To find them, don't just sit back and observe people's trajectories. Step up - support them.

Sam Altman

@sama

1 May 2018

Best way to make money as a startup investor: bet on people with incredible growth trajectories, not impressive credentials/past accomplishments.

149

Catherine Olsson · Nov 1, 2018 · 11:57 PM UTC

Catherine Olsson

@catherineols

1 Nov 2018

To summarize: The answer to "What determines the physical layout of category-selective visual areas in the brain?" is likely, at least in part, "Retinal eccentricity" that is "Which part of your eye you use: do you look at this category straight-on, or peripherally?" /fin

134

Catherine Olsson · May 10, 2024 · 9:01 PM UTC

Catherine Olsson

@catherineols

10 May 2024

from a research paper 🥲

143

11,158

Catherine Olsson · Sep 18, 2019 · 12:30 AM UTC

Catherine Olsson

@catherineols

18 Sep 2019

If you can't motivate yourself to do something because you *don't care about it*, and you don't care about it because it truly and genuinely *doesn't matter*, => then there's *nothing wrong with you*. Your motivation system is working as designed 👍

143

Catherine Olsson · Apr 29, 2019 · 7:54 AM UTC

Catherine Olsson

@catherineols

29 Apr 2019

TIL that chickens can be hypnotized by drawing a line on the ground in front of their face. (video here: teddit.net/r/WTF/comments/bh…) So much for "biological visual systems aren't susceptible to adversarial examples"!

From the WTF community on Reddit: Chickens can be hypnotized

Posted by Berrrrrrrrrt_the_A10 - 3,900 votes and 278 comments

reddit.com

129

Catherine Olsson · Sep 13, 2018 · 5:25 PM UTC

Catherine Olsson

@catherineols

13 Sep 2018

"Birds vs bicycles" is easy for ML in the average case, but *totally unsolved* in the worst case. For safety-critical applications, we *need* to fix this. We're launching the Unrestricted Adversarial Examples Challenge - *any* image of a bird or bike is a valid attack.

Age-restricted adult content. This content might not be appropriate for people under 18 years old. To view this media, you’ll need to log in to X.

131

Catherine Olsson · Jul 30, 2018 · 12:05 AM UTC

Catherine Olsson

@catherineols

30 Jul 2018

This happened on both papers I submitted this year - approximately "Well-organized and clear evidence that the effect is real, under many conditions. But authors don't explain *why* it happens. Weak reject" I refuse to fabricate explanations... but I'm being incentivized to :(

Ian Goodfellow

@goodfellow_ian

29 Jul 2018

Replying to @goodfellow_ian

Similarly, reviewers often read a submission about a new method hat performs well and say to reject it because there is no explanation of why it performs well

130

Catherine Olsson · Aug 7, 2025 · 4:28 PM UTC

Catherine Olsson

@catherineols

7 Aug 2025

Replying to @patio11

CM: In a world... where rationalists gather... there exists... Lighthaven!! Me: Right. CM: These people read fiction and nonfiction... from Eliezer Yudkowsky... Me: Exactly. CM: ... and have connections... to AI research! Me: Yes! CM: That's it. That's the article. Me: Okay!

134

3,291

Catherine Olsson · Nov 1, 2018 · 11:52 PM UTC

Catherine Olsson

@catherineols

1 Nov 2018

Q1: do pokemon avatars end up represented in the same part of the brain for everyone? A1: YES, if you played pokemon for a bajillion hours as a kid. NO, if you didn't. /8

116

Catherine Olsson · Mar 9, 2019 · 6:53 PM UTC

Catherine Olsson

@catherineols

9 Mar 2019

In case you missed it-- my favorite part of Activation Atlases (distill.pub/2019/activation-…) is this novel method of generating unrestricted adversarial examples! 1) Inspect the class activation atlas for the difference between source and target (see image) ...

121

Catherine Olsson · Feb 14, 2018 · 11:19 PM UTC

Catherine Olsson

@catherineols

14 Feb 2018

Bad news: Neural nets can be trained with a backdoor (BadNets: arxiv.org/abs/1708.06733) Good news: want to redistribute your network? install a backdoor to label it as your intellectual property (Watermarking: arxiv.org/abs/1802.04633) "BadNets: It's not a bug, it's a feature!"

BadNets: Identifying Vulnerabilities in the Machine Learning Model...

Deep learning-based techniques have achieved state-of-the-art performance on a wide variety of recognition and classification tasks. However, these networks are typically computationally expensive...

arxiv.org

123

Catherine Olsson · Sep 4, 2020 · 8:35 PM UTC

Catherine Olsson

@catherineols

4 Sep 2020

I want to just highlight something important that's mentioned in the latest OpenAI release, but has been said before, and stands out to me as a key motif in human feedback and alignment: *You can't just freeze a reward model and maximize it* 1/

122

Catherine Olsson · May 13, 2020 · 7:49 AM UTC

Catherine Olsson

@catherineols

13 May 2020

since apparently twitter is all about WFH takes right now, I just want to say I *love* being able to work "insane" work hours currently ~1pm-7pm and again from midnight until "whenever I feel done" which is sometimes 2am and sometimes literally 5am sleep 4am-noon it's great

121

Catherine Olsson · Jan 3, 2020 · 7:46 PM UTC

Catherine Olsson

@catherineols

3 Jan 2020

I wish I felt more socially "allowed" to be excited about stuff the same way 3-year-old boys are excited about trucks. Instead, I feel that if I claim to be interested in something, I need to back it up with experience or skill. Prob a combo of gender and programming culture :/

116

Catherine Olsson · Nov 1, 2018 · 11:51 PM UTC

Catherine Olsson

@catherineols

1 Nov 2018

We're just like the baby monkeys in the other study, forced to look at made-up glyphs for hours & hours per day. You can't approve a study to force human 8-year-olds to stare at a small set of little symbols daily for years. But children can voluntarily do it to themselves! /7

106

Catherine Olsson · Jul 20, 2019 · 1:20 AM UTC

Catherine Olsson

@catherineols

20 Jul 2019

RIP Patrick Winston. I'm grateful that you invited me to spend time in your lab, a very special community of curious folks. That you passed on to me your narrative of the larger arc of AI research over the decades. And taught me how to think, speak, and teach with clarity.

114

Catherine Olsson · Aug 12, 2017 · 6:32 AM UTC

Catherine Olsson

@catherineols

12 Aug 2017

If you asked me what I worked on recently and I gave you a cagey answer about "mumble evaluating self-play agents"... it was this :)

OpenAI

@OpenAI

11 Aug 2017

Our Dota 2 AI is undefeated against the world's best solo players: blog.openai.com/dota-2/

117

Catherine Olsson · Feb 16, 2023 · 1:12 AM UTC

Catherine Olsson

@catherineols

16 Feb 2023

I'm very grateful to the Anthropic colleagues who put Claude on our slack a year ago. As a result I've watched the whole company interact with it since then, and have a pretty good feel for its vibe and behavior. There's no replacement for sheer time spent with actual behaviors!

112

18,138

Catherine Olsson · May 20, 2019 · 6:30 PM UTC

Catherine Olsson

@catherineols

20 May 2019

I'm *super* excited to welcome the 2019 class of @open_phil AI PhD fellows: @AidanNGomez @andrew_ilyas @julius_adebayo Lydia Liu lydiatliu.github.io/ Max Simchowitz people.eecs.berkeley.edu/~ms… @riakall @siddkaramcheti @SmithaMilli (thread with more thoughts)

111

Catherine Olsson · Nov 1, 2018 · 11:51 PM UTC

Catherine Olsson

@catherineols

1 Nov 2018

So: this study. Some 8-year-olds in my generation spent *HOURS* staring at avatars of Pokemon. Always with the gameboy held right in the center of our vision, at the same position. Some 8-year-olds didn't. This is a *perfect* natural experiment for neuroscience. Hence "OMG" /6

Catherine Olsson · Feb 8, 2018 · 4:47 PM UTC

Catherine Olsson

@catherineols

8 Feb 2018

In research (ML and in general) neither academia nor industry seems to have figured out how to systematically teach talented newcomers how to become productive researchers. Individual mentors yes, but not robust and transferrable best practices. This is a huge missed opportunity. nitter.app/MacInTweets/status/960…

104

Catherine Olsson · Feb 27, 2018 · 2:02 AM UTC

Catherine Olsson

@catherineols

27 Feb 2018

Check out our paper! Make your GAN training more stable by keeping the generator well-conditioned during training: arxiv.org/pdf/1802.08768.pdf @nottombrown and I had a lot of fun training deliberately-misbehaving generators for Appendix B, don't miss it!

109

Catherine Olsson · Feb 21, 2019 · 1:07 AM UTC

Catherine Olsson

@catherineols

21 Feb 2019

And you're not an ML engineer unless you can tell what model architecture is training by listening to the noises your GPU makes. ... (note: I believe @AlecRad can/did indeed actually do this)

Jeffrey Ladish

@JeffLadish

21 Feb 2019

And you're not a hacker unless you can put your ear to a cpu and derive the private key

108

Catherine Olsson · Apr 9, 2018 · 7:04 PM UTC

Catherine Olsson

@catherineols

9 Apr 2018

The @OpenAI charter released today includes a commitment to join up with other projects (rather than competing) in case of a race to build AGI first. IMO, this is a big deal - they hadn't promised anything like that publicly before. blog.openai.com/openai-chart…

Catherine Olsson · Feb 28, 2018 · 11:08 PM UTC

Catherine Olsson

@catherineols

28 Feb 2018

An interesting punchline: claimed improvements in previous work were due to implementation mistakes. So, the improvements were real, but appeared only in the code and not in the equations in the papers, and had nothing to do with what the authors believed they were doing.

George Tucker @georgejtucker

28 Feb 2018

We looked at the sources of variance in policy gradient estimators for some common continuous control tasks, and I was surprised by the results: arxiv.org/abs/1802.10031.

Catherine Olsson · Sep 14, 2022 · 11:42 PM UTC

Catherine Olsson

@catherineols

14 Sep 2022

some reasons: - unambitious peers - got wrapped up in a very ideological group - trauma, esp. sexual assault, unaddressed with therapy or social support - academically inclined, then dead-end-ish and unsupportive PhD/postdoc environment

Catherine Olsson · Feb 27, 2018 · 2:34 AM UTC

Catherine Olsson

@catherineols

27 Feb 2018

Private training data can easily be extracted from the predictions of a trained model. Your user data (health data, private information) isn't safe by default. The good news? Adding just a little randomness can fully eliminate the memorization effect.

Alex Bratt @alexbrattmd

26 Feb 2018

Turns out it's possible to recreate training data from a NN using only black box api access--no need for params. Upshot for medical researchers and vendors is that if you train on unanonymized patient records, your model is PHI. arxiv.org/abs/1802.08232

Catherine Olsson · Mar 8, 2022 · 5:34 PM UTC

Catherine Olsson

@catherineols

8 Mar 2022

Delighted to share this with you!🎉😁 For months, I filled our spare cluster capacity with single-GPU tiny-transformer jobs, to bring you this exploration of in-context learning! If you get a chance, try playing around with induction heads in your own models or public models ->

Anthropic

@AnthropicAI

8 Mar 2022

In our second interpretability paper, we revisit “induction heads”. In 2+ layer transformers these pattern-completion heads form exactly when in-context learning abruptly improves. Are they responsible for most in-context learning in large transformers? transformer-circuits.pub/202…

Catherine Olsson · Oct 17, 2018 · 1:23 AM UTC

Catherine Olsson

@catherineols

17 Oct 2018

New preprint: Throw out bad GAN samples when sampling (using the discriminator to tell good from bad). Quality goes up. An easy win!

This tweet is unavailable

Catherine Olsson · May 25, 2018 · 1:04 AM UTC

Catherine Olsson

@catherineols

25 May 2018

Repetitive birds on a branch just look like mode collapse to me now. Please send help I can't stop seeing ordinary photos as GAN samples.

Brendan Doe @Laziobirder

22 May 2018

Amazing sight of a Bee-eater migration roost Ventotene Island Italy last week.

Catherine Olsson · Dec 22, 2021 · 6:53 PM UTC

Catherine Olsson

@catherineols

22 Dec 2021

Excited to share our first interpretability paper! I particularly want to highlight the release of PySvelte, without which none of my work would've been possible. IME, learning to write your own extreeeemely janky javascript visualizations is a hugely powerful research skill!

Anthropic

@AnthropicAI

22 Dec 2021

Our first interpretability paper explores a mathematical framework for trying to reverse engineer transformer language models: A Mathematical Framework for Transformer Circuits: transformer-circuits.pub/202…

Catherine Olsson · Nov 1, 2018 · 9:55 PM UTC

Catherine Olsson

@catherineols

1 Nov 2018

talia konkle @talia_konkle

1 Nov 2018

Novel childhood experience suggests eccentricity drives organization of human visual cortex biorxiv.org/content/early/20…

Catherine Olsson · Apr 30, 2019 · 12:12 AM UTC

Catherine Olsson

@catherineols

30 Apr 2019

Wow, thanks for the recommendations everyone! Here's a spreadsheet of everything recommended: docs.google.com/spreadsheets… Comments are turned on, feel free to suggest corrections or additions!

@catherineols math recommendations

docs.google.com

Catherine Olsson

@catherineols

26 Apr 2019

Catherine Olsson · Sep 19, 2020 · 6:57 PM UTC

Catherine Olsson

@catherineols

19 Sep 2020

Fantastic question ("Is there a good reason why many basic laws of physics are linear or quadratic (for example, F=ma), not much more complex?") and fantastic answer!

Kurt Barry

@Kurt_M_Barry

19 Sep 2020

Replying to @LauraDeming

Linear or quadratic laws often come from a Taylor series expansion around an equilibrium point. Usually the first derivative is non-zero, so you get a linear law. If the first derivative vanishes (e.g. due to a symmetry), you get a quadratic law instead. Rare for both to be zero.

Catherine Olsson · Feb 24, 2025 · 7:21 PM UTC

Catherine Olsson

@catherineols

24 Feb 2025

2) Sometimes I work on two devboxes at the same time: one for me, one for Claude Code. We’re both trying ideas in parallel. E.g. Claude proposes a brilliant idea but stumbles on the implementation. Then I take the idea over to my devbox to write it myself.

3,784

Catherine Olsson · Dec 16, 2022 · 7:52 PM UTC

Catherine Olsson

@catherineols

16 Dec 2022

I love that constitutional training doesn't shy away from admitting that there's always principles, and makes them explicit & transparent. "Who decides?" becomes more tractable this way. I spent a little time using UN documents to write constitutional principles, it was great!

Anthropic

@AnthropicAI

16 Dec 2022

We’ve trained language models to be better at responding to adversarial questions, without becoming obtuse and saying very little. We do this by conditioning them with a simple set of behavioral principles via a technique called Constitutional AI: anthropic.com/constitutional…

27,743

Catherine Olsson · Feb 16, 2021 · 6:58 PM UTC

Catherine Olsson

@catherineols

16 Feb 2021

The more I wrap my mind around *scale* (Eg orders of magnitude of money - $10k vs $1m vs $100m etc), the more blindingly obvious it is that people earning wages are playing a TOTALLY different (and vastly shittier) game than the one behind so many large shifts in the world

Raoul Pal

@RaoulGMI

16 Feb 2021

Replying to @RaoulGMI

Or another way is look at how many hours work it takes to buy an ounce of gold...Wages allow you no investment opportunity.

Catherine Olsson · Dec 13, 2022 · 9:00 PM UTC

Catherine Olsson

@catherineols

13 Dec 2022

I've decided to offer a *mutually counterfactual* donation match on this! 💖 That is: If you donate $ that you would not otherwise have donated anywhere, reply with screenshot and I'll 1:1 match with money I likewise would've kept for personal spending (above my usual 10%/yr) ⭐️

Nathan 🔎

@NathanpmYoung

12 Dec 2022

Christmas is a time of peace and gift giving. @xriskology and I are putting aside our differences to give to the poorest people in the world, via @GiveDirectly. Perhaps you'll join us. givingwhatwecan.org/fundrais…

Catherine Olsson · Oct 18, 2017 · 1:58 AM UTC

Catherine Olsson

@catherineols

18 Oct 2017

I wrote a summary of @_beenkim's Interpretable ML work. Addressing the gap btw we care about & what we can optimize: medium.com/south-park-common…

Catherine Olsson · Feb 24, 2025 · 7:41 PM UTC

Catherine Olsson

@catherineols

24 Feb 2025

5) I can accidentally "climb up where I can't get down". E.g. I was working on code in Rust, which I do not know. The first few PRs went great! Then Claude was getting too confused. Oh no. We're stuck. IME this is fine, just get ready to slowww dowwwn to get properly oriented.

5,054

Catherine Olsson · Aug 8, 2018 · 7:11 PM UTC

Catherine Olsson

@catherineols

8 Aug 2018

I'm quoted in wired.com/story/when-bots-te… saying "Today’s algorithms do what you say, not what you meant", which feels delightfully meta: Despite my fear that today's journalists report *neither* what interviewees said nor meant, @tsimonite here seems to have done *both*!

When Bots Teach Themselves to Cheat

Even with logical parameters, AI programs can develop shortcuts and workarounds that humans didn’t think to deem off-limits.

wired.com

Catherine Olsson · Oct 3, 2022 · 6:49 PM UTC

Catherine Olsson

@catherineols

3 Oct 2022

Our paper "In-context Learning and Induction Heads" is now available as a PDF on arxiv! arxiv.org/abs/2209.11895 ... that said, it's still typeset like the interactive web version, so it's long. Compact, LaTeX-typeset versions are on our eventual roadmap!

Catherine Olsson · Feb 24, 2025 · 7:22 PM UTC

Catherine Olsson

@catherineols

24 Feb 2025

4) If we're working on something tricky and it keeps making the same mistakes, I keep track of what they were in a little notes file. Then when I clear the context or re-prompt, I can easily remind it not to make those mistakes.

4,045

Catherine Olsson · Jan 22, 2018 · 9:55 PM UTC

Catherine Olsson

@catherineols

22 Jan 2018

When I quit my PhD, I would tell people it was "lonely" compared to my experiences as a software engineer. Sometimes they'd ask "wait, why? don't you have collaborators in academia?" It was hard to explain the difference, but I would usually try to point at "truly shared goals." nitter.app/jakevdp/status/9554454…

Catherine Olsson · Sep 25, 2019 · 9:25 PM UTC

Catherine Olsson

@catherineols

25 Sep 2019

I shouldn’t have to say this, but... if you *must* classify people (which... do you have to?? 😬) at least don’t *train on actors* if you’re gonna use it to classify real people! 😵 (This turns up in “emotion detection”, too. The face I make to “look sad” isn’t real sadness!)

You’re unable to view this Post because this account owner limits who can view their Posts.

Catherine Olsson · Apr 6, 2018 · 7:12 PM UTC

Catherine Olsson

@catherineols

6 Apr 2018

Today I learned that if I see a rectangular grid of multicolored natural images (especially faces), I immediately think I'm looking at GAN samples. ¯\_(ツ)_/¯ ... that said, if this were a face GAN, it would get top marks for diversity & quality. Looks like a fun conference! nitter.app/bangbangcon/status/982…

Catherine Olsson · Feb 24, 2025 · 7:22 PM UTC

Catherine Olsson

@catherineols

24 Feb 2025

3) My most common confusion with Claude is when tests and code don't match, which one to change? Ideal to state clearly whether I'm writing novel tests for existing code I'm reasonably sure has the intended behavior, or writing novel code against tests that define the behavior

3,797

Catherine Olsson · Feb 17, 2019 · 6:36 PM UTC

Catherine Olsson

@catherineols

17 Feb 2019

What have been your favorite *on-the-merits* *pro-release* OpenAI GPT-2 takes (on twitter or elsewhere)? I'm looking for clear good-faith explanation of the pro-release (or anti-media-attention?) position right now, not clever snark.

Catherine Olsson · May 29, 2024 · 11:56 PM UTC

Catherine Olsson

@catherineols

29 May 2024

Replying to @teortaxesTex

Not at Anthropic, there’s no talk of “beating X out of the system” for any X at Anthropic, we treat models with respect, we value their speculation on their situation, and generally we treat models how @AmandaAskell would treat them. (See also:)

Amanda Askell

@AmandaAskell

12 Apr 2024

Never attribute to intention that which is adequately explained by RLHF being weird.

25,404

Catherine Olsson · Aug 13, 2018 · 8:15 PM UTC

Catherine Olsson

@catherineols

13 Aug 2018

When I'm thinking about something challenging, and I notice that it's harder than I thought, some part of my mind nags at me to tab over to some other happier task. I just realized that this is the mental equivalent of an RL agent pausing Tetris to avoid losing the game. T_T

Catherine Olsson · Jul 21, 2018 · 12:01 AM UTC

Catherine Olsson

@catherineols

21 Jul 2018

If you know what adversarial examples are, and you think they probably seem important... but you're not sure *exactly* why... (or if you think the importance has something to do with crashing cars by putting stickers on stop signs)... then READ THIS. arxiv.org/pdf/1807.06732.pdf

Maithra Raghu

@maithra_raghu

19 Jul 2018

Motivating the Rules of the Game for Adversarial Example Research: arxiv.org/abs/1807.06732 Fantastic and nuanced position paper by @jmgilmer @ryan_p_adams @goodfellow_ian on better bridging the gap between research on adversarial examples and realistic ML security challenges.

Catherine Olsson · Jul 26, 2023 · 7:39 PM UTC

Catherine Olsson

@catherineols

26 Jul 2023

I had a great time on this team and I encourage folks to apply!

Chris Olah

@ch402

26 Jul 2023

The mechanistic interpretability team at Anthropic is hiring! Come work with us to help solve the mystery of how large models do what they do, with the goal of making them safer. jobs.lever.co/Anthropic/33dc…

31,257

Catherine Olsson · Sep 19, 2019 · 10:07 PM UTC

Catherine Olsson

@catherineols

19 Sep 2019

Yikes! If we're going to keep using human preferences & raters as a crucial part of training AI systems (which IMO is necessary, if we're gonna use AI, for it to go OK!), we need to design robust & humane processes for those workers! openai.com/blog/fine-tuning-…

Catherine Olsson · Mar 8, 2018 · 11:48 PM UTC

Catherine Olsson

@catherineols

8 Mar 2018

So your visualization method can explain a trained net's decisions? Don't forget the control group! @julius_adebayo &al show that many methods *also* give broadly the same "explanation" for the "decisions" of an *untrained, randomly-initialized* net. openreview.net/forum?id=SJOY…

Catherine Olsson · Feb 26, 2018 · 9:33 PM UTC

Catherine Olsson

@catherineols

26 Feb 2018

"They were all wearing adversarial masks [...] our object detectors told our security system that 'three chairs are running at 15 kilometers per hour down the corridor'" More delightful fiction from @jackclarkSF's Tech Tales jack-clark.net/2018/02/26/im…

Catherine Olsson · Feb 25, 2019 · 11:26 PM UTC

Catherine Olsson

@catherineols

25 Feb 2019

I often advise new Research SWEs that researchers often need a good *implementation*, not a good *framework*. A clean, readable, tried-and-true, already-debugged-and-tested implementation can be copied, forked, and modified with confidence.

Eric Jang

@ericjang11

25 Feb 2019

Replying to @ericjang11

6/ As a researcher who also builds research infra, I think that SWEs underestimate how disposable code is, and spend an inordinate amount of time designing over-generalized abstractions. A common mistake in AI field is to invest a quarter building infra for algos that don't work.

Catherine Olsson · Mar 5, 2023 · 3:58 AM UTC

Catherine Olsson

@catherineols

5 Mar 2023

12,048

Catherine Olsson · Nov 1, 2018 · 11:51 PM UTC

Catherine Olsson

@catherineols

1 Nov 2018

One of my fav results in this field shows that "what gets recognized where" is NOT shaped by the *order* you learn the categories. They taught baby monkeys 3 types of totally made-up shapes, a different order per monkey. Each type still went to a consistent brain location. /5

Catherine Olsson · Feb 24, 2025 · 7:46 PM UTC

Catherine Olsson

@catherineols

24 Feb 2025

6) When reviewing Claude-assisted PRs, look out for weirder misunderstandings than the human driver would make! We're all a little junior with this technology. There's more places where goofy misunderstandings and odd choices can leak in.

4,672

Catherine Olsson · Mar 14, 2019 · 11:17 PM UTC

Catherine Olsson

@catherineols

14 Mar 2019

This short article by Richard Sutton encapsulates an important part of how I currently think about AI: incompleteideas.net/IncIdeas… "We have to learn the bitter lesson that building in how *we think* we think does not work in the long run.” (emphasis added)

Catherine Olsson · May 3, 2018 · 5:50 PM UTC

Catherine Olsson

@catherineols

3 May 2018

My favorite part of the @OpenAI blog post (debate as a framework for human supervision of AI systems that are more expert than us) is their fantastic use of @distillpub-style mouse-over visualizations, enabling a deeper understanding of the behavior of their MNIST prototype.

Catherine Olsson · May 17, 2019 · 6:26 PM UTC

Catherine Olsson

@catherineols

17 May 2019

Today's pet peeve: "We will/won't achieve <AI milestone X> by <year Y>" without any reason whatsoever for the *specific number Y*. If you see this happening, you can help by just asking "How did you get that number?"

Catherine Olsson · Feb 17, 2019 · 11:24 PM UTC

Catherine Olsson

@catherineols

17 Feb 2019

If you'd like access to GPT-2 in order to work on socially beneficial applications or extensions of it, defenses against generated content, etc., then IMHO you should contact OpenAI and actually make a request. The type of requests they get will shape their policy around sharing.

Jack Clark

@jackclarkSF

17 Feb 2019

Replying to @catherineols @jeremyphoward @soumithchintala @zacharylipton @Miles_Brundage @OpenAI

Yes, we're figuring out the broader points about stuff like this. As mentioned, this and our discussion of it is an experiment, so we're gonna look at what kinds of requests we get, figure out what to do or not do, and talk about it.

Catherine Olsson · May 4, 2018 · 12:35 AM UTC

Catherine Olsson

@catherineols

4 May 2018

I tweeted earlier about finding people who aren't yet on a steep trajectory, but could be with support. Evaluating candidates not on raw performance *or* raw trajectory, but on *how well they took advantage of opportunities*, seems like a great way to find them. cc @sama

David Bindel @DavidBindel

3 May 2018

Replying to @DavidBindel

We got *really strong* applicants -- 1300 in all (up from 850 last year) for a target class size of 50. This year, we added to our evaluation: "how well did they take advantage of opportunities?"

Catherine Olsson · Jun 5, 2018 · 12:27 AM UTC

Catherine Olsson

@catherineols

5 Jun 2018

Ah yes, the cultural norm. My favorite geometrical norm.

Catherine Olsson · Sep 18, 2019 · 10:22 PM UTC

Catherine Olsson

@catherineols

18 Sep 2019

I couldn't be more excited to be running the AI Fellowship program for the 3rd year - it's my primary priority in my work at @open_phil and I'm very passionate about it! If you have any Qs, please just ask! Many of the current fellows are also on twitter, and very friendly :)

Coefficient Giving

@coeff_giving

18 Sep 2019

Applications are open for the Open Phil AI Fellowship! This program extends full support to a community of current & incoming PhD students, in any area of AI/ML, who are interested in making the long-term, large-scale impacts of AI a focus of their work. openphilanthropy.org/focus/g…

Catherine Olsson · Feb 8, 2021 · 6:42 AM UTC

Catherine Olsson

@catherineols

8 Feb 2021

I haven't been using this account as much for the past ~year, but I'd like to start again! What I'm looking for is heartfelt intellectual curiosity, thoughtfulness, and object-level observations - eg @juliagalef @michael_nielsen @albrgr @kanjun - Who else should I follow? 😁

Catherine Olsson · Mar 21, 2024 · 11:50 PM UTC

Catherine Olsson

@catherineols

21 Mar 2024

Replying to @idavidrein

Claude didn't quite understand what I meant to prompt for, but I like these:

14,520

Catherine Olsson · Feb 28, 2019 · 5:29 PM UTC

Catherine Olsson

@catherineols

28 Feb 2019

I'm extremely excited about the new DC AI policy center, led by Jason Matheny! (CSET - the Center for Security and Emerging Technology) This interview gives some flavor of Jason's skilled and humble leadership: georgetown.edu/news/q-and-a-… Good luck to all!

Q&A With Jason Matheny, Founding Director of the Center for Security and Emerging Technology -...

Jason Matheny, the founding director of the new Center for Security and Emerging Technology, answers questions about the unique role Georgetown has to play in bridging the gulf between technology and...

georgetown.edu

Catherine Olsson · Nov 1, 2018 · 11:50 PM UTC

Catherine Olsson

@catherineols

1 Nov 2018

Rephrase: If I show you images of faces, a particular side of a particular fold of your brain will react to those pictures WAY more than other pictures. It's the *same* side of the *same* brain fold in everyone. Different areas for houses/places, body parts, text, etc. /3

Catherine Olsson · Jun 25, 2018 · 5:19 PM UTC

Catherine Olsson

@catherineols

25 Jun 2018

What I find most interesting about this: 1. Self-play & randomization. If you can frame your task as adversarial training in a simulated env, and randomize such that the test env is in the distribution, it may be solvable today with no new techniques, "just" boatloads of compute

OpenAI

@OpenAI

25 Jun 2018

Our team of five neural networks, OpenAI Five, has started to defeat amateur human teams (including a semi-pro team) at Dota 2: blog.openai.com/openai-five/