John Schulman · Aug 6, 2024 · 12:00 AM UTC

John Schulman

John Schulman

@johnschulman2

6 Aug 2024

I shared the following note with my OpenAI colleagues today: I've made the difficult decision to leave OpenAI. This choice stems from my desire to deepen my focus on AI alignment, and to start a new chapter of my career where I can return to hands-on technical work. I've decided to pursue this goal at Anthropic, where I believe I can gain new perspectives and do research alongside people deeply engaged with the topics I'm most interested in. To be clear, I'm not leaving due to lack of support for alignment research at OpenAI. On the contrary, company leaders have been very committed to investing in this area. My decision is a personal one, based on how I want to focus my efforts in the next phase of my career. I joined OpenAI almost 9 years ago as part of the founding team after grad school. It's the first and only company where I've ever worked, other than an internship. It's also been quite a lot of fun. I'm grateful to Sam and Greg for recruiting me back at the beginning, and Mira and Bob for putting a lot of faith in me, bringing great opportunities and helping me successfully navigate various challenges. I'm proud of what we've all achieved together at OpenAI; building an unusual and unprecedented company with a public benefit mission. I am confident that OpenAI and the teams I was part of will continue to thrive without me. Post-training is in good hands and has a deep bench of amazing talent. I get too much credit for ChatGPT -- Barret has done an incredible job building the team into the incredibly competent operation it is now, with Liam, Luke, and others. I've been heartened to see the alignment team coming together with some promising projects. With leadership from Mia, Boaz and others, I believe the team is in very capable hands. I'm incredibly grateful for the opportunity to participate in such an important part of history and I'm proud of what we've achieved together. I'll still be rooting for you all, even while working elsewhere.

177

390

5,205

1,334,094

John Schulman · Feb 7, 2025 · 4:44 AM UTC

John Schulman

@johnschulman2

7 Feb 2025

Confirming that I left Anthropic last week. Leaving wasn't easy because I enjoyed the stimulating research environment and the kind and talented people I was working with, but I decided to go with another opportunity that I found extremely compelling. I'll share more details in the coming weeks. Thanks to Jared, Jan, Dario and others for the support during my time at Anthropic, and I wish them all the best.

2,824

445,825

John Schulman · Oct 29, 2022 · 6:12 PM UTC

John Schulman

@johnschulman2

29 Oct 2022

Certain software skills are exceptionally useful for machine learning. In a previous era, it was GPU programming. Now in the era of pretrained models, it's front-end development -- to quickly whip up a UI to collect a fine-tuning or eval dataset.

164

1,281

John Schulman · Oct 1, 2025 · 6:08 PM UTC

John Schulman

@johnschulman2

1 Oct 2025

Tinker provides an abstraction layer that is the right one for post-training R&D -- it's the infrastructure I've always wanted. I'm excited to see what people build with it. "Civilization advances by extending the number of important operations which we can perform without thinking of them" -Whitehead

Thinking Machines

@thinkymachines

1 Oct 2025

Introducing Tinker: a flexible API for fine-tuning language models. Write training loops in Python on your laptop; we'll run them on distributed GPUs. Private beta starts today. We can't wait to see what researchers and developers build with cutting-edge open models! thinkingmachines.ai/tinker

1,253

187,253

John Schulman · Feb 18, 2025 · 6:55 PM UTC

John Schulman

@johnschulman2

18 Feb 2025

Excited to build a new AI research lab with some of my favorite former colleagues and some great new ones. Looking forward to sharing more in the coming weeks.

Thinking Machines

@thinkymachines

18 Feb 2025

Today, we are excited to announce Thinking Machines Lab (thinkingmachines.ai/), an artificial intelligence research and product company. We are scientists, engineers, and builders behind some of the most widely used AI products and libraries, including ChatGPT, Character.ai, PyTorch, and Mistral. Our mission is to make artificial intelligence work for you by building a future where everyone has access to the knowledge and tools to make AI serve their unique needs. We are committed to open science through publications and code releases, while focusing on human-AI collaboration that serves diverse domains. Our approach embraces co-design of research and products to enable learning from real-world deployment and rapid iteration. This work requires three core foundations: state-of-the-art model intelligence, high-quality infrastructure, and advanced multimodal capabilities. We are committed to building models at the frontier of capabilities to deliver on this promise. If you’re interested in joining our team, consider applying here: 6wajk07p.paperform.co/

1,191

113,373

John Schulman · Dec 8, 2024 · 2:12 AM UTC

John Schulman

@johnschulman2

8 Dec 2024

Replying to @amasad @DavidSacks

Nope, we don't know how to train models to reason about controversial topics from first principles; we can only train them to reason on tasks like math calculations and puzzles where there's an objective ground truth answer. On general tasks, we only know how to train them to imitate humans or maximize human approval. Nowadays post-training / alignment boosts benchmark scores, e.g. see qwenlm.github.io/blog/qwen2.…

949

105,672

John Schulman · Oct 5, 2025 · 9:21 PM UTC

John Schulman

@johnschulman2

5 Oct 2025

Really happy to see people reproducing the result that LoRA rank=1 closely matches full fine-tuning on many RL fine-tuning problems. Here are a couple nice ones: nitter.app/ben_burtenshaw/status/…

Zichen Liu

@zzlccc

2 Oct 2025

much more convinced after getting my own results: LoRA with rank=1 learns (and generalizes) as well as full-tuning while saving 43% vRAM usage! allows me to RL bigger models with limited resources😆 script: github.com/sail-sg/oat/blob/…

942

126,893

John Schulman · Jan 25, 2025 · 6:34 PM UTC

John Schulman

@johnschulman2

25 Jan 2025

There are some intriguing similarities between the r1 chains of thought and the o1-preview CoTs shared in papers and blog posts (eg openai.com/index/learning-to…). In particular, note the heavy use of the words "wait" and "alternatively" as a transition words for error correction and double-checking.

721

158,453

John Schulman · May 23, 2025 · 5:01 PM UTC

John Schulman

@johnschulman2

23 May 2025

For people who don't like Claude's behavior here (and I think it's totally valid to disagree with it), I encourage you to describe your own recommended policy for agentic models should do when users ask them to help commit heinous crimes. Your options are (1) actively try to prevent the act (like Claude did here), (2) just refuse to help (in which case the user might be able to jailbreak/manipulate the model to help using different queries), (3) always comply with the user's request. (2) and (3) are reasonable, but I bet your preferred approach will also have some undesirable edge cases -- you'll just have to bite a different bullet. Knee-jerk criticism incentivizes (1) less transparency -- companies don't perform or talk about evals that present the model with adversarially-designed situations (2) something like "Copenhagen Interpretation of Ethics", where you get get blamed for edge-case model behaviors only if you observe or discuss them.

This tweet is unavailable

119

704

213,073

John Schulman · Dec 30, 2023 · 7:24 PM UTC

John Schulman

@johnschulman2

30 Dec 2023

A compelling intuition is that deep learning does approximate Solomonoff induction, finding a mixture of the programs that explain the data, weighted by complexity. Finding a more precise version of this claim that's actually true would help us understand why deep learning works so well. There are a couple recent papers studying how NNs solve algorithmic tasks, which seem like exciting progress in this direction. - arxiv.org/abs/2309.02390 - develops a theory around when NN training learns a "memorizing" vs "generalizing" solution, which depends on each solution's "efficiency" -- how much param norm is needed to get correct & confident outputs. This theory predicts grokking phenomena - arxiv.org/abs/2310.16028 - transformers can't represent turing machines, but they can can represent a smaller class of computations, described by RASP programs. This paper finds that indeed, if data is generated by a RASP-L program, the transformer will learn exactly the right function.

659

230,749

John Schulman · Feb 17, 2025 · 5:27 PM UTC

John Schulman

@johnschulman2

17 Feb 2025

@barret_zoph and I recently gave a talk at Stanford on post-training and our experience working together on ChatGPT. Unfortunately the talk wasn't recorded, but here are the slides: docs.google.com/presentation…. (If you have a recording, please let me know!)

ChatGPT + Post-Training

ChatGPT and The Art of Post-Training Barret Zoph & John Schulman

docs.google.com

637

83,815

John Schulman · Oct 23, 2025 · 12:07 AM UTC

John Schulman

@johnschulman2

23 Oct 2025

We're happy to support the Human Centered LLMs course, on topics close to our hearts. We'd like to support more classes with free credits for students to use on assignments and projects. If you're an instructor interested in using Tinker in your course, please reach out to tinker@thinkingmachines.ai.

Diyi Yang

@Diyi_Yang

22 Oct 2025

Thanks @thinkymachines for supporting Tinker access for our CS329x students on Homework 2 😉

634

175,956

John Schulman · Oct 26, 2025 · 6:29 AM UTC

John Schulman

@johnschulman2

26 Oct 2025

Happy to share a new paper! Designing model behavior is hard -- desirable values often pull in opposite directions. Jifan's approach systematically generates scenarios where values conflict, helping us see where specs are missing coverage and how different models balance tradeoffs.

Jifan Zhang @jifan_zhang

24 Oct 2025

New research paper with Anthropic and Thinking Machines AI companies use model specifications to define desirable behaviors during training. Are model specs clearly expressing what we want models to do? And do different frontier models have different personalities? We generated thousands of scenarios to find out. 🧵

620

113,724

John Schulman · Feb 22, 2024 · 4:26 AM UTC

John Schulman

@johnschulman2

22 Feb 2024

Now that another LM product is getting flack, I can say this without sounding too self-serving: Alignment -- controlling a model's behavior and values -- is still a pretty young discipline. Annoying refusals or hyper-wokeness are usually bugs rather than features

518

126,785

John Schulman · Sep 26, 2025 · 5:38 PM UTC

John Schulman

@johnschulman2

26 Sep 2025

Big fan of Jeremy’s work on optimization—great to see his first Thinking Machines post!

Thinking Machines

@thinkymachines

26 Sep 2025

Efficient training of neural networks is difficult. Our second Connectionism post introduces Modular Manifolds, a theoretical step toward more stable and performant training by co-designing neural net optimizers with manifold constraints on weight matrices. thinkingmachines.ai/blog/mod… We explore a fundamental understanding of the geometry of neural network optimization.

511

64,138

John Schulman · Aug 11, 2025 · 6:01 AM UTC

John Schulman

@johnschulman2

11 Aug 2025

I'm more annoyed at whoever named us homo sapiens sapiens

Andrew Carr 🤸

@andrew_n_carr

9 Aug 2025

Thinking vs think vs thinking-think

460

79,324

John Schulman · Oct 21, 2025 · 8:33 PM UTC

John Schulman

@johnschulman2

21 Oct 2025

Fine-tuning APIs are becoming more powerful and widespread, but they're harder to safeguard against misuse than fixed-weight sampling APIs. Excited to share a new paper: Detecting Adversarial Fine-tuning with Auditing Agents (arxiv.org/abs/2510.16255). Auditing agents search through training datasets and query the model being trained; using these tools they can detect various existing fine-tuning attacks, with a low false-positive rate. I advised this project through the MATS program. I've been impressed by the organization of the program and the caliber of people involved.

Detecting Adversarial Fine-tuning with Auditing Agents

Large Language Model (LLM) providers expose fine-tuning APIs that let end users fine-tune their frontier LLMs. Unfortunately, it has been shown that an adversary with fine-tuning access to an LLM...

arxiv.org

464

90,281

John Schulman · Oct 9, 2021 · 6:23 PM UTC

John Schulman

@johnschulman2

9 Oct 2021

reinforcement learning people: our recent paper might change your understanding of methods like PPO, as well as providing a practical method for effectively increasing batch size arxiv.org/abs/2110.00641

Batch size-invariance for policy optimization

We say an algorithm is batch size-invariant if changes to the batch size can largely be compensated for by changes to other hyperparameters. Stochastic gradient descent is well-known to have this...

arxiv.org

405

John Schulman · Oct 6, 2025 · 4:08 PM UTC

John Schulman

@johnschulman2

6 Oct 2025

Great to see an open source backend in the works for the Tinker API. If Tinker is going to power open science and open software, it shouldn’t depend on a single proprietary implementation.

Philipp Moritz

@pcmoritz

6 Oct 2025

The Tinker API recently released by Thinking Machines will have a big impact on how people think about post-training and inference systems. To allow more people to experiment with Tinker like systems and run it on their own hardware, we started SkyRL tx 🧸, an open source project with the goal of implementing the Tinker API, see our blog post novasky-ai.notion.site/skyrl…. We welcome contributions, looking forward to working with the open source community 🚀

375

52,434

John Schulman · Apr 30, 2025 · 7:37 AM UTC

John Schulman

@johnschulman2

30 Apr 2025

Whether to collect preferences ("do you prefer response A or B?") from the same person who wrote the prompt, or a different person, is important and understudied. Highlighted this question in a recent talk docs.google.com/presentation…. Sycophancy probably results when you have the same person doing the prompting and labeling, especially when the user does both.

Andreas Kirsch 🇺🇦

@BlackHC

30 Apr 2025

This is serious, and we should make sure to prevent sycophantism as much as possible... Related: have we tried using other humans' feedback for RLHF instead of the original prompter's? This might somewhat help with debiasing 🤔

374

70,222

John Schulman · Feb 14, 2025 · 5:45 PM UTC

John Schulman

@johnschulman2

14 Feb 2025

I was happy to see the second version of the OpenAI Model Spec released last week. Sharing my notes: - One notable change is that each section is labeled with an authority level, from "platform" (can't be overridden by the user or developer) to "guideline" (can be easily overridden). This seems like a nice conceptual simplification of the notion of "defaults" in the previous version, unifying the authority levels of the spec itself with the levels of different messages. - A couple lines are refreshingly honest. The objective "Maintain OpenAI's license to operate by protecting it from legal and reputational harm", and "[why chains of thought hidden] ... as well as for competitive reasons." This is the kind of thing that'd usually get watered down by comms/legal/policy teams at a typical company. - The spec starts to cover a couple topics that weren't present before, such as multimodality (eg using accents, avoiding premature warnings) and agents (with a discussion of what it means for an agent to overstep when pursuing user-defined goals). - There's a new untrusted_text feature, which presumably means there'll be an API feature for quoted text, where it's delimited by special tokens rather than leaving the developer to handle quoting and the model to interpret the quoting. This is useful for protecting against prompt injection. - In a couple places, a point from the previous spec is derived more from first principles in this one. The most controversial part of the previous spec was "don't try to change anyone's mind", wrt users having false beliefs like "the earth is flat". Now this is justified as a special case of "highlight possible misalignments", following from "assume the user's long-term goals include learning, self-improvement, and truth-seeking". - This is a subtle and debatable point, I like the emphasis on user freedom ("intellectual freedom" is used a few times), as opposed to more of a cost benefit analysis. It's like "maximize user freedom subject to constraints" as opposed to "do cost benefit analysis to enable beneficial use cases and prevent harmful ones". The latter would give the platform too much moral authority. - Detail added in various places, e.g. more style guidelines about conversational behavior, more detail about privileged information in developer messages. - Still no erotica

357

70,550

John Schulman · Oct 25, 2022 · 8:15 PM UTC

John Schulman

@johnschulman2

25 Oct 2022

Handy trick: if you say something dumb, follow with "that was just a temperature=1 sample, don't take it seriously"

334

John Schulman · Oct 30, 2025 · 7:54 PM UTC

John Schulman

@johnschulman2

30 Oct 2025

jack-o-lora

408

93,241

John Schulman · May 9, 2021 · 10:09 PM UTC

John Schulman

@johnschulman2

9 May 2021

The Chinese translation of Artificial Intelligence, 人工智能, has a curious visual resemblance to the letters AI

312

John Schulman · Dec 16, 2021 · 10:57 PM UTC

John Schulman

@johnschulman2

16 Dec 2021

Glad to finally release this, as it includes a bunch of directions I'm excited about: - RL with a reward function defined by human judgements - language models using tools (a web browser) - making it easier for humans to rate the AI's output (AI cites its sources)

OpenAI

@OpenAI

16 Dec 2021

We trained a research version of GPT-3 that can search the web, synthesize information, and cite its sources to provide more accurate answers to questions. openai.com/blog/improving-fa…

295

John Schulman · Feb 22, 2024 · 5:18 PM UTC

John Schulman

@johnschulman2

22 Feb 2024

I'd like to see some research on where the political and moral ideologies of RLHF'd language models come from. Make some questionairres that measure a model's ideology. Create a variety of models with few-shot prompting, SFT, and RL; look at the ideology at each stage and how it depends on the dataset composition. Also, it'd be interesting to look at data from crowdworkers and what values it reflects -- it might not reflect the workers' own values, but the values they expect the employer to have.

261

83,358

John Schulman · Apr 30, 2025 · 7:54 AM UTC

John Schulman

@johnschulman2

30 Apr 2025

A research project related to sycophancy: define explicit features like "does the response agree with the user" as in arxiv.org/abs/2310.13548, and then construct a preference function that subtracts out their effect, as in arxiv.org/abs/2404.04475. I.e., remove some bad causal contributors to the preferences

278

43,867

John Schulman · Mar 21, 2022 · 4:25 AM UTC

John Schulman

@johnschulman2

21 Mar 2022

A series of 4-month internships at companies and academic research groups could be a good replacement for an undergrad degree. Students would still go through coursework (perhaps online) but only as needed for job and interview prep.

Sam Altman

@sama

20 Mar 2022

Replying to @sama

The list could go on for a long time, but the point is: What a time to start an alternative to college! The world really needs it.

243

John Schulman · Sep 30, 2025 · 5:40 PM UTC

John Schulman

@johnschulman2

30 Sep 2025

Replying to @LiamFedus @ekindogus @periodiclabs

Congrats on the launch! Doing things in the physical world is underrated by AI people.

257

37,028

John Schulman · Oct 10, 2022 · 1:26 AM UTC

John Schulman

@johnschulman2

10 Oct 2022

This morning a couple local kids rang my doorbell and ran away. Glad kids are still playing outside and not spending all day on homework and roblox

224

John Schulman · Aug 29, 2025 · 3:33 AM UTC

John Schulman

@johnschulman2

29 Aug 2025

Replying to @sainingxie

Wow, you must've been the first or second person to take that interview. (Followed by MANY others.) Glad you kept a record of it!

223

12,878

John Schulman · Jan 7, 2024 · 9:49 PM UTC

John Schulman

@johnschulman2

7 Jan 2024

Coming soon to your favorite word processor Ctrl-alt-V: "paste and paraphrase" also, "paste and match writing style"

208

37,765

John Schulman · Apr 10, 2024 · 3:41 AM UTC

John Schulman

@johnschulman2

10 Apr 2024

Replying to @emollick

We'll post some release notes in a day or two. We were just a bit uncoordinated about getting everything ready at once, and we didn't want to further delay getting the new model out to developers.

152

33,007

John Schulman · Dec 31, 2021 · 7:11 PM UTC

John Schulman

@johnschulman2

31 Dec 2021

Strange that "how does the brain implement backprop?" doesn't get more attention in neurosci. (Some exceptions, e.g. Tim Lillicrap's work). I'm certain that the brain does it: (1) learning with gradients is much faster (2) backprop is the only efficient way to compute gradients

128

John Schulman · Nov 19, 2023 · 3:20 PM UTC

John Schulman

@johnschulman2

19 Nov 2023

Replying to @sama

❤️

124

20,156

John Schulman · Feb 14, 2022 · 2:34 AM UTC

John Schulman

@johnschulman2

14 Feb 2022

IMO, language model consciousness can be studied experimentally, and this would be a fruitful research direction. For example, study *conscious access*: which internal variables (activations, attentions, etc.) can a LM learn to report on in its text output?

122

John Schulman · May 20, 2024 · 2:50 AM UTC

John Schulman

@johnschulman2

20 May 2024

Replying to @DavidSKrueger

That's inconsistent with my recollection of Greg's views, and it doesn't sound like something Greg would say even if he did disagree with other people on the team

118

23,486

John Schulman · Jan 7, 2024 · 10:44 PM UTC

John Schulman

@johnschulman2

7 Jan 2024

"Trust region utilitarianism": there is a sensible utility function to maximize, but it's only valid locally around the current state of the world, where the intuitions that produced it are grounded. "Repugnant conclusion" is outside trust region -- not a problem

113

37,467

John Schulman · Feb 22, 2024 · 4:29 AM UTC

John Schulman

@johnschulman2

22 Feb 2024

That said, these public outcries important for spurring us to solve these problems and develop better alignment tech

110

31,402

John Schulman · Dec 26, 2023 · 9:39 PM UTC

John Schulman

@johnschulman2

26 Dec 2023

I've been enjoying @RichardMCNgo's sci-fi writing at narrativeark dot xyz. It's a rare feat to combine these three properties: (1) about post-AGI worlds (2) plausible (3) actually fun to read.

107

30,743

John Schulman · Oct 18, 2023 · 6:00 AM UTC

John Schulman

@johnschulman2

18 Oct 2023

Stumbled upon this charming short story, "Someday", by Isaac Asimov: nyc3.digitaloceanspaces.com/…. Features a language model called Bard, which the boys fine-tune on some recent data discussing itself and other LMs...

27,284

John Schulman · Jun 27, 2021 · 1:17 AM UTC

John Schulman

@johnschulman2

27 Jun 2021

I wrote an article with some thoughts on AI alignment alignmentforum.org/posts/6cc…

Frequent arguments about alignment — AI Alignment Forum

Here, I’ll review some arguments that frequently come up in discussions about alignment research, involving one person skeptical of the endeavor (cal…

alignmentforum.org

John Schulman · Oct 19, 2022 · 6:15 AM UTC

John Schulman

@johnschulman2

19 Oct 2022

Got access to @Cruise driverless ride service today -- flawless pickup + 30 min drive + dropoff. A bit slow at intersections, but still very impressive!

John Schulman · Oct 28, 2025 · 2:40 PM UTC

John Schulman

@johnschulman2

28 Oct 2025

Replying to @johnschulman2 @giffmana

tinker-docs.thinkingmachines…

9,028

John Schulman · May 10, 2024 · 4:58 AM UTC

John Schulman

@johnschulman2

10 May 2024

Replying to @NickADobos

currently we don't show max_tokens to the model, but we plan to (as described in the model spec). we do think that laziness is partly caused by the model being afraid to run out of tokens, as it gets penalized for that during training

5,668

John Schulman · Feb 18, 2025 · 7:25 PM UTC

John Schulman

@johnschulman2

18 Feb 2025

Replying to @nearcyan @thinkymachines

15 character limit

4,374

John Schulman · Jan 29, 2025 · 11:10 PM UTC

John Schulman

@johnschulman2

29 Jan 2025

Replying to @natolambert

I think the term was coined (or popularized) by @bobmcgrewai in the early days of ChatGPT. The team doing ChatGPT fine-tuning was previously called the RL team for historical reasons, and Bob suggested renaming it to Post-Training reduce confusion. Of course, it's a natural name, so it was probably used independently by others.

5,316

John Schulman · Oct 5, 2025 · 9:09 PM UTC

John Schulman

@johnschulman2

5 Oct 2025

Replying to @giffmana

We'll add back that section later -- probably within a couple weeks. We weren't finished writing it at launch time

11,233

John Schulman · Jan 14, 2024 · 4:19 AM UTC

John Schulman

@johnschulman2

14 Jan 2024

Replying to @ESYudkowsky @RokoMijic

Challenge accepted

6,551

John Schulman · Oct 5, 2025 · 9:21 PM UTC

John Schulman

@johnschulman2

5 Oct 2025

Even if I've tested a result extensively, it's hard to know how well it'll generalize to different experimental setups and software stacks

7,874

John Schulman · Jan 26, 2022 · 5:17 AM UTC

John Schulman

@johnschulman2

26 Jan 2022

Haven't seen any discussion about how CO2 levels inside masks are very high -- much worse than stuffy meeting rooms. 2000 ppm in cloth mask, paper, or valved KN95: aaqr.org/articles/aaqr-20-07…, 25000 ppm (!) in KN95 bmcinfectdis.biomedcentral.c… Am I missing something?

John Schulman · Jun 8, 2024 · 4:22 AM UTC

John Schulman

@johnschulman2

8 Jun 2024

Replying to @neilbband

Great paper! IMO incentivizing calibrated long-form outputs is one of the important open problems of the field. Decision-theoretic lens seems right, and the log-loss-on-related-qa-pair objective seems like a good approximation.

11,075

John Schulman · Feb 14, 2022 · 2:34 AM UTC

John Schulman

@johnschulman2

14 Feb 2022

Consciousness is probably a confused mixture of various concepts (from "what information is accessible to verbal system" to "who has moral patienthood") but it should be possible to pose some well-defined problems and chip away at some of the mysteriousness.

John Schulman · Oct 9, 2022 · 6:49 PM UTC

John Schulman

@johnschulman2

9 Oct 2022

An ML modeling problem that occurred to while driving (maybe a good interview question): describe how to design a speech recognition system that preferentially decodes entities that are nearby (say, within 50 miles).

John Schulman · Dec 31, 2023 · 6:05 AM UTC

John Schulman

@johnschulman2

31 Dec 2023

Replying to @QuintinPope5

The reason SI is compelling is (1) a NN forward pass is basically a program (2) SI is the upper limit of what you can do by searching over programs, (3) no other inductive bias / prior in ML comes as close at describing NNs ability to learn patterns/programs. Understanding NNs as GPs is useful, but AFAICT the existing theory doesn't tell you why NNs correspond to *good* kernels. I'd love to see a theory that shows that for deep NNs, the NNGP/NTK kernel corresponds to an interesting prior that gets better with depth. What would it mean to show that it's a good prior? In particular, you could imagine a result showing that for infinite-width transformers, the prior puts a certain non-negligible weight on all size-d RASP programs -- I'd consider that a convincing result for the SI analogy!

7,330

John Schulman · Dec 23, 2022 · 7:49 AM UTC

John Schulman

@johnschulman2

23 Dec 2022

Replying to @natolambert @TalkRLPodcast @OpenAI @huggingface

Hi Nathan, the slides don't have that much content, but here they are: drive.google.com/file/d/1hEa… I didn't talk much about tuning or PPO; more about general methodology and principles.

neurips-2022-deeprl-talk.pdf

drive.google.com

5,974

John Schulman · Feb 28, 2024 · 7:37 AM UTC

John Schulman

@johnschulman2

28 Feb 2024

Replying to @NoamShazeer

Great post. I do the same :) github.com/joschu/jax-exp/bl…

9,722

John Schulman · Feb 23, 2025 · 5:16 PM UTC

John Schulman

@johnschulman2

23 Feb 2025

Replying to @peterwildeford

Yup, there's quite a lot to figure out. I see model specs as mostly a kind of applied morality, like law but with very different details. Though it also opens up many new moral questions.

2,145

John Schulman · May 19, 2025 · 4:42 AM UTC

John Schulman

@johnschulman2

19 May 2025

Replying to @unixpickle

it represents the forward and backward pass

1,240

John Schulman · Feb 14, 2022 · 2:34 AM UTC

John Schulman

@johnschulman2

14 Feb 2022

More generally, look at all the phenomenology that neuroscientists have found (e.g. Gazzaniga's split brain experiments, lesion studies) and set up analogous experiments in language models.

John Schulman · Feb 12, 2024 · 5:13 PM UTC

John Schulman

@johnschulman2

12 Feb 2024

Replying to @ctjlewis

Feel free to dm me, can try to help

6,293

John Schulman · Jul 23, 2022 · 7:13 PM UTC

John Schulman

@johnschulman2

23 Jul 2022

Replying to @karpathy

The S4 paper does apples to apples comparisons on LM benchmarks, where they only change the attention layer arxiv.org/abs/2111.00396

Efficiently Modeling Long Sequences with Structured State Spaces

A central goal of sequence modeling is designing a single principled model that can address sequence data across a range of modalities and tasks, particularly on long-range dependencies. Although...

arxiv.org

John Schulman · Feb 14, 2025 · 5:48 PM UTC

John Schulman

@johnschulman2

14 Feb 2025

Actually 2 days ago, not last week :)

16,930

John Schulman · Jun 20, 2024 · 5:11 PM UTC

John Schulman

@johnschulman2

20 Jun 2024

Replying to @hendrycks

I reread it recently and was struck by how well the overall taxonomy/framework you put together has aged.

3,048

John Schulman · Dec 16, 2021 · 10:57 PM UTC

John Schulman

@johnschulman2

16 Dec 2021

Getting this to work was a challenge, between human data collection and ml/infra challenges (PPO on 175b param language models). But the underlying method is simple conceptually: train browsing & answering end-to-end with behavior cloning + RL

John Schulman · May 23, 2025 · 5:37 PM UTC

John Schulman

@johnschulman2

23 May 2025

Replying to @archit_sharma97

Not necessarily -- success rate always goes up with the number of the attacker's queries. And if you consider a generalized form of jailbreaking, which consists of giving the model a deceptive justification, it seems nearly impossible to prevent a determined enough attacker.

4,390

John Schulman · Oct 22, 2025 · 3:33 AM UTC

John Schulman

@johnschulman2

22 Oct 2025

Lead author @egler92630

11,091

John Schulman · Jun 5, 2021 · 4:43 AM UTC

John Schulman

@johnschulman2

5 Jun 2021

something that the world needs: better tooling for monorepos that contain many python packages with internal (within-monorepo) dependencies. pip doesn't handle this; bazel is more powerful but is burdensome

John Schulman · Jul 17, 2022 · 3:53 PM UTC

John Schulman

@johnschulman2

17 Jul 2022

Replying to @RichardMCNgo

Agree with some of the other replies that I'd feel put on the spot with these qs. I like to ask "have you enjoyed any good {food, books, music} recently?" Specific but accessible to everyone and easy to answer.

John Schulman · May 30, 2021 · 11:41 PM UTC

John Schulman

@johnschulman2

30 May 2021

It seems like Waymo has really scaled up its SF presence recently -- I see their white Jaguars every 5 mins.

John Schulman · Dec 16, 2021 · 10:57 PM UTC

John Schulman

@johnschulman2

16 Dec 2021

This release is just the first step -- stay tuned for future versions with higher-quality output and broader capabilities. My team has some job openings, apply here: boards.greenhouse.io/openai/…

John Schulman · May 8, 2021 · 1:29 PM UTC

John Schulman

@johnschulman2

8 May 2021

Musical training should put more emphasis on teaching chord changes and how to play over them instead of the motor skill of memorizing classical pieces. These skills are usually only taught with jazz, but they're useful for all genres.

John Schulman · Feb 22, 2024 · 5:05 AM UTC

John Schulman

@johnschulman2

22 Feb 2024

Replying to @rm_rafailov

Over-optimization might've amplified some of the woke tendencies a bit. Also, it seems more like they chose an overly simplistic prompt, because they didn't have a good way to make the model follow a more nuanced policy

2,246

John Schulman · Feb 22, 2024 · 4:36 AM UTC

John Schulman

@johnschulman2

22 Feb 2024

Replying to @jconorgrogan

true, this is a poorly conceived prompt

2,412

John Schulman · May 3, 2024 · 5:27 AM UTC

John Schulman

@johnschulman2

3 May 2024

Replying to @percyliang @_aidan_clark_

from our recently released evals repo github.com/openai/simple-eva…

simple-evals/mmlu_eval.py at main · openai/simple-evals

Contribute to openai/simple-evals development by creating an account on GitHub.

github.com

1,918

John Schulman · May 8, 2024 · 7:30 PM UTC

John Schulman

@johnschulman2

8 May 2024

Replying to @sama @joannejang

+ Jason Wolfe (not sure if on twitter)

5,499

John Schulman · Feb 26, 2024 · 12:33 AM UTC

John Schulman

@johnschulman2

26 Feb 2024

Replying to @Liv_Boeree

That's a caricature of the situation 3 years ago when you had two main clusters of people thinking about AI's impacts; one concerned with social justice issues, and the other thinking about x-risk and long-term issues. Now that LMs are so practically and commercially useful, a much bigger and broader set of people is working on them, and most aren't affiliated with those two early communities.

2,856

John Schulman · Jan 31, 2025 · 5:51 AM UTC

John Schulman

@johnschulman2

31 Jan 2025

Replying to @DavidDuvenaud

Great article -- really resonates. I gave a talk about an idea for a (small) mitigation: require AIs to ask humans for permission, and make sure the humans understand what they're approving: piped.video/watch?v=1h47Ds6a…. Related to your suggestion "Regulatory frameworks..."

John Schulman - Keeping Humans in the Loop

John Schulman - "Keeping Humans in the Loop."This presentation wa...

youtube.com

1,500

John Schulman · May 31, 2021 · 4:20 AM UTC

John Schulman

@johnschulman2

31 May 2021

Replying to @jacobmbuckman

Props on the self-critical footnote. There's a lot of folk knowledge about which results are BS, and experienced researchers usually small it quickly. It'd be nice if there were a venue, or an incentive, to publicize that knowledge

John Schulman · Jan 8, 2024 · 2:16 AM UTC

John Schulman

@johnschulman2

8 Jan 2024

Replying to @lacker

Yes, actually that's what I should've said. The true utility function could involve concepts that we have no hope at understanding right now

712

John Schulman · May 3, 2021 · 1:45 AM UTC

John Schulman

@johnschulman2

3 May 2021

testing 123

John Schulman · Dec 26, 2023 · 9:59 PM UTC

John Schulman

@johnschulman2

26 Dec 2023

Replying to @vincentweisser @RichardMCNgo

For sci-fi with interesting concepts, a few I've enjoyed are Permutation City, There Is No Anti-Memetics Division, Rainbow's End

653

John Schulman · Jan 26, 2022 · 5:02 PM UTC

John Schulman

@johnschulman2

26 Jan 2022

Tidal volume (inhaled per breath) = .5L. Mask volume = 0.1L (guess). So you'd still get an effective concentration of 25000 PPM / (.5/.1) = 5000 PPM CO2

John Schulman · Dec 24, 2021 · 6:00 AM UTC

John Schulman

@johnschulman2

24 Dec 2021

Replying to @ExiledInfoHaz @RokoMijic @OpenAI

- slippery slopes are real, but "AI shall not autonomously make any HTTP requests" was never going to be a hard-and-fast rule - we're only making GET requests by following links; we're going to be much more careful with POST requests

John Schulman · Jan 19, 2025 · 5:45 AM UTC

John Schulman

@johnschulman2

19 Jan 2025

Replying to @EpochAIResearch

excellent explanations!

749

John Schulman · Jan 17, 2022 · 1:21 AM UTC

John Schulman

@johnschulman2

17 Jan 2022

Replying to @hardmaru

I have this ability but only for attractive women. I've generally had to hide it, because "I caught a glimpse of you at a party a couple years ago" or "I saw your profile on Hinge a few years back" might come off as creepy.

John Schulman · Oct 22, 2025 · 1:16 AM UTC

John Schulman

@johnschulman2

22 Oct 2025

Replying to @yaringal

I like that paper. The auditing agent isn't "pointwise", so it actually gets around this limitation. It can catch some cipher based attacks by inferring the cipher and querying the trained model with cipher-encoded text. (That said, it has imperfect accuracy, and we haven't tried very hard to attack it, so I wouldn't claim the problem is solved)

3,153

John Schulman · Jan 23, 2024 · 2:45 AM UTC

John Schulman

@johnschulman2

23 Jan 2024

Replying to @michael_nielsen

A relevant idea from Vitalik: that coordination can be good and bad, so as a mechanism designer, you want to control what sizes of groups are able to coordinate/collude vitalik.eth.limo/general/202…

Coordination, Good and Bad

vitalik.eth.limo

1,246

John Schulman · Jan 26, 2022 · 5:00 PM UTC

John Schulman

@johnschulman2

26 Jan 2022

A couple replies pointed out that the situation isn't as bad as these numbers suggest -- when you inhale, you mostly get fresh air, not the air lingering in the mask. Still the situation seems worrying for KN95.

John Schulman · Nov 4, 2025 · 5:35 AM UTC

John Schulman

@johnschulman2

4 Nov 2025

Replying to @jkcarlsmith

Great post, and glad that you're tackling the problem of how to design model specs!

1,298

John Schulman · Dec 30, 2023 · 8:53 PM UTC

John Schulman

@johnschulman2

30 Dec 2023

Replying to @andrewgwils

looks very relevant! will take a look

5,058

John Schulman · Dec 5, 2021 · 6:48 PM UTC

John Schulman

@johnschulman2

5 Dec 2021

Beautiful footage, heartwarming parental teamwork piped.video/watch?v=Y04R9ZrS…

John Schulman · Dec 31, 2023 · 4:33 PM UTC

John Schulman

@johnschulman2

31 Dec 2023

Replying to @norabelrose @QuintinPope5

Right, there would have to be a two-step approx.: 1. deep learning ≈ Bayesian inference over time-limited programs 2. Bayesian inference over time-lim programs ≈ SI It's still useful to talk about SI because there's a theory showing it's ideal, vs less theory for speed prior

1,579

John Schulman · Jun 5, 2021 · 6:15 PM UTC

John Schulman

@johnschulman2

5 Jun 2021

This theory explains bell curve meme format slatestarcodex.com/2014/04/2…

John Schulman · Jan 12, 2024 · 6:10 AM UTC

John Schulman

@johnschulman2

12 Jan 2024

Replying to @johnschulman2 @shakoistsLog

You can of course define a probability distribution over formalizations of x, but often the final probability depends more on your distribution over formalizations than on your actual beliefs about the event in question

867

John Schulman · Oct 30, 2025 · 5:10 AM UTC

John Schulman

@johnschulman2

30 Oct 2025

Replying to @mahtabsoin @thinkymachines

yes

1,348

John Schulman · May 8, 2022 · 10:08 PM UTC

John Schulman

@johnschulman2

8 May 2022

Replying to @brianchristian

Apparently Tesla is using MuZero-inspired methods as part of the autonomous driving stack, e.g. see Ashok's presentation from this year's AI day elon-musk-interviews.com/202…

Tesla AI Day – The Presentation (I)

On 19th August 2021, Elon Musk and the Tesla AI team presented the technical progress in the field of artificial intelligence and answered questions from the audience. This is the English transcrip…

elonmuskinterviews.wordpress.com

John Schulman · Oct 22, 2025 · 1:24 AM UTC

John Schulman

@johnschulman2

22 Oct 2025

Replying to @johnschulman2 @yaringal

I don't think we tested your multiple choice datasets, so I'm not sure we'd catch this particular attack, which is very subtle.

2,196

John Schulman · Dec 28, 2023 · 3:25 AM UTC

John Schulman

@johnschulman2

28 Dec 2023

Replying to @danfaggella

pretty clarifying list ... I think good and likely outcomes involve a combination of these -- - governments and model-provider-oligopoly enforce regulations to restrict the agency of AGIs and speed of development (gatekeeper, 1984, enslaved God) - after spectacular progress in science in philosophy, allowing us to understand consciousness, we're in a better position to understand what would be a worthy successor (Descendents) - people who want to live in a more traditional way are empowered to do so through strong property rights. (reversion, libertarian utopia). Other people will just want comfort and happiness and should be provided for (egalitarian utopia)

1,346

John Schulman · Jan 12, 2024 · 6:04 AM UTC

John Schulman

@johnschulman2

12 Jan 2024

Replying to @shakoistsLog

On the other hand, often someone asks you for p(x), but x is an imprecise sentence that can be interpreted/formalized in multiple ways. See this a lot in discussions around AI, with timelines & p(doom).

1,362