Sheryl Hsu · Aug 11, 2025 · 6:00 PM UTC

Sheryl Hsu

Sheryl Hsu

@SherylHsu02

11 Aug 2025

1/n I’m thrilled to share that our @OpenAI reasoning system scored high enough to achieve gold 🥇🥇 in one of the world’s top programming competitions - the 2025 International Olympiad in Informatics (IOI) - placing first among AI participants! 👨‍💻👨‍💻

194

282

2,681

2,495,917

Sheryl Hsu · Jul 19, 2025 · 7:52 AM UTC

Sheryl Hsu

@SherylHsu02

19 Jul 2025

Watching the model solve these IMO problems and achieve gold-level performance was magical. A few thoughts 🧵

Alexander Wei

@alexwei_

19 Jul 2025

1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).

116

1,655

661,278

Sheryl Hsu · Jul 19, 2025 · 7:52 AM UTC

Sheryl Hsu

@SherylHsu02

19 Jul 2025

It’s crazy how we’ve gone from 12% on AIME (GPT 4o) → IMO gold in ~ 15 months. We have come very far very quickly. I wouldn’t be surprised if by next year models will be deriving new theorems and contributing to original math research!

805

231,786

Sheryl Hsu · Jul 19, 2025 · 7:52 AM UTC

Sheryl Hsu

@SherylHsu02

19 Jul 2025

The model solves these problems without tools like lean or coding, it just uses natural language, and also only has 4.5 hours. We see the model reason at a very high level - trying out different strategies, making observations from examples, and testing hypothesis.

573

73,253

Sheryl Hsu · Jul 19, 2025 · 7:52 AM UTC

Sheryl Hsu

@SherylHsu02

19 Jul 2025

It’s been a blast working with everyone @OpenAI, esp @alexwei_ @polynoamial for this project. I joined 3 months ago and people are so smart and kind - although sometimes they threaten you with a sword, make the model produce correct solutions or suffer!

497

42,768

Sheryl Hsu · Aug 11, 2025 · 6:00 PM UTC

Sheryl Hsu

@SherylHsu02

11 Aug 2025

2/n We officially competed in the online AI track of the IOI, where we scored higher than all but 5 (of 330) human participants and placed first among AI participants. We had the same 5 hour time limit and 50 submission limit as human participants. Like the human contestants, our system competed *without* internet or RAG, and just access to a basic terminal tool.

435

53,734

Sheryl Hsu · Jul 19, 2025 · 7:52 AM UTC

Sheryl Hsu

@SherylHsu02

19 Jul 2025

I was particularly motivated to work on this project because this win came from general research advancements. Beyond just math, we will improve on other capabilities and make ChatGPT more useful over the coming months.

406

35,558

Sheryl Hsu · Aug 11, 2025 · 6:00 PM UTC

Sheryl Hsu

@SherylHsu02

11 Aug 2025

4/n This result demonstrates a huge improvement over @OpenAI’s attempt at IOI last year where we finished just shy of a bronze medal with a significantly more handcrafted test-time strategy. We’ve gone from 49th percentile to 98th percentile at the IOI in just one year!

349

46,604

Sheryl Hsu · Aug 11, 2025 · 6:00 PM UTC

Sheryl Hsu

@SherylHsu02

11 Aug 2025

6/n I’ve been lucky to work with many fantastic teammates here at @OpenAI, specifically with @alexwei_ @bminaiev @oleg_murk for prepping for IOI and building on top of the long term work on competitive programming by @_lorenzkuhn @MostafaRohani @clavera_i @andresnds @ahelkky

261

32,601

Sheryl Hsu · Aug 11, 2025 · 6:00 PM UTC

Sheryl Hsu

@SherylHsu02

11 Aug 2025

3/n We competed with an ensemble of general-purpose reasoning models---we did not train any model specifically for the IOI. Our only scaffolding was in selecting which solutions to submit and connecting to the IOI API.

243

35,981

Sheryl Hsu · Aug 11, 2025 · 6:00 PM UTC

Sheryl Hsu

@SherylHsu02

11 Aug 2025

5/n It’s been really exciting to see the progress of our newest research methods at OpenAI, with our successes at the AtCoder World Finals, IMO, and IOI over the last couple weeks. We’ve been working hard on building smarter, more capable models, and we’re working hard to get them into our mainstream products.

242

31,922

Sheryl Hsu · Aug 11, 2025 · 6:00 PM UTC

Sheryl Hsu

@SherylHsu02

11 Aug 2025

I, along with some teammates, were able to travel to Bolivia to attend the IOI in person. It was wonderful to meet all the participants and coaches there, and we wanted to say congrats once again!!

214

34,326

Sheryl Hsu · Jul 19, 2025 · 8:14 AM UTC

Sheryl Hsu

@SherylHsu02

19 Jul 2025

Replying to @polynoamial

lucky to be one of the agents on multi-agent, it's a blast!!

201

9,171

Sheryl Hsu · Oct 31, 2024 · 4:30 AM UTC

Sheryl Hsu

@SherylHsu02

31 Oct 2024

Feeling spooked👻🎃? Get grounded...introducing "Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval." Meet LeReT (Learning to Retrieve by Trying), a RL-based framework that improves LLM’s ability to use retrieval tools by up to 29%. sherylhsu.com/LeReT/

146

70,397

Sheryl Hsu · Aug 11, 2025 · 7:14 PM UTC

Sheryl Hsu

@SherylHsu02

11 Aug 2025

We officially entered the 2025 International Olympiad in Informatics (IOI) online competition track and adhered to the same restrictions as the human contestants, including submissions and time limits, but without direct supervision from the contest organizers.

129

23,423

Sheryl Hsu · May 16, 2025 · 12:36 PM UTC

Sheryl Hsu

@SherylHsu02

16 May 2025

Sun is rising and run is (still) erroring

12,094

Sheryl Hsu · Jun 22, 2025 · 2:33 AM UTC

Sheryl Hsu

@SherylHsu02

22 Jun 2025

went to the farm to learn to grow berries 🍓🍓

9,940

Sheryl Hsu · Apr 25, 2025 · 9:17 AM UTC

Sheryl Hsu

@SherylHsu02

25 Apr 2025

Presenting this @iclr_conf Saturday 3-5:30 Hall 2B/3 poster 540. Come say hi!!

Sheryl Hsu

@SherylHsu02

31 Oct 2024

25,256

Sheryl Hsu · Feb 27, 2025 · 5:36 PM UTC

Sheryl Hsu

@SherylHsu02

27 Feb 2025

Personalization is what makes apps like tiktok, instagram & x so addictive. What if LLMs could be customized to your preferences and needs?

Anikait Singh

@Anikait_Singh_

27 Feb 2025

Personalization in LLMs is crucial for meeting diverse user needs, yet collecting real-world preferences at scale remains a significant challenge. Introducing FSPO, a simple framework leveraging synthetic preference data to adapt new users with meta-learning for open-ended QA! 🧵

7,447

Sheryl Hsu · May 5, 2024 · 2:51 AM UTC

Sheryl Hsu

@SherylHsu02

5 May 2024

Spent the day at the Unleashed hackathon building Devin At Home with @JJHennessy73909 @SylvainY02 @techlordim. Allows Devin to operate on your local computer (with some protections). Thank you to @cognition_labs @philip_bogdanov

46,237

Sheryl Hsu · Oct 31, 2024 · 4:30 AM UTC

Sheryl Hsu

@SherylHsu02

31 Oct 2024

[3/5] How? LeReT samples a set of queries, computes a reward based on the retrieved documents, and fine-tunes the LLM using SFT + IPO. Moving beyond high temperature sampling, LeReT uses DSPy to optimize few shot prompts, resulting in more diverse and high reward samples.

4,137

Sheryl Hsu · Oct 31, 2024 · 4:30 AM UTC

Sheryl Hsu

@SherylHsu02

31 Oct 2024

[4/5] Results: LeReT improves absolute retrieval accuracy by 29% and downstream generation evaluations by 17%. Additionally, we find few-shot prompting results in better datasets than high temperature sampling, and stronger generators benefit more from improved retrievals.

3,512

Sheryl Hsu · Sep 23, 2024 · 7:08 AM UTC

Sheryl Hsu

@SherylHsu02

23 Sep 2024

4o can create calendar events...actually so useful (and wasn't expecting this to work)

2,347

Sheryl Hsu · Oct 31, 2024 · 4:30 AM UTC

Sheryl Hsu

@SherylHsu02

31 Oct 2024

[5/5] LeReT treats retrieval as a black box, meaning the general algorithm is applicable to any tool and reward function. As such, LeReT can be extended to general agents systems or LLM pipelines. Had a blast working on this with @lateinteraction @chelseabfinn @archit_sharma97

2,991

Sheryl Hsu · Jul 4, 2024 · 12:27 PM UTC

Sheryl Hsu

@SherylHsu02

4 Jul 2024

Gave the first presentation (“What is in the Chrome Web Store?”) of my undergrad today @ASIACCS2024! Big thank you to @AuroreFass for all the mentorship and support.

1,711

Sheryl Hsu · Oct 31, 2024 · 4:30 AM UTC

Sheryl Hsu

@SherylHsu02

31 Oct 2024

[2/5] Why is this important? Like seeing a ghost 👻👻, LLMs often hallucinate (glue in pizza) and grounding LLM answers in retrieved facts improves factuality and transparency. Improving LLM’s ability to retrieve correct information thus improves overall performance.

1,812

Sheryl Hsu · Jul 7, 2024 · 5:15 AM UTC

Sheryl Hsu

@SherylHsu02

7 Jul 2024

Just finished reading Life 3.0 on the plane back from Singapore, it’s shocking how much things have changed since then (2017). Very eye-opening in terms of how AI safety got its start, how people use to think of OpenAI, and the old AGI timeline estates.

1,604

Sheryl Hsu · May 7, 2024 · 6:27 PM UTC

Sheryl Hsu

@SherylHsu02

7 May 2024

I turned 20 three days ago and now I’m listening to @Adele

1,430

Sheryl Hsu · Jun 25, 2024 · 7:27 PM UTC

Sheryl Hsu

@SherylHsu02

25 Jun 2024

Check out this article about our paper “What is in the Chrome Web Store”! Also on arxiv (arxiv.org/abs/2406.12710) and I’ll be at @ASIACCS2024 presenting this work next week.

What is in the Chrome Web Store? Investigating Security-Noteworthy...

This paper is the first attempt at providing a holistic view of the Chrome Web Store (CWS). We leverage historical data provided by ChromeStats to study global trends in the CWS and security...

arxiv.org

Forbes

@Forbes

24 Jun 2024

280 Million Google Chrome Users Installed Dangerous Extensions, Study Says trib.al/NJLg1FH

1,716

Sheryl Hsu · May 6, 2024 · 9:36 AM UTC

Sheryl Hsu

@SherylHsu02

6 May 2024

Spent at least 4 hours this week debating the meaning of life with tech ppl, never seem to talk about this with non tech ppl. Why do tech ppl think so much more about this?

1,594

Sheryl Hsu · Nov 8, 2024 · 6:54 AM UTC

Sheryl Hsu

@SherylHsu02

8 Nov 2024

Replying to @scale_AI

Biggest dataset here is 15k examples, doubtful improvement continues through the 100k-1M range?

121

Sheryl Hsu · Jul 19, 2024 · 12:18 AM UTC

Sheryl Hsu

@SherylHsu02

19 Jul 2024

Had a great time last night!

Mithril

@mithrilcompute

18 Jul 2024

Fantastic to see AI founders, researchers, and builders coming together last night. Thanks for hosting, @FactoryAI 🤝

2,733

Sheryl Hsu · Nov 11, 2023 · 11:05 AM UTC

Sheryl Hsu

@SherylHsu02

11 Nov 2023

2 is my favorite number because it's the only even prime

657

Sheryl Hsu · Apr 22, 2024 · 11:51 PM UTC

Sheryl Hsu

@SherylHsu02

22 Apr 2024

Replying to @du_maximilian @StanfordAILab

Congrats Max!!