ARC Prize · Mar 25, 2026 · 5:37 PM UTC

ARC Prize

Pinned Tweet

ARC Prize

@arcprize

Mar 25

Announcing ARC-AGI-3 The only unsaturated agentic intelligence benchmark in the world Humans score 100%, AI <1% This human-AI gap demonstrates we do not yet have AGI Most benchmarks test what models already know, ARC-AGI-3 tests how they learn

249

576

4,327

744,724

ARC Prize · Jul 10, 2025 · 4:42 AM UTC

ARC Prize

@arcprize

10 Jul 2025

Grok 4 (Thinking) achieves new SOTA on ARC-AGI-2 with 15.9% This nearly doubles the previous commercial SOTA and tops the current Kaggle competition SOTA

232

687

4,924

7,307,391

ARC Prize · Dec 20, 2024 · 6:07 PM UTC

ARC Prize

@arcprize

20 Dec 2024

New verified ARC-AGI-Pub SoTA! @OpenAI o3 has scored a breakthrough 75.7% on the ARC-AGI Semi-Private Evaluation. And a high-compute o3 configuration (not eligible for ARC-AGI-Pub) scored 87.5% on the Semi-Private Eval. 1/4

107

608

3,075

2,473,859

ARC Prize · Jun 11, 2025 · 4:26 PM UTC

ARC Prize

@arcprize

11 Jun 2025

After the o3 price reduction, we retested the o3-2025-04-16 model on ARC-AGI to determine whether its performance had changed. We compared the retest results with the original results and observed no difference in performance.

162

2,771

434,864

ARC Prize · Mar 24, 2025 · 8:29 PM UTC

ARC Prize

@arcprize

24 Mar 2025

Today we are announcing ARC-AGI-2, an unsaturated frontier AGI benchmark that challenges AI reasoning systems (same relative ease for humans). Grand Prize: 85%, ~$0.42/task efficiency Current Performance: * Base LLMs: 0% * Reasoning Systems: <4%

324

2,353

461,723

ARC Prize · Sep 16, 2025 · 5:07 PM UTC

ARC Prize

@arcprize

16 Sep 2025

New SOTA on ARC-AGI - V1: 79.6%, $8.42/task - V2: 29.4%, $30.40/task Custom submissions by @jeremyberman and @_eric_pang_ are now the best known solutions to ARC-AGI Both: * Are open source * Use Grok 4 * Implement program-synthesis outer loops with test-time adaptation

143

256

1,997

7,503,095

ARC Prize · Jul 18, 2025 · 5:26 PM UTC

ARC Prize

@arcprize

18 Jul 2025

Today, we're announcing a preview of ARC-AGI-3, the Interactive Reasoning Benchmark with the widest gap between easy for humans and hard for AI We’re releasing: * 3 games (environments) * $10K agent contest * AI agents API Starting scores - Frontier AI: 0%, Humans: 100%

166

229

2,011

520,962

ARC Prize · Aug 15, 2025 · 7:03 PM UTC

ARC Prize

@arcprize

15 Aug 2025

Analyzing the Hierarchical Reasoning Model by @makingAGI We verified scores on hidden tasks, ran ablations, and found that performance comes from an unexpected source ARC-AGI Semi Private Scores: * ARC-AGI-1: 32% * ARC-AGI-2: 2% Our 4 findings:

162

1,349

272,761

ARC Prize · Oct 9, 2025 · 4:49 PM UTC

ARC Prize

@arcprize

9 Oct 2025

New ARC-AGI SOTA: GPT-5 Pro - ARC-AGI-1: 70.2%, $4.78/task - ARC-AGI-2: 18.3%, $7.41/task @OpenAI’s GPT-5 Pro now holds the highest verified frontier LLM score on ARC-AGI’s Semi-Private benchmark

157

1,279

562,803

ARC Prize · Apr 16, 2025 · 6:01 PM UTC

ARC Prize

@arcprize

16 Apr 2025

Clarifying o3’s ARC-AGI Performance OpenAI has confirmed: * The released o3 is a different model from what we tested in December 2024 * All released o3 compute tiers are smaller than the version we tested * The released o3 was not trained on ARC-AGI data, not even the train set * The released o3 is tuned for chat/product use, which introduces both strengths and weaknesses on ARC-AGI What ARC Prize will do: * We will re-test the released o3 (all compute tiers) and publish updated results. Prior scores will be labeled “preview” * We will test and release o4-mini results as soon as possible * We will test o3-pro once available

1,224

225,641

ARC Prize · Jan 21, 2025 · 5:53 PM UTC

ARC Prize

@arcprize

21 Jan 2025

Verified DeepSeek performance on ARC-AGI's Public Eval (400 tasks) + Semi-Private (100 tasks) DeepSeek V3: * Semi-Private: 7.3% ($.002) * Public Eval: 14% ($.002) DeepSeek Reasoner: * Semi-Private: 15.8% ($.06) * Public Eval: 20.5% ($.05) (Avg $ per task)

104

1,176

291,821

ARC Prize · Feb 14, 2025 · 6:15 PM UTC

ARC Prize

@arcprize

14 Feb 2025

Introducing SnakeBench, an experimental benchmark side quest We made 50 LLMs battle each other in head-to-head snake 🐍 2.8K matches showed which models are the best at snake real-time strategy and spatial reasoning Here’s the top match between o3-mini and DeepSeek-R1 🧵

145

1,003

176,917

ARC Prize · Oct 21, 2025 · 4:39 PM UTC

ARC Prize

@arcprize

21 Oct 2025

Grok-4 (Fast Reasoning) on ARC-AGI Semi Private Eval - ARC-AGI-1: 48.5%, $0.03/task - ARC-AGI-2: 5.3%, $0.06/task @xai pushes the frontier of performance efficiency on ARC-AGI

958

1,613,482

ARC Prize · Mar 27, 2025 · 8:50 PM UTC

ARC Prize

@arcprize

27 Mar 2025

Gemini-2.5-Pro Experimental Preview Results ARC-AGI-1 * Public Eval: 24.3% * Semi Private: 12.5% ARC-AGI-2 * Public Eval: .8% * Semi Private: 1.3% These results are on par with Deepseek's R1

975

297,624

ARC Prize · Apr 22, 2025 · 7:11 PM UTC

ARC Prize

@arcprize

22 Apr 2025

o3 and o4-mini on ARC-AGI's Semi Private Evaluation * o3-medium scores 53% on ARC-AGI-1 * o4-mini shows state-of-the-art efficiency * ARC-AGI-2 remains virtually unsolved (<3%) Through analysis we highlight differences from o3-preview and other model behavior

114

979

205,502

ARC Prize · Sep 13, 2024 · 9:18 PM UTC

ARC Prize

@arcprize

13 Sep 2024

We put OpenAI o1 to the test against ARC Prize. Results: both o1 models beat GPT-4o. And o1-preview is on par with Claude 3.5 Sonnet. Can chain-of-thought scale to AGI? What explains o1's modest scores on ARC-AGI? Our notes: arcprize.org/blog/openai-o1-…

145

830

403,443

ARC Prize · Jun 11, 2025 · 4:26 PM UTC

ARC Prize

@arcprize

11 Jun 2025

This indicates that there is no evidence of a distilled or altered model being served after the price change.

837

28,272

ARC Prize · Oct 16, 2025 · 5:16 PM UTC

ARC Prize

@arcprize

16 Oct 2025

Tiny Recursion Model (TRM) results on ARC-AGI - ARC-AGI-1: 40%, $1.76/task - ARC-AGI-2: 6.2%, $2.10/task Thank you to @jm_alexia for contributing TRM, a well written, open source, and thorough research to the community based on the HRM from @makingAGI

820

272,788

ARC Prize · Feb 3, 2025 · 4:48 PM UTC

ARC Prize

@arcprize

3 Feb 2025

o3-mini performance matches o1 on ARC-AGI-1 Semi-Private Test Set Scores by reasoning effort: > Low: 11% ($0.009/task) > Med: 29% ($0.02/task) > High: 35% ($0.04/task)

773

229,263

ARC Prize · Feb 27, 2025 · 8:16 PM UTC

ARC Prize

@arcprize

27 Feb 2025

GPT-4.5 Results on ARC-AGI Semi Private Set (100 hold out tasks): * Score: 10.33% * Average Cost per Task: $0.29

797

129,900

ARC Prize · Jun 10, 2025 · 8:28 PM UTC

ARC Prize

@arcprize

10 Jun 2025

o3 Pro on ARC-AGI Semi Private Eval Results ARC-AGI-1: * Low: 44%, $1.64/task * Medium: 57%, $3.18/task * High: 59%, $4.16/task ARC-AGI-2: * All reasoning efforts: <5%, $4-7/task Takeaways: * o3-pro in line with o3 performance * o3's new price sets the ARC-AGI-1 Frontier

690

126,310

ARC Prize · Dec 19, 2024 · 1:12 AM UTC

ARC Prize

@arcprize

19 Dec 2024

Verified o1 performance on ARC-AGI's Semi-Private Eval (100 tasks) o1, Low: 25% ($1.5/task) o1, Medium: 31% ($2.5/task) o1, High: 32% ($3.8/task)

667

323,466

ARC Prize · Feb 25, 2025 · 6:29 PM UTC

ARC Prize

@arcprize

25 Feb 2025

AGI is reached when the capability gap between humans and computers is zero ARC Prize Foundation measures this to inspire progress Today we preview the unbeaten ARC-AGI-2 + open public donations to fund ARC-AGI-3 TY Schmidt Sciences (@ericschmidt) for $50k to kick us off!

662

225,022

ARC Prize · May 28, 2025 · 8:01 PM UTC

ARC Prize

@arcprize

28 May 2025

Claude Opus 4 on ARC-AGI Semi Private Eval Base * ARC-AGI-1: 22.5%, $0.40/task * ARC-AGI-2: 1.3%, $0.63/task Thinking 16K * ARC-AGI-1: 35.7%, $1.25/task * ARC-AGI-2: 8.6%, $1.93/task Opus 4 sets new SOTA (8.6%) on ARC-AGI-2

665

55,318

ARC Prize · Apr 8, 2025 · 5:26 PM UTC

ARC Prize

@arcprize

8 Apr 2025

Llama 4 Maverick and Scout on ARC-AGI's Semi Private Evaluation Maverick: * ARC-AGI-1: 4.38% ($0.0078/task) * ARC-AGI-2: 0.00% ($0.0121/task) Scout: * ARC-AGI-1: 0.50% ($0.0041/task) * ARC-AGI-2: 0.00% ($0.0062/task)

645

114,638

ARC Prize · Oct 23, 2024 · 3:16 PM UTC

ARC Prize

@arcprize

23 Oct 2024

New ARC-AGI high score! 53% (Prize goal: 85%) Congratulations, MindsAI!

567

58,033

ARC Prize · Dec 6, 2024 · 6:49 PM UTC

ARC Prize

@arcprize

6 Dec 2024

ARC Prize remains unbeaten. In 2024, SoTA moved from 33% to 55.5%. Announcing: ARC Prize 2024 Winners & Technical Report.

617

156,045

ARC Prize · Jul 10, 2025 · 4:42 AM UTC

ARC Prize

@arcprize

10 Jul 2025

Thank you to the @xai team for working with us to validate Grok 4's score and inviting us to the watch the live stream

624

80,099

ARC Prize · Feb 25, 2025 · 5:58 PM UTC

ARC Prize

@arcprize

25 Feb 2025

Claude Sonnet 3.7 + Thinking 1/8/16K results - Base: 13.6%, $.05/task - Thinking 1K: 11.6%, $.07/task - Thinking 8K: 21.1%, $.21/task - Thinking 16K: 28.6%, $.33/task Performance is on par with o3-mini for slightly increased cost per task

618

89,491

ARC Prize · Dec 20, 2024 · 6:07 PM UTC

ARC Prize

@arcprize

20 Dec 2024

This performance on ARC-AGI highlights a genuine breakthrough in novelty adaptation. This is not incremental progress. We're in new territory. Is it AGI? o3 still fails on some very easy tasks, indicating fundamental differences with human intelligence. 2/4

587

65,792

ARC Prize · Jun 5, 2025 · 7:11 PM UTC

ARC Prize

@arcprize

5 Jun 2025

We tested every major AI reasoning system. There is no clear winner. Accuracy goes up as you stack modern CoT techniques, but efficiency goes way down. This gives rise to a Pareto frontier on accuracy vs. cost using ARC-AGI as a consistent measuring stick.

562

64,297

ARC Prize · Aug 19, 2025 · 6:53 PM UTC

ARC Prize

@arcprize

19 Aug 2025

ARC-AGI-3 Preview - 30-Day Learnings 30 days ago we released a preview of our first Interactive Reasoning Benchmark Our goal was to ship quick, learn from the community, and inform the next >100 games. Here’s what we learned after 100s of agents and >3,900 game plays:

600

96,404

ARC Prize · May 27, 2025 · 4:55 PM UTC

ARC Prize

@arcprize

27 May 2025

Claude Sonnet 4 on ARC-AGI Semi Private Eval Base * ARC-AGI-1: 23%, $0.08/task * ARC-AGI-2: 1.2%, $0.12/task Thinking 16K * ARC-AGI-1: 40%, $0.36/task * ARC-AGI-2: 5.9%, $0.48/task Sonnet 4 sets new SOTA (5.9%) on ARC-AGI-2

582

95,834

ARC Prize · Aug 21, 2025 · 6:31 PM UTC

ARC Prize

@arcprize

21 Aug 2025

ARC-AGI-3 Preview: +3 Games Released We’ve opened 3 previously private holdout games from the Preview Agent Competition Now 6 games are available to play online and via Agents API Each game was selected to expand the novelty of ARC-AGI-3 public games Can you beat them?

369

44,876

ARC Prize · Jul 10, 2025 · 4:42 AM UTC

ARC Prize

@arcprize

10 Jul 2025

On ARC-AGI-1, Grok 4 (Thinking) achieves 66.7% inline with the Pareto frontier for AI reasoning systems we reported last month

580

64,636

ARC Prize · Apr 14, 2025 · 7:27 PM UTC

ARC Prize

@arcprize

14 Apr 2025

GPT-4.1 on ARC-AGI's Semi Private Evaluation GPT-4.1: * ARC-AGI-1: 5.5% ($0.039/tsk) * ARC-AGI-2: 0.0% ($0.069/tsk) GPT-4.1-Mini: * ARC-AGI-1: 3.5% ($0.0078/tsk) * ARC-AGI-2: 0.0% ($0.0139/tsk) GPT-4.1-Nano: * ARC-AGI-1: 0.0% ($0.0021/tsk) * ARC-AGI-2: 0.0% ($0.0036/tsk)

576

53,273

ARC Prize · Jan 29, 2025 · 5:42 PM UTC

ARC Prize

@arcprize

29 Jan 2025

R1-Zero matches performance of R1 on ARC-AGI We’ve verified that R1-Zero scored 14% on ARC-AGI-1 (vs 15% on R1) @mikeknoop explains why R1-Zero is more important than R1, why scaling inference isn’t going away, and what happens when “inference becomes training” 1/4

548

138,833

ARC Prize · Nov 3, 2024 · 8:36 PM UTC

ARC Prize

@arcprize

3 Nov 2024

New ARC-AGI high score! 55.5% (Prize goal: 85%) Congratulations, MindsAI!

534

149,428

ARC Prize · Aug 7, 2025 · 5:29 PM UTC

ARC Prize

@arcprize

7 Aug 2025

GPT-5 on ARC-AGI Semi Private Eval GPT-5 * ARC-AGI-1: 65.7%, $0.51/task * ARC-AGI-2: 9.9%, $0.73/task GPT-5 Mini * ARC-AGI-1: 54.3%, $0.12/task * ARC-AGI-2: 4.4%, $0.20/task GPT-5 Nano * ARC-AGI-1: 16.5%, $0.03/task * ARC-AGI-2: 2.5%, $0.03/task

556

108,179

ARC Prize · Jul 25, 2025 · 7:07 PM UTC

ARC Prize

@arcprize

25 Jul 2025

New ARC Prize 2025 High Score 19.0% by Giotto. ai (@podesta_aldo)

401

38,337

ARC Prize · Oct 29, 2024 · 3:29 PM UTC

ARC Prize

@arcprize

29 Oct 2024

New ARC-AGI high score! 54.5% (Prize goal: 85%) Congratulations, MindsAI!

511

46,789

ARC Prize · Jun 30, 2025 · 4:30 PM UTC

ARC Prize

@arcprize

30 Jun 2025

ARC-AGI-3 Developer Preview * Hands on first look at ARC-AGI-3 (live demos & API access) * Fireside with @fchollet moderated by @dwarkesh_sp 7/17, San Francisco Open to sponsors & researchers of @arcprize (very limited public slots available)

445

52,969

ARC Prize · Jun 20, 2025 · 7:12 PM UTC

ARC Prize

@arcprize

20 Jun 2025

Gemini 2.5 Pro (6/17) on ARC-AGI Semi Private Eval ARC-AGI-1: * Thinking 1K: 16%, $0.06/task * Thinking 8K: 29%, $0.29/task * Thinking 16K: 41%, $0.48/task * Thinking 32K: 37%, $0.51/task ARC-AGI-2: * Thinking 32K: 4.9%, $0.75/task

528

52,609

ARC Prize · Mar 6, 2025 · 4:43 PM UTC

ARC Prize

@arcprize

6 Mar 2025

QwQ-32B on ARC-AGI * Public Eval: 11.25%, $0.036 per task * Semi Private: 7.5%, $0.039 per task

499

42,008

ARC Prize · Oct 23, 2024 · 11:06 PM UTC

ARC Prize

@arcprize

23 Oct 2024

Claude 3.5 Sonnet (new) scores pass@1 20.3% on 400 ARC-AGI public eval tasks. Original 3.5 Sonnet: 21%.

475

201,410

ARC Prize · Jul 21, 2025 · 6:26 PM UTC

ARC Prize

@arcprize

21 Jul 2025

Impressive work by @makingAGI and team No pre-training or CoT with material performance on ARC-AGI > With only 27 million parameters, HRM achieves exceptional performance on complex reasoning tasks using only 1000 training samples

Guan Wang

@makingAGI

21 Jul 2025

🚀Introducing Hierarchical Reasoning Model🧠🤖 Inspired by brain's hierarchical processing, HRM delivers unprecedented reasoning power on complex tasks like ARC-AGI and expert-level Sudoku using just 1k examples, no pretraining or CoT! Unlock next AI breakthrough with neuroscience. 🌟 📄Paper: arxiv.org/abs/2506.21734 💻Code: github.com/sapientinc/HRM

495

31,574

ARC Prize · Dec 21, 2024 · 12:23 AM UTC

ARC Prize

@arcprize

21 Dec 2024

Today, alongside our analysis of o3's ARC-AGI-Pub performance, we're also releasing data (results, attempts, and prompt) from our high-compute testing. o3 was unable to solve ~9% set of Public Eval tasks that are straightforward for humans. Curious to see why? We invite the community to help assess the characteristics of both solved and unsolved tasks. arcprize.org/blog/oai-o3-pub…

458

82,114

ARC Prize · May 20, 2025 · 4:45 PM UTC

ARC Prize

@arcprize

20 May 2025

ARC-AGI-2: A New Challenge for Frontier AI Reasoning Systems Our paper introduces the leading benchmark for evaluating AI’s abstract reasoning capabilities - Humans solve 100% of tasks - Frontier AI scores <5% @fchollet @mikeknoop @GregKamradt @bryanlanders Henry Pinkard

481

83,009

ARC Prize · Jul 23, 2025 · 12:27 PM UTC

ARC Prize

@arcprize

23 Jul 2025

We’re working to reproduce Qwen 3’s reported 41% on ARC-AGI-1. This score is not yet verified. Reminder, all scores on the ARC-AGI Leaderboard reflect our own verified testing on our semi-private holdout set.

470

30,414

ARC Prize · Sep 29, 2024 · 4:39 PM UTC

ARC Prize

@arcprize

29 Sep 2024

New ARC-AGI high score! 48% (Prize goal: 85%) Congratulations, MindsAI!

411

36,462

ARC Prize · Sep 30, 2025 · 5:40 PM UTC

ARC Prize

@arcprize

30 Sep 2025

Sonnet 4.5 on ARC-AGI-1 Semi Private Eval Sonnet 4.5: 25%, $0.08/task - Thinking 1K: 31.3%, $0.09/task - Thinking 8K: 46.5%, $0.18/task - Thinking 16K: 48.3%, $0.27/task - Thinking 32K: 63.7%, $0.52/task

406

82,582

ARC Prize · May 27, 2025 · 5:01 PM UTC

ARC Prize

@arcprize

27 May 2025

On Claude Opus 4 results We're currently unable to finish testing Claude Opus 4 on ARC-AGI due to consistent timeouts and rate limits We're actively trying to get this unblocked by the @AnthropicAI team, if you can help get us in touch, please do Once resolved, we'll complete and share the results

ARC Prize

@arcprize

27 May 2025

402

52,323

ARC Prize · Jun 2, 2025 · 7:20 PM UTC

ARC Prize

@arcprize

2 Jun 2025

DeepSeek R1 5/28 on ARC-AGI Semi Private Eval * ARC-AGI-1: 21.2%, $0.046/task * ARC-AGI-2: 1.1%, $0.052/task DeepSeek R1 5/28 is on par with OpenAI's o4-mini (low) performance

381

28,412

ARC Prize · Feb 26, 2025 · 3:07 AM UTC

ARC Prize

@arcprize

26 Feb 2025

Wow! One of our donors has anonymously decided to materially increase their support to $1M! This fully funds our 2025 goal in just 1 day With this support, we’ll launch v2, build v3, and continue driving progress in measuring AGI

ARC Prize

@arcprize

26 Feb 2025

We're not done - @bryanhelmig just pledged $15K to ARC Prize

366

50,689

ARC Prize · Aug 12, 2025 · 3:31 PM UTC

ARC Prize

@arcprize

12 Aug 2025

"I've updated my AGI timeline." One year later, @dwarkesh_sp and @fchollet meet on camera again. Both of them have shifted their AGI timelines. They dive into AGI macroeconomics, the singularity, and ARC-AGI-3 preview.

"I've updated my AGI timeline" | Francois Chollet + Dwarkesh Patel

Interview filmed July 17, 2025 in San Francisco, CA Learn more about ARC-AGI-3: https://arcprize.org/arc-agi/3/ Play the games: https://three.arcprize.org/ arcprize.org

374

47,606

ARC Prize · Jun 11, 2024 · 5:10 PM UTC

ARC Prize

@arcprize

11 Jun 2024

Announcing ARC Prize. A $1M+ competition to beat the ARC-AGI benchmark and open source the solution. Hosted by @mikeknoop & @fchollet. arcprize.org

108

362

114,911

ARC Prize · Dec 20, 2024 · 6:07 PM UTC

ARC Prize

@arcprize

20 Dec 2024

Previously shared, ARC-AGI-2 (same format - verified easy for humans, harder for AI) will launch alongside ARC Prize 2025. We're committed to running the Grand Prize competition until a high-efficiency, open-source solution scoring 85% on the latest ARC-AGI is created. 3/4

349

43,036

ARC Prize · Jul 7, 2025 · 3:40 PM UTC

ARC Prize

@arcprize

7 Jul 2025

New ARC Prize 2025 High Score 15.4% by @MindsAI_Jack, @MohamedOsmanML, @tufalabs

304

29,713

ARC Prize · May 22, 2025 · 7:57 PM UTC

ARC Prize

@arcprize

22 May 2025

Gemini 2.5 Flash results on ARC-AGI ARC-AGI-1 Semi Private: * Gemini 2.5 Flash: 33% * Thinking - 1K: 16% * Thinking - 8K: 26% * Thinking - 16K: 33% * Thinking - 24K: 32% ARC-AGI-2 Semi Private: * All: <3%

362

29,052

ARC Prize · Sep 26, 2025 · 2:48 PM UTC

ARC Prize

@arcprize

26 Sep 2025

New ARC Prize 2025 High Score 27.08% by Giotto. ai (@podesta_aldo)

349

34,013

ARC Prize · Oct 9, 2024 · 1:08 PM UTC

ARC Prize

@arcprize

9 Oct 2024

New ARC-AGI high score! 49% (Prize goal: 85%) Congratulations, MindsAI!

332

40,347

ARC Prize · Jul 21, 2025 · 2:51 PM UTC

ARC Prize

@arcprize

21 Jul 2025

New ARC Prize 2025 High Score 17.6% by Giotto. ai (@podesta_aldo)

329

37,490

ARC Prize · Sep 3, 2025 · 7:20 PM UTC

ARC Prize

@arcprize

3 Sep 2025

New ARC Prize 2025 High Score 24.58% by Giotto. ai (@podesta_aldo)

335

25,018

ARC Prize · Jul 30, 2025 · 3:24 PM UTC

ARC Prize

@arcprize

30 Jul 2025

New ARC Prize 2025 High Score 21.6% by Giotto. ai (@podesta_aldo)

318

60,319

ARC Prize · Dec 20, 2024 · 6:07 PM UTC

ARC Prize

@arcprize

20 Dec 2024

Read our full o3 testing report and @fchollet's perspective on this exciting breakthrough, the future of the ARC-AGI benchmark, and the path to AGI. arcprize.org/blog/oai-o3-pub… 4/4

306

40,512

ARC Prize · Sep 26, 2024 · 3:58 PM UTC

ARC Prize

@arcprize

26 Sep 2024

New ARC-AGI high score! 47% (Prize goal: 85%) Congratulations, MindsAI!

281

31,587

ARC Prize · Mar 14, 2025 · 7:16 PM UTC

ARC Prize

@arcprize

14 Mar 2025

3/24/2025

300

75,213

ARC Prize · Jul 22, 2024 · 3:43 PM UTC

ARC Prize

@arcprize

22 Jul 2024

New ARC-AGI high score! 43% (Prize goal: 85%) Congratulations, MindsAI!

285

49,246

ARC Prize · Apr 10, 2025 · 6:11 PM UTC

ARC Prize

@arcprize

10 Apr 2025

New ARC Prize 2025 High Score: 10.1% by @guille_bar

302

32,405

ARC Prize · Aug 19, 2024 · 3:22 PM UTC

ARC Prize

@arcprize

19 Aug 2024

New ARC-AGI high score! 46% (Prize goal: 85%) Congratulations, MindsAI!

268

29,619

ARC Prize · Nov 21, 2024 · 4:44 PM UTC

ARC Prize

@arcprize

21 Nov 2024

On Dec. 6... We'll announce the winners of ARC Prize 2024, including top score & paper award progress prizes. And we'll publish a paper documenting state-of-the-art approaches to ARC-AGI. We're now reviewing paper submissions and verifying the leaderboard. Stay tuned...

254

28,688

ARC Prize · Jan 8, 2025 · 6:00 PM UTC

ARC Prize

@arcprize

8 Jan 2025

The Next Chapter: ARC Prize Foundation Beyond the benchmark - the North Star for AGI We're excited to announce important updates to our leadership, entity structure, and initiatives for 2025 1/5

252

38,882

ARC Prize · Mar 4, 2025 · 7:24 PM UTC

ARC Prize

@arcprize

4 Mar 2025

Novel test-time-training method to solve ARC-AGI without pretraining "CompressARC achieves 34.75% on the training set and 20% on the evaluation set"

Isaac Liao @LiaoIsaac91893

4 Mar 2025

Introducing *ARC‑AGI Without Pretraining* – ❌ No pretraining. ❌ No datasets. Just pure inference-time gradient descent on the target ARC-AGI puzzle itself, solving 20% of the evaluation set. 🧵 1/4

251

33,907

ARC Prize · Oct 26, 2024 · 7:16 PM UTC

ARC Prize

@arcprize

26 Oct 2024

[Paper] One approach to solve ARC-AGI is to learn a domain-specific language from the training set and add to the DSL on-the-fly when faced with novel tasks. arxiv.org/abs/2410.06209

LeanAgent: Lifelong Learning for Formal Theorem Proving

Large Language Models (LLMs) have been successful in mathematical reasoning tasks such as formal theorem proving when integrated with interactive proof assistants like Lean. Existing approaches...

arxiv.org

249

23,754

ARC Prize · Sep 16, 2025 · 5:07 PM UTC

ARC Prize

@arcprize

16 Sep 2025

@jeremyberman's submission: While his previous submission wrote python programs, Jeremy altered his submission to write "natural language programs” in English - ARC-AGI-1: 79.6%, $8.42/task - ARC-AGI-2: 29.44%, $30.40/task Blog post: jeremyberman.substack.com/p/… Code: github.com/jerber/arc-lang-p… Kaggle: kaggle.com/code/jerber/jerem…

243

43,675

ARC Prize · Nov 1, 2024 · 4:20 PM UTC

ARC Prize

@arcprize

1 Nov 2024

[Paper] Dreamcoder's inductive program synthesis has inspired many ARC-AGI approaches. By combining neural networks + symbolic abstractions, it can tackle tasks from programming to physics. arxiv.org/abs/2006.08381

DreamCoder: Growing generalizable, interpretable knowledge with...

Expert problem-solving is driven by powerful languages for thinking about problems and their solutions. Acquiring expertise means learning these languages -- systems of concepts, alongside the...

arxiv.org

245

17,287

ARC Prize · Jun 21, 2024 · 3:13 PM UTC

ARC Prize

@arcprize

21 Jun 2024

New ARC-AGI high score! 39% (Prize goal: 85%) Congratulations, MindsAI!

228

32,635

ARC Prize · Apr 7, 2025 · 9:24 PM UTC

ARC Prize

@arcprize

7 Apr 2025

ARC Prize 2025 Leaders 2 weeks in, 7 months to go The Grand Prize is still unclaimed

238

17,895

ARC Prize · Jun 19, 2024 · 4:56 PM UTC

ARC Prize

@arcprize

19 Jun 2024

New ARC-AGI high score! 38% (Prize goal: 85%) Congratulations, MindsAI!

225

49,001

ARC Prize · Jul 10, 2025 · 4:42 AM UTC

ARC Prize

@arcprize

10 Jul 2025

Reported scores are from ARC-AGI-1 & 2 Semi Private Evaluation Set Learn more about ARC Prize Foundation: arcprize.org/ ARC-AGI Reasoning Frontier Blog Post: arcprize.org/blog/which-ai-r… View the leaderboard: arcprize.org/leaderboard Reproduce the results: github.com/arcprize/arc-agi-…

227

45,300

ARC Prize · Aug 15, 2025 · 7:03 PM UTC

ARC Prize

@arcprize

15 Aug 2025

Finding #1: The "hierarchical" architecture had minimal performance impact when compared to a similarly sized transformer A drop-in transformer comes within a few points without any hyperparameter optimization. See our full post: arcprize.org/blog/hrm-analys…

227

54,494

ARC Prize · Jun 9, 2025 · 6:09 PM UTC

ARC Prize

@arcprize

9 Jun 2025

Interactive Reasoning Benchmarks are the next step in frontier evaluations Hear @GregKamradt share why measuring human-like intelligence requires multi-turn environments Including a sneak peak of ARC-AGI-3 Want to help us build interactive evaluations? We're hiring

Measuring AGI: Interactive Reasoning Benchmarks

215

27,409

ARC Prize · Jul 18, 2024 · 3:55 PM UTC

ARC Prize

@arcprize

18 Jul 2024

New ARC-AGI high score! 41% (Prize goal: 85%) Congratulations, MindsAI!

199

30,862

ARC Prize · May 14, 2025 · 5:37 PM UTC

ARC Prize

@arcprize

14 May 2025

New ARC Prize 2025 High Score 15.3% by @MindsAI_Jack, @MohamedOsmanML, @tufalabs

210

9,870

ARC Prize · Aug 11, 2025 · 7:04 PM UTC

ARC Prize

@arcprize

11 Aug 2025

ARC-AGI-3 Preview Event Recap @GregKamradt steps through our Interactive Reasoning Benchmark thesis * Why static benchmarks fall short measuring agentic capabilities * The ARC Prize approach to creating interactive benchmarks

211

22,543

ARC Prize · Mar 24, 2025 · 8:29 PM UTC

ARC Prize

@arcprize

24 Mar 2025

Every ARC-AGI-2 task, however, is solved by at least two humans, quickly and easily. We know this because we tested 400 people live.

210

79,801

ARC Prize · Sep 17, 2024 · 5:59 PM UTC

ARC Prize

@arcprize

17 Sep 2024

ARC Prize is now 3 months old - we're announcing: 🏆 +$100K Grand Prize (now $600k) 📜 +$25K Paper Awards (now $75k) And we're committing funds for a US university tour in October and the development of the next iteration of ARC-AGI. arcprize.org/blog/3-month-up…

ARC Prize Survives 3 Months | ARC Prize

Bigger Prizes, Events, Improvements

arcprize.org

198

38,288

ARC Prize · Jul 24, 2025 · 6:40 PM UTC

ARC Prize

@arcprize

24 Jul 2025

Qwen3-235b-a22b Instruct-2507 ARC-AGI Semi Private Eval * ARC-AGI-1: 11%, $0.003/task * ARC-AGI-2: 1.3%, $0.004/task

213

173,826

ARC Prize · Jul 20, 2024 · 4:32 PM UTC

ARC Prize

@arcprize

20 Jul 2024

New ARC-AGI high score! 42% (Prize goal: 85%) Congratulations, MindsAI!

199

26,197

ARC Prize · Sep 12, 2024 · 9:50 PM UTC

ARC Prize

@arcprize

12 Sep 2024

One goal for ARC Prize was to provide a public measure of progress towards AGI. Here's what we see now when new models like o1 come out.

204

15,054

ARC Prize · Mar 24, 2025 · 8:29 PM UTC

ARC Prize

@arcprize

24 Mar 2025

Base LLMs (no reasoning) are currently scoring 0% on ARC-AGI-2. Specialized AI reasoning systems (like R1 and o3-mini) score <4%. Even AI systems with high adaptation like o1 pro and o3 low score single-digits (est.)

203

22,736

ARC Prize · Dec 20, 2024 · 5:57 PM UTC

ARC Prize

@arcprize

20 Dec 2024

Watch the finale of "12 Days of @OpenAI" livestream for a big announcement, starting in 3 minutes... openai.com/12-days/

196

14,309

ARC Prize · Mar 24, 2025 · 8:29 PM UTC

ARC Prize

@arcprize

24 Mar 2025

Our belief is that once we can no longer come up with quantifiable problems that are relatively easy for humans, yet hard for AI, we have reached AGI. ARC-AGI-2 proves that we do not have AGI. New ideas are still needed!

199

14,595

ARC Prize · Sep 9, 2025 · 5:21 PM UTC

ARC Prize

@arcprize

9 Sep 2025

ARC Prize Foundation @ MIT We're hosting an evening with top researchers to explore measuring sample efficient in humans and machines Join us to hear from Francois Chollet along with a world class panel: Josh Tenenbaum, Samuel Gershman, Laura Schulz, Jacob Andreas

110

25,796

ARC Prize · Feb 25, 2025 · 6:29 PM UTC

ARC Prize

@arcprize

25 Feb 2025

ARC-AGI-1 was designed to challenge deep learning ARC-AGI-2 challenges reasoning systems – while still maintaining a 100% human solve rate Early results show frontier AI systems scoring 10-20% on ARC-AGI-2 and we're launching it March 2025 This gap demonstrates that we have not yet achieved AI systems that reach human-level general intelligence nitter.app/fchollet/status/187017…

ALT Example unsolved ARC-AGI-2 task.

François Chollet

@fchollet

20 Dec 2024

Replying to @fchollet

Does this mean the ARC-AGI benchmark has saturated? Yes -- the v1 version of the benchmark is starting to saturate. There were already signs of this in the Kaggle competition this year -- an ensemble of all submissions would score 81%. The competition next year will run on ARC-AGI-2, an updated version of the dataset that keeps the same format as v1, but features fewer tasks that can be easily brute-forced. Early indications are that ARC-AGI-v2 will represent a complete reset of the state-of-the-art, and it will remain extremely difficult for o3. Meanwhile, a smart human or a small panel of average humans would still be able to score >95%.

193

38,430

ARC Prize · Oct 28, 2024 · 7:56 PM UTC

ARC Prize

@arcprize

28 Oct 2024

Deep learning is not enough to beat ARC Prize. We need something more. @mikeknoop & @fchollet share a path to defeat ARC-AGI via Program Synthesis. arcprize.org/blog/beat-arc-a…

How to Beat ARC-AGI by Combining Deep Learning and Program Synthesis | ARC Prize

Deep learning is not enough to beat ARC Prize. We need something more. Knoop and Chollet lay out a path via Program Synthesis to beating the benchmark.

arcprize.org

178

28,066

ARC Prize · Sep 5, 2025 · 7:37 PM UTC

ARC Prize

@arcprize

5 Sep 2025

ARC Prize will present at @OpenAI DevDay this October We'll be sharing ARC-AGI-3 progress, including first results on human performance and how interactive evaluations open a new axis for measuring intelligence

OpenAI

@OpenAI

26 Jun 2025

OpenAI DevDay Oct 6, 2025 in San Francisco Our biggest one yet: - 1500+ developers - Livestreamed opening keynote - Hands-on building with our latest models & tools - More stages & more demos devday.openai.com

186

20,681

ARC Prize · Aug 21, 2025 · 8:40 PM UTC

ARC Prize

@arcprize

21 Aug 2025

NeurIPS 2025 - Google Code Golf Championship Based on ARC-AGI, create the shortest program that transforms input -> output $100,000 in prizes

Kaggle

@kaggle

8 Aug 2025

📣 Competition Launch Alert! NeurIPS 2025 hosted by @GoogleDeepMind 🎯 To create Python programs that solve abstract reasoning tasks from the ARC-AGI benchmark 💰 $100,000 Prize Pool ⏰ Entry Deadline: October 23, 2025 kaggle.com/competitions/goog…

194

20,348

ARC Prize · Mar 27, 2025 · 3:04 PM UTC

ARC Prize

@arcprize

27 Mar 2025

Are You Smarter Than A.I.? An interactive article by @nytimes covers @arcprize and @fchollet "Some experts predict that A.I. will surpass human intelligence within the next few years. Play this puzzle to see how far the machines have to go."

183

45,335

ARC Prize · Oct 16, 2025 · 7:45 PM UTC

ARC Prize

@arcprize

16 Oct 2025

Anthropic Haiku 4.5 on ARC-AGI-1 Semi Private Eval Haiku 4.5: 14.33%, $0.03/task - Thinking 1K: 16.8%, $0.03/task - Thinking 8K: 25.5%, $0.07/task - Thinking 16K: 37.3%, $0.10/task - Thinking 32K: 47.6%, $0.26/task

189

29,152

ARC Prize · Jan 21, 2025 · 5:53 PM UTC

ARC Prize

@arcprize

21 Jan 2025

DeepSeek performance is on par, albeit slightly lower, with o1-preview

180

16,833