A North Star for open AGI. Co-founders: @fchollet @mikeknoop. President: @gregkamradt. We're hiring mission-driven builders: arcprize.org/jobs

Earth
Pinned Tweet
Announcing ARC-AGI-3 The only unsaturated agentic intelligence benchmark in the world Humans score 100%, AI <1% This human-AI gap demonstrates we do not yet have AGI Most benchmarks test what models already know, ARC-AGI-3 tests how they learn
249
576
4,327
744,724
Grok 4 (Thinking) achieves new SOTA on ARC-AGI-2 with 15.9% This nearly doubles the previous commercial SOTA and tops the current Kaggle competition SOTA
232
687
4,924
7,307,391
New verified ARC-AGI-Pub SoTA! @OpenAI o3 has scored a breakthrough 75.7% on the ARC-AGI Semi-Private Evaluation. And a high-compute o3 configuration (not eligible for ARC-AGI-Pub) scored 87.5% on the Semi-Private Eval. 1/4
107
608
3,075
2,473,859
After the o3 price reduction, we retested the o3-2025-04-16 model on ARC-AGI to determine whether its performance had changed. We compared the retest results with the original results and observed no difference in performance.
44
162
2,771
434,864
Today we are announcing ARC-AGI-2, an unsaturated frontier AGI benchmark that challenges AI reasoning systems (same relative ease for humans). Grand Prize: 85%, ~$0.42/task efficiency Current Performance: * Base LLMs: 0% * Reasoning Systems: <4%
67
324
2,353
461,723
New SOTA on ARC-AGI - V1: 79.6%, $8.42/task - V2: 29.4%, $30.40/task Custom submissions by @jeremyberman and @_eric_pang_ are now the best known solutions to ARC-AGI Both: * Are open source * Use Grok 4 * Implement program-synthesis outer loops with test-time adaptation
143
256
1,997
7,503,095
Today, we're announcing a preview of ARC-AGI-3, the Interactive Reasoning Benchmark with the widest gap between easy for humans and hard for AI We’re releasing: * 3 games (environments) * $10K agent contest * AI agents API Starting scores - Frontier AI: 0%, Humans: 100%
166
229
2,011
520,962
Analyzing the Hierarchical Reasoning Model by @makingAGI We verified scores on hidden tasks, ran ablations, and found that performance comes from an unexpected source ARC-AGI Semi Private Scores: * ARC-AGI-1: 32% * ARC-AGI-2: 2% Our 4 findings:
38
162
1,349
272,761
New ARC-AGI SOTA: GPT-5 Pro - ARC-AGI-1: 70.2%, $4.78/task - ARC-AGI-2: 18.3%, $7.41/task @OpenAI’s GPT-5 Pro now holds the highest verified frontier LLM score on ARC-AGI’s Semi-Private benchmark
47
157
1,279
562,803
Clarifying o3’s ARC-AGI Performance OpenAI has confirmed: * The released o3 is a different model from what we tested in December 2024 * All released o3 compute tiers are smaller than the version we tested * The released o3 was not trained on ARC-AGI data, not even the train set * The released o3 is tuned for chat/product use, which introduces both strengths and weaknesses on ARC-AGI What ARC Prize will do: * We will re-test the released o3 (all compute tiers) and publish updated results. Prior scores will be labeled “preview” * We will test and release o4-mini results as soon as possible * We will test o3-pro once available
33
78
1,224
225,641
Verified DeepSeek performance on ARC-AGI's Public Eval (400 tasks) + Semi-Private (100 tasks) DeepSeek V3: * Semi-Private: 7.3% ($.002) * Public Eval: 14% ($.002) DeepSeek Reasoner: * Semi-Private: 15.8% ($.06) * Public Eval: 20.5% ($.05) (Avg $ per task)
19
104
1,176
291,821
Introducing SnakeBench, an experimental benchmark side quest We made 50 LLMs battle each other in head-to-head snake 🐍 2.8K matches showed which models are the best at snake real-time strategy and spatial reasoning Here’s the top match between o3-mini and DeepSeek-R1 🧵
44
145
1,003
176,917
Grok-4 (Fast Reasoning) on ARC-AGI Semi Private Eval - ARC-AGI-1: 48.5%, $0.03/task - ARC-AGI-2: 5.3%, $0.06/task @xai pushes the frontier of performance efficiency on ARC-AGI
37
75
958
1,613,482
Gemini-2.5-Pro Experimental Preview Results ARC-AGI-1 * Public Eval: 24.3% * Semi Private: 12.5% ARC-AGI-2 * Public Eval: .8% * Semi Private: 1.3% These results are on par with Deepseek's R1
25
63
975
297,624
o3 and o4-mini on ARC-AGI's Semi Private Evaluation * o3-medium scores 53% on ARC-AGI-1 * o4-mini shows state-of-the-art efficiency * ARC-AGI-2 remains virtually unsolved (<3%) Through analysis we highlight differences from o3-preview and other model behavior
37
114
979
205,502
We put OpenAI o1 to the test against ARC Prize. Results: both o1 models beat GPT-4o. And o1-preview is on par with Claude 3.5 Sonnet. Can chain-of-thought scale to AGI? What explains o1's modest scores on ARC-AGI? Our notes: arcprize.org/blog/openai-o1-…
44
145
830
403,443
This indicates that there is no evidence of a distilled or altered model being served after the price change.
10
19
837
28,272
Tiny Recursion Model (TRM) results on ARC-AGI - ARC-AGI-1: 40%, $1.76/task - ARC-AGI-2: 6.2%, $2.10/task Thank you to @jm_alexia for contributing TRM, a well written, open source, and thorough research to the community based on the HRM from @makingAGI
33
91
820
272,788
o3-mini performance matches o1 on ARC-AGI-1 Semi-Private Test Set Scores by reasoning effort: > Low: 11% ($0.009/task) > Med: 29% ($0.02/task) > High: 35% ($0.04/task)
24
88
773
229,263
GPT-4.5 Results on ARC-AGI Semi Private Set (100 hold out tasks): * Score: 10.33% * Average Cost per Task: $0.29
43
60
797
129,900
o3 Pro on ARC-AGI Semi Private Eval Results ARC-AGI-1: * Low: 44%, $1.64/task * Medium: 57%, $3.18/task * High: 59%, $4.16/task ARC-AGI-2: * All reasoning efforts: <5%, $4-7/task Takeaways: * o3-pro in line with o3 performance * o3's new price sets the ARC-AGI-1 Frontier
23
72
690
126,310
Verified o1 performance on ARC-AGI's Semi-Private Eval (100 tasks) o1, Low: 25% ($1.5/task) o1, Medium: 31% ($2.5/task) o1, High: 32% ($3.8/task)
28
50
667
323,466
AGI is reached when the capability gap between humans and computers is zero ARC Prize Foundation measures this to inspire progress Today we preview the unbeaten ARC-AGI-2 + open public donations to fund ARC-AGI-3 TY Schmidt Sciences (@ericschmidt) for $50k to kick us off!
24
67
662
225,022
Claude Opus 4 on ARC-AGI Semi Private Eval Base * ARC-AGI-1: 22.5%, $0.40/task * ARC-AGI-2: 1.3%, $0.63/task Thinking 16K * ARC-AGI-1: 35.7%, $1.25/task * ARC-AGI-2: 8.6%, $1.93/task Opus 4 sets new SOTA (8.6%) on ARC-AGI-2
21
70
665
55,318
Llama 4 Maverick and Scout on ARC-AGI's Semi Private Evaluation Maverick: * ARC-AGI-1: 4.38% ($0.0078/task) * ARC-AGI-2: 0.00% ($0.0121/task) Scout: * ARC-AGI-1: 0.50% ($0.0041/task) * ARC-AGI-2: 0.00% ($0.0062/task)
44
43
645
114,638
New ARC-AGI high score! 53% (Prize goal: 85%) Congratulations, MindsAI!
39
40
567
58,033
ARC Prize remains unbeaten. In 2024, SoTA moved from 33% to 55.5%. Announcing: ARC Prize 2024 Winners & Technical Report.
20
91
617
156,045
Thank you to the @xai team for working with us to validate Grok 4's score and inviting us to the watch the live stream
7
27
624
80,099
Claude Sonnet 3.7 + Thinking 1/8/16K results - Base: 13.6%, $.05/task - Thinking 1K: 11.6%, $.07/task - Thinking 8K: 21.1%, $.21/task - Thinking 16K: 28.6%, $.33/task Performance is on par with o3-mini for slightly increased cost per task
26
55
618
89,491
This performance on ARC-AGI highlights a genuine breakthrough in novelty adaptation. This is not incremental progress. We're in new territory. Is it AGI? o3 still fails on some very easy tasks, indicating fundamental differences with human intelligence. 2/4
4
39
587
65,792
We tested every major AI reasoning system. There is no clear winner. Accuracy goes up as you stack modern CoT techniques, but efficiency goes way down. This gives rise to a Pareto frontier on accuracy vs. cost using ARC-AGI as a consistent measuring stick.
23
90
562
64,297
ARC-AGI-3 Preview - 30-Day Learnings 30 days ago we released a preview of our first Interactive Reasoning Benchmark Our goal was to ship quick, learn from the community, and inform the next >100 games. Here’s what we learned after 100s of agents and >3,900 game plays:
10
44
600
96,404
Claude Sonnet 4 on ARC-AGI Semi Private Eval Base * ARC-AGI-1: 23%, $0.08/task * ARC-AGI-2: 1.2%, $0.12/task Thinking 16K * ARC-AGI-1: 40%, $0.36/task * ARC-AGI-2: 5.9%, $0.48/task Sonnet 4 sets new SOTA (5.9%) on ARC-AGI-2
39
71
582
95,834
ARC-AGI-3 Preview: +3 Games Released We’ve opened 3 previously private holdout games from the Preview Agent Competition Now 6 games are available to play online and via Agents API Each game was selected to expand the novelty of ARC-AGI-3 public games Can you beat them?
11
94
369
44,876
On ARC-AGI-1, Grok 4 (Thinking) achieves 66.7% inline with the Pareto frontier for AI reasoning systems we reported last month
9
25
580
64,636
GPT-4.1 on ARC-AGI's Semi Private Evaluation GPT-4.1: * ARC-AGI-1: 5.5% ($0.039/tsk) * ARC-AGI-2: 0.0% ($0.069/tsk) GPT-4.1-Mini: * ARC-AGI-1: 3.5% ($0.0078/tsk) * ARC-AGI-2: 0.0% ($0.0139/tsk) GPT-4.1-Nano: * ARC-AGI-1: 0.0% ($0.0021/tsk) * ARC-AGI-2: 0.0% ($0.0036/tsk)
19
56
576
53,273
R1-Zero matches performance of R1 on ARC-AGI We’ve verified that R1-Zero scored 14% on ARC-AGI-1 (vs 15% on R1) @mikeknoop explains why R1-Zero is more important than R1, why scaling inference isn’t going away, and what happens when “inference becomes training” 1/4
10
63
548
138,833
New ARC-AGI high score! 55.5% (Prize goal: 85%) Congratulations, MindsAI!
27
35
534
149,428
GPT-5 on ARC-AGI Semi Private Eval GPT-5 * ARC-AGI-1: 65.7%, $0.51/task * ARC-AGI-2: 9.9%, $0.73/task GPT-5 Mini * ARC-AGI-1: 54.3%, $0.12/task * ARC-AGI-2: 4.4%, $0.20/task GPT-5 Nano * ARC-AGI-1: 16.5%, $0.03/task * ARC-AGI-2: 2.5%, $0.03/task
29
90
556
108,179
New ARC Prize 2025 High Score 19.0% by Giotto. ai (@podesta_aldo)
7
73
401
38,337
New ARC-AGI high score! 54.5% (Prize goal: 85%) Congratulations, MindsAI!
23
31
511
46,789
ARC-AGI-3 Developer Preview * Hands on first look at ARC-AGI-3 (live demos & API access) * Fireside with @fchollet moderated by @dwarkesh_sp 7/17, San Francisco Open to sponsors & researchers of @arcprize (very limited public slots available)
13
96
445
52,969
Gemini 2.5 Pro (6/17) on ARC-AGI Semi Private Eval ARC-AGI-1: * Thinking 1K: 16%, $0.06/task * Thinking 8K: 29%, $0.29/task * Thinking 16K: 41%, $0.48/task * Thinking 32K: 37%, $0.51/task ARC-AGI-2: * Thinking 32K: 4.9%, $0.75/task
14
51
528
52,609
QwQ-32B on ARC-AGI * Public Eval: 11.25%, $0.036 per task * Semi Private: 7.5%, $0.039 per task
9
39
499
42,008
Claude 3.5 Sonnet (new) scores pass@1 20.3% on 400 ARC-AGI public eval tasks. Original 3.5 Sonnet: 21%.
37
28
475
201,410
Impressive work by @makingAGI and team No pre-training or CoT with material performance on ARC-AGI > With only 27 million parameters, HRM achieves exceptional performance on complex reasoning tasks using only 1000 training samples
🚀Introducing Hierarchical Reasoning Model🧠🤖 Inspired by brain's hierarchical processing, HRM delivers unprecedented reasoning power on complex tasks like ARC-AGI and expert-level Sudoku using just 1k examples, no pretraining or CoT! Unlock next AI breakthrough with neuroscience. 🌟 📄Paper: arxiv.org/abs/2506.21734 💻Code: github.com/sapientinc/HRM
8
47
495
31,574
Today, alongside our analysis of o3's ARC-AGI-Pub performance, we're also releasing data (results, attempts, and prompt) from our high-compute testing. o3 was unable to solve ~9% set of Public Eval tasks that are straightforward for humans. Curious to see why? We invite the community to help assess the characteristics of both solved and unsolved tasks. arcprize.org/blog/oai-o3-pub…
19
53
458
82,114
ARC-AGI-2: A New Challenge for Frontier AI Reasoning Systems Our paper introduces the leading benchmark for evaluating AI’s abstract reasoning capabilities - Humans solve 100% of tasks - Frontier AI scores <5% @fchollet @mikeknoop @GregKamradt @bryanlanders Henry Pinkard
11
87
481
83,009
We’re working to reproduce Qwen 3’s reported 41% on ARC-AGI-1. This score is not yet verified. Reminder, all scores on the ARC-AGI Leaderboard reflect our own verified testing on our semi-private holdout set.
12
19
470
30,414
New ARC-AGI high score! 48% (Prize goal: 85%) Congratulations, MindsAI!
25
30
411
36,462
Sonnet 4.5 on ARC-AGI-1 Semi Private Eval Sonnet 4.5: 25%, $0.08/task - Thinking 1K: 31.3%, $0.09/task - Thinking 8K: 46.5%, $0.18/task - Thinking 16K: 48.3%, $0.27/task - Thinking 32K: 63.7%, $0.52/task
6
42
406
82,582
On Claude Opus 4 results We're currently unable to finish testing Claude Opus 4 on ARC-AGI due to consistent timeouts and rate limits We're actively trying to get this unblocked by the @AnthropicAI team, if you can help get us in touch, please do Once resolved, we'll complete and share the results
Claude Sonnet 4 on ARC-AGI Semi Private Eval Base * ARC-AGI-1: 23%, $0.08/task * ARC-AGI-2: 1.2%, $0.12/task Thinking 16K * ARC-AGI-1: 40%, $0.36/task * ARC-AGI-2: 5.9%, $0.48/task Sonnet 4 sets new SOTA (5.9%) on ARC-AGI-2
18
18
402
52,323
DeepSeek R1 5/28 on ARC-AGI Semi Private Eval * ARC-AGI-1: 21.2%, $0.046/task * ARC-AGI-2: 1.1%, $0.052/task DeepSeek R1 5/28 is on par with OpenAI's o4-mini (low) performance
13
33
381
28,412
Wow! One of our donors has anonymously decided to materially increase their support to $1M! This fully funds our 2025 goal in just 1 day With this support, we’ll launch v2, build v3, and continue driving progress in measuring AGI
We're not done - @bryanhelmig just pledged $15K to ARC Prize
11
20
366
50,689
"I've updated my AGI timeline." One year later, @dwarkesh_sp and @fchollet meet on camera again. Both of them have shifted their AGI timelines. They dive into AGI macroeconomics, the singularity, and ARC-AGI-3 preview.
12
49
374
47,606
Announcing ARC Prize. A $1M+ competition to beat the ARC-AGI benchmark and open source the solution. Hosted by @mikeknoop & @fchollet. arcprize.org
23
108
362
114,911
Previously shared, ARC-AGI-2 (same format - verified easy for humans, harder for AI) will launch alongside ARC Prize 2025. We're committed to running the Grand Prize competition until a high-efficiency, open-source solution scoring 85% on the latest ARC-AGI is created. 3/4
2
16
349
43,036
New ARC Prize 2025 High Score 15.4% by @MindsAI_Jack, @MohamedOsmanML, @tufalabs
10
39
304
29,713
Gemini 2.5 Flash results on ARC-AGI ARC-AGI-1 Semi Private: * Gemini 2.5 Flash: 33% * Thinking - 1K: 16% * Thinking - 8K: 26% * Thinking - 16K: 33% * Thinking - 24K: 32% ARC-AGI-2 Semi Private: * All: <3%
8
25
362
29,052
New ARC Prize 2025 High Score 27.08% by Giotto. ai (@podesta_aldo)
14
27
349
34,013
New ARC-AGI high score! 49% (Prize goal: 85%) Congratulations, MindsAI!
10
20
332
40,347
New ARC Prize 2025 High Score 17.6% by Giotto. ai (@podesta_aldo)
16
20
329
37,490
New ARC Prize 2025 High Score 24.58% by Giotto. ai (@podesta_aldo)
12
20
335
25,018
New ARC Prize 2025 High Score 21.6% by Giotto. ai (@podesta_aldo)
9
25
318
60,319
Read our full o3 testing report and @fchollet's perspective on this exciting breakthrough, the future of the ARC-AGI benchmark, and the path to AGI. arcprize.org/blog/oai-o3-pub… 4/4
3
28
306
40,512
New ARC-AGI high score! 47% (Prize goal: 85%) Congratulations, MindsAI!
9
22
281
31,587
3/24/2025
12
18
300
75,213
New ARC-AGI high score! 43% (Prize goal: 85%) Congratulations, MindsAI!
12
16
285
49,246
New ARC Prize 2025 High Score: 10.1% by @guille_bar
9
19
302
32,405
New ARC-AGI high score! 46% (Prize goal: 85%) Congratulations, MindsAI!
5
29
268
29,619
On Dec. 6... We'll announce the winners of ARC Prize 2024, including top score & paper award progress prizes. And we'll publish a paper documenting state-of-the-art approaches to ARC-AGI. We're now reviewing paper submissions and verifying the leaderboard. Stay tuned...
10
17
254
28,688
The Next Chapter: ARC Prize Foundation Beyond the benchmark - the North Star for AGI We're excited to announce important updates to our leadership, entity structure, and initiatives for 2025 1/5
9
20
252
38,882
Novel test-time-training method to solve ARC-AGI without pretraining "CompressARC achieves 34.75% on the training set and 20% on the evaluation set"
Introducing *ARC‑AGI Without Pretraining* – ❌ No pretraining. ❌ No datasets. Just pure inference-time gradient descent on the target ARC-AGI puzzle itself, solving 20% of the evaluation set. 🧵 1/4
4
20
251
33,907
@jeremyberman's submission: While his previous submission wrote python programs, Jeremy altered his submission to write "natural language programs” in English - ARC-AGI-1: 79.6%, $8.42/task - ARC-AGI-2: 29.44%, $30.40/task Blog post: jeremyberman.substack.com/p/… Code: github.com/jerber/arc-lang-p… Kaggle: kaggle.com/code/jerber/jerem…
2
18
243
43,675
New ARC-AGI high score! 39% (Prize goal: 85%) Congratulations, MindsAI!
6
16
228
32,635
ARC Prize 2025 Leaders 2 weeks in, 7 months to go The Grand Prize is still unclaimed
11
19
238
17,895
New ARC-AGI high score! 38% (Prize goal: 85%) Congratulations, MindsAI!
6
14
225
49,001
Reported scores are from ARC-AGI-1 & 2 Semi Private Evaluation Set Learn more about ARC Prize Foundation: arcprize.org/ ARC-AGI Reasoning Frontier Blog Post: arcprize.org/blog/which-ai-r… View the leaderboard: arcprize.org/leaderboard Reproduce the results: github.com/arcprize/arc-agi-…
5
11
227
45,300
Finding #1: The "hierarchical" architecture had minimal performance impact when compared to a similarly sized transformer A drop-in transformer comes within a few points without any hyperparameter optimization. See our full post: arcprize.org/blog/hrm-analys…
2
18
227
54,494
Interactive Reasoning Benchmarks are the next step in frontier evaluations Hear @GregKamradt share why measuring human-like intelligence requires multi-turn environments Including a sneak peak of ARC-AGI-3 Want to help us build interactive evaluations? We're hiring
9
33
215
27,409
New ARC-AGI high score! 41% (Prize goal: 85%) Congratulations, MindsAI!
5
16
199
30,862
New ARC Prize 2025 High Score 15.3% by @MindsAI_Jack, @MohamedOsmanML, @tufalabs
7
14
210
9,870
ARC-AGI-3 Preview Event Recap @GregKamradt steps through our Interactive Reasoning Benchmark thesis * Why static benchmarks fall short measuring agentic capabilities * The ARC Prize approach to creating interactive benchmarks
11
23
211
22,543
Every ARC-AGI-2 task, however, is solved by at least two humans, quickly and easily. We know this because we tested 400 people live.
5
8
210
79,801
ARC Prize is now 3 months old - we're announcing: 🏆 +$100K Grand Prize (now $600k) 📜 +$25K Paper Awards (now $75k) And we're committing funds for a US university tour in October and the development of the next iteration of ARC-AGI. arcprize.org/blog/3-month-up…
6
28
198
38,288
Qwen3-235b-a22b Instruct-2507 ARC-AGI Semi Private Eval * ARC-AGI-1: 11%, $0.003/task * ARC-AGI-2: 1.3%, $0.004/task
15
15
213
173,826
New ARC-AGI high score! 42% (Prize goal: 85%) Congratulations, MindsAI!
5
8
199
26,197
One goal for ARC Prize was to provide a public measure of progress towards AGI. Here's what we see now when new models like o1 come out.
4
18
204
15,054
Base LLMs (no reasoning) are currently scoring 0% on ARC-AGI-2. Specialized AI reasoning systems (like R1 and o3-mini) score <4%. Even AI systems with high adaptation like o1 pro and o3 low score single-digits (est.)
7
14
203
22,736
Watch the finale of "12 Days of @OpenAI" livestream for a big announcement, starting in 3 minutes... openai.com/12-days/
7
11
196
14,309
Our belief is that once we can no longer come up with quantifiable problems that are relatively easy for humans, yet hard for AI, we have reached AGI. ARC-AGI-2 proves that we do not have AGI. New ideas are still needed!
3
9
199
14,595
ARC Prize Foundation @ MIT We're hosting an evening with top researchers to explore measuring sample efficient in humans and machines Join us to hear from Francois Chollet along with a world class panel: ​Josh Tenenbaum, ​Samuel Gershman, ​Laura Schulz, ​Jacob Andreas
9
30
110
25,796
ARC-AGI-1 was designed to challenge deep learning ARC-AGI-2 challenges reasoning systems – while still maintaining a 100% human solve rate Early results show frontier AI systems scoring 10-20% on ARC-AGI-2 and we're launching it March 2025 This gap demonstrates that we have not yet achieved AI systems that reach human-level general intelligence nitter.app/fchollet/status/187017…
Replying to @fchollet
Does this mean the ARC-AGI benchmark has saturated? Yes -- the v1 version of the benchmark is starting to saturate. There were already signs of this in the Kaggle competition this year -- an ensemble of all submissions would score 81%. The competition next year will run on ARC-AGI-2, an updated version of the dataset that keeps the same format as v1, but features fewer tasks that can be easily brute-forced. Early indications are that ARC-AGI-v2 will represent a complete reset of the state-of-the-art, and it will remain extremely difficult for o3. Meanwhile, a smart human or a small panel of average humans would still be able to score >95%.
10
16
193
38,430
ARC Prize will present at @OpenAI DevDay this October We'll be sharing ARC-AGI-3 progress, including first results on human performance and how interactive evaluations open a new axis for measuring intelligence
OpenAI DevDay Oct 6, 2025 in San Francisco Our biggest one yet: - 1500+ developers - Livestreamed opening keynote - Hands-on building with our latest models & tools - More stages & more demos devday.openai.com
6
15
186
20,681
NeurIPS 2025 - Google Code Golf Championship Based on ARC-AGI, create the shortest program that transforms input -> output $100,000 in prizes
📣 Competition Launch Alert! NeurIPS 2025 hosted by @GoogleDeepMind 🎯 To create Python programs that solve abstract reasoning tasks from the ARC-AGI benchmark 💰 $100,000 Prize Pool ⏰ Entry Deadline: October 23, 2025 kaggle.com/competitions/goog…
5
24
194
20,348
Are You Smarter Than A.I.? An interactive article by @nytimes covers @arcprize and @fchollet "Some experts predict that A.I. will surpass human intelligence within the next few years. Play this puzzle to see how far the machines have to go."
5
29
183
45,335
Anthropic Haiku 4.5 on ARC-AGI-1 Semi Private Eval Haiku 4.5: 14.33%, $0.03/task - Thinking 1K: 16.8%, $0.03/task - Thinking 8K: 25.5%, $0.07/task - Thinking 16K: 37.3%, $0.10/task - Thinking 32K: 47.6%, $0.26/task
7
13
189
29,152
DeepSeek performance is on par, albeit slightly lower, with o1-preview
1
12
180
16,833