Artificial Analysis · Sep 30, 2025 · 2:43 AM UTC

Artificial Analysis · Sep 30, 2025 · 2:43 AM UTC

Artificial Analysis

Artificial Analysis

@ArtificialAnlys

30 Sep 2025

Anthropic’s new Claude 4.5 Sonnet is now the #4 most intelligent model, beats 4.1 Opus, and places Anthropic in the top 3 in the race for frontier intelligence Claude 4.5 Sonnet offers a clear upgrade for Claude 4.1 Opus and Claude 4 Sonnet users, with greater intelligence at the same price and token efficiency as Claude 4 Sonnet. Claude 4.5 Sonnet’s token efficiency, even in its maximum reasoning mode, makes it cheaper to use for many tasks than GPT-5, Grok 4 or Gemini 2.5 Pro. Key benchmarking takeaways: ➤🧠 Anthropic’s most intelligent model: In reasoning mode, Claude 4.5 Sonnet scores 61 on the Artificial Analysis Intelligence Index. This is a jump of +4 points from Claude 4 Sonnet (Thinking) which was released in May 2025, and +2 points from Claude 4.1 Opus (Thinking). Claude 4.5 Sonnet (Thinking) now places ahead of Gemini 2.5 Pro (60) and Grok 4 Fast (60), but behind GPT-5 (high, 68) and Grok 4 (65). ➤📈 Largest increases: we see the biggest uplifts in individual evaluation scores in 𝜏²-Bench Telecom (+13 p.p.), Humanity's Last Exam (+14 p.p.) and Humanity's Last Exam (+7 p.p.). Claude 4.5 Sonnet achieves Anthropic’s best score yet TerminalBench-Hard, but only gains +1 p.p compared to Claude 4.1 Opus and remains behind Grok 4 and GPT-5 Codex (High. Interestingly, Claude 4.5 Sonnet does not achieve the highest score yet in any individual evaluation across the 10 evaluations in Artificial Analysis Intelligence Index. ➤⚡ Non-reasoning performance: In non-reasoning mode, Claude 4.5 Sonnet jumped from 44 to 49 on the Artificial Analysis Intelligence Index. We see the largest improvement in Agentic Tool Use (increase in 𝜏²-Bench Telecom score from 52% to 71%) with smaller improvements across other evals. ➤⚙️ Token efficiency: Anthropic have increased Claude’s evaluation scores without increasing output token usage and the Claude models continue to be more token efficient than all other reasoning models. For Claude 4.5 Sonnet (Thinking) - evaluated with a maximum reasoning budget of 64k tokens - we see a slight decrease in token usage to run Artificial Analysis Intelligence Index from 43M to 42M, compared to Claude 4 Sonnet. This is different to other model upgrades we have seen where increase in intelligence is often correlated with increase in output token usage ➤💲 Pricing: Claude 4.5 Sonnet is priced the same as Claude 4 Sonnet at $3/$15 per 1M input/output tokens. This represents a more compelling option, compared to Claude 4.1 Opus, offering higher intelligence in thinking mode at 1/5th the blended price (3:1 input to output token ratio) Key model details: ➤📏 Context window: 200K tokens ➤🪙 Max output tokens: 64K tokens ➤🌐 Availability: Claude 4.5 Sonnet is available via Anthropic‘s API, Google Vertex and Amazon Bedrock. Claude 4.5 Sonnet is also available via Claude, and Claude Code (v2 of which has also been released today)

Sep 30, 2025 · 2:43 AM UTC

774