o4-mini independent evals: o4-mini (high) claims the highest Artificial Analysis Intelligence Index score to date (o3 evals in-progress) and shows strong gains in coding ability Key takeaways: ➤ o4-mini is a clear upgrade to o3-mini, while not as dramatic as the leap from o1-mini to o3-mini (+12pts), the model, with reasoning effort set to high, achieves a +4pt gain the Artificial Analysis Intelligence Index ➤ o4-mini (high) made particular gains in coding intelligence, achieving the #1 position in our Coding Index. This was supported by a +7%pts gain in both LiveCodeBench and SciCode whereby o4-mini is now the clear leader ➤ Pricing: o4-mini is priced in-line with o3-mini ($1.10/$4.40 per 1M Input/Output tokens), though cached inputs are 1/2 the price of o3-mini ($0.275/1M Input tokens vs $0.55/1M) ➤ Context window: o4-mini’s context window of 200k tokens is the same as o3-mini. This is now notably smaller than 4.1’s massive 1M token context window ➤ Token usage: As a reasoning model, the model used a high amount of tokens compared to other models broadly, but marginally lower than o3-mini (72M for o4-mini (high), 77M for o3-mini (high)) Evals for o3 are in progress. While we expect o3 to offer greater intelligence, o3-mini may be the more practical choice for most developers considering the substantially lower price and lower end-to-end latency

Apr 17, 2025 · 5:52 AM UTC

15
72
603