This is a good time to reflect on the "AI effect". Before a benchmark is solved, people often think we'll need "real AGI" to solve it. Then, afterwards, we realize the benchmark can be solved using mere tricks.
Will this benchmark fall in the same way? Honestly, I'm not sure.🧵
1/10 Today we're launching FrontierMath, a benchmark for evaluating advanced mathematical reasoning in AI. We collaborated with 60+ leading mathematicians to create hundreds of original, exceptionally challenging math problems, of which current AI systems solve less than 2%.