OpenAI's O3 Model Falls Short on the FrontierMath Benchmark: What's the Real Score?

Apr 22
OpenAI's O3 Model Falls Short on the FrontierMath Benchmark: What's the Real Score?

OpenAI's FrontierMath Fiasco: Unpacking the Controversy

Jan 20
OpenAI's FrontierMath Fiasco: Unpacking the Controversy

OpenAI's Secret Support of FrontierMath Stirs Up Controversy in AI Community

Jan 20
OpenAI's Secret Support of FrontierMath Stirs Up Controversy in AI Community

The Evolution of Evaluating LLMs: From Traditional to FrontierMath & Beyond

Dec 27
The Evolution of Evaluating LLMs: From Traditional to FrontierMath & Beyond