On Friday, throughout Day 12 of its “12 days of OpenAI,” OpenAI CEO Sam Altman introduced its newest AI “reasoning” fashions, o3 and o3-mini, which construct upon the o1 fashions launched earlier this yr. The corporate is just not releasing them but however will make these fashions accessible for public security testing and analysis entry immediately.
The fashions use what OpenAI calls “non-public chain of thought,” the place the mannequin pauses to look at its inside dialog and plan forward earlier than responding, which you may name “simulated reasoning” (SR)—a type of AI that goes past primary massive language fashions (LLMs).
The corporate named the mannequin household “o3” as an alternative of “o2” to keep away from potential trademark conflicts with British telecom supplier O2, in response to The Data. Throughout Friday’s livestream, Altman acknowledged his firm’s naming foibles, saying, “Within the grand custom of OpenAI being actually, really dangerous at names, it will be referred to as o3.”
In line with OpenAI, the o3 mannequin earned a record-breaking rating on the ARC-AGI benchmark, a visible reasoning benchmark that has gone unbeaten since its creation in 2019. In low-compute situations, o3 scored 75.7 p.c, whereas in high-compute testing, it reached 87.5 p.c—akin to human efficiency at an 85 p.c threshold.
OpenAI additionally reported that o3 scored 96.7 p.c on the 2024 American Invitational Arithmetic Examination, lacking only one query. The mannequin additionally reached 87.7 p.c on GPQA Diamond, which incorporates graduate-level biology, physics, and chemistry questions. On the Frontier Math benchmark by EpochAI, o3 solved 25.2 p.c of issues, whereas no different mannequin has exceeded 2 p.c.