OpenAI announces new frontier models o3 and o3 mini
On the last day of “shipmas,” OpenAI saved what might be the biggest news for last, though the 1-800 number remains the most fun.
In a puzzling branding move, OpenAI CEO Sam Altman announced their latest frontier models: “o3” and “o3-mini.” For some reason (possibly trademark related), they’re skipping “o2” altogether.
The models are not available to the public yet, but researchers can apply to participate in “public safety testing” of the models, which are expected to be widely released at the end of January. The new models feature multistep “reasoning” like the current o1 model, but also apply the process to safety, leading to a higher success rate at catching prohibited responses, according to Altman.
Altman announced the models on a livestream and revealed that the new models had achieved the highest scores on a benchmark test that has been notoriously difficult for AI models to solve.
The ARC-AGI benchmark is a visual test that consists of a series of patterns of squares on a grid, and the model must apply unique solutions to each puzzle, which requires learning new skills with each problem.
Altman said that the o3 model performed 20% better than the current o1 model on coding benchmarks, and highlighted the performance and cost improvements for the smaller o3-mini model.
The models are not available to the public yet, but researchers can apply to participate in “public safety testing” of the models, which are expected to be widely released at the end of January. The new models feature multistep “reasoning” like the current o1 model, but also apply the process to safety, leading to a higher success rate at catching prohibited responses, according to Altman.
Altman announced the models on a livestream and revealed that the new models had achieved the highest scores on a benchmark test that has been notoriously difficult for AI models to solve.
The ARC-AGI benchmark is a visual test that consists of a series of patterns of squares on a grid, and the model must apply unique solutions to each puzzle, which requires learning new skills with each problem.
Altman said that the o3 model performed 20% better than the current o1 model on coding benchmarks, and highlighted the performance and cost improvements for the smaller o3-mini model.