AI researchers trained an OpenAI competitor in 26 minutes for less than $50
Researchers at Stanford and the University of Washington have developed an AI model that could compete with Big Tech rivals — and trained it in 26 minutes for less than $50 in cloud compute credits.
In a research paper published last Friday, the new “s1” model demonstrated similar performance on tests measuring mathematical problem-solving and coding abilities to advanced reasoning models like OpenAI’s o1 and DeepSeek’s R1.
Researchers said that s1 was distilled from “Gemini 2.0 Flash Thinking Experimental,” one of Google’s AI models, and that they used “test-time scaling” — or, presenting a base model with a dataset of questions and giving it more time to think before it answers. While this technique is widely used, researchers attempted to achieve the “simplest approach” through a process called supervised fine-tuning, where the model is explicitly instructed to mimic certain behaviors.
In the paper, the researchers discuss using simple commands like “wait”:
“...by appending ‘Wait’ multiple times to the model’s generation when it tries to end. This can lead the model to double-check its answer, often fixing incorrect reasoning steps.”
With their methodology, the researchers report using a relatively small dataset on an off-the-shelf base model to cheaply recreate an AI model’s “reasoning” abilities. Now, the s1 model, along with the data and code used to train it, is on GitHub… which will, presumably, not best please big AI companies. (It was only days ago that OpenAI accused DeepSeek of ripping off ChatGPT to train its models.) Indeed, the mounting concern about unauthorized distilling has given rise to the word “distealing” among the AI community.
The researchers said that the fine-tuning was done on 16 H100 GPUs from Nvidia.
Researchers said that s1 was distilled from “Gemini 2.0 Flash Thinking Experimental,” one of Google’s AI models, and that they used “test-time scaling” — or, presenting a base model with a dataset of questions and giving it more time to think before it answers. While this technique is widely used, researchers attempted to achieve the “simplest approach” through a process called supervised fine-tuning, where the model is explicitly instructed to mimic certain behaviors.
In the paper, the researchers discuss using simple commands like “wait”:
“...by appending ‘Wait’ multiple times to the model’s generation when it tries to end. This can lead the model to double-check its answer, often fixing incorrect reasoning steps.”
With their methodology, the researchers report using a relatively small dataset on an off-the-shelf base model to cheaply recreate an AI model’s “reasoning” abilities. Now, the s1 model, along with the data and code used to train it, is on GitHub… which will, presumably, not best please big AI companies. (It was only days ago that OpenAI accused DeepSeek of ripping off ChatGPT to train its models.) Indeed, the mounting concern about unauthorized distilling has given rise to the word “distealing” among the AI community.
The researchers said that the fine-tuning was done on 16 H100 GPUs from Nvidia.