On Thursday, OpenAI introduced its much-anticipated O1 models, bringing an exciting new feature to ChatGPT users—the ability for the AI to pause and “think” before responding. These models, which had been building up quite a buzz ahead of their release, are known internally by the codename “Strawberry.” The unique selling point here is the AI’s newfound ability to process questions more thoughtfully, adding a layer of sophistication to the user experience.
Unlike earlier iterations, these O1 models seem designed to slow down and analyze queries in a more human-like fashion. The “thinking” pause offers the potential for more accurate and nuanced answers, mirroring how people often take a moment to gather their thoughts before responding. This thoughtful delay isn’t just about better responses; it could also represent a significant step forward in making AI interactions feel more natural and intuitive.
When comparing OpenAI’s O1 models to GPT-4o, the differences are both significant and somewhat disappointing. While O1 brings notable advancements, particularly in reasoning and handling complex questions, it doesn’t come without drawbacks. One of the major concerns is cost—O1 is roughly four times more expensive to use than GPT-4o, which makes it a tougher sell for many users, especially for everyday tasks.
Workera CEO and Stanford adjunct lecturer Kian Katanforoosh, stated, “There’s a lot of excitement in the AI community. If you can train a reinforcement learning algorithm paired with some of the language model techniques that OpenAI has, you can technically create step-by-step thinking and allow the AI model to walk backwards from big ideas you’re trying to work through.”
The principles behind OpenAI’s O1 models are rooted in techniques that date back several years. Notably, Google’s AlphaGo, which defeated a world champion in the board game Go in 2016, utilized similar methods. AlphaGo trained by playing against itself, learning and improving through countless iterations until it achieved superhuman capabilities. This approach exemplifies what Andy Harrison, former Googler and CEO of the venture firm S32, describes as an agentic process—essentially, the AI improves through practice and iteration rather than generalized reasoning.
Harrison brings up an enduring debate in the AI community. On one side, there’s the belief that automating workflows through these agentic processes is the way forward. On the other side, some argue that for AI to truly excel, it needs to possess generalized intelligence and reasoning, allowing it to make judgments akin to human decision-making. Harrison identifies himself with the first camp, suggesting that while agentic processes are effective, we are not yet at the stage where AI can consistently make nuanced judgments on its own.
In contrast, some view O1 not as a decision-maker but as a valuable tool for refining one’s own decision-making process. Katanforoosh, CEO of Workera, shared an example where he used O1 to assist with a critical hiring decision. He provided the AI with specific constraints—such as a limited interview time and particular skills to assess—and used O1 to validate whether his approach was on the right track. This kind of support highlights O1’s potential as a tool for enhancing human decision-making rather than replacing it.
The crux of the matter is whether O1’s advanced capabilities justify its steep price. As AI technology becomes more accessible and affordable, O1 stands out as one of the few models that has seen a significant increase in cost. This raises the question of whether its benefits in reasoning and decision support are worth the extra investment. For now, it seems that while O1 may offer sophisticated tools for complex problem-solving, its value must be weighed against its cost, especially as more affordable alternatives continue to emerge.
While there’s always hype surrounding new tech releases, the O1 models seem to be living up to their promise. For users who’ve been eagerly awaiting a more refined conversational AI, this could be a game-changer. Imagine an AI that doesn’t just spit out an instant reply but one that seems to truly “consider” the question at hand. It adds a layer of depth that might not have been present in earlier models, which typically prided themselves on speed.
Fit