According to a TechCrunch story, OpenAI’s recently released o3 and o4-mini AI models are more likely than the company’s earlier reasoning models to experience hallucinations. The models, which are intended to pause and consider queries before answering, were released by the ChatGPT developers on Wednesday, April 16.
The two new models, however, are hallucinating or making things up far more frequently than even the non-reasoning models, like GPT-4o, according to OpenAI’s internal tests. The business has no understanding why this is taking place.
In a technical report, OpenAI said “more research is needed” to understand why hallucinations are getting worse as it scales up reasoning models.
“Our hypothesis is that the kind of reinforcement learning used for o-series models may amplify issues that are usually mitigated (but not fully erased) by standard post-training pipelines,” a former OpenAI employee was quoted as saying by the publication.
Experts claim that while hallucinations may help the models develop creative and interesting ideas, they could also make it a tough sell for businesses in a market where accuracy is the paramount benchmark to achieve.
OpenAI has been betting heavily on the new models to beat the likes of Google, Meta, xAI, Anthropic, and DeepSeek in the cutthroat global AI race. As per the Sam Altman-led company, o3 achieves state-of-the-art performance on SWE-bench verified — a test measuring coding abilities, scoring 69.1 per cent. Meanwhile, the o4-mini model achieves similar performance, scoring 68.1 per cent.
ChatGPT makes people lonely
According to a joint study by OpenAI and MIT Media Lab earlier this month, ChatGPT may be increasing feelings of loneliness among its most regular users. The authors of the study came to the conclusion that participants who trusted and “bonded” with ChatGPT more were more likely to be lonely and to rely on it than others, even though emotions of social isolation and loneliness are frequently impacted by a variety of circumstances.
Researchers said the study might help spark a discussion about the technology’s entire influence on users’ mental health, even if it is still in its infancy.
Leave a Reply