Google’s AI Overviews have been a game-changer in the world of search, providing users with concise and accurate summaries of complex topics. However, a recent analysis by the New York Times, in collaboration with AI startup Oumi, has shed light on a concerning issue: despite achieving 91% accuracy in February, Google’s AI Overviews still contain millions of errors.
With over 5 trillion searches conducted annually, the sheer volume of queries handled by Google means that tens of millions of answers may be incorrect. This raises important questions about the reliability of AI-generated content and its potential impact on users.
The Rise of AI Overviews
Google’s shift towards AI-generated content has been a gradual one, with the company moving from linking to sources to summarizing them over the past two years. This change has been driven by the need to provide users with more concise and easily digestible information. However, as the New York Times analysis suggests, this shift has also introduced new challenges.
The Accuracy Conundrum
Oumi tested 4,326 Google searches using the SimpleQA benchmark, a widely used measure of factual accuracy in AI systems. The results showed that AI Overviews were accurate 85% of the time with Gemini 2 and 91% after an upgrade to Gemini 3. While this represents a significant improvement, it also highlights the complexity of the issue.
The bigger problem may not be the accuracy of the answers themselves, but rather the sourcing that underpins them. Oumi found that more than half of the correct February responses were

Leave a Comment