I’ve written in the past about inaccuracies in Google’s Search AI Overviews (let’s call them AIOs) that so often seem to provide specific answers to search queries, usually in rather authoritative wording. And there have been many concerns about how accurate or not these AIOs actually are. I’ve publicly noted a number of examples where AIOs were just flat out wrong even with simple math questions, which certainly doesn’t give one much confidence in more complex kinds of queries.
Now a firm called Oumi, led by a team of former-Google and former-Apple employees, has conducted a study that, combined with what Google does say about these issues, gives us considerable insight into some real statistics about Google AIOs. And those numbers are quite disturbing.
Of course Google has disclaimers all over the place saying that AI answers could be wrong and you should double-check them. But other studies have shown that hardly anybody bothers to do that. And why would you expect them to? What’s the point of being handed an “answer” if you can’t trust that’s it’s correct? And what does “correct” even really mean? Some answers from Google AIOs are fully correct, some are completely wrong, and some are only PARTLY correct — this last case can be particularly insidious. Even when Google AIOs provide links to what are supposedly the source sites related to an answer, often those sites had the information wrong themselves, and then Google unfortunately accepted it as correct and just stirred it into an AIO answer. Sometimes when you look you can’t even find where on those sites Google supposedly got that information at all — it just doesn’t seem to be there.
So what’s the scope of this problem as determined by this new study? Keep in mind this IS a tough nut to crack, for a bunch of reasons, one being Google AIOs are not particularly consistent. In fact, you can ask the same question twice just seconds apart, and, as you may have noticed you may get entirely different and even completely contradictory answers.
Another foundational point is that percentages alone don’t tell the whole story. The massive scale of Google — they have BILLIONS of users across their ecosystem — is absolutely crucial to consider. This applies whether we’re talking about their continuing account recovery policy failures when it comes to innocent users losing access to their accounts, or when it comes to this question of AIOs. The point is, at Google scale even relatively small percentages will represent vast numbers of actual human beings.
Summarizing the numbers it shakes out like this. There are a lot of factors involved but let’s be generous and say that AIOs are inaccurate overall about 9% of the time. There are significantly larger inaccuracy numbers when you look at specific related Google Gemini AI statistics and whether particular answers seem to be ungrounded as relates to referenced sites or inaccuracies at those sites.
Now let’s cut to the chase. At Google scale of around 5 TRILLION search queries a year — that’s trillions with a “T” — the study says that we’re dealing with on the order of tens of millions of wrong Google AIO answers EVERY HOUR, hundreds of thousands PER MINUTE. To me, especially when you consider the impracticality of really double-checking such answers, this is a terrible situation.
Google has become a misinformation machine on a colossal scale, virtually a mirror image of the trust we used to be able to put in their traditional search engine results without AIOs. Because it’s NOT the CORRECT answers we have to worry about, it’s the damage potentially done by wrong or partly wrong answers that people nonetheless understandably accept and act upon as if those answers were correct.
SCALE MATTERS, and enormous scale like Google’s matters enormously. But it’s the human beings around the world being impacted by that scale who Google should care about. The situation is made even worse since Google really won’t take responsibility for inaccuracies in their AIOs impacting those persons. Google is making money hand over fist, but when it comes to really caring about their users, they’re failing more than ever.
–Lauren–