December 2016 – Lauren Weinstein's Blog

Fake News and Google: What Does a Top Google Search Result Really Mean?

Controversy continues to rage over how Holocaust denial sites and related YouTube videos have achieved multiple top and highly-ranked search positions on Google for various forms and permutations of the question “Did the Holocaust really happen?” — and what — if anything — Google intends to ultimately do about these outright, racist lies achieving such search results prominence.

If you’re like most Internet users, you’ve been searching on Google and viewing the resulting pages of blue links for many years now.

But here’s something to ponder that you may not have ever really stopped to think about in depth: What does a top or otherwise high search result on Google really mean?

This turns out to be a remarkably complex issue.

The ranking of search results is arguably the most crucial aspect of core search functionalities. I don’t know the details of how Google’s algorithms make those determinations, and if I did know I couldn’t tell you — this is getting into “crown jewel” territory. This is one of Google’s most important and best kept secrets.

It’s not just important from business and competitive aspects, but also in terms of serving users well.

Google is continually bombarded by folks trying to use all manner of “dirty tricks” to try boost their search ranks and visibility — in the parlance of the trade, Black Hat SEO (Search Engine Optimization). Not all SEO per se is evil — simply having a well organized site using modern coding practices is essentially a kind of perfectly acceptable and recommended “White Hat” SEO.

But if details of Google’s ranking algorithms were known, it could theoretically help underhanded players use various technical tricks to try “game” the system to achieve fraudulently high search ranks.

It’s crucial not to confuse search links that are the results of these Google algorithms — technically termed “organic” or “natural” search results — with paid ad links that may appear above those organic results. Google always clearly marks the latter as “Ad” or “Sponsored” and these must always be considered in the context of being paid insertions that are dependent on the advertisers’ continuing ability to pay for them.

Until a relatively few years ago, Google’s organic search results always represented “simply” what Google felt were the “best” or “most relevant” link results for a given user’s query.

But the whole situation became enormously more complex when Google began offering what it deemed to be actual answers to questions posed in some queries, rather than only the familiar set of links.

In simple terms, such answers are typically displayed above (and/or to the right) of the usual search result links. These can come from a wide variety of sources, often related to the top organic search result, with one prominent source being Wikipedia.

Google’s philosophy about this — repeatedly stated publicly — is that if a user is asking a straightforward question and Google knows the straightforward answer, it can make sense to provide that answer directly rather than only the pages of blue links.

This makes an enormous amount of good sense.

Yet it also introduced a massive complication which is at the foundation of the Holocaust denial and other fake news, fake information controversies.

Google Search has earned enormous trust around the world. Users assume that when Google ranks organic results to a query, it does so based on a sound, scientific analysis.

And here’s the absolutely crucial point: It is my belief, based on continuing interactions with Google users and other data I’ve been collecting over an extended period, that most Google users do not commonly differentiate between what Google considers to be “answers” and what Google considers “merely” to be ordinary search result links.

That is, users overall have come to trust Google to such an extent that they assume Google would not respond to a specific question with highly ranked links that are outright lies and falsifications.

Again, Google doesn’t consider all of those to be “specific answers” — Google rather considers the vast majority to be simply the “best” or “most relevant” links based on the internal churning of their algorithm.

Most Google users don’t make this distinction. To them, the highest ranking organic links that appear in response to questions are assumed to likely be the “correct” answers, since they can’t imagine Google knowingly highly ranking fake news or false information in response to such queries.

As Strother Martin’s character “Captain” famously proclaimed in the 1967 film “Cool Hand Luke” – “What we’ve got here is failure to communicate.”

Part of the problem is that Google’s algorithms appear outwardly to be tuned toward topics where specific answers are not controversial. It’s one thing to see a range of user-perceived answers to a question like “What is the best flavor of ice cream?” But when it comes to the truth of the Holocaust for example, there is no room for maneuvering, any more than there is when answering other fact-based questions, such as “Is the moon made of green cheese?”

Many observers are calling for Google to manually eliminate or manually downrank outright lies like the Holocaust denials.

I am unenthusiastic about such approaches. I would much prefer that scalable, automated methods be employed in these contexts whenever possible. Some governments are already proposing false “solutions” that amount to horrific new censorship regimes (that could easily make the existing and terrible EU “Right To Be Forgotten” look like a veritable picnic by comparison).

I would much prefer to see this set of issues resolved via various forms of labeling to indicate highly ranked items that are definitively false (please see: Action Items: What Google, Facebook, and Others Should Be Doing RIGHT NOW About Fake News).

Also important could be explicit notices from Google indicating that they are not endorsing such links in any way and do not represent them as being “correct answers” to the associated queries. A general educational outreach by Google to help users better understand Google’s view of what highly ranked search results actually represent, could also potentially be very useful.

As emotionally upsetting as the fake news and fake information situation has become, especially given the prominent rise of violent, racist, often politically motivated lies in this context, there are definitely ways forward out of this current set of dilemmas, so long as both we and the firms involved acknowledge that serious actions are needed and that the status quo is definitely no longer acceptable.

–Lauren–
I have consulted to Google, but I am not currently doing so — my opinions expressed here are mine alone.
– – –
The correct term is “Internet” NOT “internet” — please don’t fall into the trap of using the latter. It’s just plain wrong!

Administrivia: Observing Google: “Tough Love”

Lately I’ve been receiving a significant spike in email from readers asking various forms of the question:

What is your true stance regarding Google?

In particular, they seem unable to grasp how I can send out one blog post or other item that is significantly critical of some aspect of Google, then another post that is highly complimentary of a different aspect.

I view the question as frankly rather shallow and illogical. One might as well ask “What is your true opinion of life?”

Google is a great firm — a very large company of enormous complexity, operating at the leading edge of technology’s intersection with privacy, security, and one way or another, most other aspects of society.

It would be foolhardy in the extreme to evaluate Google as if it were some sort of monolithic whole (though the true “Google Haters” seem to do exactly that most of the time).

As for myself, when I believe that Google is making a mistake that is causing them to fall short of the high standards of which I feel they’re capable, I explicitly tell them so and I pull no punches in that analysis. When my view is that they’re doing great work (which is overwhelmingly more often the case) it’s my pleasure to say so clearly and explicitly.

If you wish to call this something akin to “tough love” regarding Google on my part, I won’t argue.

Be seeing you.

Action Items: What Google, Facebook, and Others Should Be Doing RIGHT NOW About Fake News

Today is action items day, and there isn’t a moment to lose before someone gets killed as a result of the fake news scourge. It nearly happened a couple of days ago, when some wacko invaded a pizza restaurant and shot it up looking for the youthful “sex slaves” that the fake “Pizzagate” story claims exist (a total fabrication created out of whole cloth and part of the complex of fake anti-Hillary sex stories even being promoted by highly-placed wackos in Trump’s White House circle). In fact, there are already new fake stories circulating regarding the shooting itself.

There are some ongoing efforts to begin dealing with fake and false news at the big firms. Facebook appears to be running an experiment asking some users to rate how “misleading” some link titles might be. This will no doubt collect some interesting data and may be a small portion of solutions, but of course cannot alone solve the underlying problems.

Having spent enough time inside Google to have some sense of how the world looks at Google Scale (i.e. “Big” with a Capital “B”), I am convinced that efforts to deal with the Fake/False News problem must primarily be based on algorithmic, automated systems. Humans will also still have important roles to play in this process in terms of tagging, flagging, and verification at least — especially for items that are suspected or verified fakes but are still trending upward very rapidly.

So, Action Item #1: We should be looking at automated systems for doing the bulk of the first level work to detect fakes, or else we’ll be swamped from the word go.

And I believe that the foundational resources to get this done do exist. Google and Facebook (just to name two obvious examples) have powerful AI architectures that could be leveraged toward such tasks, given the will to do so.

Action Item #2: We must understand the true dynamics of how fake and false news are shared — how they rapidly reach large numbers of users and push high into search results. It’s popular to simply assert that everyone believing/sharing these fake stories are just evil or stupid (or both).

That’s way too simplistic an assertion. Even over the very short time that my factsquad.com fake news data collection effort has been active, obvious patterns in the data are already emerging.

One pattern that hits you in the face immediately is that the vast majority of users who share fake news are not stupid and not evil, but they are very much confused by the misinformation surrounding them. There’s a sense that “Well, if it looks professional, or if this ranks highly in search, or if Facebook showed it to me, or my friends shared it with me, it at least might be true, there might something to it somehow, so I’ll share it too!”

This appears to be a far, far larger group of users than the ones who are actually generating and voluntarily wallowing in this trash. In fact, the latter group is voluntarily in their own “echo chambers” — and like with most any group of dedicated haters, Internet-based efforts to change their minds will likely be wasted.

But for much a larger segment of users who are misinformed, confused, and don’t even realize that they have become involuntarily trapped in echo chambers by fake and false news, there is definitely still hope.

This emphasizes a key point that various observers including myself have previously noted. Older users and other users with less Internet experience tend to believe items that look professional, that appear to be from sources that are visually attractive and seemingly structured in a more “news traditional” manner. On the other hand, younger users or other users with more Internet experience tend to care much less — or not at all — about the “professionalism” of the source and give much more credence to items that rank highly in search, are surfaced by services like Facebook, or are widely shared by their friends.

And this gets us to the crux of the matter. By and large, the Internet economy has evolved into a click-based popularity contest. Both in terms of search and social media, it is basically designed to surface content based on how many people appear to have interest in that content. That’s somewhat a simplification of course but it’s fairly close to the mark. And let’s face it, given two stories presented as accurate — one that discusses how people eat pizza, the other an actually fake story describing a nonexistent child sex ring — which is likely to get the most clicks — and so the most revenue?

While a variety of the big fake news sites are related to persons with political motives, a large number are operated by individuals who have no political motives at all — they are “merely” enriching themselves by creating false stories that they believe will get the most shares and “engagement” clicks for their own monetary enrichment.

On the other hand, I’ll tell you as one of the individuals involved in Internet development for decades that we did not build and grow the Net to be a tool for paying people to post fake news, nor to use such false content to help elect a lying sociopath as President of the United States.

Yet the click-based Internet economy is what it is, and alternative models such as subscriptions have seen only limited success. Other concepts such as micropayments even less so.

So what are we to do? This brings us to …

Action Item #3: I continue to strongly feel that censorship is not the best answer to this set of problems, and that more information — not less — is the path toward solutions. Downranking — where fake stories would still exist but no longer be so prominently featured in search results or system shares — can be a viable approach if handled with caution. In particular, only the most serious and dangerous fake content would typically be considered for manual downranking. For most fake news situations, organic (natural) downranking is a much more desirable procedure.

And that’s where labeling comes in. If fake news that has managed to reach high search results and massive sharing were labeled as fake or in some other relevant distinctive manner, I believe that this would give some pause to that large group of confused users, result in less sharing of fakes, and ultimately in the organic downranking of many such stories.

What’s more, in comments I’ve received it’s clear that many users are desperate for help in evaluating the truth of the content that comes pouring in at them now. How can we really blame them for accepting false stories as real when we don’t even make the effort to point out and label the fakes that we definitely know about?

Obviously it’s the case that detecting, evaluating, and labeling content on an Internet scale — even if we restrict our efforts to highly trending and highly ranked items — is a very significant undertaking, even with the best of AI resources doing the bulk of the work. Such issues as the exact wording of labels can also be complex. Do we actually want to label a known false story as “false” per se? Snopes does this successfully at their relatively limited scale, but they don’t have particularly deep pockets, either (ironically but predictably, all manner of fake news stories are written and widely promulgated against Snopes). Another approach as an alternative to a specific “false” label would be the assigning of a kind of “confidence rank” to such stories — with the known fakes perhaps getting a rank of zero.

As always, the devil is in the details, but I’m convinced that some combination of these or related concepts can be made to work, especially given that the status quo is no longer tenable.

Action Item #4: Parody as a test case. The ability of many (most?) people to recognize parody or satire on the Net (unless it is clearly labeled) can be very poor. I ran into this myself when I wrote April Fools’ columns for the CACM journal — even with that highly technical audience some readers assumed that what I thought was obvious and outrageous satire was actually real. The same thing happened with a satire video I released on YouTube years ago as well.

A significant number of the “fake news” stories are sourced from satire sites (that is, at least ostensibly satirical sites — many seem to call themselves satire in small print to try cover fake items with clearly political motives, or mix fake and real items on their sites to cause even more confusion). Yet even items from known satire sources like “The Onion” — and “Borowitz” from “The New Yorker” — frequently explode into mass visibility without any indication that they aren’t “legit” articles.

In some cases this is just by virtue of the fact that typical sharing or search results may give no obvious indication that these are satire or parody — and such items may be innocently shared to large numbers of persons as if they were serious items. In other cases, the sharer knows that they’re dealing with satire but purposely promotes the items as non-satire if this fits with their political agenda of the moment.

In either case, if such stories were clearly marked (as parody or satire, referencing the original source) in search results or in Facebook shares, Twitter feeds, etc., the purposeful and/or accidental damage they can do when they’re inappropriately interpreted by users as serious items could be significantly reduced.

Such specific labeling of individual items that are known to be originally sourced from self-proclaimed satire/parody sites — irrespective of their current share or search results links — could provide something of an initial proving ground for the overall labeling concept. If such items could be identified in the various search and sharing systems as having such sites as their origins, it could help to demonstrate the usefulness of this labeling technique on this specific class of material that would be relatively straightforward to target. User reactions to these labels could then be studied toward the launch of a possible much broader labeling initiative dealing with fake/false news in a more comprehensive manner.

None of this will be easy, nor are these the only possible approaches. But we must immediately begin vigorously moving down the paths towards practical solutions to the serious, rapidly escalating issues of fake news and related problems on the Internet, unless we’re satisfied to be increasingly suffocated under a growing and ultimately disastrous deluge of lies.

S	M	T	W	T	F	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31