Artificial Intelligence at the Crossroads

Suddenly there seems to be an enormous amount of political, regulatory, and legal activity regarding AI, especially generative AI. Much of this is uncharacteristically bipartisan in nature.

The reasons are clear. The big AI firms are largely depending on their traditional access to public website data as the justification for their use of such data for their AI training and generative AI systems.

This is a strong possibility that this argument will ultimately fail miserably, if not under current laws then under new laws and regulations likely to be pushed through around the world, quite likely in a rushed manner that will have an array of negative collateral effects that could actually end up hurting many ordinary people.

Google for example notes that they have long had access to public website data for Search.

Absolutely true. The problem is that generative AI is wholly different in terms of its data usage than anything that has ever come before.

For example, ordinary Search provides a direct value back to sites through search results pages links — something that the current Google CEO has said Google wants to de-emphasize (colloquially, “the ten blue links”) in favor of providing “answers”.

Since the dawn of Internet search sites many years ago, search results links have long represented a usually reasonable fair exchange for public websites, with robots.txt (Robots Exclusion Protocol) available for relatively fine-grained access control that can be specified by the websites themselves, and which at least the major search firms generally have honored.

But generative AI answers eliminate the need for links or other “easy to see” references. Even if “Google it!” or other forms of “more information” links are available related to generative AI answers at any AI firm’s site, few users will bother to view them.

The result is that by and large, today’s generative AI systems by their very nature return essentially nothing of value to the sites that provide the raw knowledge, data, and other information that powers AI language/learning models. 

And typically, generative AI answers (leaving aside rampant inaccuracy problems for now) are like high school term papers that haven’t even included sufficient (if any) inline footnotes and comprehensive bibliographies with links.

A very quick “F” grade at many schools.

I have proposed extending robots.txt to help deal with some of these AI issues — and Google also very recently proposed discussions around this area.

Giving Creators and Websites Control Over Generative AI:

But ultimately, the “take — and give back virtually nothing in return” modality of many AI systems inevitably leads toward enormous pushback. And I do not sense that the firms involved fully understand the cliff that they’re running towards in a competitive rush to push out AI systems long before they or the world at large are ready for them.

These firms can either grasp the nettle themselves and rethink the problematic aspects of their current AI methodologies, or continue their current course and face the high probability that governmental and public concerns will result in major restrictions to their AI projects — restrictions that may seriously negatively impact their operations and hobble positive AI applications for users around the world long into the future.