Last February, in:
Giving Creators and Websites Control Over Generative AI
I suggested expansion of the existing Robots Exclusion Protocol (e.g. “robots.txt”) as a path toward helping provide websites and creators control over how their contents are used by AI systems.
Shortly thereafter, Google publicly announced their own support for the robots.txt methodology as a useful mechanism in these contexts.
While it’s true that adherence to robots.txt (or related webpage Meta tags — also part of the Robots Exclusion Protocol) is voluntary, my view is that most large firms do honor its directives, and if ultimately moves toward a regulatory approach to this were deemed genuinely necessary, a more formal approach would be a possible option.
This morning Google ran a livestream discussing their progress in this entire area, emphasizing that we’re only at the beginning of a long road, and asking for a wide range of stakeholder inputs.
I believe of particular importance is Google’s desire for these content control systems to be as technologically straightforward as possible (so, building on the existing Robots Exclusion Protocol is clearly desirable rather than creating something entirely new), and for the effort to be industry-wide, not restricted to or controlled by only a few firms.
Also of note is Google’s endorsement of the excellent “AI taxonomy” concept for consideration in these regards. Essentially, the idea is that AI Web crawling exclusions could be specified by the type of use involved, rather than by which entity was doing the crawling. So, a set of directives could be defined that would apply to all AI-related crawlers, irrespective of who was doing the crawling, but permitting (for example) crawlers that are looking for content related to public interest AI research to proceed, but direct that content not be taken or used for commercial Generative AI chatbot systems.
Again, these are of course only the first few steps toward scalable solutions in this area, but this is all incredibly important, and I definitely support Google’s continuing progress in these regards.