More Regarding a Terrible Decision by the Internet Archive

Views: 1611

Yesterday, in A Terrible Decision by the Internet Archive May Lead to Widespread Blocking, I discussed in detail why the Internet Archive’s decision to ignore Robots Exclusion Standard (RES) directives (in robots.txt files on websites) is terrible for the Internet community and users. I had expected a deluge of hate email in response. But I’ve received no negative reactions at all — rather a range of useful questions and comments — perhaps emphasizing the fact that the importance of the RES is widely recognized.

As I did yesterday, I’ll emphasize again here that the Archive has done a lot of good over many years, that it’s been an extremely valuable resource in more ways than I have time to list right now. Nor am I asserting that the Archive itself has evil motives for its decision. However, I strongly feel that their decision allies them with the dark players of the Net, and gives such scumbags comfort and encouragement.

One polite public message that I received was apparently authored by Internet Archive founder Brewster Kahle (since the message came in via my blog, I have not been able to immediately authenticate it, but the IP address seemed reasonable). He noted that the Archive accepts requests via email to have pages excluded.

This is of course useful, but entirely inadequate.

Most obviously, this technique fails miserably at scale. The whole point of the RES is to provide a publicly inspectable, unified and comprehensively defined method to inform other sites (individually, en masse, or in various combinations) of your site access determinations.

The “send an email note to this address” technique just can’t fly at Internet scale, even if users assume that those emails will ever actually be viewed at any given site. (Remember when “postmaster@” addresses would reliably reach human beings? Yeah, a long, long time ago.)

There’s also been some fascinating discussion regarding the existing legal status of the RES. While it apparently hasn’t been specifically tested in a legal sense here in the USA at least, judges have still been recognizing the importance of RES in various court decisions.

In 2006, Google was sued (“Field vs. Google” — Nevada) for copyright infringement for spidering and caching a website. The court found for Google, noting that the site included a robots.txt file that permitted such access by Google.

The case of Century 21 vs. Zoocasa (2011 — British Columbia) is also illuminating. In this case, the judge found against Zoocasa, noting that they had disregarded robots.txt directives that prohibited their copying content from the Century 21 site.

So it appears that even today, ignoring RES robots.txt files could mean skating on very thin ice from a legal standpoint.

The best course all around would be for the Internet Archive to reverse their decision, and pledge to honor RES directives, as honorable players in the Internet ecosystem are expected to do. It would be a painful shame if the wonderful legacy of the Internet Archive were to be so seriously tarnished going forward by a single (but very serious) bad judgment call.


A Terrible Decision by the Internet Archive May Lead to Widespread Blocking

Views: 3807

UPDATE (23 April 2017):  More Regarding a Terrible Decision by the Internet Archive

– – –

We can stipulate at the outset that the venerable Internet Archive and its associated systems like Wayback Machine have done a lot of good for many years — for example by providing chronological archives of websites who have chosen to participate in their efforts. But now, it appears that the Internet Archive has joined the dark side of the Internet, by announcing that they will no longer honor the access control requests of any websites.

For any given site, the decision to participate or not with the web scanning systems at the Internet Archive (or associated with any other “spidering” system) is indicated by use of the well established and very broadly affirmed “Robots Exclusion Standard” (RES) — a methodology that uses files named “robots.txt” to inform visiting scanning systems which parts of a given website should or should not be subject to spidering and/or archiving by automated scanners.

RES operates on the honor system. It requests that spidering systems follow its directives, which may be simple or detailed, depending on the situation — with those detailed directives defined comprehensively in the standard itself.

While RES generally has no force of law, it has enormous legal implications. The existence of RES — that is, a recognized means for public sites to indicate access preferences — has been important for many years to help hold off efforts in various quarters to charge search engines and/or other classes of users for access that is free to everyone else. The straightforward argument that sites already have a way — via the RES — to indicate their access preferences has held a lot of rabid lawyers at bay.

And there are lots of completely legitimate reasons for sites to use RES to control spidering access, especially for (but by no means restricted to) sites with limited resources. These include technical issues (such as load considerations relating to resource-intensive databases and a range of other related situations), legal issues such as court orders, and a long list of other technical and policy concerns that most of us rarely think about, but that can be of existential importance to many sites.

Since adherence to the RES has usually been considered to be voluntary, an argument can be made (and we can pretty safely assume that the Archive’s reasoning falls into this category one way or another) that since “bad” players might choose to ignore the standard, this puts “good” players who abide by the standard at a disadvantage.

But this is a traditional, bogus argument that we hear whenever previously ethical entities feel the urge to start behaving unethically: “Hell, if the bad guys are breaking the law with impunity, why can’t we as well? After all, our motives are much better than theirs!”

Therein are the storied paths of “good intentions” that lead to hell, when the floodgates of such twisted illogic open wide, as a flood of other players decide that they must emulate the Internet Archive’s dismal reasoning to remain competitive.

There’s much more.

While RES is typically viewed as not having legal force today, that could be changed, perhaps with relative ease in many circumstances. There are no obvious First Amendment considerations in play, so it would seem quite feasible to roll “Adherence to properly published RES directives” into existing cybercrime-related site access authorization definitions.

Nor are individual sites entirely helpless against the Internet Archive’s apparent embracing of the dark side in this regard.

Unless the Archive intends to try go completely into a “ghost” mode, their spidering agents will still be detectable at the http/https protocol levels, and could be blocked (most easily in their entirety) with relatively simple web server configuration directives. If the Archive attempted to cloak their agent names, individual sites could block the Archive by referencing the Archive’s known source IP addresses instead.

It doesn’t take a lot of imagination to see how all of this could quickly turn into an escalating nightmare of “Whac-A-Mole” and expanding blocks, many of which would likely negatively impact unrelated sites as collateral damage.

Even before the Internet Archive’s decision, this class of access and archiving issues had been smoldering for quite some time. Perhaps the Internet Archive’s pouring of rocket fuel onto those embers may ultimately lead to a legally enforced Robots Exclusion Standard — with both the positive and negative ramifications that would then be involved. There are likely to be other associated legal battles as well.

But in the shorter term at least, the Internet Archive’s decision is likely to leave a lot of innocent sites and innocent users quite badly burned.


The Google Page That Google Haters Don’t Want You to Know About

Views: 14391

UPDATE (24 April 2017):  Quick Tutorial: Deleting Your Data Using Google’s “My Activity”

– – –

There’s a page at Google that dedicated Google Haters don’t like to talk about. In fact, they’d prefer that you didn’t even know that it exists, because it seriously undermines the foundation of their hateful anti-Google fantasies.

A core principle of Google hatred is the set of false memes concerning Google and user data collection. This is frequently encapsulated in a fanciful “You are the product!” slogan, despite the fact that (unlike the dominant ISPs and many other large firms) Google never sells user data to third parties.

But the haters hate the idea that data is collected at all, despite the fact that such data is crucial for Google services to function at the quality levels that we have come to expect from Google.

I was thinking about this again today when I started hearing from users reacting to Google’s announcement of multiple user support for Google Home, who were expressing concerns about collection of more individualized voice data (without which — I would note — you couldn’t differentiate between different users).

We can stipulate that Google collects a lot of data to make all of this stuff work. But here’s the kicker that the haters don’t want you to think about — Google also gives you enormous control over that data, to a staggering degree that most Google users don’t fully realize.

The Golden Ticket gateway to this goodness is at:

There’s a lot to explore there — be sure to click on both the three vertical dots near the upper top and on the three horizontal bars near the upper left to see the full range of options available.

This page is a portal to an incredible resource. Not only does it give you the opportunity to see in detail the data that Google has associated with you across the universe of Google products, but also the ability to delete that data (selectively or in its totality), and to determine how much of your data will be collected going forward for the various Google services.

On top of that, there are links over to other data related systems that you can control, such as Takeout for downloading your data from Google, comprehensive ad preferences settings (which you can use to adjust or even fully disable ad personalization), and an array of other goodies, all supported by excellent help pages — a lot of thought and work went into this.

I’m a pragmatist by nature. I worry about organizations that don’t give us control over the data they collect about us — like the government, like those giant ISPs and lots of other firms. And typically, these kinds of entities collect this data even though they don’t actually need it to provide the kinds of services that we want. All too often, they just do it because they can.

On the other hand, I have no problems with Google collecting the kinds of data that provide their advanced services, so long as I can choose when that data is collected, and I can inspect and delete it on demand.

The portal provides those abilities and a lot more.

This does imply taking some responsibility for managing your own data. Google gives you the tools to do so — you have nobody but yourself to blame if you refuse to avail yourself of those excellent tools.

Or to put it another way, if you want to use and benefit from 21st century technological magic, you really do need to be willing to learn at least a little bit about how to use the shiny wand that the wizard handed over to you.



Prosecute Burger King for Their Illegal Google Home Attacks in Their Ads

Views: 5340

Someone — or more likely a bunch of someones — at Burger King and their advertising agency need to be arrested, tried, and spend some time in shackles and prison cells. They’ve likely been violating state and federal cybercrime laws with their obnoxious ad campaign purposely designed to trigger Google Home devices without the permission of those devices’ owners.

Not only has Burger King admitted that this was their purpose, they’ve been gloating about changing their ads to avoid blocks that Google reportedly put in place to try protect Google Home device owners from being subjected to Burger King’s criminal intrusions.

For example, the federal CFAA (Computer Fraud and Abuse Act) broadly prohibits anyone from accessing a computer without authorization. There’s no doubt that Google Home and its associated Google-based systems are computers, and I know that I didn’t give Burger King permission to access and use my Google Home or my associated Google account. Nor did millions of other users. And it’s obvious that Google didn’t give that permission either. Yet the morons at Burger King and their affiliated advertising asses — in their search for social “buzz” regarding their nauseating fast food products — felt no compunction about literally hijacking the Google Home systems of potentially millions of people, interrupting other activities, and ideally (that is, ideally from their sick standpoint) interfering with people’s home environments on a massive scale.

This isn’t a case of a stray “Hey Google” triggering the devices. This was a targeted, specific attack on users, which Burger King then modified to bypass changes that Google apparently put in place when word of those ads circulated earlier.

Burger King has instantly become the “poster child” for mass, criminal abuse of these devices.  And with their lack of consideration for the sanctity of people’s homes, we might assume that they’re already making jokes about trying to find ways to bill burgers to your credit card without your permission as well. For other dark forces watching these events, this idea could be far more than a joke.

While there are some humorous aspects to this situation — like the anti-Burger King changes made on Wikipedia in response to news of these upcoming ads — the overall situation really isn’t funny at all.

In fact, it was a direct and voluntary violation of law. It was accessing and using computers without permission. Whether or not anyone associated with this illicit stunt actually gets prosecuted is a different matter, but I urge the appropriate authorities to seriously explore this possibility, both for the action itself and relating to the precedent it created for future attacks.

And of course, don’t buy anything from those jerks at Burger King. Ever.


You Can Make the New Google+ Work Better — If You’re Borg!

Views: 796

Recently, in Google+ and the Notifications Meltdown, I noted the abysmal user experience represented by the new Google+ unified desktop notifications panel — especially for users like me with many G+ followers and high numbers of notifications.

Since then, one observer mentioned to me that opening and closing the notifications panel seemed to load more notifications. I had noticed this myself earlier, but the technique appeared to be unreliable with erratic results, and with large numbers of notifications still being “orphaned” on the useless standalone G+ notifications page.

After a bunch more time wasted on digging into this, I now seem to have a methodology that will (for now at least … maybe) reliably permit users to see all G+ notifications on the desktop notifications panel, in a manner that permits interacting with them that is much less hassle than the standalone notifications page permits.

There’s just one catch. You pretty much have to be Borg-like in your precision to make this work. You can just call me “One of One” for the remainder of this post.

Keeping in mind that this is a “How-to” guide, not a “What the hell is going on?” guide, let’s begin your assimilation.

The new notifications panel will typically display up to around 10 G+ notification “tiles” when it’s opened by clicking on the red G+ notification circle. If you interact in any way with any specific tile, G+ now usually considers it as “read” and you frequently can’t see it again unless you go to the even more painful standalone notifications page.

Here’s my full recommended procedure. Wander from this path at your own risk.

Open the panel on your desktop by clicking the red circle with the notifications count inside. Click on the bottom-most tile. That notification will open. Interact with it as you might desire — add comments, delete spam, etc.

Now, assuming that there’s more than one notification, click the up-arrow at the top of the panel to proceed upward to the next notification. You can also go back downward with the down-arrow, but do NOT at this time touch the left-arrow at the top of the panel — you do not want to return to those tiles yet.

Continue clicking upward through the notifications using that up-arrow — the notifications will open as you proceed. This can be done quite quickly if you don’t need to add comments of your own or otherwise manage the thread — e.g., you can plow rapidly through +1 notifications.

When you reach the last (that is, the top) notification on the current panel, the up-arrow will no longer be available to click.

NOW you can use the left arrow at the top of the panel to return to the notification tiles view. When you’re back on that view, be sure that you under NO circumstances click the “X” on any of those tiles, and do NOT click on the “hamburger” icon (three horizontal lines) that removes all of the tiles. If you interact with either of those icons, whether at this stage or before working your way up through the notifications, you stand a high probability of creating “orphan” notifications that will collect forever on the standalone notifications page rather than ever being presented by the panel!

So now you’re sitting on the tile view. Click on an empty area of the G+ window OUTSIDE the panel. The panel should close.

Assuming that there are more notifications pending, click again on the red circle. The panel will reopen, and if you’ve been a good Borg you’ll see the panel repopulate with a new batch of notifications.

This exact process can be repeated (again, for the time being at least) until all of your notifications have been dealt with. If you’ve done this all precisely right, you’ll likely end up with zero unread notifications on the standalone notifications page.

That’s all there is to it! A user interface technique that any well-trained Borg can master in no time at all! But at least it’s making my G+ notifications management relatively manageable again.

Yep, resistance IS futile.