Tuesday, March 22, 2005

Sidewalk Theory: Scraping RSS, and More Google

Apropos the Google / APF contretemps, Kevin Reynen of the Sidewalk Theory blog recently blogged on the concept of data (or content) scraping, which he maintains is essentially what Google has been doing to APF. Here is a thoughtful email from the Media Bloggers Association discussion list that Kevin has agreed to let me reproduce:

I wouldn't be so quick to defend Google. Is scraping content really fair use?

When researching the adoption rate of RSS by US daily newspapers we found a number of people had written code to scrape headlines from a site and reformat it as RSS. They are temporarily caching the response of an HTTP request, parsing that request for headlines, and displaying the modified results as an RSS feed. In most cases the reformatted version only exists when it is requested and is never saved.

Is this considered fair use of headlines?

What if I started mixing ads into a feed I was creating from your content?

What if I wasn't an individual reformatting your headlines for my use, but the developer of a popular browser add-on like the Google toolbar reformatting your entire page by adding links to your content that pointed to commercial services and profitted from this?

Microsoft tried this with SmartTags in 2001. Google is starting to do it now with the newest version of their Toolbar.

This move by Google has upset a few people, but not enough for Google to change the Toolbar reformatting into an opt-in service like AdSense instead of something you would have to opt out-of. There isn't even an easy no-follow or robot tag method for opting out. Are you going defend Google's right to reformat content until 25% of your readers are using a Google branded version of Firefox that reformats your work before you get upset about this?

Scraping and reformatting content is a slippery slope. If it's acceptable to scrape headlines and images from your site to display on my site, is it also acceptable to add links to your content if I control the software that renders your page? Does it make a difference if that software is on the reader's machine or my server?

I think a fundamental of fair use is that value has been added to the source's work rather than simply trying to profit from it. Is converting headlines to RSS adding value? That's arguable. When Goolge scrapes and reformats your content are they adding value to your work or simply trying to profit from it? IMHO since Google has gone public, they have started ignoring their "Do No Evil" mantra to better their bottom line.

Why does this sound familiar? Oh, yeah. It's fundamentally the argument Marty Schwimmer made when he pulled his blog feed off Bloglines.

UPDATE: Relevant discussion from Tech Law Advisor.

No comments: