5ubliminal@twitter

Scraping and Content Theft, the reasons why the web is drowning in itself : 5ubliminal's TellinYa

<a href="http://www.tellinya.com/art2/45/">Scraping and Content Theft, the reasons why the web is drowning in itself : 5ubliminal's TellinYa</a>
5ubliminal's YAMS

Most of the web content is duplicated and triplicated and replicated so many times it could make you dizzy. Scrapers steal (borrow) your content and post it as fresh with more or less modifications. Why would you think there are billions of pages in the Search Engines' indexes?

Single scraper sites inject 1000s or even 10.000s pages into search engines' indexes. Think how much the page count on the internet is artificially inflated.

Where do scrapers get their content?

They get it from virtually anywhere they can find text. Scrapers can scrape many informations sources such as:

  • SERPS (Search Engine Results Pages)
  • RSS Feeds
  • News / Articles Sites
  • Regular Sites Returned by Searches
  • Product / Information Feeds
  • Forums / Discussion Boards
  • Chat Logs

Where there is accesible text there is a scaper.

How do scrapers work ?

After they decide on the targeted keywords they employ one of the scraping sources above and leech all the content. Then they can simply republish it or alter it to avoid duplicate filters.

There are methods of altering scraped content to make it look somehow unique compared to the original sources.

Why are the scrapers dangerous?

Google and other search engines try to determine original content by recording who posted some content for the first time. So if you are unlucky enough to have scrapers steal your content before search engines see it you might end up being the offender.

Scrapers also make the webb poor quality. Every search is flooded by irrelevant results from MFA Sites, sites whose only porpouse is to generate revenue from PPC advertising.

How to protect from scrapers?

There's no silver bullet for this but one rule of thumb would be to make sure search engines soo your sites first. Avoid pinging when you have new content as many scrapers watch Blog News and grab content as soon as it appears.

Another method would be to use CopyScape.com to see if you content has been stolen by other sites and the file DMCA complaints against the offenders and then hope for the best.

Why are scrapers useful?

The only thing they are useful for is the links. Links from scrapers are relevant so they can be of some use to you. Make sure any RSS feed contains links back to your site and include at least one links inside your articles's text. It scrapers do not strip the HTML tags you have a chance of a free yet valuable link.

Example of a scraper

As you can see my article GoArticles.com content scraper educational script it is not difficult to actually write the scraping code. What you do with content afterwards is what really matters.

Post Feedback 
Name *
Mail *
URL
« Anti-Spam
» URL will only go live after a review. Comments are moderated. «
5ubliminal's TellinYa.com SEM & SEO Blog © 2007 - All rights reserved unless mentioned otherwise .
Rendered On : [Friday, 21 November, 2008 - 11:19:04 GMT]   No Ajax / Flash Used Here
" Scraping and Content Theft, the reasons why the web is drowning in itself : 5ubliminal's TellinYa "