
Most of the web content is duplicated and triplicated and replicated so many times it could make you dizzy. Scrapers steal (borrow) your content and post it as fresh with more or less modifications. Why would you think there are billions of pages in the Search Engines' indexes?
Single scraper sites inject 1000s or even 10.000s pages into search engines' indexes. Think how much the page count on the internet is artificially inflated.
They get it from virtually anywhere they can find text. Scrapers can scrape many informations sources such as:
Where there is accesible text there is a scaper.
After they decide on the targeted keywords they employ one of the scraping sources above and leech all the content. Then they can simply republish it or alter it to avoid duplicate filters.
There are methods of altering scraped content to make it look somehow unique compared to the original sources.
Google and other search engines try to determine original content by recording who posted some content for the first time. So if you are unlucky enough to have scrapers steal your content before search engines see it you might end up being the offender.
Scrapers also make the webb poor quality. Every search is flooded by irrelevant results from MFA Sites, sites whose only porpouse is to generate revenue from PPC advertising.
There's no silver bullet for this but one rule of thumb would be to make sure search engines soo your sites first. Avoid pinging when you have new content as many scrapers watch Blog News and grab content as soon as it appears.
Another method would be to use CopyScape.com to see if you content has been stolen by other sites and the file DMCA complaints against the offenders and then hope for the best.
The only thing they are useful for is the links. Links from scrapers are relevant so they can be of some use to you. Make sure any RSS feed contains links back to your site and include at least one links inside your articles's text. It scrapers do not strip the HTML tags you have a chance of a free yet valuable link.
As you can see my article GoArticles.com content scraper educational script it is not difficult to actually write the scraping code. What you do with content afterwards is what really matters.
Post Feedback