Home » Blogs, Content, Google, WordPress

When Your Blog is Getting Scraped

8 October 2008 16 Comments

scraper It’s upsetting to discover that nefarious individuals have scraped your RSS feed to make money off your hard work.

Many bloggers are also concerned that it will affect their rankings in the search engines. However, according to Google, they are quite good at identifying original content, and the result is no negative effects for the originating site.

According to Google:

Generally, we can differentiate between two major scenarios for issues related to duplicate content:

-  Within-your-domain-duplicate-content, i.e. identical content which (often unintentionally) appears in more than one place on your site

-  Cross-domain-duplicate-content, i.e. identical content of your site which appears (again, often unintentionally) on different external sites

Search engines identify duplicate results and will then filter out all but one. They try to figure out which is the original by checking which version was published first and which has the most links pointing to it. They also look at which site is the more authoritative one.

Usually, duplicate content will not affect your site negatively, but instead will get filtered out.

All that said, it’s still infuriating to find your own content on a scraped blog. There’s at least one blog that has been scraping the Diva for some time now. Contacting site owners that scrape usually does no good. One plugin that’s helpful is WP-Ban. You can block the IP address of the scraping site.

One thing I’ve been considering is putting an absolute link within or at the end of my posts, so I at least have a link to the originating site (mine), like this:

SEO Diva

  • http://www.seodiva.net Seo Diva

    @Christoph – thanks for the info and suggestions. The process may be a PITA, but I think it would be worth it…getting ripped off like that is maddening.

  • http://www.itjobhunt.com pitagora

    The problem is that usually a bigger and more authoritative site will be looked at as being the original and this has created a very bad breed of scrapers. Bigger sites with PR4 and PR5 that look for fresh new articles from small sites, scrape them and since they are bigger and more likely to be the original authors in google’s eyes to poor little guy who actually worked gets nothing at best and a penalty at worst.

  • http://www.daviddalka.com/createvalue/ David – Chicago

    If you’re getting scrapped someone really likes your content. But it can be highly annoying! I used to have this problem all the time.

    One day I made a completely unrelated change to my blog that changed this dynamic. I added a plugin to add related posts to the actual feed.

    Almost immediately the scrapping stopped almost completely. I guess scrapers don’t like giving linkbacks! :)

    David – Chicagos last blog post..Speaking Of Facebook – It Needs Change Management of Customer Service

  • http://wickedaliens.com Wicked Aliens

    Link at the end is not so bad idea actually.

  • http://macgotme.com Brady – Mac Got Me

    In addition to my blog, I help maintain a couple others that have been scraped before – is there a way for search engines to know which content is actually the original content? Post date could be one way to tell, but even that won’t always reveal the original content.

    Brady – Mac Got Mes last blog post..Must Have Mac Apps – The Must Have Mac Software Roundup

  • http://www.oilpaintingsmarket.com Fabian Perez

    WP-Ban is useful, but it’s better just to find out where does your article go and let Google know, and they will not be able to use Adsense anymore, just like Christoph said.

  • http://macgotme.com Brady – Mac Got Me

    I’ll check out WP-Ban, but how do you let Google know? Is there a URL where you can report content thieves?

    Brady – Mac Got Mes last blog post..Must Have Mac Apps – The Must Have Mac Software Roundup

    • http://www.seodiva.net Seo Diva

      Here’s the link to the Google DCMA page on copyright infringement, with links to the various services where you can report violators: http://www.google.com/dmca.html.

  • http://www.neworleanscondotrends.com Eric- New Orleans Co

    Nice article to bring up. How do you even know if you are being scraped?

    Eric- New Orleans Cos last blog post..Warehouse District Condos-priced around 200k-A popular price range! An Update

  • http://www.seodiva.net Seo Diva

    @Eric, if you’re using WP, it will show you what’s linking to your site, and sometimes they’ll show up that way. You can also do a Google search for your titles…scrapers don’t change them.

  • http://www.softwaretestinggenius.com Yogindernath@Software Testing

    Apart from that, If you are getting scrapped someone really likes your content. But it can be highly annoying! I used to have this problem all the time.

  • http://bio-genetix.blogspot.com Jackie

    Is there a URL where you can report content thieves?

    Jackies last blog post..Understanding the methods of DNA Isolation

  • http://snaggeries.blogspot.com Brett

    I have heard that there is a thing known as Google DMCA for filing the case against the person who copies ur content.. Thats too annoying.

    Bretts last blog post..Puppy basics

    • http://www.seodiva.net Seo Diva

      @Brett – yes, there is. So far it takes more time than I want to spend, but if the scraping gets really bad I’ll take action. Having a copyright at the bottom of feeds helps. At least you get your link that way.

  • Pingback: Use Tracer to Track Blog Scrapers | SEO Diva

  • Pingback: Use Tracer to Track Blog Scrapers | SEO Blog