Did you know that over 60% of the websites have duplicate content on them? And did you know that this is the number one reason Google penalizes websites? In this article I’ll show you how you can find duplicate content on your site and how you can fix it.
External vs. Internal Duplicate Content
When the same content is both on your site and on some other site, that won’t hurt you. Think about it: if it did, all you’d have to do to hurt your competitors is to take their content and put it on some other website.
The kind of duplicate content you need to worry about is the internal. That is, when the same content is on two pages of your site or can be accessed through two different URLs.
Although external duplicate content won’t hurt you, it sure helps having a lot of unique content on your site. If all your content is borrowed from other sources, you’re adding very little value to the web and Google won’t appreciate that.
Canonical URLs Issues
Try typing YourSite.com in your browser’s address bar. Does it redirect to www.YourSite.com (with the “www”)? Or maybe www.YourSite.com redirects to YourSite.com. Either way, you want to make sure that one version of your domain redirects to the other. I suggest having your non-www version redirect to your www version.
How you do this depends on what kind of server and hosting platform you use. Here’s a good tutorial.
Home Page URL
When you link to your home page, you should be linking to “/”, not to “/index.html” or “index.php”. Check all the links you have pointing to your home page and fix any links that aren’t pointing to “/”. In addition to this, create a 301 redirect so people who type /index.php are redirected to “/”. This tutorial explains how to create 301 redirects. Keep in mind that you can hire a good developer to fix all these issues for under $100, so it might not make a lot of sense for you to learn how to write code.
Different Versions of the Same Content
Do you have printer-friendly versions of your pages? Or maybe a mobile version of your site. Here you have two choices:

  • You can use the robots.txt file to tell the search engines not to index the folder where you have your PDF files or printer-friendly pages.
  • If you have several versions of the same page you can use the “rel canonical” tag. This is how it works: let’s say you can access a certain piece of content through three different URLs. You want to choose your main URL and make that one the canonical version. And then on the other two pages you can use the rel canonical tag, which tells Google what the “official” URL for that content is. This is how you use it:
    <link rel=”canonical” href=”http://www.example.com/product.php?item=swedish-fish”/>

Almost-Duplicate Content
This is a very common problem. Let’s say you have an e-commerce website with 100 different products. And, let’s say you’ve written 20-word descriptions for each of your products. Here’s the problem: each of your product pages has 20 unique words and a lot of content that doesn’t change across your whole website (like navigation menus, headers, footers, etc.)
There are two things you can do about this:

  • Reduce the amount of content that is common to all pages
  • Add some unique content to each one of your pages. Write longer descriptions, explain how the products can be used, ask customers to leave reviews, tell stories, share testimonials and leave comments. Get creative. You’ll get a better response if you offer a free giveaway to people who add content to your site.

Duplicate Content Caused by CMSs Not Set Up Properly
Most content management systems (CMS), such as WordPress, Drupal or Joomla, have duplicate content issues out of the box. The good news is that it’s easy to fix them.

  1. Make a list of the types of pages on your site, such as home page, categories, products, member profiles, etc.
  2. Take one page of each type and copy some of the text on the page
  3. Search for that text on Google (put it between quotes) and see if the text appears on more than one page. If it does, you have a duplicate content issue you need to fix

In most cases, you want to fix these issues blocking folders using Robots.txt (tutorial here) or using the rel canonical technique I talked about before.
Fixing duplicate content issues will considerably improve your Google rankings and organic traffic. Let me know if you have any questions.