Importance of avoiding duplicate content
Getting indexed and ranking with slightly less unique content!
Many webmasters have contacted me recently with the same problem. It’s an old problem that is still very important in all search engines. Their websites weren’t indexed entirely or perticular pages couldn’t even rank for the most unique text phrases although these pages were indexed. In this post I will give some pointers on effortlessly making pages more unique.
Large database driven websites all have the same problem. Sometimes search engines do not index them fully. But when they do, many pages still don’t rank in the search results. This is caused by the uniqueness of the content and seeming importance of those pages. Let’s take an average large database driven website with for instance jobs. These sites contain large amounts of job descriptions that are all formatted in the same way. While the content is somewhat unique, the shear amount causes part of that website to be seen as duplicate.
So how does this duplicate content filtering in for instance Google work?
Duplicate content filtering isn’t a black or white issue. It has multiple shades of grey, that in the worst case penalize an entire site and in the best case just effects the ranking of a page slightly. Because all forms of duplication across pages can effect your ranking, it is very important to know how to avoid duplicate content. To ensure perceived search result quality, removing duplicates is high on the agenda of every search engine.
Real focus on a search term is best given by dedicating an entire page to those search terms. This means creating and filling pages can become a huge task. Computer generated or scraped text is a very easy way to create pages, but that is where duplicate content filters often kick in. When you want to rank for combinations between “jobs in” and every city you can think of (for instance “jobs in amsterdam”), you probably generate many copies of the same page and replace the city spot wherever you can. And that is exactly what search engines want to combat.
Duplicate area’s
Many search engines see a page as part of a website and they can distinguish between the header, footer, menu, content block, etc. In fixed blocks duplication is very common, because the header and main menu are usually the same across an entire site. In the content block duplication is less common, so any duplication there is something search engines look at more closely. While duplicate area’s on the entire page should be limited, the percentage of duplicate text in the content block is extremely important.
The more inportant the page, the more duplication is condoned
If your homepage and a page just below it are near duplicates from each other, they can still rank on the part that makes them unique (even on more competitive terms). When the near duplicates are located further down the navigation and they recieve little linkjuice, the chances of them not ranking or even beïng omitted from the index are ever increasing. Linkjuice transfer is very important and optimizing it can fix many duplication issues.

The illustration above shows 2 navigational structures from the homepage. When the homepage gets an extra link on it, pages further down recieve less linkjuice. Less linkjuice means a higher chance of getting caught by duplicate content filters. Put pages higher in your navigation or acquire external links directly to them when you want to make sure they rank in spite of duplication.
Unique mashups
The “jobs in …” example will be easily detected if the city is the only inserted text. So how can you make such a thing work without having to write loads of text? You create unique mashups!
A mashup is a collection of different types of collected content. When you write small pieces of unique text per page and collect all other content in small pieces from many different sources, search engines will love your pages!
In the “jobs in …” example: Write a fifty word intro about “jobs” per city you want to focus on. Add a list of about 10 job descriptions per city from your database. Scrape a piece of city information from a cityguide. Scrape extra pieces of additional information from other sources and finally randomize the order of those content blocks. Try to collect a total of about 300 words. Search engines are smart enough to detect this technique, but the people who use it, have been ranking for ages. The linkjuice to those pages, the amount of used sources and amount of unique text you write determine if you rank on all cities.
Keep the good content above the fold
Unique text is very labour intensive and quality text cannot be automated. But where do you need quality text? Just get your visitor to click a button before they start reading the entire page content and they won’t notice the low quality ;) Focus good usability and text quality on the top part of your page. People rarely scroll and read in detail if the function of the page is already clear and the navigation options are very obvious.
Lazy people can still score with automation, but I prefer using cheap copywriters!
February 5th, 2008 at 1:06 am
Good to read from you again! Great article.
Hopefully this will solve all duplicate content issues for once and ever (for all you lazy webmasters out there).
February 7th, 2008 at 5:28 pm
This is a smart insight that can help a lot of websites with a lot of content in roughly the same format. The example of job sites is valid, so are many webshops and many, many affiliate sites. Keeping text unique above the fold, shuffle keyword-rich duplicate text in other parts adds body and keywords.
One step further would be to shuffle entirely new pages over and over again (don’t forget to keep an eye on your server load ;))
I think the most efficient way to deal with stuff like this is a combination of both your article as a whole and your statement in the end: hire cheap copywriters, let an experienced seo editor review it and shuffle the rest!
February 7th, 2008 at 6:09 pm
Another comment I’d like to add: Keep the content per URL consistent. So do not shuffle it around on every spider crawl.
March 11th, 2008 at 5:24 am
How about the problem when you have 90 offices from one main business who all want their own website so they can rank on specific, geographical concerned, keywords but you don’t want to write really unique content for every website?
March 11th, 2008 at 8:34 pm
You must see not along content you must see full Site ( Html) it is Unique ?
most yes. Affiliate sites have the same Problem( the same script )
March 12th, 2008 at 10:37 am
@JW: yes they all require somewhat unique content for all 90 offices!
@berlin: Duplication is mainly about text. HTML differences don’t matter that much.
April 1st, 2008 at 8:09 pm
If you write your own content carefully (and let’s assume nobody *steals* it), chances of getting a penalty for duplicate content are small right?
May 7th, 2008 at 1:05 am
On one of my website I always write my content myself (in dutch) and copy some english description into it. The website is now 4 months old and even on the long tail of some uncompeted words I can’t find my website or website pages? So penalized Google my whole website?