Exactly How Do You Define Duplicate Content And Should We Be Concerned About It?
As a webmaster is it necessary to know exactly what duplicate content is and is it going to make any significant difference to how your site performs?
The argument about precisely what is meant by duplicate content and whether or not duplicate content matters has been going on for a long time now and there is no sign of it going away. So precisely what constitutes duplicate content and is it a problem?
It is widely considered that duplicate content is important and, though one highly respected search engine optimization expert recently wrote an article opposing this view, even a quick trawl through the huge mountain of material which has been written on the subject recently will clearly show that this is a minority opinion.
If we accept the view that duplicate content is in fact important, then how should we define duplicate content? For instance, if I create an article for an article directory and then re-work that same article for submission to a second article directory how are the search engines going to evaluate these two articles and decide whether or not they contain duplicate content? The answer is quite simply that we don't know, but here is this webmaster's opinion.
When duplicate content checking was first undertaken by the major search engines it was very much a matter of comparing one web page in its entirety with another and there was no attempt to begin to dissect the pages and compare individual page elements. In those days you could make use of identical content and simply add an introduction and conclusion to one of the pages to fly under the duplicate content radar. Unfortunately for many those days have long since disappeared.
Today, the major search engines divide up the two pages in order to compare individual elements and here is the core of today's discussion. It is generally thought that attention is now largely restricted to the central content of a web page rather than the structure of the web page. A large number of site designers use templates when building their pages which set the structure of each page including such things as navigation bars, headers and footers. This is widely believed to be acceptable and the major search engines do not consider this as duplicate content. What the major search engines are looking at is the actual content that is contained in the body of the page. But exactly how do they check this page content?
Some people argue that this comparison is done at 'block' level (in other words at the level of individual paragraphs or sentences), but others believe that filters search for phrases or possibly even individual words. None of us really knows the answer although it would seem reasonable to conclude that the most likely basis of examination would be to use either sentence or phrase matching.
Sentence matching is quite clear-cut and simply means breaking both pages down into chunks based upon the page's punctuation. Look, for instance, this sentence:
It is fairly simple to find a good deal on a shower unit, as long as you know where to shop.
This would either be seen as one single sentence or as two sentences, depending on whether or not you use the strict definition of a full-stop as being the end of a sentence or adopt a flexible approach and use other punctuation marks, like commas.
Matching at the phrase level is a little more complex. What is a phrase? Should it have 2 or 3 or 4 or 20 words?
Just for now let us say that we are going to define a phrase as 3 words. If this is the case the following phrases would be viewed as duplicate content if they appeared on two pages which were being examined:
At that time Take a look Did you know In the end The answer is You can get In those days One way to Day to day
These five phrases are all ordinary everyday phrases that could be used on pages about growing vegetables, learning to swim, pay per click advertising or anything else you can think of. Now there are a few people who would say that the major search engines do check pages down to this level. As an example, when I questioned the staff for one popular duplicate checker (Dupecop) about how their system checked duplicate content they replied saying:
"DupeCop compares both individual words and 3-word phrases. It also ignores all punctuation and scans across sentences"
I was not surprised therefore that when I ran several articles through this system (comparing articles about dogs against articles about Christmas dec�r) I discovered that they had an average of 25% duplicate content!
Bearing this in mind, I think that it would be absurd to believe that the major search engines would have their filters set this low. But how low could the filters be set? Should they be at 4 words or 5 words or�? To be honest, your guess would be as good as mine.
Over the last few years I have written and published hundreds of articles and watched the results for signs of duplicate content penalties, as far as it is possible for anybody to do this. Upon the basis of my own experience I am content that filtering is not carried out down to the level of short phrases but ends at the sentence level. As a consequence, providing you are changing your articles down to this level, you ought to have no problem in escaping the duplicate content filters. Indeed, even if a couple of your sentences are duplicated you ought to still be fine.
About the Author
WebMarketingCentre.com provides information on article writing and article submission and is also an article directory where you can pick up a free online article for your website or ezine and to which you can submit articles on a wide variety of topics including SEO and much more.
Tell others about
this page:
Comments? Questions? Email Here