Why Duplicate Content Causes Supplemental Results

Nearly a year after I had my tentative say about supplemental results, my thinking on what causes supplemental results has shifted away from duplicate content and moved more toward lack of inbounds and internal PageRank distribution, as Matt Cutts recently stated that PageRank is the primary factor in determining supplemental results.

Back around May 18, 2006, Matt Cutts had this to say about combatting supplemental results by optimizing internal link structure:

typically the depth of the directory doesn’t make any difference for us; PageRank is a much larger factor. So without knowing your site, I’d look at trying to make sure that your site is using your PageRank well. A tree structure with a certain fanout at each level is usually a good way of doing it.

Some people questioned my thinking when I advocated optimizing your internal link structure for better PageRank distribution, probably because they just hate hearing the word PageRank.

The Word PageRank Causes Friction (tangent)

So I’ve been replacing the word “PageRank” with words like “authority”, “trust”, “link juice”, “link value”, “visibility” or “link weight.” If I say “your page doesn’t have enough PageRank,” that makes some people:

cringe, like they’re at the dentist, when suddenly the dentist whips out a dental drill. I think its some kinda pavlovian response - the word “PageRank” just make some people feel ill.
look at the toolbar, and tell me “hey, this PR 0 page is in the main index, while this PR 6 page is marked supplemental. So you’re obviously wrong.” First of all, please don’t say “PR.” PR means Public Relations or Peurto Rico. Second of all, when I say TBPR, I’m talking about the Toolbar PR. When I say PageRank, I’m talking about internal PageRank, the one you can’t see. Just because the toolbar says 0 doesn’t mean there are no links to that page.
argue with me because he/she is conditioned to think PageRank doesn’t matter. She might tell me “PR is just one of 100 factors in Google’s algorithm and you should ignore it and focus on getting quality links by publishing quality content and marketing it well instead of spending hours nofollowing links to your privacy policy.” No wonder people claim “SEO is easy” when SEOs advocate ignoring internal link structure optimization in favor of “quality content.” Not that I disagree with that recommendation, but I think if I’m optimizing a site, my focus would lie more on the nuts and bolts instead of on copywriting or buying ad space.

Seriously, some people just get stupid when they hear the word “PageRank”, so to get around that debacle, I say “your page doesn’t have enough trust” or “your page doesn’t have enough quality inbound links” or “your page lacks visibility.” But really, all I’m saying is your page doesn’t have enough internal PageRank.

/tangent

Why Duplicate Content Is Responsible for Supplemental Results

Anyway, back to duplicate content. I’m starting to read some people saying duplicate content doesn’t matter at all when it comes to supplemental results. I disagree.

If you have two identical/similar content pages, both with thousands of authority inbound links, those pages are not supplemental-index-bait. Google’s solution to duplicate content is not the supplemental index. On the contrary, Googlers insist that they filter out duplicate content.

But when you’ve got links to domain.com and domain.com/index.php, you’re splitting link juice between two pages instead of one.When you got multiple urls like index.php?sessionId=290302342, index.php?sessionId=20343400, and index.php?sessionId=023123321 generating identical content and you got links to all of those urls, again, you’re splitting link weight between all those pages. The urls with just a couple of low value inbound links end up “going supplemental” while urls with tons of other sites linking gets to sit pretty in the main index.

So yeah, you can have multiple duplicate content pages in the main index as long as they’re well-linked-to. But links to multiple urls hosting identical content within your own site will dilute PageRank and cause some link-starved pages to “go supplemental.”

6 Responses to “Why Duplicate Content Causes Supplemental Results”

Supplementals are a difficult concept. I think that they are like most of Google’s algorithm - context sensitive. By that I mean that two almost-duplicate pages at PR0 may cause a supplemental problem, whereas the same two pages at PR4 probably wouldn’t.

I think the way to conceptualise it is to start it from the other end and say that pages more than X% duplicate will cause a supplemental problem. And then to realise that the value of X gets higher the more pagerank the pages have.

I’d not be surprised if anchor text played a role here too, in terms of differentiating similar pages.

VinceVinceVince said this on February 19th, 2007 at 8:19 am
Vince, interesting point. I remember some other guy posting that authority sites can get away with duplicate content while sites with low trust get dumped in the supplemental index. I see it a little differently than you though, plus keep in mind Google claims it doesn’t use a % similarity to detect duplicate content (check out my earlier duplicate content myth post for more on that).

Sites with low trust aka PageRank not making it into the main index isn’t all that surprising - that fits Matt Cutts’ explanation about supplemental index perfectly. Sites with high internal PageRank pages maintaining a strong presence in the main index isn’t surprising either. What is surprising is that in spite of Googlers insisting that Google is “pretty good” at detecting and filtering out duplicate content, I just don’t see it happening. For every song with nearly identical lyrics, you’ll find dozens of sites in the main index.

Halfdeck said this on February 19th, 2007 at 2:45 pm
I have a ratings and review site. Prior to official launch, the googlebot crawled several thousand dynamic pages and classified them as supplemental. A user will most likely not even see the search page as it is buried somewhere, who knows where — how can this improve a user’s experience from google’s perspective? I mean we all could start writing scripts to generate thousands of static pages with unique urls and no url variables, with a rock solid internal link structure, but does this help the internet in anyway? It certainly does not help me maintain the site. Whatever. Google, the new MS.

snork said this on February 26th, 2007 at 1:06 pm
Yes, though Google tells you build for people, not search engines, the reality is that we must build for both people AND search engines.

Anyway, same thing happened to one of my sites last year, so I can relate. In fact its the site I most cared about and with the biggest potential to make money.

Each page on a blog is designed to attract organic links. A CMS site with thousands of product pages, on the other hand, must generally rely on other means to increase visibility. Unless you’re amazon.com, you may have 90% of your backlinks pointing at your home page. That’s why internal link structure for CMS sites is key. Even a TBPR 10 site can have thousands of supplemental pages if not enough PageRank flows to deep pages.

Halfdeck said this on February 26th, 2007 at 4:59 pm
Definitely agree with your rant on PageRank…

Ahmed Bilal said this on March 11th, 2007 at 11:18 am
I agree with all that was said, above.

Duplicate Content, Pagerank, and Supplemental are all intricately bound together in several different ways.

g1smd said this on November 11th, 2007 at 4:10 pm

Half’s SEO Notebook