Supplemental Listings - How To Avoid Them

One of the biggest problems for new and veteran webmasters alike is avoiding Google's supplemental index. Pages tend to "go supplemental" especially when you launch an e-commerce site with thousands of similar, thin content pages, or if you leave your Blogspot blog's meta tags unattended (I'm guilty as charged). So, how do you avoid supplemental hell -- or if you're already there, how do you get out? There's no silver bullet, unfortunately, but here are some things you can do to improve your chances.

Update (5/11/2007): Since the release of Big Daddy, the concept of supplemental results have gotten a whole lot clearer. Sorry, I'm still too busy right now to fully update this page, but I strongly recommend you read the following blog posts first:

If you have any questions, just send me an email (halfdeck at gmail.com).

UPDATE: July 12, 2007:

According to Jill Whalen, Dan Crow, director of crawl systems at Google, recently said that "basically the supplemental index is where they put pages that have low PageRank (the real kind) or ones that don’t change very often." NOTE: That's Jill's take on what Dan said, not what Dan said. A Googler would never refer to internal PageRank as the "real" PageRank. TBPR is just as "real" as internal PageRank. The difference is TBPR is on a delay and is much less granular than internal PageRank. Internal PageRank is updated daily and is calculated using floating point numbers.

UPDATE: Aug 1, 2007:

Google hides the supplemental results label to shift webmasters' focus to bigger and better things. site:domain.com/& (shows supplemental results, though unlabeled) and site:domain.com/* (shows pages in the main index) can still be used. In the announcement over on Google's blog, an engineer added that URL complexity is another factor in determining "supplemental status."

UPDATE: Aug 8, 2007:

Tedster on WMW posts a link to a patent called System and method for selectively searching partitions of a database, originally mentioned by Bill Slawski.

Supplemental Results Detector Tool

A site can never be 100% supplemental results free. If you publish alot of pages, some of them will be supplemental. But the Supplemental Results Detector tool can be helpful in getting important pages back into the main index.

Consulting

If you think your site is clean and you tried absolutely everything, shoot me an email. I'm good at uncovering stuff even the top SEO's miss. I'm no marketing guru so don't ask me for viral marketing or link building advice but I can help you make the best of the visibility you already have in getting rid of supplemental results. The solution isn't simply to "get more backlinks." SEOs will tell you alot of advice but you need someone who will help you with the implementation.

Right now, my standard fee is $175 / hour (though its negotiable), and I guarantee a refund of $115 / hour if my services doesn't help you achieve results. Why do I guarantee a refund? Because I'm sick and tired of unethical, greedy SEO firms charging $200/hour, not delivering results, and running away with the dough, hiding behind the "you can't guarantee search engine ranking" BS. When I get a contract, my goal is to deliver results. Plain and simple. I value my reputation and I don't care to be known as another slimy SEO snakeoilsman.

Understanding Supplemental Results

If you get value-passing, trustworthy links from trusted sites that point directly at a supplemental url, that url will pop back into the main index. Getting too many wrong kinds of links (excessive reciprocal links, paid links, links injected into .edu domains) can make Google devalue your IBLs PageRanks, and send your site into "Google Hell." During Big Daddy, Google began devaluing artificial links more aggressively, which explains why many site owners complained about supplemental results during Big Daddy's release in the spring of 2006. A recent Forbes article (published in May 2007) mentions a site that went supplemental after Google discovered excessive reciprocal links pointing at the site.

Avoiding Supplemental Hell - The Short Answer

The number of supplemental results you have on your site depends on the quantity and quality of links from other domains linking to your site - otherwise known as PageRank - and how that juice is distributed throughout your site.

  1. Improve a page's PageRank by getting more quality, relevant inbound links.
  2. Improve PageRank distribution by optimizing internal links.
  3. Maintain a natural link profile to prevent inbound link devaluation.

Simply put, improve a page's PageRank by getting more quality, relevant inbound links.

How to Get Out of Supplemental Hell - One Way Ticket to Zero Traffic?

Once you're in there, it's not so easy to get out, or is it? Provided you took care of every item on the list above, you still probably won't see any positive changes for a while. That's because Google uses a special bot called supplemental Googlebot to refresh its supplemental cache. It's said it comes around every 6 months, but for one of my sites, the wait was around 12 months. That doesn't necessarily mean pages won't return to the main index for a year. GoogleGuy/Matt Cutts recently hinted Google may be running supplemental cache refreshes more frequently - so we have reason for hope. But how can you speed things up? Improve your trust with Google. That means:

Supplemental Index Key Ideas

Types of Supplemental Results

These are some of the types of supplemental results you may encounter when running a site:domain.com search:

Hacks

To see an estimate of the number of supplemental pages listed under a domain, try: site:www.mydomain.com *** I recommend you not take the results too seriously though.

You Got a Link Out of Supplemental Results?

Don't get too excited just yet. Consider the page still on the crawl fringe. If you see a page pop into the main index, act fast because if the page falls back into the supplemental index, the page's cache will freeze for months.

Take another look at your site's internal PageRank distribution. Focus on the PageRank distributed to your page and make sure most of that PageRank stays within the site. (You can do this just by adding more internal links on a page). Note: You can't always tell how much PageRank is going into a page by looking at your toolbar. We're talking about the difference between PageRank 0.15 and PageRank 0.9812 here, which all looks to you like PR0 on the toolbar. You need to use a script to figure out what's going on.

Point more links to those pages so they stay on the right side of the fence. With a little shift in PageRank, the pages can drop right back into the supplemental index. Keep in mind, PageRank isn't a straight algorithmic calculation anymore. You want PageRank from links Google won't discount. Preferably, you want organic, not-paid-for citations from reputable sites.

My Older Supplemental Listing Notes

You might find some conflicting information, unpolished ideas, or something I've already said above repeated here in my notes (which I wrote around Feb 2006 - yeah, a long time ago). I'll hopefully clean it up later on (though I doubt it..I'm up to here with work - sucks to be my own boss sometimes). For even more info on general SEO stuff, visit my blog. You can also email specific questions at halfdeck AT gmail.com.

Oh yeah..if you want your site checked over, there are tons of willing webmasters over at Google Group Webmaster Help, including me (30 min ~ an hour a day, as of 9/10/2006 - who knows how long that'll last? So far I'm enjoying it though). You'll also occasionally come across a few Googlers including Adam Lasnik and Vanessa "Buffy" Fox. Unlike some other SEO related forums, you can post specifics, so there's a better chance people will help you iron out obvious problems with your site, if any. Beware - there are some noobies and trolls in there posting inaccurate/misleading information too (no surprise, right?), so I recommend you consider all angles before going with just one guy's advice. Even a really knowledgeable SEO can be wrong sometimes, so the best policy is play your odds (e.g. avoiding hidden text because it may get you banned) and optimize via process of elimination (e.g. getting rid of "possible" problems).

Note: according to Googleguy, "the supplemental results are a new experimental feature to augment the results for obscure queries."

How do you avoid getting pages listed as supplemental? Here are a few of my guesses that'll be tested. Note: I'm not talking about cannonical problems here, like www/non-www or / versus /index.html or sites with query strings that generate duplicate pages. I'm talking about regular pages going supplemental because somehow Google/Yahoo thinks they're similar to other pages on the web.

My guess is there are several factors that come into play when deciding whether a page belongs in the main or supplemental index. But the goal of these tests is to determine how much can we get away with before a page is flagged as supplemental, so that there's no guesswork involved when publishing pages and wondering if it'll end up in the supplemental index.

 

By the way, till Google temporarily dropped this page from their index a few nights ago, I was listed under WMW, digitalpoint, and Jim Boyakin on Google for "supplemental hell." Don't ask me how that happened; probably the "fresh factor" kicking in, because I'm not optimizing this page for anything at all.

Also, I just noticed the domain's /index.html got crawled. Damn Dreamweaver. This teaches me never to link to pages using the link tool. Also, I added this line to my HTACCESS (3/22/2006). Not the cleanest mod_rewrite but it gets the job done. A good reminder to have fail-safe htaccess installed before a domain is ever crawled.

RewriteCond %{REQUEST_URI} ^/index.html [NC]
RewriteRule ^/index.html$ / [R=301]

Also, this SERP shows the / and index.html as similar. They're identical pages with cache dates 3 days apart (page text was not modified during that time). This might mean one of those pages are on its way to the supplemental index or Google loosened up its supplemental filter(?) Time will tell.

Last thing, if you're using Wordpress, their default .htaccess will rewrite /blog/ to /blog/index.php. I'll see how that pans out on Google.

Possible Reasons for Winding Up in Supplemental Hell

  1. Cannonical Problems (I'll deal with this elsewhere since most forums cover it pretty well, and they don't necessarily have negative effect to the domain as long as the key pages are indexed correctly).
    • www / non-www
    • http://www.xyz.com/ vs http://www.xyz.com/index.html
    • /dynamic.cgi?id=x&sessionid=y&options=z generating similar or identical content.
    • "Sloppy webmastering" and misconfiguration of dynamic sites can easily generate multiple urls that generate the same page.
  2. Duplicate content (text taken from some other page on the web).
  3. No content. (i.e. thin pages) aim for 200 ~ 250 words per page. From what I've seen, this is not true. Uniqueness of the content seems to matter more than filesize/wordcount.
  4. Orphaned pages.
  5. HTML head element not closed; or, body not opened. Again, not true from looking at my test pages. I left broken <head><body> tags but the pages were indexed correctly.
  6. Similar header/footer/side nav. This may be a factor especially if the navigation links, footers, and header text comprise a big percentage of the page, making dynamically generated pages very similar to each other.
  7. Content is buried in the bottom half of HTML code.
  8. Large percentage of reciprocal links. hearsay.
  9. Identical title/description. This is the easiest way to create supplemental pages.
  10. Lack of description meta. This comes into the picture if the on-page code is similar/identical to other pages.
  11. Similar descriptions across a site.
  12. Lack of incoming links. I doubt this. Why? This site has pages with only one incoming link but the pages are indexed correctly. On the other hand, if you are trying to get a page listed as supplemental back into the main index, adding more incoming links will probably help increase crawl frequency.

Test Results

How to Get Rid of Supplementals?

Supplemental Database and Main Index Are Two Separate Databases

From following WMW's Supplemental Club thread, I'm convinced that 1) Main index and supplemental index are two separate databases. 2) Supplemental index database is structured to be add-only. Once a docID gets flagged as supplemental, it will stay there forever. If a page goes back into the main index, the supplemental listing in Google will dissappear, but the page's docID and HTML text still remains in the supplemental database undeleted. Aside: If you check the cache copy of your supplemental page in all datacenters, you shoud see that all of the copy has an identical timestamp.

Supplemental test pages that underscored that duplicate content pages don't necessarily go supplemental. Now that I cutt off juice to those test pages, they should be supplemental.

Want more stuff to read? Check out other people's thoughts on supplemental results.

Back to SEO4FUN Blog