Half's SEO Notebook

Your Obsession to Rank Higher is the Final Nail in Your Coffin

Halfdeck — Wed, 28 May 2008 13:03:06 +0000

What kind of recording artist says stuff like “my goal is to make the Billboard Top 10″? Sure, recording labels may set goals like that, but did Lou Reed ever sit down and write songs just to win a Grammy? If making money is your only goal on the web, I don’t want you as a client.

I get no kick out of promoting a piece-of-crap-boring-as-hell-cookie-cutter-turnkey-site. I got to be infatuated with a site I’m promoting. It’s got to blow me away. Simon Cowell may call you “karaoke”, “cabaret”, “cruise ship”, whatever - I only live once. I don’t have time to waste promoting mediocrity.

Udi Manber recently wrote:

[Google’s] goal is always the same: improve the user experience. This is not the main goal, it is the only goal.

Don’t even think “if you build it they won’t just come” which is just a poor excuse marketers use to swipe more of your dough. If those words don’t strike a chord in your brain somewhere, nothing I do or say will help you.

Recently, several real estate agents were up in arms about Trulia using widgets and nofollows, accusing Trulia of employing “aggressive” SEO (as if that’s somehow a bad thing). The truth is Trulia is just following the SEO rule book. The real threat is the amount of money and human resources Trulia has at its finger tips. Trulia’s technology will rapidly evolve. Meanwhile an agent is boxed in by an uninspiring “SEO-friendly” template thousands of other people are using and a limited marketing budget. There’s no competition. Taking collective action against widgets is just delaying the inevitable. Even if realtors stopped linking to Trulia, sites like businessweek.com will continue to link in. If you want to talk massive inlinks, how about nearly 4,000 dofollow links to trulia from CNN.com?

So far, one and perhaps the only edge agents have over Trulia are comprehensive, up-to-date property listings. If Trulia somehow gains access to that, what then? When will people wake up and realize SEO isn’t about worrying about SPAM (site positioned above mine) its about pushing their sites to the next level? If Microsoft didn’t bother creating the XBOX360 what would their market share look like now? If you refuse to evolve, your days are numbered on the web.

Your move. The clock is ticking.

I Lurve Shari Thurow Too

Halfdeck — Fri, 07 Mar 2008 13:27:17 +0000

Recently, Shari got some flak for the condescending tone of her anti-nofollow post on SEL. But guess what? I agree with her.

Parasites of the Interweb

There are short-sighted people out there always looking for short cuts. They won’t hesitate to pay you $600/hour to hear you say “you need to write unique META descriptions.” Ya know, easy fixes and promises of big returns. It’s all about high ROI. Contribute to the community as little as possible and milk it till its drier than Erg Chebbi. It’s “you reap 10,000,000 times what you sow” syndrome. It’s about living off the web like a parasite. It’s about putting money in your pocket screw everybody else.

For those people, internal nofollow is an easy sell because its easy to implement. You don’t have to spend hours writing blog posts. You don’t have to share valuable ideas with other people. You don’t have to come up with anything original. You don’t have to dump your template site someone else spoon fed you and design a truely compelling site from the ground up. All you have to do is spend a couple of hours adding rel=nofollow to your pages.

I’m not saying short cuts don’t exist. After making thousands of bucks and 5,000+ visits/day per psuedo-spam domain using a script that took less than two hours to code, and having PPC campaigns that make me $10,000 for every $500 I put in, I know there are short cuts.

Still, there’s a time and place for every SEO tactic. We’re always short on time, and given multiple choices, we are forced to choose which path to take. Clients want instant gratification. Don’t give it to them - unless the client is willing to settle for 80% long term/20% short term strategy. If a client is unwilling to do the right thing, warn your client in advance that what you’re going to do for him/her is probably going to be a complete waste of your time and hir money. That way, a few months down the road, your client won’t come back to you and bitch that nothing is happening. The appropriate response in that scenario is “I told you so.” Don’t take the blame for your client’s bad judgement calls.

What should you change before you touch internal nofollow?

First order of the day has always got to be injecting value into your website so that its by far the best in your niche. If it isn’t the best, forget SEO, forget marketing, forget everything else. Work on improving your site.

How do I know that something I built is going to sell? I know because I use it every single day.

Second, increase visibility. No, forget “authority links from high PageRank pages.” Get noticed on Craiglist, Myspace, whatever, it doesn’t matter. If your competitor is dominating Google Maps results in your area, for example, you know what you gotta do. If you run a search on Trulia and get a bunch of listings by your competitor, you got work to do.

Many big dogs in my vertical don’t depend on Google. Sure, they get 80,000+ Google hits/day, but that doesn’t represent the majority of their traffic. They leech traffic off other sites, they buy traffic, they get on top lists, they submit videos to Youtube — they do whatever it takes to generate traffic. Is the traffic not converting? Filter it, trade it, sell ads - as long as your site is visible, there’s money to be made. You fixate on rankings and high TBPR links and guess what? You’re probably going to end up just scraping by.

Internal nofollow isn’t evil. But unless you have unlimited amount of time, you gotta prioritize your SEO campaign. Internal nofollow should not be at the top of your todo list.

Scraping 101: Extracting Anchor Text with Regexp

Halfdeck — Sat, 09 Feb 2008 03:23:18 +0000

There are many ways to skin a cat, but when it comes to scraping websites, I like parsing content with regexp. One of the biggest problems I bumped into when parsing HTML is matching opening and closing tags.

For example:

(]+>)(.*)

Ok let’s try that in English:

(]+>) matches .

(.*) *should* match anchor text (I’ll elaborate on that).

matches the closing A tag.

search engine land

will correctly extract the anchor text “search engine land.” BUT because (.*) is greedy,

search engine land is cool because vanessa fox posts there.

will incorrectly extract:

search engine land is cool because vanessa fox posts there.

as anchor text. Hmm..

So how do you fix this? Instead of using a .*, use .*? or other non-greedy modifiers like +?, ??, or {m,n}? (I haven’t tested the last three, I assume they work).

(]+>)(.*?) will extract anchor text from web pages.

Top 7 Reasons Why Optimizing Porn Sites is Hard

Halfdeck — Wed, 16 Jan 2008 19:23:06 +0000

Sebastian recently fired up an experiment so I thought I’d post this up to give his experiment some link juice. While I’m at it, I’ll write a short rant about why optimizing porn sites isn’t easy.

Digg/reddit/stumbleupon are nearly useless. Of course, there are alternatives and workarounds, but really, life would be easier if digg had a nsfw section.
Everyone is scared of linking to you. Since almost everyone uses either affiliate content shared by 1,000 people or Matrix Content that looks all the same, there’s no such thing as unique content. Mainstream people are scared of linking to porn sites. And adult webmasters will not link to each other ever unless you set up a link exchange. There is no such thing as editoral links in the porn niche (except for sex blogs, but those are for female bloggers; men generally suck at talking on and on about sex). You either have a traffic link, an affiliate link, a reciprocal link, or an internal link. Nobody links out for free.
90% of adult webmasters still don’t get the concept of one-way links. Major adult sites like penisbot was built on reciprocal linking done on a mass scale. They accept dozens of sites a day which are required to link to them and they in turn link to those sites. Those webmasters don’t see massive recip link networks as a link scheme because the links are “relevant” - as if Google could tell the difference between a page about [girl+girl action] and [slippery dildos].
There’s nothing to write about. Porn is about pics and video - not text. It’s like optimizing for flash - its a nightmare because you have to spend hours typing bullshit just so you ranks for something. Yeah, you can write paysite reviews, publish chat logs with Brooke Banner or some other hot starlet that does weekly cam shows, post barely sfw videos on youtube, and talk about a girlfriend that’s been annoying you. But on the whole, you’ll end up writing alot of jibberish and saying stuff you don’t mean.
No matter who links to you and who you link to, you’re in a bad neighborhood. If real estate sites got bitchslapped for excessive reciprocal linking, well, porn sites’ been in hot water for years. Google doesn’t trust links in adult; even a site with over 300,000 links is stuck at TBPR 5, a sign that most of those links aren’t passing much value. But even if each link only pass a trickle of juice, 300k crap links add up to #1 ranking for terms that drive over 80,000 visits/day from Google. This happens because even though you’re in a bad neighborhood and you got a busload of crap links pointed at you, everyone is also in the same rut, with a lot less links in their profile. So you still end up on top even if you break all the Google rules in the book.
Running AdWord ads is a challenge. Conversions are phenomenal; but the teen porn flag is easy to trip (different reviewers have different opinions on what is and isn’t compliant) and one disapproval too many can get your account killed.
A Yahoo! Directory link costs a small fortune. Apparently Yahoo never heard of a level playing field or they assume, like VISA does, that porn sites make way more money than mainstream sites so adult webmasters can afford to pay more every year.

And a few reasons why optimizing porn is easier than mainstream

Google image search generates a ton of sales. I have a few images on this domain but none of them makes me any money. Not so with adult traffic. If you know how to nail top position, you really don’t even need organic search traffic. One key is to have as many thumbs on a page as possible (and of course the page has to be in the main index - unless Google changed that up). Post a ton of thumbs on your home page (yeah, Google image search is primitive - PageRank is a bigger factor here, since anchor text doesn’t work). Use huge pics - which helps you rank higher if a surfer filters a search by image size, and use framebreaker JS to prevent people from just looking at one pic then backpedaling to Google porn TGP.
Yahoo Video. You can generate 600~1000 uniques a day per a set of videos thanks to Yahoo Video. The trick is to make sure traffic converts; otherwise your bandwidth bill may eat into your profits, especially if each vid is big (if each vid is 5mb, 1000 downloads = 5 gigs/day)
There are thousands of long tails you can optimize for. Model names, celebrity names, paysite names, niches, superniches, DVD titles - an endless stream of keywords you can monetize, some with little to no competition.

How to Check for Position 6 Penalty

Halfdeck — Fri, 11 Jan 2008 05:40:37 +0000

One of my clients is convinced he is smacked with a position 6 penalty. In some cases, he is coming up 7th or 8th, often when sites above him have indented listings. Where would he rank if those indented listings weren’t there?

Usually, Google pulls up indented listings if the secondary URL is relevant enough to rank on the same page as the primary URL. As tedster on WMW noted, “You can see this mechanism at work by changing your preferences to 50 or 100 results per page - that opens up the opportunity for more urls to be clustered.”

Sandboxsam, a new user on Webmasterworld, noted a trick you can use to filter out those indented listings.

1. Go to Advanced Search on Google
2. Set number of results to 20
3. change num=20 to num=6 in the adress bar

That filters out secondary URLs from search results (unless it ranks 6th or higher).

Paid Review: Internet Marketing Ninjas

Halfdeck — Thu, 10 Jan 2008 11:09:17 +0000

You Can’t Judge a Book by its Cover

A while back, Fantomaster advertised Brad Callen’s PPC Arbitrage ebook on Sphinn with a blog post containing an affiliate link, which made some people think Fantomaster was trying to make a buck by leveraging his reputation on Sphinn. His post went hot, then a mod later pulled it off the front page. I got myself into an argument with Ralph after I called the ebook “crap.” He said (paraphrasing) “how the fuck do you know if its crap if you haven’t read it? You’re judging the content based on the way its marketed. Don’t talk till you read the book.” A day later, I downloaded the ebook. It was well-written, containing how-to information anyone can follow.

Now I’m in a similar situation where I’m asked to review something that I haven’t dug my teeth into. I don’t have access to Internet Marketing Ninjas’ members area. I can tell you its a great program because Shoe explains step by step how he made his first mill, or that it sucks because the information that you pay for you already know, like how to set up a 301 redirect. But fact is no one can tell you whether or not the program is worth 3K unless they tried it. All I’m going to tell you is decide for yourself. Jim has his reputation riding on the success of this program - that by itself should tell you something.

Facts About Internet Marketing Ninjas

Internet Marketing Ninjas is a brainchild of Jim Boykin, who recently won Search Engine Journal’s The Best Link Building Blog of 2007 award. Here’s the facts:

15 hours worth of video, featuring Aaron Wall, Jill Whalen, Lee Oden, Todd Malicoat, Jeremy Shoemaker, Jim Boykin, Niel Patel, Cameron Olthuis, Bill Slawski, Christine Churchill, and Jim Gilbert

More authority SEOs will join to contribute their knowledge and more videos will be added to the members area throughout 2008.

Access to Webuildpages tools that used to be public but are now private, including Quick Backlink Checker, Backlink Anchor Text Analysis, and Strongest Pages Tool.

Topics include link buying, keyword research, PPC, linkbait, Digg, link building, and affiliate marketing.

Membership: $3k/year

Visit the site to see free preview vids.

The Buzz

Want more dirt? Check out the buzz. Since the program just launched, none of the articles really give you any specifics. Still, Hobo-web’s interview is worth a read; SEO book’s comments contain back and forth arguments about the program’s high price point; and you also might wanna check out the link to Webmaster radio’s podcast.

http://www.seodisco.com/internet-marketing-ninjas/

“I heard one of the best testimonials of Internet Marketing Ninjas from Todd Malicoat, when we were chillin’ at PubCon. When I asked him if he would advise me to shell out the money for the membership, he enthusiastically said that Shoemoney’s videos alone are worth the dough!” - Kid Disco

http://www.semscholar.com/2008/01/03/internet-marketing-ninjas-unleash-the-power/
http://www.hobo-web.co.uk/seo-blog/index.php/seo-ninja-linkbuilding-jim-boykin/>
http://www.toprankblog.com/2008/01/internet-marketing-ninja-interview-with-jim-boykin/
http://www.seroundtable.com/archives/015863.html

“If you’re in doubt of the price tag, Barry has vouched for it. ” - tamar

http://www.cartoonbarry.com/2008/01/sem_education_videos_internet.html

“It is pretty expensive, but it seems to me to be well worth the price tag.” - Barry

http://sphinn.com/story/21112
http://sphinn.com/story/21180
http://www.jimboykin.com/internet-marketing-training-seo-tools/
http://www.marketingpilgrim.com/2008/01/internet-marketing-training-course.html
http://www.netbusinessblog.com/review-internet-marketing-ninjas/ (paid review)
http://www.ilovejackdaniels.com/reviewme-reviews/internet-marketing-ninjas/ (paid review)
http://www.brucecat.com/internet-marketing-ninjas.html

“For one year recurring membership fee of $2999.00 you will get access to some of the most amazing SEO tools around plus 15 hours of free video interview ” - Bruce Cat

(emphasis mine)
http://forums.digitalpoint.com/showthread.php?t=632542
http://raven-seo-tools.com/blog/68/internet-marketing-ninjas-and-their-value-to-seo
http://www.webmasterradio.fm/Search-Engine-Optimization/Webcology/Internet-Marketing-Ninjas.htm
http://www.seobook.com/jim-boykin-launches-internet-marketing-training-tools-combo

“I agree that Jim could have shown more value upfront with some of the stuff he put in this package. Some of the tools have “never been released” next to them without stating what they do. As Jim gets feedback I am sure he will start offering more info about those.” - Aaron Wall

http://www.stumbleupon.com/url/www.internetmarketingninjas.com/

“Jim is a friend of mine, and I know he always underpromises and overdelivers.” - Sebastian

YAPAPL: Yet Another Damn Post About Paid Links

Halfdeck — Sat, 13 Oct 2007 15:06:11 +0000

“I do what I do best, I take scores. You do what you do best, try to stop guys like me.”

–Neil McCauley, Heat (1995)

Unless you were living under a rock for the last couple of weeks, you know Google’s been busy lately in its War Against Paid Links. According to Danny Sullivan, Google is now bitchslapping the “guilty high-rollers” to make an example out them and to put the fear of God into link sellers and link buyers.

While this move triggered a hailstorm of debates on ethics in the SEO community, the reality is this: Google will continue to tighten its security system. Whether Google’s stock price is $625 or $125, Google will continue to walk that path. Google will never be completely “hacker-proof.” Then again, it doesn’t need to be. Its goal isn’t to detect all paid links on the face of the Interweb.

See, the cops don’t work the streets expecting to catch every grocery-robber, rapist, and gangbanger in town (don’t get your SEO handbook in a twist because I’m comparing link sellers to criminals; its just a damn example, not a full-fledged analogy).

For example, according to ~1990 stats, only 16% of rapes are reported to the police. (Now I can see some people are gonna start asking, “but how do you define forced sex? If I tell my GF she can’t have my cream bagle unless she has sex with me, is that considered forced sex? Or what about if I rape someone but she ends up having 10 orgasms and begs me to marry her, steals my phone number and won’t stop calling me and says if I don’t marry her she’ll report me to the police for forced sex? What if she just felt a little uncomfortable for the first two minutes and then started really liking it? There’s so much gray area around the definition of forced sex maybe I’m a rapist and don’t even know it? I mean, some girls say no and then when I stop they say hey, why the hell did you stop? Keep going dammit! Do I really deserve to spend 7 years in jail and pay a $200,000 fine for using a cream bagle as a sex-bait tool? Isn’t that excessive? If my best friend rapes someone, why do I have to rat on him? Isn’t it unethical to snitch on a friend? He was drunk and he was horny. He couldn’t help himself. She was asking for it anyway, with that low cut dress exposing her boobs. She really shouldn’t dress like that. Yeah, its HER fault! Why do women make such a big deal about sex anyway? The government shouldn’t tell me what to do; it’s a free country. I should be able to do whatever I want!”)

In the United States, 1 out of every 5 women in college is raped (1995 National College Health Risk Behavior Survey). Less than half of those arrested for rape are convicted, 54% of all rape prosecutions end in either dismissal or acquittal, 21% of convicted rapists are never sentenced to jail or prison time, and 24% receive time in local jail which means that they spend an average of less than 11 months behind bars.

So in an ideal world, 649,733 rapes per year might lead to around 600,000 rapists sent to jail every year (give and take a few, considering cases where a group of men rapes the same woman or one man rapes more than one woman). In reality, only around 103,957 rapes are reported, and even if all of the perpetrators for those rapes are prosecuted, 56,136 rapists are dismissed/acquitted, and 10,042 convicted rapists don’t see jail time. In the end, we’re left with 37,779 rapists in jail out of ~600,000 - or a 6% success rate.

If you rape someone, you have a 94% chance of getting away with it.

A cop, unless he/she’s smoking crack, doesn’t expect to get from a 6% success rate (or 94% failure rate, however you want to look at it) to 100% success rate overnight. What he expects is to see that 6% inch up to something like 9%. A crackdown on rape will not stop rape from happening. Rape is simply not going anywhere. And if you rape under the radar, there’s a very low chance of ever spending time in prison. Police efforts may eventually fail. But just because the government cannot stop rape from ever happening doesn’t mean the government is going to just let it happen.

Google is a bank with a high-tech security system. Any security system, however, has its weaknesses. Matt Cutts and his Anti-Spam squad continue to plug holes in the system. They resort to scare tactics to lower the number of people trying to beat the system. They are trying to go from 6% to 9%.

An SEO professional’s job is to continue to find ways to bypass that system and “get the cheese”, as an open 9-ball hustler pal back in the Big Apple used to say.

But its not my job to whine about the system. It’s not my job to waste time questioning the legality of that system.

It’s my job to

: understand the system
: exploit any weaknesses of that system

Identify tactics, draw up a plan, execute, re-evaluate, and make another run. If you’re in this for profit, everything else is digital vapour.

“Got. Got. What do we got? *What do we got?* Bon voyage, motherfucker. You were good. I’m going to the hotel. I’m going to take a shower. I’m going to sleep, for a month. ”

–Vincent Hanna, Heat (1995)

Third Level Push (modified Siloing) For Deeper Index Penetration

Halfdeck — Thu, 23 Aug 2007 03:32:47 +0000

Third-Level Push (aka “siloing”), according to Dan Thies (who regained my attention after his recent article on Google proxy hacking), helps you get third-tier pages (e.g. article/product detail pages) in the main index and ranking higher by “taking more of the PageRank from your second tier, and pushing it down into the third tier.”

Dan explains:

In most sites, your global navigation links to the entire second tier from every page, including the home page. This causes the second tier pages to accumulate a lot of PageRank, at the expense of your third tier.

Makes perfect sense. Sites with slightly low link popularity (home page TBPR 3-4) often have no problem getting the home page and most of the category pages in the main index, but they often can’t get some of the product detail pages to stick. Why? Often its because of exactly what Dan said: the internal navigation makes the home page and second-level pages PageRank-hogs, leaving the third-level pages high and dry.

Some SEOs call Dan’s tactic “siloing”, and attribute its benefits to better themed internal linking. For example, Haylie from Bruce Clay talks about siloing, albeit with a focus on ranking, not index penetration. Siloing, in this case, is done by setting up thematic pyramids via links or directory structure. Just imagine a tree hierarchy, where leaf nodes link up to their parent, then a set of parents link up to their parent, and so on, till you reach the root node.

Dan disagrees: “At the time we all assumed this had something to do with the topics of the pages not being closely related, but we were wrong.” According to him, increase in site traffic is due to increase in PageRanks at the third-tier.

So how do you implement Third-Level Push? In brief:

1. Use nofollow to prevent second-level pages from passing PageRank to each other. This forces PageRank downwards to the third-level.

2. Use nofollow on links on third-level pages to second-level pages so that a third-level page passes PageRank to its parent page but not to any other pages in the second-tier.

3. Tiered Pairing: To prevent second-level pages from losing too much PageRank, you can link them in pairs: e.g. page A with B, C with D, and so on.

4. Circular Navigation: To circulate more PageRank on the leaf level, link them up in circular faction, so page A links to B and C, B links to C and D, etc.

That’s third-level push in a nutshell.

Note: Avoid deleting or adding links to do this (like I did); instead, just use nofollow. There’s no bigger sin than compromising user-experience for the sake of SEO (well, there probably is, but lets not get into that).

A Third Level Push Implementation for Wordpress

Does third-level push really work? I decided to use this blog as a guinea pig. But how do I implement third-level push on Wordpress? Sure, you can just Google for a “SEO Siloing” Wordpress Plugin, but that’s no fun. Warning to the faint of heart: don’t try this at home:

1. open template-functions-category.php
2. find function wp_list_cats($args = ”) declaration.
3. around line 236, under parse_str($args, $r): enter:

if ( !isset($r[’nofollow’]))
$r[’nofollow’] = FALSE;

// That sets the default nofollow value, in case no value is passed.

4. Look for the return list_cats.. line.
5. At the end of the long argument list (after $r[’heirarchical’]), type $r[’nofollow’]
6. Find function list_cats(…..).
5. At the end of the function declaration argument list @line 279, after $hierarchical=FALSE, type: “, $nofollow=FALSE”
6. Now look for the A HREF echo statement, around line 327.
7. Replace $link = ‘cat_ID).’” ‘; with:

if($nofollow==FALSE) $link = ‘cat_ID).’” ‘;
else $link = ‘cat_ID).’” rel=”nofollow” ‘;

8. Finally, open sidebar.php. Look for the wp_list_cats line for single posts (not the home page), around line # 104, that looks like: wp_list_cats(’sort_column=name&optioncount=1&hierarchical=0′);

Replace that with wp_list_cats(’sort_column=name&optioncount=1&hierarchical=0&nofollow=TRUE’);

That’s it.

Potential negative side effects: If your blog doesn’t have a lot of backlinks, your category pages might go supplemental. In that case, try Tiered Pairing.

UPDATE: Joost apparently incorporated my idea into his Robots Meta Plugin. Check it out.

Does Siloing/Third-Level Push Really Work?

So what happens to PageRank flow after I implement a third-level push?

Here’s a before and after:

Before:

(Pages in the main index are green. Notice I channeled most of my site’s PageRanks to only those URLs I want to rank, so I had some supplemental URLS but none of them I cared about.)

After:

Hmm…so basically I lost PageRank to some of my unpopular category pages. But where did all that PageRank go? To just a handful of recently-published posts, which had high PageRanks to begin with. So it doesn’t really look like I gained anything, does it? In fact, it looks to me like a whole bunch of pages might go supplemental.

See, there’s no point in having pages with too much PageRank (at least for getting pages indexed). You want a moderate amount of PageRank on as many pages as possible.

Nah, instead I want something more like this:

Notice now PageRanks are more evenly spread throughout my site.

(screenshots generated by PageRankBot).

Sitewides’ Gotta Go (Modified Third-Level Push)

The problem was I had other sitewide links besides links to category pages, like “recent posts”, “top posts”, and links to the home page. Those URLs stole the PageRanks the category pages gave up.

In short, sitewide links are bad. So what did I do?

1. Dumped sitewide links: Got rid of sitewide links to recent posts and top posts (better to nofollow them but I was in a rush) and nofollowed links to the blog home page. Used third-level push (nofollowed sitewide links to category pages), except I prevented blog articles from linking back to its parent category page to keep the page from accumulating too much PageRank.

2. Added related posts plugin to circulate PageRanks to internal pages “randomly” instead of sitewide.

Aside: Some SEOs will tell you you should never, ever, ever nofollow links to your internal pages because it sends a negative quality signal to Google. First, Vanessa Fox, an ex-Googler, confirmed that a reason nofollowing internal links may be a bad idea is because the target urls will still be indexed if other people link to them (not that I take her word blindly as Gospel but hey, I don’t have time to test everything a Googler says). My policy is to use nofollow on internal links only when I want to control the amount of PageRanks flowing into a URL but I still want to keep the URL in the main index. For example, I wouldn’t nofollow a link to my privacy policy or TOS; I would just use robots.txt disallow (sure, robots.txt doesn’t guarantee that a URL stays out of the main index, but I don’t care about that; I just don’t want internal link juice to flow to my TOS page. But if you really wanted to get rid of a URL from Google’s index, use META noindex instead of robots.txt).

Conclusion

Will this setup help me or hurt me? Time will tell. The main problem is that now my site’s PageRanks are unfocused; my top posts aren’t getting any special attention. I can modify Wordpress so that X% of links to top posts are nofollowed instead of nofollowing every single sitewide link. That way, my most important pages will have the highest PageRanks but they won’t be PageRank hogs.

I do believe that third-level push can work, as long as you have some kind of tool to make sure PageRanks are actually being pushed down to the third-tier pages. Just by nofollowing links to category pages won’t guarantee that, though preventing sitewide links from flowing juice will probably do the trick.

Why should you care about this stuff?

As Matt Cutts recently explained (emphasis mine):

You could do a similar post with a bunch of Play-Doh and show how you have a certain amount of Play-Doh (your PageRank), and you choose with your internal linking how to spread that Play-Doh throughout your site. If a given page has enough PageRank (reasonable-sized ball of Play-Doh), it can be in our main web index. If it has not-very-much PageRank (tiny ball of Play-Doh), it might be a supplemental result. And if only a miniscule iota of Play-Doh makes it to a page, then we might not get a chance to crawl that page.

The Play-Doh / PageRank metaphor is kinda disturbing, but hey, whatever works.

How To Exploit The PageRankBot Tool

Halfdeck — Wed, 08 Aug 2007 17:09:08 +0000

Building a good house means more than buying a pine dining table or 1080p Plasma TV (more “quality” content) or telling your friends about the new house you’re building (marketing). You gotta know how to use hammers, drills and nails too.

If you rather build a good site than worry about supplemental results, why are you reading SEO blogs? Come on, be honest. When’s the last time you read an SEO blog that talked in-depth about optimizing a dynamic page for fast page loads or repeating graphic elements on a page to create a sense of unity or using element size and position to establish a visual hierarchy?

Never, right?

But if you’re a control freak like me, read on.

WTH Does It Do?

Though some of you guys gave me positive feedback via comments and email about PageRankBot, I’m not sure if all of you know exactly what to do with it.

Inspite of the misleading name “Supplemental Results Detector”, its not a tool for detecting supplemental results. You have site:www.domain.com/& and site:www.domain.com/* for that. There are also other tools out there (I think Aaron Wall has one and sitemost just came out with a new tool).

I don’t really care how many of my pages are supplemental, but I do care when a page that deserves to rank in the SERP goes supplemental. One way to address that problem is PageRank distribution management. That’s what I built this tool for.

Tactics

First, figure out which pages on your site are important and which pages aren’t. Ask yourself is this page valuable to my visitors? If the answer is no, the page can go. You might also ask yourself what is this page supposed to rank for? If the answer is “contact me” or “privacy policy” then ask yourself why the hell would I want traffic for “privacy policy” and am I out of my mind thinking I can rank on the first page for “privacy policy” alongside Google, Sun, Apple, Adobe, and NY Times?

But if your “contact me” page contains your email address or IM information and your clients find you by Googling for your contact info, I would keep the page in the main index.

To mark unimportant URLs, multi-select URLs that are unimportant, then Edit > Toggle Importance.

Now flag supplemental URLs. Some of you wish the tool does this for you automatically. It doesn’t. Instead, label URLs returned by site:www.domain.com* command by going to Edit > Mark Page As > Main Index.

You can use the search tool to find URLs. For example, the following image shows a search on seo4fun.com for urls containing the word “pagerank”:

Now go to View > Filters > Hide Marked, which hides all the URLs you just marked. Select all the URLs you see, and then set their status to supplemental.

Find Your Link Targets

To manage internal PageRank flow, you add internal links to your site. Decide which page you’re going to add a link on (link source) and which page you want that link to point to (link target).

To fish out your link “targets”, view only supplemental pages and sort them by PageRank (View > Filters > Show Supplementals and then click on the PageRank column). The topmost URL marked “important” is your best candidate:

1. The page is important to you (you feel the page deserves to rank in the SERPs).
2. The page is supplemental.
3. The page with the highest PageRank = easiest url to pull back into the main index.

There’s your link “target.”

Note: If your site has multiple “entry points” (i.e. not all inbounds point to the home page), PageRank flowing into your site from those entry points will change the dynamics of how PageRank is distributed. In that case, take the PageRank values this tool gives you with a grain of salt.

If you’re anal enough to want to account for IBLs pointing at specific pages, then you can “add juice” by going to Tools > Simulate Backlinks. First, set the home page TBPR (use a float, like 4.2 for more accuracy). Go to View > Column Filters > Approximate TBPR. That will show you approximate TBPR numbers translated from raw PageRank numbers. Choose a URL, and adjust as needed using the + and - keys.

Find Your Link Sources

There’s a few ways to figure out your link “sources.” One way is to find the page with the most PageRank bleed. (Don’t believe PageRank bleeds? We’ll argue about that in another post). Amount of PageRank bleed depends on percentage of outbounds to inbounds and a URL’s (non-visible) PageRank. For example, a PageRank X URL with two outbound links and two internal links would bleed (X/4)*2 PageRank. Bigger X (increased number/quality of IBLs pointing to a URL) means more PageRank bleed. More internal links means less PageRank bleed, even if the number of outbound links stay the same.

Let’s not get too obsessed with PageRank bleeds though. You can solidify Google’s trust in your links by linking out organically. A site that doesn’t link out needs a strong set of credible, trusted IBLs to “validate” with Google (e.g. amazon.com). Consider your outbound links a part of your link profile and a key ingredient in proving to Google that your linking habits are 100% natural with no artificial colors, flavors or sweeteners (yeah, I know that was bad).

Link from Pages with the Highest PageRank Bleed

First, limit results to URLs in the main index by going to View > Filter > Main Index Only, so you only link from URLs in the main index. Then sort by Outbound PageRank (click on the “Outbound PageRank” column header. If you don’t see the colum displayed, go to View > Column Filter to activate). The topmost URL with the biggest outbound PageRank is your link “source.”

Link from Pages that Flow the Most PageRank

Another way is to find a page with that flows the most PageRank with each link. Go to View > Filter > Main Index Only. Then click on the “Increment” column header, which sorts the result in the order of PageRank flowing per link. The topmost URL with the biggest Increment bar is your link “source.”

Connect the Dots

Finally, point a link from your link source to your link target.

If your modification isn’t sitewide, select the URL you just updated and recrawl that URL only instead of recrawling the entire site to update the site’s PageRanks.

You can also try flattening out your site’s PageRank curve (see the two graphs in my previous post about Google hiding supplemental results).

Take It Slow

If your site has enough PageRank, Google should update your pages in the main index every 3-4 days, if not sooner (e.g. if you show up for Google News) - though dramatic on-page edits like rewriting a TITLE tag might make Google sit on a page for a week or two. It should take you no more than 3 days to get a URL out of the supplemental index, as long as you have enough URLs in the main index to play around with.

How Google Failed to Hide Supplemental Results

Halfdeck — Wed, 01 Aug 2007 08:48:34 +0000

If you’re an SEO with clients that are worried about supplemental results, your job just got a whole lot harder. It’s like having a patient dying of disease showing no visible symptoms. Not only does he believe he isn’t sick anymore, but you can’t tell what he’s sick of.

First, your clients should know that just because they don’t see the supplemental results label anymore, it doesn’t mean their worries are over. Their supplemental pages are still supplemental. Google is just trying to hide the fact.

They should also know that just because the label is gone doesn’t mean you can’t detect supplemental results. You can, and here’s how:

site:www.domain.com/& hack, which seems to pull up urls that used to be labeled supplemental. Of course now that Danny Sullivan blogged about it, that hack probably won’t last another week. (UPDATE: The hack was covered last week, according to Danny, the same week I pulled almost all my SEO feeds from Google Reader so I’m not bombarded by SEO news. Bad timing, I guess)
site:www.domain.com/* shows pages in the main index.
Old cache date. If s page’s cache date is old, its a sign that the page may be supplemental. Why? Because Google doesn’t refresh a supplemental result’s cache all that often. For example, my blog’s main urls have cache dates ranging from Jul 25~26, 2007 (today’s date: Aug 1, 2007) while my old supplemental pages have cache dates as old as Jul 6-7, 2007.
Low-to-none competitive term traffic. If you’re not getting Google hits for two-word queries or getting no traffic at all to a specific URL, it may be supplemental.
Uneven PageRank distribution, which you can control by downloading the Supplemental Results Detector Tool. See how sugarrae.com and seo4fun.com distributes PageRank?

See how sugarrae’s PageRank distribution is pretty even, so that there isn’t a huge gap between the home page and the deep pages? The site is 99% supplemental results free. Yeah, its a high TBPR site with only ~100 pages (which means plenty of PageRank to go around for each page) but so is vanessafox.com (TBPR 7 with ~100 pages), which has more supplemental results than sugarrae.com.

(orange urls are supplemental)
In contrast, seo4fun.com concentrates PageRank on just a handful of pages while the rest of the site gets very little attention. Consequently, some of the urls near the bottom of the chart with low link popularity are supplemental.
Low Toolbar PageRank (0 ~ 3). The toolbar is a weak indicator due to delay but more green generally means less chance of a page being supplemental.

TANGENT:

(One interesting tidbit I found in the recent Google blog post is Matt/Prashanth Koppula saying a url with complicated query strings also might go supplemental. At this point (considering the fact that Dave said stale pages go supplemental as well) it’s probably safe to assume a myriad of minor factors are involved)

(After reading the post, reasons why Google likes supplemental results are pretty clear:

1. Crawl the web more fully to serve ~1000 results (or maybe Google’s satisfied with just 10-100) for every possible search query, which means a) a happier user and b) more pages to display AdWords on.
2. Improve efficiency by taking advantage of prioritized crawling: crawl important, frequently updated pages more often while crawl less important, unupdated pages less frequently. Unfortunately, this often means only home page/top-level nav pages get indexed while pages with actual content fails to make it into the main index. I often get frustrated by a search result that lands me on a blog category page with 40+ blog post links instead of the blog post itself.)

Wrapping up:

A site with many pages in the main index receive traffic for competitive two-word queries. Traffic land on not just a handful of pages but on thousands of pages. Googlers promise that, by the end of the summer, supplemental results will generate more traffic and will rank for more terms. We’ll see. There are a ton of spam pages in the supplemental index, so Google will have to walk a thin line - otherwise odd query terms will be swamped with low PageRank spam while legitimate supplemental results never see the light of day.

Hey, it Just Sounds to Me Like You Need to Unplug, Man.

Halfdeck — Fri, 22 Jun 2007 14:50:49 +0000

Image courtesy of What Is the Matrix

If you spend hours a day surfing blogs, you’re trapped in the Matrix. Being an ex-hardcore MMORPG addict makes me an expert on getting sucked in. In that world, I met thousands of people, I slew dragons, I saved lives.

Meanwhile, in the real world, I sat in front of my computer for hours tapping on my keyboard and staring into my monitor.

I know why you’re here, Neo. I know what you’ve been doing… why you hardly sleep, why you live alone, and why night after night, you sit by your computer.

Living in a cacooning world, you give in to your urge to socialize - to connect to other people, to not feel alone, to feel important, to feel wanted.

Like throwing a rock into a lake and seeing ripples on the water or shouting at a mountain to hear the echoes of your voice, you feel a need to confirm your own existence.

But does reading about Google policing paid links help you get your laundry done? Does knowing that Yahoo redesigned its home page take care of your phone bills? Does leaving a comment about why Jason Calanis is wrong pay for your baby’s diapers?

You know, I know this steak doesn’t exist. I know that when I put it in my mouth, the Matrix is telling my brain that it is juicy and delicious. After nine years, you know what I realize?

Ignorance is bliss.

When I wake up in the morning, I instinctively fire up Google Reader to jump-start my brain. Since I’m my own boss, sometimes, that quick peek turns into hours of reading and commenting and posting and I get no work done.

Here’s a thought de jour:

You already know enough.

“Hey, it just sounds to me like you need to unplug, man. You know, get some R and R” - The Matrix

JDBC ClassNotFoundException (NetBeans, Classpath, Java)

Halfdeck — Sat, 16 Jun 2007 12:48:20 +0000

If you get a java.lang.ClassNotFoundException error when loading a database driver using the statement:

Class.forName({nameOfYourDriverWhateverItIs}).newInstance();

You can either:

Set CLASSPATH in DOS

Go into DOS (Start/Run/cmd.exe):

set CLASSPATH=.;{pathToYourJarFile}

For example, if your jar file is at: C:/Program
Files/java/jdk1.6.0_01/lib/mysql-connector-java-5.0.6-bin.jar,

Type:

set CLASSPATH=.;C:/Program Files/java/jdk1.6.0_01/lib/mysql-connector-java-5.0.6-bin.jar

Now,

javac YourJavaFile.java
java YourJavaFile

That’s all. But it won’t work if you’re trying to run code in Netbeans.

Set Your Project’s Classpath in Netbeans

If you’re using netbeans, set your project’s classpath.

1. Right-click on your project.
2. Select “Properties.”
3. Click “Libraries.”
4. Click “Add JAR/Folder”
5. Choose your MYSQL driver JAR file.

HEY. Did you find this post useful as hell? I know you did. Then link to this page, Stumble it, Del.icio.us it, do whatever you can to pump this page up the Google SERPs so other people looking for this info can find it easier. Thanks.

Why This Has Anything to Do With SEO

It took me over 30 minutes to find this solution. I was this close to throwing in the towel. So once I figured out the solution, I wrote a page I wish I found at the top of Google’s search results when I looked for “jdbc classnotfoundexception.” That way, other people looking for the same information won’t be frustrated like I was. Because I’m working in a long tail space, to get this page to rank high I just optimized the content. But I didn’t keyword spam or pepper my H tags with related terms. Instead, I optimized for you.

Yep, you.

Ok, so it’s not perfectly optimized for you if you’re an RSS subscriber of mine, since you’re interested in SEO, not Java. But if you found this page through Google, you’re thanking me now because:

I give you exactly what you were looking for.
I get straight to the point. Instead of starting off with a long irrelevant opening paragraph, I explain the problem and give you the answer. Instant gratification baby.
I keep it short. I use as few words as I can to save you time.
I keep it simple. Instead of trying to crack you up with dumb jokes or titilate you with fancy metaphors, I use simple words so even a caveman can “get it.”

Some people say ranking is about links, not content. Is it better to be tall and rich, or loyal and charismatic? Why are marketers trying to convince you its all about looks? (Yeah, just think about that for a minute :D)

Given two websites with equal visibility, the site that publishes the most compelling content will always win, showering the site with even more visibility. If marketing is all it takes, Paris Hilton would have sold more CDs.

A few articles that both helped me and frustrated me (hint: natural outbound links to relevant, authoritative pages will help you win Google’s trust):

Got Supplementals? Accepting PageRank is Only The Beginning

Halfdeck — Thu, 07 Jun 2007 15:28:34 +0000

Nowadays, supplemental results aren’t much of a mystery anymore to most people. As Matt Cutts replied to Michael Martinez at Seattle SMX, people know all they need to do is to get more “quality links.” The answer is so simple, so easily digestible that I’m starting to see people answer supplemental threads with just three words - PageRank and duplicate content. As if.

What makes you gain weight? Eating too much. Duh. Knowing that doesn’t help you much, does it? How can you lose weight and keep it off? Knowledge isn’t power. Actionable knowledge is power.

Understanding Supplemental Results and PageRank Distribution

You have countless internal-link-based tactics at your disposal to combat supplemental results. In a thread on WMW titled Supplemental Page Count Formula?, I summarized:

If your site is largely supplemental, it means 1) not enough quality inbound links to your site 2) you have too many pages 3) you link out too much 4) Google may think your IBLs are artificial 5) Cannonical issues are causing PageRank to split.

Bouncybunny replies:

I’ve never heard points 2 + 3 being relevant for pages falling into the supplementals.

I’ll leave the discussion about PageRank leaks for another day. As for point 2, I explained (in geek speak):

Think of total PageRank X (sum of all inbound PageRank to your domain) split between Y number of pages. Roughly speaking, bigger page count = lower average PageRank per page (depending on your site structure). We know that a page with PageRank below minimum threshold “goes” supplemental. With excessively high page count, average falls too low, and you’ll end up with many pages in the supplemental index. By reducing the number of pages, you slightly increase average PageRank per url. That can result in several supp pages popping back into the main index.

As a matter of fact, Shoemoney claims he got rid of some of his supplemental results by following Aaron Wall’s advice: robots.txt disallow noisy pages.

Don’t Cut Up Your Pizza Into 60,000 slices if You Ordered a Small Pie

Thanks to Andy Beal, you can hear Matt Cutts say basically the same thing:

If you got 60,000 pages, and you only got “this much” PageRank, and you divide it […he mumbles], some of them are going to be in the supplemental index. Given “this many people” who link to you, we’re willing to include “this many” pages in the main index.

The picture below shows you how PageRank is distributed on this domain, assuming an artificial scenario where all inbound links are ignored:

(click on the thumbnail to see the details)

Aside: I generated this chart using Google Docs, Photoshop, and my Supplemental Results Detector, which a few of you guys might remember I wrote back in December 2006. It’s a simple script that emulates Google’s PageRank iteration. Though it ignores inbound links and uses the original PageRank formula where all PageRanks add up to the total number of pages on a domain instead of 1, it’s a pretty good indicator of which pages on your site is prone to go supplemental. It’s a gift-from-above for PageRank horders, but at the same time, it helps you organize your internal links more strategically.

Notice:

More inbound links = bigger pie. As you guys link to me more often (hint hint), my pizza gets bigger, which makes all my slices grow. Phatter slices mean fewer supplemental results.
More artificial links = smaller pie. Reciprocal links, cheap directory links, link injections,easily detectable paid links - all of these work to some extent if you play it like Sam Fisher. Uline.com ranks on the first page for “cardboard boxes”, Thisisouryear ranks first for “website directory”, and customermagnetism ranks 13th for “search engine optimization” all thanks in part to paid links. Major adult sites also dominate competitive porn terms using hundreds of thousands of reciprocal links. But having tons of artificial links pointing at your site makes them easier for Google to detect. If Google decides to devalue the PageRanks passed by those links, your large pizza turns into a medium, sometimes causing your site to enter the realm of Google Hell.
More pages = smaller slices. As I publish more posts, I create more slices, causing all my slices to shrink. What happens to a commercial site that publishes 100,000 new pages in one day? Or what about a blogger that publishes 10 posts a day but gets completely ignored by the linkerati? They often go 99% supplemental because you’re adding a ton of more slices while the pie stays the same size. More slices mean smaller slices. And if they’re too small, they “go supplemental.” But if I publish something useful, people will link to me, increasing the size of my pie.
Fewer pages = bigger slices. Conversely, deleting pages causes the size of my other slices to grow. It’s like a page “taking one for the team.”

Move Away from Default Wordpress Setup for Better PageRank Flow

The chart also gives you an idea of how the default Wordpress template distributes PageRank. The blog home page gets the most love; category pages are second in line; recent posts are third in line. A second page of your category pages (e.g. /category/seo/2/) and old posts have the least internal link juice flowing into them by default.

If you use the Recent Posts Plugin, your recently published post gets love from the blog home page, first page on your category and archive pages, and every single posts page. If you use install a Top Posts Plugin, you can direct extra juice (and traffic) to your favorite posts (see Jim Boykin’s blog, though the way he has it set up is kinda fugly). The Related Posts plugin can help maintain internal linkage between old posts.

Internal Linking Tactic is Half the Battle

As Adam Lasnik would probably tell you, the best cure for supplemental results is to create original, compelling content, marketing it, and getting links you deserve. But you can also get mileage out of working with the PageRank you already have. Think of it like tweaking one of your landing pages to improve your CTR. A 2% increase can add up to alot of money. Remember, people used to believe duplicate text caused supplemental results. But its duplicate urls creating more slices than you need that’s partly to blame, as Vanessa Fox recently confirmed on her blog:

Does having duplicate content cause sites to be placed there? Nope, that’s mostly an indirect effect. If you have pages that are duplicates or very similar, then your backlinks are likely distributed among those pages, so your PageRank may be more diluted than if you had one consolidated page that all the backlinks pointed to. And lower PageRank may cause pages to be supplemental.

How Will People Find Your Site If Search Engines Didn’t Exist?

Halfdeck — Tue, 22 May 2007 15:55:17 +0000

Summary (for people who don’t have time to read blogs all day. We should be building stuff instead of bitching about Google, yeah?): If you depend on Google to survive, your site sucks. If your site gets the same level of traffic from Google a day but your daily visits isn’t rising, your site sucks. For big companies, SEO is just an afterthought. Universalstudios.com has a few SEO flaws, but it doesn’t matter.

Big Companies Are Already Highly Visible

In the white hat world, you need to develop a valuable product and then launch a marketing campaign to bring eyeballs to your product. If your a newbie webmaster with a domain name no one’s ever heard of, you have a long road ahead of you. First, you need to build a product to sell (aka. website). Pour millions of visitors on a piece of crap and your conversion ratios will look worse than plentyoffish and traffic will go right through you like you weren’t even there. Once your done developing a great product, you need to go on a marketing blitz, because the greatest website in the world will sound like a tree that fell in a forest if no one knows your site exists.

On the other hand, Universal Studios, MGM, McDonalds, IBM, Apple … these companies existed long before the birth of search engines. These companies have access to mass media (TV news, commercials, billboards, radio, newspapers, magazines, movies). They branded their names permanently into our collective psyche. Even if Google didn’t exist, people know how to find them on the web.

Google is Just a Middle Man

When do you need Google? You need Google when you’re looking for something but don’t know where to find it. But as time goes on, you figure out where to find whatever you’re looking for, so that Google becomes a middle man you no longer need. If you were looking for Ali Larter’s pics, for example, you might first surf Google Images. If you were looking for a list of movies she starred in, you go to imdb. If you wanted to know the names of Asia Carrera’s kids or if she’s still having financial problems, you either go to her official website or go to Wikipedia. If you wanted to buy a DVI adapter or a new PSU, you go to newegg. If you wanted to download MTV videos on YouTube that might disappear by tomorrow. you visit keepvid. If you felt like stalking Vanessa Fox, you’d go to twitter (Vanessa, don’t worry. I’m too busy to stalk anyone :D).

See? If you know where to find what you’re looking for, you don’t really need Google, do you? (unless you’re thinking of sites like Webmasterword with lousy search features)

For Big Brand Names, SEO is An Afterthought

For companies like Universal Studios, SEO is almost an afterthought. The bulk of their “marketing” that’s been going on for decades penetrates households that don’t even own a computer or don’t have enough money to pay for internet access.

If you already got high visibility, all you really need to do then is build a razzle-dazzle 59 points out of 60 website that turns your visitors into marketers.

Building a Website with Compounding Traffic

Have you ever lost thousands of daily uniques because your Google ranking suddenly tanked?

You have? Ok. But what happened to 30,000 people that visited your site last week?

If your site’s stickier than cyanoacrylate, 100 daily uniques from Google would pile up into 1,000 visitors/day after 10 days. Do you see that happening with your site? If your site’s traffic isn’t rising, you need to work on content, not SEO. I’m not just talking about building more pages. I’m talking about injecting value into your site, making it amazing, mind-blowing, unforgettable. Sure, adding more pages targeting more phrases might increase number of incoming daily hits, but what good does a million pennies in your pocket do you if you have a gaping hole in your pocket?

In investing, you look for high interest rates that compound year after year after year. Investing $4K a year in a Roth IRA at 8% average interest rate, for example, will turn you into a millionare in 30 years all thanks to compounding interest.

Most people know this, yet they often don’t put much effort into building a website with compounding traffic.

Universal Studios Can’t Care Less About SEO

Universal Studios obviously hasn’t bothered to SEO their official website, universalstudios.com:

100% Flash pages like this makes some SEOs frown. But see, it’s a dichotomy: on the one hand, people say SEO is 99.9% about links. They also say all Flash and no text is bad SEO. Makes you wonder if people spreading these ideas are capable of logical thinking.
Home page redirects to index.php. That’s not necessarily bad if it was a 301 redirect. But it’s a 302.
The second site: search result triggers a 404.
According to the home page META keywords, the home page wants to rank for matt damon,ppv, vod, on demand, pay-per-view, jerry springer. Despite the TBPR 7, universalstudios.com ranks for none of those words because the IBL anchor texts are untargeted. According to SEO Digger, the home page does rank for over 637 terms.

It doesn’t matter though. If Google vanished tomorrow, people will still visit universalstudios.com. What will happen to your site if Google vanished tomorrow?

Stop Blowing Money on Text Link Ads

Halfdeck — Mon, 14 May 2007 16:20:03 +0000

Not all paid links (don’t forget to read the update) are easy to detect, but if you think Google can’t detect blatant schemes like Text Link Ads you’re dreaming. They stick out like a geek on American Idol because:

Anchor text targets money terms. Anchor text like “buy viagra”, “car insurance”, and “search engine optimization” are more likely to raise Google’s suspicion than if you used anchor text like “click here for more info on why Text Link Ads suck.”
Links are off-topic. (e.g. “online casino” link on a “paris hotel” site)
No context. Anchor text without context is like fish out of water. Sure, there are legit reasons for linking out from your sidebar. But the reason is unclear because there’s no surrounding text to provide context.

If you’re buying links through link brokers like Text Link Ads, stop wasting your money. Your chance of avoiding detection is lower than Boris Yeltsin scoring a one-night-stand with Maria Belluci. Instead, surf on over to technorati, search for topics related to your niche, sort by authority, and start at the top of your list. Email the owner of every blogger you see there, offering them money for blogging about your site. Remember, every blogger has a price, so don’t take no for an answer.

If You Lack Motivation, Read This

Halfdeck — Sun, 29 Apr 2007 06:00:59 +0000

Success is going from failure to failure without a loss of enthusiasm. - Winston Churchill

Work is the refuge of people who have nothing better to do. - Oscar Wilde

There’s a difference between knowing the path and walking the path - Morpheus, The Matrix

If we keep doing what we’re doing, we’re going to keep getting what we’re getting. - Stephen R. Covey

If you are failing to plan, you are planning to fail.

Don’t sweat the small stuff…and it’s all small stuff.

The measure of success is not whether you have a tough problem to deal with, but whether it’s the same problem you had last year.

Everything that irritates us about others can lead us to an understanding of ourselves.

Work is love made visible. And if you cannot work with love but only with distaste, it is better that you should leave your work and sit at the gate of the temple and take alms of those who work with joy.

If you don’t do it excellently, don’t do it at all. Because if it’s not excellent, it won’t be profitable or fun, and if you’re not in business for fun or profit, what the hell are you doing there?

Many attempts to communicate are nullified by saying too much.

I am grateful for all my problems. I became stronger and more able to meet those that were still to come. - J.C. Penney

The vision must be followed by the venture. It is not enough to stare up the steps - we must step up the stairs.

If each of us sweeps in front of our own steps, the whole world would be clean.

Flow with whatever is happening and let your mind be free. Stay centered by accepting whatever you are doing. This is the ultimate.

If the world seems cold to you, kindle fires to warm it.

The heart of a fool is in his mouth, but the mouth of the wise man is in his heart.

All that is gold does not glitter; not all those that wander are lost.

More wise sayings.

Google’s Motives Are Selfish - So Are Yours and Mine

Halfdeck — Tue, 17 Apr 2007 23:52:24 +0000

As Graywolf said, Google’s motive for cleaning up the SERPs is self-serving and revenue-driven. Robert Scoble explains in more detail:

Why does Google care? Well, Google’s relevancy rankings will be hurt if people can buy their way onto their pages instead of earn their way to those search results pages by doing the best content, etc. Lots of people are doing comparisons of Google’s search results to Yahoo, Ask, and Microsoft’s search engines. If Google’s result set isn’t the best Google’s market share will start to go down as people figure out there are better engines out there. That, in turn, will hurt Google’s advertising business.

Not to mention that if advertisers know there’s a cheaper way to get onto Google’s search engine than by buying an ad, they’ll go with that system. So, Google has a LOT of incentive to swat down PayPerPost and pay-per-link style systems.

Aaron Wall also points out that Google would make more money on Adwords if Google makes search results harder to game:

The more I think about it the more I realize why Google doesn’t like the various flavors of paid links. It has nothing to do with organic search relevancy.

Not quite. Google cares alot about organic search relevancy. But here’s the catch. As I commented on Aaron’s blog,

Google wants informational sites on the organic results front page while forcing commercial sites to battle it out in the right column. Searchers looking for information on “coffee” will be happy with his organic results (Wikipedia, nationalgeographic, coffeereview, coffeeuniverse), and searchers looking to buy coffee will click on adwords and buy.

If commercial sites show up in organic results, there’s no reason for people to click on Adwords.

That means Google not only wants highly relevant organic results, but highly relevant results that are also non-commercial.

(Time to write meatier, more “informational” articles and dump those aff links from thin pages, huh?)

So Google is smacking down on paid links to increase its profit margin.

So what?

Do you read SEO blogs because they’re fun to read? Do you buy Adword ads because you want to help people find better products? Give me a break. I run ads to make money - pure and simple. And just because I have selfish reasons for giving you a great product doesn’t take away from the value of my product. Sure, Sony makes money off selling PS3s - so what? You want them to make them for free?

We’re all in it for the money, so “Google is selfish” objection doesn’t wash.

21 Reasons Why Anti-Nofollow SEOs Can’t Think Straight

Halfdeck — Mon, 16 Apr 2007 18:30:46 +0000

Paid link buyers and sellers are nothing but black hat spammers (though you’ll never catch me saying spamming is right or wrong - to quote Vlad from Max Payne 2, “you have to do what you have to do.“). Now I’m hearing alot of regurgitation going on, so I compiled a list of 21 major compelling (and not so compelling) anti-Google-paid-link-policy objections I came across on the Net in the last few days.

And the nominees are…

Google, you’re just trying to make more money on Adwords.
Rebuttal: Just because I make money on a product doesn’t take away from the value of my product. You’re gonna tell me Sony’s evil because it doesn’t make PS3s for free?
Innocent sites will get penalized if Google guesses wrong or if a site owner isn’t familiar with Google’s Guidelines.
Rebuttal: None. This is a fair point and Google needs to address it.
It’s Google’s own fault for building a link-dependent algo.
Rebuttal: There’s no way Google can judge the quality or accuracy of content on a page unless they build a machine that can read and think. News flash - We’re not there yet. Google HAS to rely on links. There’s no way around it. You might bash NASA for not being able to build space colonies, but I got no respect for armchair quarterbacks that don’t even understand the game.
There’s no other way for some niche sites to link build.
Rebuttal: Graywolf argues against people “who don’t think linkbaiting can be used to their boring clients such as carpet cleaners, when I came up with five ideas in less than 15 minutes.” Porn is one tough niche to crack because no one links for free, but other niches in comparison aren’t so tough. Is no one linking to you? Maybe your website isn’t offering anything new or valuable.
Google is telling webmasters to build for search engines, not for people.
Rebuttal: Have we all forgotten that Matt Cutts said build for both search engines and people?. Now I agree rel=nofollow is building for search engines. Absolutely. Most users won’t even know its there. So we’re stuck in a grey area. But so what? Building for search engines is what SEOs have always done. Why do you pay $10,000 a month for links that send you 2 hits a day? Why do you waste hours writing unique META description tags? Why do you charge $600 an hour for SEO services? To improve usability? Gimme a break. And if you really have issues with building for search engines, maybe its time you quit your profession, because SEO isn’t about content building.
Paid links are whiter than other spam tactics like cloaking or hidden text, so why doesn’t Google go after more nefarious tactics first?
Rebuttal: Who says Google isn’t working on a solution for better cloak detection? If you assumed paid links is the only thing Google’s spam team is working on, you assumed wrong.
Small site owners who make a living off selling links will go broke. You don’t wanna see them get evicted or something, do you Matt?
Rebuttal: You’re making a living off contributing to spam and my heart should bleed because … why? Find something better to do with your websites.
This is all FUD. Google is lousy at paid link detection.
Rebuttal: Yeah, some of it is FUD. If Google could detect paid links, they wouldn’t need site owners to tag paid links with nofollow; they’d just auto devalue paid links without all this media hype and move on. And for easy-to-detect links (can you say Text Link Ads?) they probably already do. If a big chunk of your paid links are automated or above the radar, this isn’t FUD. You are fucked. If all your paid links are contextual, relevant, and point to high quality sites, then yeah, its FUD.
Most links involve some sort of compensation, even if money doesn’t exchange hands.
Rebuttal: Compensation isn’t the problem. Even marriage is a kind of a trade. I get to have sex every night with a beautiful wife in exchange for providing a roof over her head, helping her make babies, and giving her money to buy expensive jewelry and clothes. Ok so did she marry me for my money or did she marry me because she loves me? Compensation is a non-issue; almost everything in life is a trade. It’s the intent that’s in question.
I aint’ worried. Some paid links are impossible to detect.
Rebuttal: Yeah, some individual links are undetectable. But many aren’t that hard to detect. And if Google detects a pattern of manipulative intent, your entire IBLs will become suspect.
Google, you’re not being realistic. You can’t expect dishonest people to behave honestly.
Rebuttal: None. Google needs to find a completely automated paid link detection method that doesn’t depend on people’s good will. Rule breakers will always break rules. From that POV, nofollow doesn’t work.
Google makes money off link sellers like Text Link Ads by letting them run Google ads.
Rebuttal: Google’s out to make money like everyone else. Besides, don’t blame Google’s Spam Team for what Google’s Adwords people do. They’re two completely different breeds of people.
Pay Per Action doesn’t offer disclosure until you mouseover.
Rebuttal: Big deal.
Reporting paid links is snitching.
Rebuttal: Spam Report’s been available to the public for years, and most people (except Sugarrae) have used it at least once. You’ve even bitched about Googlers not acting on your spam reports fast enough (34 million results? Wow). So why the sudden uproar? There’s no bad karma in filing a spam report on a spammer. If you buy links, yeah, that makes you a spammer. But why worry? Googlers go out of its way not to manually penalize sites because manual bans don’t scale. That means hours of tweaking and testing before you see any spammer get penalized. As for “paidlinks” report, Google doesn’t even have a working algo in place yet. So why are you panicking?
Matt, be clearer about what’s a paid link and what isn’t. How about charities that links to a list of donors. Are those paid links?
Rebuttal: None. Even though some links are obviously paid for, others aren’t so obvious. Google needs to clearly define what constitutes paid and what doesn’t.
You can damage your competitor by buying links to his site then reporting those links to Google.
Rebuttal: Google is running a beta test on an algorithm that targets thousands of sites, not just one particular site. And history says Google punishes link sellers, not buyers. That may change over time (Andy Beal: “Lasnik explains, why penalize hundreds of sites that sell just a single link, when it’s the recipient that is clearly benefiting?”) but I don’t see that day coming anytime soon. This is SEO FUD defense against Google’s FUD - and it ain’t pretty.
Using paid link reports to spot spam introduces a human factor in Google’s algo.
Rebuttal: According to tedster, a WMW mod, “Google is already using human input to a degree, and they’ve even patented a more scalable method for integrating editorial oversight without needing to rely on it for everything.” I don’t see Google fully automating everything. There are always going to be Google Adwords reviewers, Google Video submission reviewers, people who read spam reports and site reinclusion requests, engineers who think up new algorithms, PHDs that develop new BETA products… Sure, I’m sure Google would like to automate everything, but a human factor isn’t being “introduced” - its always been a factor.
Google, why are you cramming the Ten Commandments down webmasters’ throats?
Rebuttal: Wanna cloak? Keyword spam? Use hidden text? Build doorway pages? Buy links? Go ahead. No one’s stopping ya. Do whatever you want with your site - it’s your site. But when you walk into someone else’s house you respect their rules or you’ll be asked to leave. It’s that simple. I like what Linkmoses said on Matt’s blog:

Ultimately we only have one choice to make. We either follow Google’s reco’s or we don’t. Nobody is forcing anything on us. Like the speed limit, we can -choose- to drive faster, and usually don’t get caught. Like the speed limit, we have no right to act shocked if we are pulled over.
If a link points to a relevant, quality site then compensation is irrelevant.
Rebuttal: Everyone has a price. Anyone who insists he/she won’t link to a crap site for any amount of money is blowing smoke. If I offered you one million dollars to link to a page that said something really nasty about your mother, you’ll not only link to it but send 10K uniques/day to it using Flash banners and Adbrite.
Paid links improve search results. Successful companies with quality products and the baddest buying power deserve top rankings.
Rebuttal: Let me introduce to you the players in this game. SEOs: These guys make money off fortune 500 companies who pay them $550/hour to buy up links. Without this tool, SEOs lose their edge. As Rand Fiskin says, “you’d be at a huge competitive disadvantage to your slightly less pointy-white-hat competitor.” Link sellers: these guys live off this monster of a marketplace; they do not want to lose their ability to make thousands of bucks a month on links. Big companies: their ranking depends heavily on paid links - these guys don’t want paid links to go away either. Mom and Pop website owners (yeah, you): you guys are basically screwed. The top 10 spots will be dominated by companies with millions to blow on links and these small website owners can’t compete. The delusion is that every webmaster is made to believe that by buying links, he can someday rank in the top 10, or if not rank at least a few spots higher. The reality is that no matter how much money you spend, if 10 websites outspend you, you’re never gonna show up on the front page. If the 10 richest companies dominate top SERP positions, 99% of you are screwed.
Google, you made PageRank a commodity by displaying it in the Toolbar.
Rebuttal: Only novice link buyers rely on the toolbar to find potential link sources. I assume Jim Boykin does alot of link buying, but I doubt he ever looks at the toolbar when measuring up a potential buy.
Google, you’re being hypocritical. You said Yahoo Directory is ok because people pay for the review, not the link. So if someone pays me, I review his/her link, and then add the link to my site, why should I get penalized?
Rebuttal: None. I can’t wrap my head around this one. I understand Google needs expert pages like Yahoo! Directory or DMOZ to calculate topic-dependent authority scores or calculate TrustRank, but to me it sounds like you’re skirting the issue. Alot of people who sell links review and reject link requests.
Aren’t Adwords and adsense paid links?
Rebuttal: First, Google has no problem with paid links for traffic/advertisement. Get that through your thick skull. Second, neither Adsense nor Adwords pass PageRank. Google’s search pages are disallowed. Notice the line “Disallow:/search”?
It’s not our job to police the internet..
Rebuttal: Did Matt Cutts offer you money to report paid links? If not, I don’t consider that a job. As for policing the internet, you’re not policing unless you spot a cheater and get him banned. Google isn’t interested in banning anyone. They’re interested in BETA testing their new algorithms. So its more like collecting guinea pigs than playing the town sheriff.

You want to see me try to counter them, right? I might later, but its Monday and I got alot of other stuff to do. *ducks*

For now, here’s my off-the-cuff advice - something you already know. If you’re shopping around for links, buy them under the radar. There are some paid links Google will never be able to detect, but a service like Text Link Ads isn’t one of them. Those links scream “paid links”, and the company is too visible. Assuming Text Link Ads links still carry some juice, they’ll be the first sinking ship among many if Google has its way.

UPDATE: This list is growing by the minute - 24 objections and counting. I updated this post this morning with rebuttals so people like Nick will have something to sink their teeth into.

Why Does Viagra.com 302 Its Home Page?

Halfdeck — Wed, 11 Apr 2007 16:50:30 +0000

Why did viagra.com drill a 302 into their home page? And why does the redirect dump me on this cryptic, Google-unfriendly URL:

http://www.viagra.com/content/index.jsp?setShowOn=../content/index.jsp
&setShowHighlightOn=../content/index.jsp

No wonder viagra.com is beat by .edu spam for “buy viagra.” It seems the company opted out of hiring a competent webmaster (no you don’t need an SEO to figure out redirects). Or is there something major usability-oriented thingie I’m missing? Nah, I can’t think of one legit explanation. Can you?

BTW, Vanessa Fox Nude is now 5th for “vanessa fox”, and 1st for “vanessa fox nude.” Her new blog outranks DaveN and Search Engine Land, two very “authoritative” sites. That’s reality folks. On the other hand, she won’t have an easy time outranking webmastercentral, but if she reveals more skin (er..how Google really works), she’ll make it to the top.

Why in Elvis’ Name Do I Blog?

Halfdeck — Tue, 10 Apr 2007 06:04:54 +0000

Michael Goldberg tagged me. I was also tagged earlier by SEM Zone and I even wrote a response but I forgot all about it. UPDATE: JLH and John aka Softplus also tagged me. John says I’m not revealing enough about myself, hmmm…

So, should I post a serious reply, try to be funny like Wayne Knight fumbling through Thank God You’re Here, or write a link bait piece like I was Rand Fishkin Jr.?

I suck at talking about my cats, so I’ll let my honesty bore you to death.

In December 2005, one of my money sites went completely supplemental. I’m talkin’ a 2000+ page site reduced to 3 pages. This was back before Big Daddy when no one had a clue about the supplemental index. On WMW, everyone was fixated on duplicate content, gs1md leading that discussion with a ton of insights into combatting canonical issues. The SEO “experts” outside of WMW didn’t have much of a clue. For example, This is the first article I ever read about supplemental results, written by Jim Boykin:

Now, for the dirt - how to get out.
1. If you stole content - change it.
2. If there’s no content - add some.
3. If it’s orphaned - link to it.

(Love your blog Jim, but first impressions die hard :D) On Sept 6, 2006, Ammon over on cre8 said this about the supplemental index:

Supplemental usually means “Google knows of this URL, but has not spidered the document recently for it to be in the main index”.

As late as Nov 2006, in reaction to my post claiming low PageRank was the primary factor producing supplemental results, Rand Fishkin responded:

I think it’s bogus - maybe it’s the primary factor in that a huge number of pages that are no longer linked to (in site structures from large sites) drop into supplemental, but for most of the real pages that webmasters want in the index that get dropped, I don’t think PageRank is playing a big role.

Now, Michael Martinez bitched about my Supplemental Results page being irrelevant, out of date, and first tier, (which, in retrospect, isn’t completely untrue)

The article may very well create a buzz and go on to become one of the SEO community’s standard references on how to deal with Google’s Supplemental Index. And the irony is that it’s wrong, even though the correct answer (as far as what I have seen work in the past few weeks) is buried amidst all the bad/good advice.

but what he doesn’t know is there was virtually nothing on the web about supplemental results in the Spring of 2006 which is when I started writing that page. Even though the page feels near obsolete now, back then it was ahead of the curve.

Anyway, during Big Daddy’s release, Matt Cutts mentioned that lack of trust in the inlinks/outlinks of a site leads to PageRank devaluation, which leads to low overall PageRank for a domain, which leads to pages dropping out of the main index - which exposes supplemental results.

But guess what? No one was listening, or didn’t want to listen, because

They resisted letting go of the idea that PageRank is dead.

When you look at the mechanics behind any piece of code, you discover function calls, loops, if/then statements, variables. Like it or not, PageRank is one of those variables. While it remains inside Google’s code, it maintains its influence, however slight, regardless of what anyone outside of Googleplex wants to believe.

Marketers who excell at writing digg-happy headlines will tell you what sounds cool - but what do they know? For example, marketers say “trust” alot (yeah, I know, I do too). Trust in the scope of supplemental results isn’t about domains; its about links, exchanged links, paid links. It’s about pattern detection, not authority. It’s about link devaluation, not a ranking boost. Trust in the scope of TrustRank has to do with high PageRank sites penalized in search results when they got lousy link profiles. TrustRank doesn’t effect low PageRank sites.

But stuff like that bores the crap out of most readers. You want to read stuff that gets you more sales. You want to know how to game Digg. You don’t want to waste time trying to postulate theories about an uncracked algorithm.

If you look at my archive links, you’d notice I started this blog on March 2006, right around the release of Big Daddy.

So, I guess the one and only reason I started this blog is a selfish one - I used this blog like a sailor uses a compass while lost at sea.

Then again, I was never really that lost to begin with.

I’ll tag these guys:

Ireland SEO Marketing
John Andrews
Peter T Davis
Red Cardinal
Scoreboard Media Group

Proof is in the SERPS - Overestimating Domain Authority, Take 2

Halfdeck — Fri, 06 Apr 2007 05:03:00 +0000

In my previous long-ass post titled Free SEO Course - Overestimating Domain Authority, I dared to outrank Peter Da Vanzo from V7n blog for the keyphrases “free seo course” and “seo course.” More than anything, I felt like challenging a slick sound bite that - while holding a morsel of truth - distorted the picture beyond recognition:

It’s not what you publish, its where.

Authority Matters

Where you publish a post matters. No question about it. V7n ranking high with just a handful of links says volumes about how authority score still remains a force to be reckoned with. Red Cardinal outranking my post with very few backlinks also proves content - even anchor text (if there aren’t enough of them from the right sources) - isn’t everything.

Nothing Really Matters

But “trust” and “reputation” isn’t everything either. Peter is telling me all that really matters is a big rep. Is SEO that simple? Is driving in NASCAR just about having Jeff Gordon behind the wheel? Is a 9-ball power break just about hitting a rack hard as hell?

If reputation is everything, I dare Peter to change the title of his post to “Expensive Golf Course” and see if it can maintain its position for “seo course.” All the so-called SEO’s in-the-know will tell you the post will tank.

There are hundreds of factors. We hear that 24/7 - but let’s say it again.

There are hundreds of factors. Yeah, only a few of them really make a big difference. But there’s no factor that is so dominant that you can forsake all the rest.

SERP for “free seo course”. Screen cap for posterity (plus this SERP flip-flops more often than John Kerry)

SERP for “seo course.” V7n is now off the front page.

Don’t Think You Can’t Outrank Wikipedia

What does this prove? First, it proves that even if your website doesn’t have that visibility you need to break into the “SEO circule of trust”, you can still outrank authoritative sites like V7N, SE Roundtable, and even Wikipedia. A gazillion links to a domain makes internal links that much more powerful, but you only need to outrank a page, not the entire domain. Put another way - even if a domain has millions of links pointing at it, if one of its pages has a weak link profile, you can beat it.

Second, it tells me anchor text creates relevance, and enough relevance will trump “reputation,” PageRank, or what have you.

Thanks to everyone that linked to me, but for the final push that put me on the map, the credit goes to SEO Theory.

UPDATE: Since I’m done with this test, I asked Michael to remove links to my post.

Matt Cutts Defaces Dark SEO Team - or Does He?

Halfdeck — Mon, 02 Apr 2007 16:34:54 +0000

Yesterday, you probably noticed Matt Cutts’ blog supposedly defaced by the Dark SEO team (ya know, those guys who have all those fake TBPR pages up), as reported by Search Engine Land in a piece titled Matt Cutts gets hacked.

The Dark SEO Team has had a bit of a beef with Google’s Matt Cutts from back in 2005 over URL hijacking. Looks like they’ve pulled a prank on him today. Matt’s blog is down, hacked

Well it turns out the prank was Matt Cutts’ own doing, inspired by his wife. To tell ya the truth, he got me too because I really believed his site got hacked. A WMW member even pointed out there’s some French in the source code, and that he didn’t think Matt spoke French.

But now check out Dark seo team’s home page. Who’s responsible for that one?

If you view source, you’ll find stuff like “phe4r !! THIS IS NOT KEYWORDS STUFFING !! Matt just try to be funny and that’s work, all the seo world is loud of laugh !”

The funniest line on the page is “Google is just a stupid algorithm relying on spammy backlinks. But you guys had no right to let everyone know.”

The photo you see is The Original GOOGLE Computer Storage wrapped in a wall of rainbow-colored LEGO:

Crawling the web to obtain its link structure required an enormous amount of storage in comparison with typical student projects at that time. We show here the original storage assembly, containing 10 4 Gigabyte disk drives, giving 40 Gbytes total.

40 Gigs! (we’re talking 1996) That’s about how much space I try to keep open for periodic degrag. Come to think of it, I got 200 GB inside my rig and 200 GB on an external drive (yeah, I should buy a cheap 500GB drive). So I could theoretically crawl the entire web without overflowing my hard drive.

Anyway, I hope you had a fun April Fools Day.

Free SEO Course Offering Expert Training - Overestimating Domain Authority

Halfdeck — Tue, 20 Mar 2007 11:47:49 +0000

If you ever paid for an SEO course run by an expert SEO training you to conquer the Intarweb, you’d be taught what you’ve known all along - that the two major factors Google uses to rank a page are relevance and value.

(Hey, if you decide to skip reading this long-ass free expert seo course training post, please do me a favor and at least read the last two short paragraphs.)

Relevance, How You Build It

For the uninitiated SEO course newbie, relevance is determined in part by what a page claims its about (e.g. page TITLE, on-page text, keywords in domain name, keywords in url) and what other people say its about (anchor text). If a page claims to be about “minestrone soup” but if everybody links to the page with the anchor text “cold pizza bought two days ago”, its not going to rank well for either of those terms. You can’t just claim to be Jimmy Kimmel if you’re calling up Mastercard; you need to prove it (e.g. social security number, driver’s license number, your favorite rock band while sleeping through classes at Ed W. Clark High School). Anchor text verifies who you say you are. Any basic free SEO training course worth its salt will teach you that.

Value, the other side of the Coin

Value, on the other hand, is determined primarily by PageRank. Put simply, Jim Boykin recommending one of your blog posts, for example, is a greater indication of value than some nameless guy on 12lzafksaldf0234.wordpress.org mentioning you.

According to a Google patent filed in 2004 titled Adaptive computation of ranking, calculating relevance is more computationally intensive than calculating value:

One approach to ranking documents involves examining the intrinsic content of each document or the back-link anchor text in parents to each document. This approach can be computationally intensive and often fails to assign highest ranks to the most important documents. Another approach to ranking involves examining the extrinsic relationships between documents, i.e. from the link structure of the directed graph…

Although link-based ranking techniques are improvements over prior techniques, in the case of an extremely large database, such as the World Wide Web which contains billions of pages, the computation of the ranks for all the pages can take considerable time.

(Emphasis mine)

That’s one strike against relevance.

Domain Strength Inflates Page Value?

In addition, some SEO experts claim that domain strength adds ranking power to a page. Rand Fishkin recently claimed that migrating (301 redirecting) his Web 2.0 piece from a relatively new domain to a directory under the wings of SEOmoz.org dramatically improved the page’s ranking. His page currently sits at 5th position for “Web 2.0.”

Peter Da Vanzo from V7N published a post today also supporting that claim, citing his recently published piece titled Free SEO Course: What Is Search Marketing?, which currently ranks 7th on Google for the term “seo course”, as evidence.

Peter didn’t mention the fact that his page isn’t unoptimized for the term “SEO course.” He’s got the term “FREE SEO course” in the blog title. He also has “SEO course” embedded in two H tags. There are 7 on-page occurances of the word “course” and 5 occurances of the term “SEO course.” He’s got words “free”, “seo”, “course” in the URL. He also has a couple of other domains (Search Engine Land included) linking to his post using the title, which validates the page’s own assertion as being about “free SEO course.” Finally, there’s an internal link from free SEO Course Part Six:Advanced Tips and Tricks pointing to the first installment of his SEO course series with the anchor text “free SEO course.”

I agree that a page published on a strong domain is likely to be more trusted. A new page on a new domain is like a new kid on the block that needs to pass a few tests before being accepted by the community. But does domain strength improve a page’s value? I’m not too sure.

PageRank per page on a domain with massive IBLs also tend to average out higher. In the Web 2.0 page’s case, Rand claims IBLs remained basically unchanged.

Page Value or Relevance? No-brainer: Be Valuable and Relevant

Many folks shooting down PageRank on forums and attributing power to relevance instead still seem to maintain that both trust and domain authority are important. However, domain authority, trust, and PageRank are metrics of page value, not relevance. Google uses both to rank pages, but which gives you more bang for your buck?

According to this TrustRank paper (PDF), as I posted over on cre8asiteforums, pages with higher PageRank tend to be displayed higher in search results. TrustRank algorithm targets high PageRank spam pages because they are more likely to be seen by users:

For example, say we have four pages p, q, r, and s, whose contents match a given set of query terms equally well [they’re equally relevant]. If the search engine uses PageRank to order the results, the page with highest rank, say p, will be displayed first, followed by the page with next highest rank, say q, and so on. Since it is more likely the user will be interested in pages p and q, as opposed to pages r and s (pages r and s may even appear on later result pages and may not even be seen by the user), it seems more useful to obtain accurate trust scores for pages p and q rather than for r and s.

In other words, theTrustRank paper attributes importance to both relevance and PageRank, where PageRank acts as a tie breaker when pages are equally relevant. How often Google finds equally relevant documents is another question, though. If that scenario seldom comes up, PageRank becomes a non-issue, because there are no ties to break.

But wait a minute - as Mike Grehan points out in his Topic Distillation paper (PDF):

..just because they are relevant doesn’t mean they are the most useful, or for that matter, the most important. Kleinberg himself calls this the “abundance problem” and states that the number of pages that could reasonably be returned as relevant is far too large for a human user to digest.

What You Get from this Free SEO Course Expert Training Post

My SEO blog is invisible compared to blogs like v7n, so if Peter is right, I can’t outrank him just by keyword spamming “seo course” all over this post or by gaining links to this post with “seo course” in the anchor text, because, according to him, “it’s not what you publish, it’s where.”

I disagree. If you’re still reading, please help me prove it. So far, I see 7+ links pointing to Peter’s free seo course post (according to live.com). If I can get at least 8 links pointing to this post with “seo course” somewhere in your anchor text, that should be enough to either prove or disprove my claim that what you publish and what people think of your page is as important if not more than where a page is published.

UPDATE: Red Cardinal left me kinda speechless with a generous post. Also thanks to megabluewave and irelandseomarketing for linking up. (Trackbacks are nofollow-free).

3/24/2007: Hmm, looks like Google dropped this page (guess I tripped a filter); it was ranking 7th for “free seo course” and 19th for “seo course.” But notice now Red Cardinal is ranking 7th for “free seo course” and 15th for “seo course.” :) Is his TBPR 6 home page helping him out? My TBPR 4 blog root is still ranking, but is ranking low, IMO mainly due to lack of keywords in the TITLE.

UPDATE: 4/9/2007. I outrank v7n for both keywords, at least temporarily. Read this for the full update.
UPDATE: I asked some IBLS to be removed, so don’t be surprised to see my site fall back in the SERPs.

P.S. For a stronger counterargument, see Rand’s piece on Parasite Hosting.

How Do 6 SEOs Miss The Obvious?

Halfdeck — Wed, 07 Mar 2007 07:31:06 +0000

I used to spend some time looking at sites people posted up on Google Group Webmaster Help as a form of recreation. I enjoyed looking at real examples instead of listening to assertions on WMW people could not back up. So when I read Search Engine Journals 6 SEOs-of-caliber write their take on techsmith.com, I was hoping to be amazed. Its kind of like the SEO-version of American Idol, where 6 expert judges look at your site and tell you what’s hot, what’s not, and what you can do to make it better. It’s potentially a huge plus for site owners looking for SEO advice and a huge plus for the SEO community as well which has been defending itself recently against badly-uninformed claims that SEO is bullshit.

By the time I was done reading, I thought to myself “these guys got it covered.” In fact, Michael Martinez commented: “That was one of the most thorough, informative online analyses I have read in a very, very long time.” That’s like hearing Nancy Pelosi say something nice about the Bush Administration.

To be frank, I wasn’t too impressed with the ALT tag and H element suggestions, because those are not the prime on-page factors (keyword frequency and keywords at the beginning of TITLE is all you need), but otherwise my initial reaction was that the article was spot on.

But that was before I ran a link: search on techsmith and found several major domains like snagit.com and camtasia.com 302 redirecting into techsmith.com. Now 302 redirects are useful for linking out to affiliate sites, but if you’re trying to transfer link juice from an old domain to a new directory, you should use a 301 redirect instead. Because they’re using a 302, that link juice isn’t getting transferred and the target urls aren’t benefiting from the redirect.

More importantly, when I dug up techsmith.com’s outlinks, I was a little baffled to find this:

That’s right. Outbound blog links to porn spam. And yeah, those links are linking without protection. The URL isn’t cached yet, but the directory isn’t noindexed via robots.txt either, so the links will do some harm once the page gets indexed.

Am I the only guy who bothers to look at the outlinks of a site to make sure a site isn’t accidentally linking out to a bunch of crap? To be fair, the site does have a load of outbound links, but it only took me a few seconds to find these links, and there’s enough links on that post its not like looking for a needle in a haystack. Anyway, I’m mostly impressed and a little disappointed, which reminds me I wanna watch American Idol tomorrow night, but I’ll probably watch Friday Night Lights instead.

YouTube Taking Down More Music Videos

Halfdeck — Sun, 25 Feb 2007 16:28:52 +0000

Looking to download videos from YouTube for free?

1. Download a free FLV player from http://www.jeroenwijering.com/?item=Flash_video_Player.
2. Go to Keepvid.com.

Pretty simple huh?

So I woke up today to find out YouTube’s pulling more and more music videos. Out of 20 videos I bookmarked (yeah I know, not a whole lot), around 16 of them are now gone. It was just a matter of time and I understand why they gotta do this but it doesn’t mean I gotta like it. There are other ways to get my hands on my favorite music vids like P2P but I prefer watching videos on sites like AOL, MSN or YouTube where I don’t have to wait for songs to download and I didn’t have to bend over backwards hunting them down. It’s easy to download songs that are still available on Youtube, though alot of the songs that used to be on there are either going or gone. Yeah, I know, I’m such a damn freeloader. Anyway, in the next few months, YouTube will go from your ultimate source of copyrighted material to a place where you can only find silly people doing silly things.

UPDATE: According to the Washington Post, YouTube failed to hammer out a deal with Viacom, who owns MTV and Comedy Central. Viacom instead struck up a deal with Joost. By disallowing YouTube from airing videos, websites like MTV.com and comedycentral.com are seeing a rise in traffic.

But some people don’t think Viacom is being too smart “because its[YouTube’s] audience surpasses that of any other video site, with 100 million video streams a day. “

Why Duplicate Content Causes Supplemental Results

Halfdeck — Mon, 19 Feb 2007 10:55:28 +0000

Nearly a year after I had my tentative say about supplemental results, my thinking on what causes supplemental results has shifted away from duplicate content and moved more toward lack of inbounds and internal PageRank distribution, as Matt Cutts recently stated that PageRank is the primary factor in determining supplemental results.

Back around May 18, 2006, Matt Cutts had this to say about combatting supplemental results by optimizing internal link structure:

typically the depth of the directory doesn’t make any difference for us; PageRank is a much larger factor. So without knowing your site, I’d look at trying to make sure that your site is using your PageRank well. A tree structure with a certain fanout at each level is usually a good way of doing it.

Some people questioned my thinking when I advocated optimizing your internal link structure for better PageRank distribution, probably because they just hate hearing the word PageRank.

The Word PageRank Causes Friction (tangent)

So I’ve been replacing the word “PageRank” with words like “authority”, “trust”, “link juice”, “link value”, “visibility” or “link weight.” If I say “your page doesn’t have enough PageRank,” that makes some people:

cringe, like they’re at the dentist, when suddenly the dentist whips out a dental drill. I think its some kinda pavlovian response - the word “PageRank” just make some people feel ill.
look at the toolbar, and tell me “hey, this PR 0 page is in the main index, while this PR 6 page is marked supplemental. So you’re obviously wrong.” First of all, please don’t say “PR.” PR means Public Relations or Peurto Rico. Second of all, when I say TBPR, I’m talking about the Toolbar PR. When I say PageRank, I’m talking about internal PageRank, the one you can’t see. Just because the toolbar says 0 doesn’t mean there are no links to that page.
argue with me because he/she is conditioned to think PageRank doesn’t matter. She might tell me “PR is just one of 100 factors in Google’s algorithm and you should ignore it and focus on getting quality links by publishing quality content and marketing it well instead of spending hours nofollowing links to your privacy policy.” No wonder people claim “SEO is easy” when SEOs advocate ignoring internal link structure optimization in favor of “quality content.” Not that I disagree with that recommendation, but I think if I’m optimizing a site, my focus would lie more on the nuts and bolts instead of on copywriting or buying ad space.

Seriously, some people just get stupid when they hear the word “PageRank”, so to get around that debacle, I say “your page doesn’t have enough trust” or “your page doesn’t have enough quality inbound links” or “your page lacks visibility.” But really, all I’m saying is your page doesn’t have enough internal PageRank.

/tangent

Why Duplicate Content Is Responsible for Supplemental Results

Anyway, back to duplicate content. I’m starting to read some people saying duplicate content doesn’t matter at all when it comes to supplemental results. I disagree.

If you have two identical/similar content pages, both with thousands of authority inbound links, those pages are not supplemental-index-bait. Google’s solution to duplicate content is not the supplemental index. On the contrary, Googlers insist that they filter out duplicate content.

But when you’ve got links to domain.com and domain.com/index.php, you’re splitting link juice between two pages instead of one.When you got multiple urls like index.php?sessionId=290302342, index.php?sessionId=20343400, and index.php?sessionId=023123321 generating identical content and you got links to all of those urls, again, you’re splitting link weight between all those pages. The urls with just a couple of low value inbound links end up “going supplemental” while urls with tons of other sites linking gets to sit pretty in the main index.

So yeah, you can have multiple duplicate content pages in the main index as long as they’re well-linked-to. But links to multiple urls hosting identical content within your own site will dilute PageRank and cause some link-starved pages to “go supplemental.”

Results Don’t Always Say They’re Personalized

Halfdeck — Sat, 03 Feb 2007 17:47:02 +0000

Aaron Wall and Graywolf reported that Google is turning off personalization notification:

In the past they typically placed a turn off personalized results whenever your results were personalized, but now they do not disclose when they are personalizing the results - Aaron Wall

If Aaron’s correct that they are increasing personalized search and turning off notification, and I have no reason to doubt him… - Graywolf

Has Google gone completely mad? I ran a few searches while logged into my Google account. Here’s a couple screenies of what I found:

A search for “search engines” returns personalized search results. You can see the text “Personalized results 1-100 of about…” above the digg subscribed link. The text “Turn off Personalized Search (Beta) for these results” is gone.

A search for “paris hilton” returns ordinary results. Notice the usual text “Results 1-100 of about…” above the results.

After having run a few other searches, these are personalized:

‘mysql tree”
“ajax”
“youtube”
“wired magazine”

These aren’t personalized:

“britney spears”
“cpr”
“how to bake cake”
“is baking cake easy”
“is seo easier than baking cake”

It’s true that personalization used to be an option I could opt out of while remaining logged into my Google account, but now its no longer optional. Like Sep says on Googeblog, “If you don’t want to see personalized results, just sign out of your Google Account.” I do think that personalization should remain optional; I shouldn’t have to log out of my Google account to get rid of personalization. But Google still does tell you when results are being personalized. Of course, if you use Google alot, after a while more results will be personalized than not.

Is Google being Evil?

First, repeat after me: Greed, for lack of a better word, is good. Greed is right; greed works. Greed clarifies, cuts through, and captures the essence of the evolutionary spirit.

But seriously, Google isn’t restricting your choices by personalizing results. Google is just shifting the order of items on the menu, moving down sites you hate, and ranking sites you like slightly higher. So saying personalization is like being force-fed by Google is like saying Paris Hilton is the best woman I’ve ever seen.

Still, I don’t quite get why search history is dependent on personalization. Like Danny Sullivan says, personalized results is dependent on search history, not the other way around. Technically, search history doesn’t need personalized results option activated in order for it to work:

“Creating a Google Account will enable Search History. Search History is a feature that will provide you with a more personalized experience on Google that includes more relevant search results and recommendations.”

So How Does this Personalization Stuff Work?

This Personalization-is-the-enemy-of-all-SEOs-either-living-or-dead debate got me kind of curious about my search history and how it relates to personalized results. Here are some of the queries I used since yesterday:

“wikipedia long term” personalized
“javascript get element TD” Not
“david bowie changes lyrics” Not personalized
“fire hydrant ” not personalized.
“fire hydrant” personalized. WTH? But this effect disappeared after I submitted the query a few more times.
“Modified Preorder tree traversal” nope
“Modified Preorder tree” personalized
“Modified Preorder posts” personalized
“Modified Preorder algorithm” personalized
“Modified Preorder algorithm posts” nay

Does Google not like queries longer than 3 words when it comes to personalization?

Let’s try this again:

“wikipedia long term” personalized
“wikipedia long” personalized
“wikipedia long term memory” niet

“half life 2 entanglement” nicht personifiziert
“half life entanglement” personalizado

last none:

“google base” персонализировано
“google base store” personnalisé
“google base store help” ordinary results.

Ok, my head is starting to hurt.

Interested in digging a little deeper? A few personalization patents.

Long Tail De Jour: what’s the point in adding more pages to my site if google doesn’t index them or puts them into the supplemental index?

Halfdeck — Tue, 23 Jan 2007 11:12:27 +0000

A couple days ago, someone hit my site with this longtail:

what’s the point in adding more pages to my site if google doesn’t index them or puts them into the supplemental index

Crazy, long huh? :) When I start talking to myself, my friends start to worry. When I start talking to Google, I need a doctor :D

The obvious answer: Forget Google. Build your site for visitors.
Reply: What visitors? Where am I gonna get the traffic if I don’t show up on Google?
Counter reply: There are tons of other ways of generating traffic besides Google. There’s always MSN and Yahoo. Besides, people can see you instantly on Technorati if you publish a post on a blog, or *gasp* bookmark yourself on del.icio.us.
Rebuttal: Yeah right. MSN sends me like 1 hit a week. That’ll buy me a pair of shoes maybe if I wait like 5 months.
Me: That’s your problem. You’ve got the unenlightened self-interest bug.
Reply: The….what??

Definition of selfishness, according to Wikipedia:

a selfish person deliberately focuses on his own agenda, rather than that of others.

But wait a minute:

un-selfishness is a deliberate act, rather than selfishness, which tends to occur naturally. In the animal kingdom, few species exhibit unselfish behavior.

Not to mention selfishness isn’t all bad:

Selfishness has some good qualities such as productivity or the taking of personal responsibility. One view is that since one needs to act in a mainly self-interested way in order to advance in life doing so should not be regarded as wrong, or labelled as harmful or inappropriate.

Now check out “rational selfishness” or “rational self-interest”:

The philosophy holds that individuals should not act on momentary self-interested whims but on what is in their long-term self-interest, which is defined to require respecting the individual liberty of others by refraining from initiating coercion against them.

And we finally end up with Enlightened self-interest:

Enlightened self-interest is a philosophy in ethics which states that persons who act to further the interests of others (or the interests of the group or groups to which they belong), ultimately serve their own self-interest.

So what’s unenlightened self-interest?

Unenlightened self-interest, in which it is argued that when most or all persons act according to their own myopic selfishness, that the group suffers loss as a result of conflict, decreased efficiency because of lack of cooperation, and the increased expense each individual pays for the protection of their own interests.

Oook, so now you’re probably thinking, what the hell does this unenlightened self-interest thingie got to do with getting indexed in Google?

Here’s a shocking revelation: some people care more about money than about putting out the best product in their niche. If you’re Shoemoney, you don’t need organic ranking to make big bucks. You really don’t even need a website to make money with PPC. But if you want to rank high in organic search results, what in the God’s name do you think you’re doing trying to get to the top with a mediocre site no one in the right mind wants to recommend? :)

What’s a good business model? Amazing your clients. You under promise and over deliver. Want to rank #1 on Google? Build a site that deserves to be #1. Focus on what other people want and contribute to society instead of obsessing about how much money you need to pay your mortage or buy another game on your PS3. What’s that old saying?

You reap what you sow.

P.S. Yeah I realize this post sounds preachy, but its good to get hit over the head with a hammer once in a while to maintain perspective, don’t you think? :)

UPDATE: Matt Cutts asks “What value is my site offering to users?“

SEO Myth: There is No Duplicate Content Penalty

Halfdeck — Fri, 05 Jan 2007 09:59:52 +0000

This is probably old news to black hats, but I often hear people say “there’s no duplicate content penalty.” Newbies worry they’ll incur some kind of penalty for having identical copyright text across 100 pages or something, and other people like me jump in to alleviate their fears: “Google doesn’t penalize duplicate content; it filters them out.”

However, a few months ago, back when I still believed supplemental results were largely due to duplicate content, I ran a test to try to figure out exactly what % similarity I had to hit for pages to squeeze into the main index. I created a directory with several pages: one original page, then several other similar pages of varying similarity, from 90% similar down to 20%. Initially, all the pages got into the main index. Then after a few months, the entire directory poofed. If Google was filtering duplicate content, then I’d assume the original page, at least, to remain indexed. I also expected a page that was only 20% similar to stay in the index. But no. Every page in that directory disappeared from both the main and supplemental index. At that point, I suspected that Googlebot was refusing to index any page in that directory.

Here’s the logic. Say I have a site with 245,230,991 pages and at least 60% of those pages are very similar. Does Googlebot really want to spend the time and effort to crawl all those pages? Keep in mind, since Big Daddy, Google’s been very picky with what pages to crawl and index. PageRank became an anti-spammer weapon built to protect Googlebot from crawling thousands of low-value spam pages with nothing but guest book links pointing at them. So if Googlebot thinks that a good number of pages within a directory are too similar, then it would make sense to not only filter those pages out but to prevent future crawling of any pages in that directory.

Caveman says something similar in this post started way back in Sep. 29, 2005 (several months before Big Daddy):

The fact that even within a single site, when pages are deemed too similar, G is not throwing out the dups - they’re throwing out ALL the similar pages…if they find four pages on the same site about a certain kind of bee, and the four pages are similarly structured, and one is a main page for that bee, and the other three are subpages about the same bee, each reflecting a variation of that bee, the site owner now seems to run the risk that they will find all of the pages too similar, and filter them all, not just the three subpages.

Anyway, today I was re-reading a Webmasterworld thread regarding Adam Lasnik’s Duplicate Content post, and happened on a few interesting comments Adam wrote:

Some guy asked: Why not build into your webmaster toolkit something like a “Duplicate Content” threshold meter?

Adam responds:

The fact that duplicate content isn’t very cut and dry for us either (e.g., it’s not “if more than [x]% of words on page A match page B…”) makes this a complicated prospect.

Todd wrote about this 6 months ago:

If it was as easy as saying that any page with more than 42% duplicate content will be filtered from the search results, then all site owners and SEO’s would probably grab 40% duplicate content for every page filler. It IS NOT a percentage.

Maybe I should go over to Bill Slawski’s blog and search for some duplicate content patents.

In regards to duplicate content penalty (emphasis mine), Adam says:

As I noted in the original post, penalties in the context of duplicate content are rare. Ignoring duplicate content or just picking a canonical version is MUCH more typical…Again, this very, very rarely triggers a penalty. I can only recall seeing penalties when a site is perceived to be particularly “empty” + redundant; e.g., a reasonable person looking at it would cry “krikey! it’s all basically the same junk on every page!”

So if I take Adam’s word for it, Google does penalize sites for duplicate content, though its a once on a DVD night kinda thing (I cancelled my Netflix like a year ago).

Sex Blogs Tank Over Christmas - Matt Cutts Responds

Halfdeck — Thu, 28 Dec 2006 15:00:04 +0000

Just an early morning post about a Boing Boing post titled Google “disappears” sex blogs? Something’s broken. For the main dish, you can go read their post, but I did find an interesting trail left by Matt Cutts on Comstock Films comment section, dated Dec. 27, 2006:

Hey, I’m an engineer at Google. I just wanted to say that different people at Google saw this and were asking about it, so we’ll check out these reports from the sites such as tiny nibbles and others.

On the 23rd, Matt said in a blog post:

I know for a fact that there haven’t been any major algorithm updates to our scoring in the last few days

I doubt Google did anything major between Dec 23 and Dec 27th, and I must say the hype I’ve read so far is a little over the top, but I’m not in the habit of jumping to conclusions either way.

Since no one is actively competing against these sex blogs for terms like “Violet Blue”, “Tiny Nibbles” or “Comstock Films”, a drop in ranking is a big signal - but of what? Algorithm shift? I doubt it. A bug at the Plex? More likely. Also, its interesting to note that these blogs are already back on #1 for their site names, which was their biggest complaint.

I don’t buy the Adsense / organic search conspiracy theory and I also don’t buy the SEO Blackhat theory about Google actively suppressing adult sites from page one, though I do see it happening if people who ran Google Ads ever took over Google’s Search Quality team.

BTW, did you notice SEO Blackhat is on page one for “free porn” for a blog post that’s no more than one month old (published Nov 21, 2006)? 350+ clean IBLs and bam, you leave hundreds of thousands of cheap, bad-neighborhood link trades in the dust.

Happy Holidays Guys

Halfdeck — Mon, 25 Dec 2006 01:50:05 +0000

I’m still kinda irked at myself for not buying myself a PS3 this Xmas, but I guess I’ll get over it. Anyway, I hope you guys are having a great holiday.

The real reason I’m posting though, is I was looking through my traffic tracker log tonight, and I saw another hit on this page I wrote about supplemental results for the term “supplemental hell.” It’s a zero-competition (~525 websites referencing the exact phrase), close-to-nil traffic query term (0 searches a month according to Overture), and makes me absolutely no money, but I’ve been stuck in 2nd place below Jim Boykin for a while, and I wanted to try ranking one spot higher.

You might have noticed I recently added “aka supplemental hell” to the page title, though I doubt that was enough, since ranking stayed the same when Google refreshed the page’s cache. And those cheap blogdrive links have been there since, like, forever.

Ok, so at least temporarily, I got what I wanted. But this is a good reminder to not neglect keyword research - because, the truth is, ranking for “supplemental hell” doesn’t do me any good - at least in the short term - for paying my bills. In contrast, I got a different non-SEO related page ranking 1st for a money term right now, and with conversion at ~1:150 (last period: 1:161; previous period:1:197; peroid before that:1:62), sales just keep coming and coming (knock on wood). And the difference in conversions once the page hit 1st place was rather dramatic. I was like, “where the hell are all these sales coming from? And what’s up with ultra low ratios?” (previously, I was converting at ~1:600+, and niche average ratio can fall anywhere between 1:300 and 1:1200. 1:62 is like, insanely good, though I admit I have another page averaging 1:35ish over the last 3 months.).

So I strongly disagree with those who think SEO is not about ranking. It’s not JUST about ranking - hopefully that we can all agree on.

WMW Google Bashing Over Reciprocal Links

Halfdeck — Tue, 19 Dec 2006 13:57:47 +0000

Some people over at WMW are going ballistic over Stephanie’s recent article on Google Webmaster Central Blog about “non-earned” links, titled Building link-based popularity. These guys who, I guess, pay their bills by swapping and buying links are miffed over the thought of losing their Google ranking and are shooting the messanger. A big chunk of the thread is pure anti-Google noise, but here’s a few “note-to-self” kinda quotes I found interesting.

For example, Adam Lasnik steps in to clear up the confusion in a thread inaptly titled It’s Official: Google Discounting Reciprocal Link Exchanges , started by martinibuster, a mod for the WMW link building forum:

Whoa!
This is a lot of speculation about reciprocal linking in response to an official blog entry, when there’s not even one mention of “reciprocal” on the entire page ;-). Take a step back, look at the bigger picture, take a deep breath!

He continues, debunking a claim that the blogspot post was just a Googler’s opinion:

What part of “Official” in the title didn’t resonate with you? :). The people who write on our Webmaster blog are either engineers or product managers — or those who work directly with them — in Search Quality and Webmaster Tools.

Ok, so it’s official. Got it.

Then he elaborates on the difference between a reciprocal link intended as a genuine citation versus a reciprocal link intended to increase PageRank:

If a Webmaster is engaging in reciprocal linking in a way that clearly indicates to us that he or she is doing so to garner PageRank, not out of a genuine interest for that other site… well, that’s the sort of linking scheme we don’t see as very user-friendly. Are we apt to ban that Webmaster’s site? I highly doubt it. Are we likely to value those links less? Quite possibly.

Glengara, a senior WMW member, makes a good distinction:

A reciprocal link can just be coincidental, an exchanged one denotes some deliberation, and it’s the deliberate targeting of the PR algo through linkage that the blogpost is all about.

Note to self: From now on, I’m not going to use the term “reciprocal” links to refer to traded links. I’m going to say “exchanged” or “traded” links instead.

In regards to Google penalizing sites for exchanged links, Adam calms all fears:

“I hope not. That isn’t reality. Our aim isn’t to penalize sites, it’s to deftly determine when and to what extent a link is indeed a “vote” for a site.”

Marcia, another Senior WMW member, also reminds people of an Adam Lasnik quote posted in that long-ass Supplemental Thread in Google Groups:

The key here is, indeed, moderation :). If, say, 90% of your backlinks are reciprocal, that’s probably not going to improve how our algorithms view your site. Or worse, if 90% of your backlinks are reciprocal and not likely to be of interest to your user.

But exchanging links here and there — *especially* when
done with clear editorial judgement (e.g., you’re not just
accepting dozens of link exchanges willy-nilly) — that’s
not the sort of thing Google looks down upon.

To sum up, reciprocal linking isn’t dead. To what extent Google devalues those types of links depends, based on what we’ve heard from Googlers so far, on a few factors, including 1) intent, 2) topical relevance, 3) percentage of reciprocal links to the total number of inbound links, and 4) most likely the level of trust Google has with the sites you’re linking out to / linked from (number of pages a site has in Google’s main index versus number of actual pages is one indicator).

UPDATE: I just read Rand’s take on the whole issue (Yeah I nofollowed it - it feels like he’s baiting). To be frank, I’m a little surprised by his attempt at working the tired “there’s nothing wrong with paid links” angle. Like I posted over at SEOmoz, buying links is gaming the system. “I admit paying for links skews search results in my favor, but as long as I don’t get caught, I’ll keep doing it.” - a statement like that I’d had no problems with.

Blog Tag: Five Things You Didn’t Know about Me

Halfdeck — Sun, 17 Dec 2006 13:40:48 +0000

Thanks alot John for tagging me so I can spend my Sunday morning in front of a computer - before the sun even has a chance to rise :D Here’s five things you didn’t know about me and would have never ever guessed:

1. I never got drunk, smoked cigs, or did any drugs till I went to Vassar College. I started out from “what’s sex on the beach?” to stone drunk every night by the end of my first semester. Oh, and yeah, coffee IS a drug.

2. I majored in Comp Sci / Cognitive Science because I didn’t believe in God or the existence of a human soul. By creating a machine that can think, I’d prove that man is just a complex machine. I guess I was in part rebelling against my Catholic background. BTW, back then, I was answering emails on a VAX, writing papers on MacWrite, and the internet didn’t even exist.

3. When I was 21, the local media reported I was missing and believed to be dead. On their front page, they wrote: “a straight-A Vassar Student (close to flunking out is more like it) has been missing for 3 days and was last seen with a major drug kingpin, who at this time is in police custody. He claims he did not kill [me] and doesn’t know of his whereabouts.”

It’s weird to see yourself on the front cover of a newspaper, to see yourself when you turn on the evening news, to have kids run up to you asking for autographs, or to hear a Yale English teacher call you a celebrity.

But it’s a long story. Plus this is an SEO blog, remember?

4. Before working on the Web, I played pool. I’m ranked 7 in APA National Team Championships, and unless you’re an A~Open player, I’ll give you a beating on the pool table :p I’ve won a free trip to Vegas a few years ago, which includes a roundtrip ticket to Vegas, and a few nights stay at the Riviera (because its probably the crappiest hotel there. A teammate gave up his free room and blew 5K on a room at the Belagio). I went to NYC and gambled away $200 bucks the night before the trip, and went there with only $100 bucks in my pocket. I ran 11 racks of 9 ball on a bar table playing a 2 am tourney and won a couple hundred from that, plus my team earned me a couple more hundred bucks. The food was near-free with all the buffets, so the trip eneded up costing me nothing (Nope, I didn’t gamble or blow money away in strip joints).

5. I met the love of my life when I was 21. Unlike Todd or Aaron, my relationship started off in a zoo (literally).

Now, I’m gonna tag…Lisa Ditlefsen (let’s just all admit it - her avatar is beyond hott), Jim Westergren, Jim Boykin (someone musta tagged him already..but whatev), Donna Fontenot, and Michael Martinez.

Still Plenty of Loopholes in Google’s Paid Link Detection Algo

Halfdeck — Sat, 16 Dec 2006 14:25:41 +0000

Warning: I’m just rambling to get a free link on Google’s blog. Why do I want that? Cuz I’m bored? :)

There’s not much new in Google’s recent post, Building Link-Based Popularity, in which Stephanie Ulrike underscores Google’s increasing ability to find and devalue artificial links. Matt Cutts has mentioned it during the release of Big Daddy earlier this year. In my sector, I still see reciprocal links remain effective to a large extent.

Honestly, I think it’ll take a while before Google reaches a point where only organic inbounds are counted toward a site’s popularity. Why? Guessing the intent of a link isn’t always easy. For example, I linked to a blog I read every day, just because it made it easier for me to visit the blog if I have the link on my blog’s front page. The blogger noticed my link and linked back to my blog. To Google that may look like reciprocal linking, but if Google discounts the links as such, its clearly missing the intention of those links.

There are also instances where an artifical link can made to look like citations. For example, write a fake article and bury a paid link in one of the paragraphs. How would Google figure out that link was paid for? I doubt it can. What about this? Instead of paying for a link, you pay for an “organic link generating service”, where a company will use several people to generate diggs for your site or bookmark you via del.icio.us. You’re still paying for links, except the company you’re paying doesn’t directly link to you anymore.

I don’t know. I still think there are alot of loopholes out there. Anyway, I think I’ve rambled on long enough.

P.S. My link popped up on their blog in like under a minute. That’s faster than I expected. I do see their blog home page is TBPR 0. Though the links aren’t nofollowed, I doubt I’m getting anything besides a traffic/anchor text benefit out of my link appearing on their page.

UPDATE: Michael Martinez wrote the most interesting response to Google’s announcement I read so far, though I’m not sure how he goes from the ability to link “spam” via Digg to Google VS Link Baiters:

I don’t rely on social link spamming, either, though Rand has publicly admitted to seeding DIGG and other social linking sites with stories and links….Call this the first round of “Google versus the Link Baiters”.

Why Google Will Not Move Away From PageRank

Halfdeck — Wed, 13 Dec 2006 10:18:10 +0000

Recently, I’ve got a little flak in Google Group Webmaster Help for coming down hard on people in the “Google is broke” camp. Basically, some of them were upset that the supplemental index was based heavily on PageRank because Google’s PageRank paradigm is broken and unfair:

Google may misread the intent of a link. For example, due partly to many people linking to domain.com, it’s amassed a TBPR 8. While those links aren’t meant to be citations, Google apparently counts them as such.
A site can’t get organic links unless it already has links (a.k.a Mike Grehan’s “rich get richer” syndrome). This would be true IF Google is the only source of a site’s visibility. However, we’ve got Technorati, RSS, Yahoo, MSN, Reddit, Digg, Myspace, YouTube, paid advertising…True - in some niches (e.g. porn), Reddit or Digg isn’t going to work, and people are more hesitant to link to you. But in general, although the “richer get richer” is a fact of life, there are many ways around it. Like Adam Lasnik pointed out, once upon a time, YouTube.com was TBPR 0.
The guy with the deepest pocket wins. The guy who can afford to spend the most money on paid advertising and paid links will in the end come up on top. Thus mom and pop sites will never have a chance in Google Search, or so they argue.
The Paris Hilton syndrome. PageRank paradigm degrades search into a popularity contest. People link to what’s popular, even if it has no value or it’s completely untrue.

Basing a page’s value on links thus have several potential downsides (like anything else). Instead of PageRank, they argue, indexing should be based the quality of on-page text - a statement that begs the question: “Are you on crack?”

Answer me this. How can a computer program read, understand, and judge the quality of an article in comparison to other articles written on the same topic? It can’t - until Google discovers Artificial Intelligence. Sure - there are ways to look for on-page spammy finger prints (e.g. illogial sentence structures, excessively high keyword density, overuse of bold and italics). But given two well-written articles, how does a machine decide - based solely on on-page text - which article is more valuable?

It can’t.

Relevance for a keyword can, of course, be guessed at by looking at things like the TITLE tag, keyword frequency, keyword location on the page, and keywords in H1. Relevancy, however, has nada to do with page value or page quality.

How can a program judge the value of a page using on-page text alone when, from its POV, everything looks like a random string of symbols? To gauge a page’s value, there is simply no other option than to analyze off-page factors.

Ok, so it sucks that without inbound links or without decent internal link structure, Google will chuck a potentially great site into the supplemental index. Been there, done that. But if you think Google should base their indexing on on-page text quality instead of inbound links, I suggest you try spending a few days coding your own search engine. Then you’ll eventually realize what you want Google to do, at the present state of technology, is like wanting Wordpress to write posts for you, or like wanting your wife to become a rock legend overnight, or like wanting a billion-dollar white-hat website that builds and markets itself (and all you have to do is deposit checks in the bank every month).

A Quick Update: AJAX/PHP/MYSQL and Google Stealin’

Halfdeck — Wed, 13 Dec 2006 08:42:41 +0000

Just to let you know if you’re reading this blog, I’m working on a non-SEO-related script right now and I’m won’t be really posting for another week or two (Stop reading here if you’re not a Javascropt geek).

What kind of script is it? Well, like other scripts I wrote, its mainly for my own use, though it would be cool if I could somehow monetize it, which is one reason I’m gonna be pretty tight-lipped about it (at least till its done), but I can say with absolute certainty in its present state it already beats the pants off all apps Google came out with this year (and I’m not just saying that because I wrote it).

I’ll also add, in case you’re into Javascript, that till this year I wrote everything in PHP/MYSQL, and never even realized how limiting that 100% server-side model was. This should come as no surprise to anyone, but if you’re doing a huge amount of manual data entry (in cases where you hit a wall in a mass automation approach), you want as few keystrokes, mouse clicks, page scrolls, and page reloads as possible to get whatever you want done. Less input means less time wasted. If I was just designing a blog, I wouldn’t really care. But when I need to save thousands of work hours, good UI design can’t be overlooked. Client-side funtionality provided by AJAX and Javascript makes it all possible.

A few other things I’ve come to appreciate while hacking away for a few days is coding event handlers in Javascript instead of using Onclick in HTML (ya know, the CSS/HTML/Javascript = Design/Content/functionality model). I still prefer OnClick in situation where I generate the HTML via Javascript, but if I do View Source, it feels nice when all I see is HTML.

And yeah, I’ve been reading about the Yahoo IE7 VS Google IE7 non-sense but it seriously doesn’t interest me. Am I going to get richer by knowing about that design theft or is it supposed to be the spark that lights the “Google is Evil” firestorm that brings way too many “Google fucked up my site, I tried everything, I give up, I hate Google” whiners out of the woodwork? Just fire whoever is responsible (or give him/her a good spanking), and put some kind of approval process in place so things like this done by (I assume) a Googler newbie doesn’t go unchecked. I’m still waiting for an admission from Google that someone did in fact steal Yahoo’s design (as if removing the page wasn’t good enough - I guess it is - but I’d still like to hear it) and I’m also curious as to the reason why they decided to steal it, but like I said, talking about it isn’t going to make me any richer.

P.S. The other day, I noticed I was No. 1 on Google for “Pagerank doesn’t matter.” (Hehe) I know it’s a low-traffic, non-competitive term and I’ll lose position as I post more stuff on here, but I did take a screenshot just to remind myself I was there (too lazy at 3:33 in the morning to post it, but I got it). I keep imagining PageRank haters typing in “pagerank doesn’t matter” in Google and reading my post that proves they were wrong for way too long. Of course, I also noticed Andy Hagans posted a similar post way back in 2005, so I’m no SEO pioneer on that, but still. Too many people either say “it doesn’t matter” or obsess about it. The healthy POV is to look at it like your TITLE tag.

“PageRank Doesn’t Matter” is Now Officially an SEO Myth

Halfdeck — Fri, 24 Nov 2006 16:51:32 +0000

Back in August, after having fixed my 99% supplemental site to the point where duplicate content could not possibly be an issue, Google nonchalantly stuck my original pages back in the supplemental index. At that point, I claimed the existence of a PageRank hurdle preventing supplemental pages from getting back into the main index:

On gfe-eh.google.com and other DCs, the supplemental pages cache dates no longer go all the way back to Aug 2005. But the new “system” makes it even harder to tell why a page is listed in the supplemental index, because now you’re required to jump over at least two major hurdles to break out of supplemental hell: 1) duplicate content issues (i.e. identical meta tags, multiple urls resolving to the same content, www/non-www, etc) and 2) “Trust” / PageRank. A perfectly structured page with original content could remain stuck in the supplemental index if a domain lacks juice.

On October 11, 2006, Matt Cutts confirmed that PageRank is the primary factor used to determine whether or not to chuck pages into supplemental hell. He later reiterated his point at Pubcon, and Adam Lasnik also recently echoed the same idea in Google Group Webmaster Help.

That PageRank still matters in this day and age didn’t fly over too well with some SEOs, who often de-emphasize PageRank in favor of other factors like “trust”, domain age, user data, links from authority sites, link neighborhood, co-citation, link history, link age, webmaster profile, SERP CTR, dmoz listing, and link topical relevance. Some black hats, G-Man for example, claimed supplemental results are mainly due to duplicate content issues. Rand Fishkin also claimed that PageRank has a limited role in determining a page’s supplemental status. He reasoned orphaned pages with no links to them may turn supplemental, but that in most cases, duplicate content is the predominant factor.

This anti-PageRank attitude stems from several commonly-held beliefs: that 1) PageRank has been devalued during the past few years; 2) PageRank by itself will not guarantee high rankings, and 3) PageRank is just one of hundreds of factors Google considers when evaluating ranking for a search term. Some webmasters also obsess too much over PageRank, paying hundreds of dollars to buy links on high TBPR pages in hopes of boosting their own TBPR.

Still, PageRank isn’t quite dead. Google still cares about the quantity and quality of links pointing to a page. How does Google keep track of that? You guessed it.

The supplemental index is Google’s version of the Interweb junkyard. It’s the holding space for what got cut on Google’s cutting floor. It’s where your pages end up if Google thinks they lack value. Again, how does Google measure page value? As far as dupe issues are concerned, it uses on-page text. Otherwise, Google looks at links. The primary metric Google uses to size up links pointing to a page, of course, is PageRank.

That doesn’t mean low PageRank is the only reason a page is in the supplemental index. For example, Adam Lasnik’s following comment implies duplicate content is clearly another major factor:

Joe Parts, I took a look at the examples you gave (thanks!) and — aside from the PR Toolbar issue I noted above — I did notice that at least a few of the URLs you noted come up directly in searches for text on the page. But there’s not much text on those pages for us to go by, unfortunately, and so it’s not surprising that some seem to be perceived as similar content to other pages on the net.

So its not just about low PageRank, even if that’s the main reason pages “go” supplemental. In Joe’s case, several other reasons, a few of which Adam alludes to, include 1) Similar shingles across pages 2) Devalued inbound links 3) Outdated TBPR.

“PageRank (toolbar PR) doesn’t matter (much anymore (in ranking))” is now officially an SEO myth and a misleading statement at best. Links matter and have always mattered. PageRank is just a metric for links. It used to be an inaccurate and spam-prone metric, but by nuking sold links, devaluing link schemes, blocking PageRank to spam sites with nofollow, and updating PageRank on a daily basis, PageRank has become a more reliable metric that better reflects page value.

The fact that Google already uses PageRank to determine crawl frequency, crawl depth, and supplemental indexing shows how confident Googlers feel (rightly or wrongly) about the accuracy of PageRank in place today.

NSFW Reddit is No More

Halfdeck — Mon, 13 Nov 2006 23:38:36 +0000

Just a few minutes ago, I noticed reddit dropped nsfw.reddit.com. If you haven’t already, you can read a loooong thread about it posted 13 days ago here. Why did they kill NSFW Reddit?

Lack of community compared to regular reddit.
It was never intended for porn: “Our intention for nsfw.reddit was for adolescent humor that, when on reddit itself, was causing users trouble at work. Unfortunately, it quickly degenerated into porn.”
NSFW content scares away advertisers: “Unfortunately, having NSFW material available on a site means that it gets blocked by many filtering programs/gateways.”
“unrestrained fake voting from sockpuppets” - Probably because most NSFW surfers’ hands are too tied up to cast a vote, so posters vote for themselves to fatten their bottom line.

What a fine example of users steering a Web 2.0 site in a completely wrong direction. Anyway, if you are dying for a nsfw.reddit replacement, you might want to check out MoSexIndex, which btw, I’m not affiliated with in any way whatsoever (thus, the nofollow).

Meanwhile, Adam Lasnik is doing some detective work right now trying to nail down the weird site: bug I posted a while ago.

Adam Lasnik Posting More Often?

Halfdeck — Fri, 10 Nov 2006 09:50:55 +0000

I’ve been taking a break from SEO for the last two weeks, instead spending more time working on my own sites, writing up project plans for the next few months, working on a possibly groundbreaking SEO script, and doing a few SEO consulting on the side.

Though tonight, I couldn’t sleep, so here I am at 4:20 in the morning reading through Google Group Webmaster Help, when I noticed Adam Lasnik has been posting a hell of alot more in there recently. He doesn’t give away any hard kept secrets, mind you, but he does shell out some incredibly helpful feedback, enough not to make webmasters who post in that group to feel like they’re talking to a wall, like I sometimes feel when I chat with Google Adword reviewers, who I think are completely braindead and can’t tell right from wrong (/rant).

Anyway, if you’re interested, you can look for some of his comments by following the recent post links in his profile. Don’t expect any new posting for a couple of weeks though, since pubcon is coming up.

UPDATE: If you wanna keep track of Vanessa Fox’s comments on Webmaster Help GG, bookmark this link.

Hack: Installing Rel=Nofollow in PHPAdsNew

Halfdeck — Tue, 07 Nov 2006 16:36:47 +0000

If you use PHPAdsNew to serve ads on a relatively low FBPR (foolbarPR) website and your site is suffering from supplemental problems, installing rel=nofollow on your ad links is a tiny tweak that will help tighten your internal links and get some pages back into Google’s main index:

around line 366 in libraries/lib-view-main.inc.php, just before these lines:

// Return banner
return( array(’html’ => $outputbuffer,

Type this:

$outputbuffer = preg_replace(”/

I’ve only tested this on HTML banners (a combination of image and text), so your mileage may vary.

Although some SEOs disregard the role of PageRank in combating supplemental pages, its something you do not want to rule out when looking for solutions.

SEO Surgery: resistinc.org - Banned Without a Clue

Halfdeck — Wed, 25 Oct 2006 15:39:32 +0000

This morning, I saw a guy on Google Webmaster Help asking why his domain is banned. Sure enough, site: query returns no results. According to domain tools, resistinc.org was created back in 1997 with 2 DMOZ listings and 3 Y! Directory listings. A site: command on MSN, though, turned up this cache page (dated 10/20/2006) with a javascript on the bottom of the source code:

What follows are hidden links to stuff like “brewster ny honda”, “drawn horse”, “yahoo game back door,” “hip hop cartel”, and “bronx bankruptcy lawyer”. Google sees these links because only the marquee element is in Javascript. Also keep in mind DMOZ.org’s description of resistinc.org has nothing to do as far as I know about hip hop cartels:

“For more than 30 years, Resist has funded progressive organizations in the United States that are actively part of a movement for social change.”

Though the link to “hip hop cartel” seems to 404 now, MSN site: search reveals some other cliche urls (out of 7,172 urls):

www.resistinc.org/free-online-poker.dhtml
www.resistinc.org/auto-insurance.dhtml
www.resistinc.org/stud-poker-tracker.dhtml
www.resistinc.org/buy—viagra.dhtml
www.resistinc.org/cheap-landlord-insurance.dhtml

List goes on.

Were they hacked? Who knows. What I find puzzling is what this guy posted (dated Tues, Oct 24 2006 11:12 am - Just a day ago):

Hi:

I’m a volunteer for:

http://www.resistinc.org/

We think we’ve been banned, since we get no results for searches with site:resistinc.org

We have reviewed the webmaster guidelines and cannot think of how we might have vioated any of them. We’d appreciate any feedback or guideance about how to determine what might be wrong and/or how to proceed.

The last advice on that Google Help thread is “install Sitemaps so you can get some Google feedback.” If he realized he’d been hacked, I’d expect him to post “we were hacked and now we’re banned.” The spammy pages (the ones I checked) 404, so someone must have removed them. I don’t know, it’s smells kinda fishy to me (not that I don’t like seafood).

P.S. I tried publishing this via Google Docs, and somehow the TITLE of the post turned up blank. Of course < and > didn’t translate. Google also uses BR instead of P so I get a chunk of code in Wordpress with no paragraph breaks. Sorry to those who read me on Bloglines (since I keep republishing this one after mutli-edits). My advice? Switch to Google Reader.

UPDATE: As you might have noticed, the original poster later added:

While looking into this yesterday afternoon, I discovered some PHP files on the site that appear to have been placed there illegitimately.

I removed those, but it’s still too early to say that the site has been “cleaned up.” I am working on that.

To state what I hope is obvious: neither the owners of this site nor I had knowledge of the fact that the website had been hijacked into serving these advertisements.

In defense of the organization that owns this site, I will say that like many small nonprofits, they cannot afford in-house tech support. In my own defense, I will say that I am not in charge of the day-to-day operation of this website; I responded to a call for help.

We appreciate the information that has been provided, and will of course appreciate any further constructive suggestions about how to proceed.

Wackiest Google Site Command Bug I’ve Ever Seen

Halfdeck — Tue, 17 Oct 2006 09:27:10 +0000

Google’s site: command’s been acting up lately, and Googlers are working feverishly day and night I assume trying to fix the damn thing on various Datacenters, but check this out. While running a site: command tonight, I came across this bizarre SERP:

Googe site search returns ginormous serp snippets

Here’s the search url I used (broken in two parts, so it doesn’t break this blog’s template):

http://www.google.com/search?hl=en&lr=&safe=off&client=firefox-a
&rls=org.mozilla%3Aen-US%3Aofficial&q=site%3Agembaby.com&btnG=Search

(EDIT: The search result is back to normal now — Too bad.)

Huge description snippets, right? Where is it coming from?

META description, repeated 12 times.

So, um..Google, what’s up with that?

UPDATE: I removed snippets I previously quoted off the meta description and the SERP, since I didn’t want to accidentally create a duplicate content type scenario.

Matt Cutts: PageRank Primary Factor Determining Supplemental Results

Halfdeck — Thu, 12 Oct 2006 13:25:06 +0000

I was going to write a long post about this Matt Cutts quote, but I think it speaks for itself.

PageRank is the primary factor determining whether a url is in the main web index vs. the supplemental results.

I predict many seasoned SEOs and newbies alike will dismiss his statement. Firstly, PageRank doesn’t matter, right? Or were you talking about TBPR? But as Matt Cutts made it clear, TBPR is exported internal PageRank translated on a 0-10 scale. The only inaccuracy involved with TBPR is PageRank updates continuously while TBPR updates every few months, and internal PageRank is more granular than TBPR.

But seriously, how can PageRank be the primary factor of anything? Well, here’s a reality check for folks who likes to turn SEO into a fairy tale where you believe what you want to believe in the face of irrefutable facts.

Another quote I found interesting wriiten by Marcia on a recent featured homepage WMW thread about supplemental results and inbound links:

I’m absolutely in agreement with that and it’s stood the test of time - and IBLs & PR. Nope, the toolbar ain’t dead yet; the reports of its demise are grossly exaggerated and contra-indicated by the Supp results and indicators.

Not one single speck of duplication, what’s Supplemental and what isn’t on the test site(s) is 100% dependent on the amount of link love the pages are getting.

People who are looking for dup issues where none exist are, unfortunately, chasing their tails.

g1smd (whom I consider to be well versed on issues revolving around duplicate content and supplemental results) responds:

Heh, Marcia, you’re gonna love this Matt Cutts comment:

>> PageRank is the primary factor determining….

Note: The “primary” factor.

Jeez. No mention of Duplicate Content, and Redirects and 404 URLs at all.

Ah, but maybe he only means for “live” URLs, or maybe redirects and 404s no longer have any PageRank associated with them.

Whatever, it agrees with what you’re saying: and I guess that fuels another link frenzy to start all over again.

Caveman follows up with an interesting comment as well, but I don’t want to turn this page into a duplicate content page by overquoting, ya know what I mean? :D

Google Docs Gone Bad - Google, Fix the Tags

Halfdeck — Thu, 12 Oct 2006 08:46:27 +0000

Google Docs tags aren’t working right now, which means all the new articles you write in Google Docs will end up in a big disorganized pile of whatev, and you get to go back later (when tags are working again) and clean up the mess. This may not be news to you if you never used Writely, but if you do I guarantee its going to annoy you to no end. So much for smooth transitions. I tell ya, if I was working for Google, this kind of thing would NEVER happen. I mean, how do they screw up something that was working perfectly just a few days ago?

Anyhoo, I hope Googlers are aware of this bug and taking it with the utmost seriousness, because without tags, Google Docs for you will be just Google Notebooks Deluxe, with no organizational functionality save sort by last mod timestamp, which forces you to scan through a long list of notebooks every time you want to write something down.

I posted about this over at Google Docs & Spreadsheets Help Group just to help get this issue see the light of day. It’s good to know I’m not the only guy having this problem.

Google Docs Broken Tags Bug Google Groups Threads

You get the picture.

UPDATE: It looks like Google finally fixed the bug. Let’s hope its a permanent fix.

Lost and Grey’s Anatomy on My Google Calendar

Halfdeck — Thu, 28 Sep 2006 20:14:36 +0000

We all read Google blog’s recent announcement about a new Google Calendar feature that lets you add web content events, like “weather forecasts, moon phases, and even Google doodles.” Well, to tell you the truth, I don’t care about all that, though holidays on my calendars I’ve already added. What I desperately need are TV programming schedules like Lost, Dancing with the Stars, Grey’s Anatomy, and Ugly Betty. Seriously, if I could have those things automatically show up on my calendar, it would save me lots of time (i think?). And nope, I don’t wanna be a developer.

UPDATE: Now I got Fall TV Season (2006) on my Calendar. Very distracting, but at least it lets me schedule my work around my favorite shows. BTW, since I have a tendency to rant about bugs, I’m posting a link to Google Groups: Google Calendar Help.

Rel NOFOLLOW = Granular META ROBOTS NOFOLLOW

Halfdeck — Wed, 27 Sep 2006 20:35:21 +0000

In a response to John Battelle’s recently released interview with Matt Cutts, Danni Sullivan asks Matt:

Matt, in the interview, you suggest that Google now views meta robots with a nofollow value as being the same as the completely different nofollow attribute, in terms of flagging links as not trusted. Is that now the case?

Matt’s response:

Danny, that’s always been the case–sorry if I haven’t explained that well. There are many techniques to sell visitors/traffic without selling PageRank or affecting search engines. As long as a link doesn’t affect search engines, there is no problem with selling that link from Google’s perspective. The nofollow attribute on links is the most granular because it’s on a link level, but something like a sponsor page is a fine opportunity to use the nofollow meta tag instead of marking each link.

I asked (though I expect Matt to dodge):

I think there’s still skepticism out there as to Adam’s recent statement “Google senses much” in terms of link selling. For example, was the wc3 page detected algorithmically or from word of mouth? Also, what would be the likely consequence if wc3 decided not to use the nofollow tag?

P.S.

In the original interview, some guy asked:

Matt, I have one line of questions. Google’s job is to measure the relevancy of web pages with algorithms/human computation or other means Google have or could have.

But is it fair for Google to put restriction on what people could do with web?

I hear this alot, and for some reason, it always rubs me the wrong way.

Here’s a scenario where Google puts no restrictions on websites:

SERP in random order.

SEO Surgery: Monx007

Halfdeck — Fri, 22 Sep 2006 13:10:27 +0000

I’ve been spending the last couple weeks checking out posts in Google Webmasters Help. One thing that these posts keep hammering home to me is the fact that there’s a huge gap between what’s being posted on SEO forums and reality in cases where either people are not allowed to link drop or they choose to keep their URLs to themselves. And right now I’m getting a kick out of talking about real sites instead of theories. Even a simple question like “why is my site not indexed?” often lead to unexpected answers.

A random example (just fished this one out a few minutes ago):

“I’m very confuse now, before August, my site got PR 4. But around August, my site PR drop to 0.”

Without a URL, typical responses might be:

Maybe you’re banned. Check to make sure your site is clean. You might want to do a re-inclusion request.
The little green bar doesn’t matter; don’t worry about it.
Check urls that are linking to you. Changes in their PageRank will effect yours.

But the moment the guy posts his url, I have something I can sink my teeth into: http://games.monx007.com/ What do you think?

Inbound links, according to Yahoo Site Explorer:

http://technohack.blogspot.com/ TBPR 0
http://www.klubforbiz.com/info-kerja-penawaran-kerjasama/t-offline-jobsvacany-at-telkom-963.html (forum - no connection) TBPR 0
http://www.friendster.com/user.php?uid=7459848 TBPR 0
http://www.womensshoescritic.com/links/games%5B99%5D-1.htm TBPR 0

Interesting. Hidden text using CSS fontsize=0em.

http://www.ringcards.com/resources/recreation%5B8%5D-56.htm (directory) TBPR 0
http://www.myrtle-beach-sc-vacation.com/metro/recreation%5B3%5D-12.htm (directory - now dead) TBPR 0
http://www.cellphoneswow.com/links/recreation%5B8%5D-52.htm (another similar looking directory) TBPR 0
http://archive.freespaces.com/ TBPR 0

What’s this? Links pointing to all his subdomains click-proofed with external JS? Real slick.

http://www.lake-tahoe-local.com/partners/games%5B143%5D.htm (games directory) TBPR 0
http://www.shohouyaku.com/links/playstation%5B43%5D.htm (another crap directory), TBPR 2
http://www.yellowsurveys.com/resources/index176.htm (crap directory) TBPR 0
http://www.paid2lotto.com/links/recreation%5B8%5D-10.htmm (dead link) TBPR 0

A few things that stand out:

Alot of TBPR 0s (doesn’t mean those pages were penalized though. I didn’t check their backlinks or see whether they are supplemental).
No “natural” citation links. Low quality links, including hidden cross links, forum sig, and cheap directory listings.

Especially the cheap directories look too much alike when you compare their url structures:

resources/recreation%5B8%5D-56.htm
metro/recreation%5B3%5D-12.htm
links/recreation%5B8%5D-52.htm
partners/games%5B143%5D.htm
links/playstation%5B43%5D.htm
links/recreation%5B8%5D-10.htm

If the guy didn’t supply the URL, we’d still be talking about a “sandbox” or “trustbox” or whatever-else-box. BTW, I have no moral qualms about “outing” a crap network or a lousy link building campaign. Robbers don’t out each other, but I’m no robber. As The Clash used to sing:

YOU BETTER CHEAT CHEAT
NO REASON TO PLAY FAIR
CHEAT CHEAT OR DON’T GET ANYWHERE
CHEAT CHEAT IF YOU CAN’T WIN

SEO Quotes of the Week September 6, 2006

Halfdeck — Wed, 06 Sep 2006 11:50:16 +0000

Some memorable quotes I came across so far this week in SEO forums and the blogsphere. None of them may be revolutionary or news breaking, but they resonated with me and hopefully with you too.

the reason some did not believe it for a long time is that — as is true with so many of G’s algo elements — things are co-dependent. So the addition of x 1,000 pages for site A will not have the same effect as the addition of 1,000 pages for site B, and the difference is not just limted to the pre-existing number of pages on each site. Many other factors involved.

- caveman, on the fallacy of ignoring the “co-dependency” of factors when determining the effect of one

I think many people who listen to Matt are those who are trying to rank “legitimately” in Google. That is, they are mostly concerned about sending false positives, or getting caught accidentally in a Google net intended to catch spammers. This is one good reason to listen to what Matt says — there is usually enough meat there to at least help you steer clear of big problem areas, and sometimes problem areas that only recently rose in importance.

In this case, one comment that is worth paying some attention to, I think, is

“So this is not something that a typical site owner needs to think about or worry about if they’re not adding hundreds of thousands or millions of URLs very quickly.”–Matt Cutts/SER

Here he gives some sense of the scale that would trip this particular flag — and it’s not in the thousands of urls, or even the tens of thousands. Certainly seasonal changes at completely legitimate ecommerce sites can require a couple thousand new urls at once. And Google would also have a historical record of that kind of seasonal change to help boost their confidence and trust….

- Tedster, Matt Cutts’ Adding Too Many URLS Raise a Flag

Probably just index churn. The Supplemental folks already had a fix for this ready, so it’s a matter of when some executables will be pushed. I’d expect highlighting to be working again within the next few weeks. Thanks again for mentioning this..

- Matt Cutts, regarding text in cached supplemental pages not being highlighted

Since I took Kimberly to task in a public blog I feel that the apology should come from the same place.

So, Kimberly and anyone else I offended with that particular post, I apologize. I never intended any harm but I think I may have caused some. I’ll be more careful in the future.

- G-Man, A Tip for your life…

Gotta Love The Web

Halfdeck — Wed, 06 Sep 2006 11:12:07 +0000

Hey, I just made a brand new tape. It could definitely be my stairway to riches and fame. And for your information, I hit the low notes better than Danny. I kid you not. So if I go head to head against him on American Idol, look out!

You betta lose yourself in my music the moment you own it
you better never let it go
You only get one shot, do not miss your chance to blow
this opportunity comes once in a lifetime

UPDATE: Yeah, going through my old feeds, I just noticed Jim already blogged about this 2 days ago. God dangit…

Supplemental Cache Refresh Revisited

Halfdeck — Tue, 05 Sep 2006 17:30:07 +0000

After the recent supplemental cache refresh, more than a few webmasters asked: “why are my perfectly structured, original content pages still supplemental?” and “Do I have to wait another 12 months for Google to get it right?”

We fixed our META description tags. We got rid of thin content pages. We installed 301 redirects and even validated our pages to death. But the pages with fresh cache were still showing up as supplemental for no apparent reason.

What I assumed was that supplemental pages show up in the SERP after Google looked them over for quality, duplicates, trust, etc. Nope. Things happened the other way around:

1. Google recrawled all existing supplemental pages by looking up urls stored in its supplemental database. One clean sweep. The timestamps were updated and the cache contained the most recent version of each page. Basically, a database update — with zero movement of pages from the supplemental index into the main index.

2. After Google updated its supplemental database, it began a long process of evaluating each URL to decide if its spam, duplicate content, or A1. This is still going on.

I hope you’re starting to see more pages reappear in the main index.

Google Engineer Toys with 64.233.187.whatever

Halfdeck — Mon, 28 Aug 2006 04:42:24 +0000

At the risk of sounding like another SEW echo chamber post, here’s a recent Googleguy quote (now a random Googler dude, or is it Matt Cutts pretending to be a random Googler dude? You be the judge) regarding recent reports of backlink updates on several DCs:

It’s Brett’s decision to call something an update, but I agree this one isn’t anything to write home about. Like Pluto, I think this would be a shrinking update; the SERPs aren’t really changing. :)

I talked to one of the engineers who would know, and it turns out that it’s an engineer who has grabbed the 64.233.187.whatever datacenter for himself to tinker around with making info: queries slightly more accurate. I don’t expect the visible/external info: or backlink data to spread to other data centers (or if it did, not for a long time).

So have fun poking around over there if you really enjoy monitoring IP addresses, but just bear in mind that it’s an engineer tinkering by himself, not part of a larger trend or an upcoming update.

Earlier, he says:

I’ll ask around with a couple engineeers who keep a closer eye on external backlinks/PR and see if they have anything they’d like to mention.

Bear in mind that backlinks/PageRank updates happen at their own rate, and we’re continuously finding and incorporating new backlinks and computing new PageRank all the time. So an update of visible backlinks doesn’t really cause an “update” or big change in rankings, because we’ve already known about those links for a while.

Anyway, I’ll ask around and if anything interesting comes out of it, I’ll let you know.

BTW, if you’re still suffering from supplemental problems, there’s also a thread developing over at WMW started by Whitey that you may want to take a look at (just because I posted twice there, yeah…).

Long Tail: “how many words to make page unique google supplemental”

Halfdeck — Fri, 25 Aug 2006 14:07:42 +0000

I figured I’ll serve the public occasionally by answer a few questions people are arriving here with, including this lengthy one: “how many words to make page unique google supplemental.” I know these questions will probably bore the hell out of most readers, since they’re un-news-worthy and fall under the beating-a-dead-horse unbrella, but hey, a few “timeless” posts wouldn’t hurt, right? Future posts like this will be labeled “long tail”, so its easy for you to avoid reading them if the topic doesn’t interest you.

Also, keep in mind, I don’t pretend to be an SEO and I don’t play one on TV or Google Video either, so I’d recommend you verify whatever I post or at least take it with a dose of salt. If I knew everything about SEO, I wouldn’t be blogging about it :) Anyway, if you happen to disagree with me, it’d be awesome if you leave me a comment (concrete examples and a decent rationale behind it preferred). I’ve been coming across a few lapse-in-logic type statements on forums and Google Groups like “sitemaps killed my site!” - but I’m sure you’re way smarter than that.

So there you have it, my “long tail” posts official disclaimer.

Anyhoo, this guy is wondering how many words should he have on a page to keep it from turning supplemental. I wish I knew. From looking at my domains, low PageRank pages 110 words or less (not counting anchor text) largely end up supplemental, especially if its template based. Vanessa Fox also recently mentioned that thin pages aren’t a good thing:

You should also take a look at your site and make sure it provides unique content. Most of your categories don’t seem to have any content. You’ll need your pages to have value in order to get them indexed.

(Google is more about philosophy, passion, and marketing than tinkering with code.)

Since I see other pages on the same low PageRank domain with over 150+ words remaining in the main index, my money’s on 200+ words per page. This is a debatable figure, of course. Some people will tell you unique content pages with 300+ words have gone supplemental, therefore page size doesn’t matter. I’ve seen both extremes - a page with 10 words indexed by Google, and a page with 300+ words wind up in the supplemental index. But as I recently posted on WMW, would you believe me if I told you being 300 lb. overweight isn’t a factor in getting hot dates because I have three fat relatives that got married to supermodels before they hit 30?

No, I didn’t think so.

If a 10 word page stays in Google for 5 years, it’s there in spite of the low word count. The fact that it’s not supplemental isn’t a proof that low word count doesn’t matter. It just means there are other factors that make that page valuable.

BTW, if you need a tool that counts words (except for text in HREF), check out my word count tool. It’s buggy and rudimentary, but I use it, so it’s gotta be good.

Reactive Blogging

Halfdeck — Wed, 23 Aug 2006 13:12:54 +0000

A few years ago, I used to play America’s Army Online, a squad-based shooter developed by the U.S. Army. To me it’s a thinking man’s game, where aim and luck has less to do with winning than exploiting the elements of surprise and forcing your enemy’s next move.

Never underestimate the element of surprise. Repeating yourself makes you transparent, which gives your enemies an upper hand. Say you rush Oil Entrance with SAW and kill 6 terrorists on Assault on Round 1. They will expect you there the next round. They may park a 203 outside to blow you up if you decide to take a peek outside, or throw a flash through the door to blind you, after which they’ll toy with you for a few seconds till they put a bullet through your head. The lesson? What worked yesterday won’t necessarily work today.

While your opponents force your hand, your odds of winning are low. Say you’re on Defense and you hear someone turning EXT pump. You run to EXT room and open the door expecting an easy kill, only to get a round of bullets sprayed on your face with a SAW before you even know what hit you. Someone yanked your chain and you fell for it.

I see the same thing happening on the Interweb. It’s less effective to blog in reaction to what someone else said than be the pebble in the water that starts a ripple effect.

Supplemental Index Fuzzier Than Ever

Halfdeck — Thu, 17 Aug 2006 23:51:59 +0000

On gfe-eh.google.com and other DCs, the supplemental pages cache dates no longer go all the way back to Aug 2005. But the new “system” makes it even harder to tell why a page is listed in the supplemental index, because now you’re required to jump over at least two major hurdles to break out of supplemental hell: 1) duplicate content issues (i.e. identical meta tags, multiple urls resolving to the same content, www/non-www, etc) and 2) “Trust” / PageRank. A perfectly structured page with original content could remain stuck in the supplemental index if a domain lacks juice.

Here’s how I see it going down (warning: pure speculation):

function refresh_supplemental_hell_cache($domain_name) {

$supplemental_urls = new Urls_in_Supplemental_Hell($domain_name);
foreach($supplemental_urls->urls as $screwed_url) {
$page = file_get_contents($screwed_url->url);
$mysqlupdate(”UPDATE supplemental_hell SET content=”$page”, cache_date=NOW() WHERE url = \”$screwed_url\”);
}

}

That’s all Google needs to run to refresh the supplemental cache. Then Google re-evaluates each page as if it just found it. Depending on PageRank, trust, number of pages, and multitude of other factors, Google inserts some pages back into the main index, while keeping others out.

Conclusion: When your mom ‘n pop site is 99.99% supplemental, it’s no longer enough just to “fix your site.” You also need to get Google to trust you enough to re-list those supplemental pages back in the main index.

Google Video Still Keeps its Hands Off Pornographic Material

Halfdeck — Mon, 14 Aug 2006 18:43:52 +0000

Philip Lensen reported a few days ago that “Google Video now* allows you to upload adult videos.” (with the added disclaimer that he’s not sure if the feature is new.) As Jimmy Ruska points out, the “Adult/Mature” option is not new; in fact it’s probably been there from the get-go. UI may have changed, but Google’s position hasn’t. Pornographic videos is still off limits, as you can see from this screen capture:

No pornography allowed on Google Video just yet.

Verify what you read on the Interweb.

So, anyway, what does Adult/Mature category cover? A guy on Techcrunch linked to http://video.google.com/videoplay?docid=-8578885628445845834&q=blowjob (obviously edited in Windows Movie Maker) as a porn site advertising on Google Video, but that clip shows no nudity. It just links to a spammy landing page with Yahoo ads plastered all over it. Merely sexual behavior and nudity alone isn’t enough to label a video as pornographic (think of nude / sex scenes in mainstream movies, i.e. love making scene with Nicole Kidman in Cold Mountain - not porn). Now http://video.google.com/videoplay?docid=-7105575458677746453 is used to be porn on Google Video (NSFW), clearly in violation of Google Video TOS, which still says “c) the Authorized Content is not, in whole or in part, pornographic or obscene“. I’m a little disappointed Google re-encoded my video, since the original is in mouth-watering HiDef. Let’s see how long that stays up, shall we?

Update (8/15):

I received an email from video.google letting me know the video I submitted one day earlier was rejected. I did tag the hell out of it to make it easier to spot, but this is a good confirmation as any. The email reads:

Hi,

Your video “Sapphic Chicks High Def Promo” was rejected because it
didn’t comply with our policies.

Videos submitted to our program are subject to an initial review to
ensure that they comply with our guidelines. When videos do not meet
our standards, we disapprove them. The following explains our content
policies for uploading videos:

* You must have all necessary legal rights to the content.
* The video must not contain pornographic, nude, or obscene material.
* The subject matter in the video must not be illegal.
* The video cannot contain invasions of personal privacy.
* The video cannot contain promotions of hate or incitement of violence.
* The video cannot contain graphic violence or other acts resulting in
serious injury or death.

For more information regarding Google Video policies, please visit:
http://video.google.com/support/bin/answer.py?answer=27737&topic=1490

Follow UP (Aug 19):

How did they spot my video? Did a Googler actually spend time watching it, as the words in the email “subject to an initial review” imply? Or did Google decide on the content by just using the video title, description, tags? I posted a second clip to test this. I’ll see how long this one stays up.

Aug 22:

Google took 3 days to reject my last video I uploaded. Since I completely misrepresented the content of my video, I’m pretty convinced Google is manually reviewing videos submitted to them.

I do think Google Video is a step backwards for adult webmasters even if pornography was allowed, because the video quality is still on the low end.

Robots.txt Not Cumulative

Halfdeck — Mon, 14 Aug 2006 16:26:10 +0000

Yesterday, gs1md wrote a meaty post on WMW regarding how Googlebot responds to robots.txt containing directives for both User-Agent:* and User-Agent: Googlebot. Both Googleguy and Vanessa Fox dropped by to clarify the situation: When you use both specifications, Googlebot will go with User-Agent: Googlebot.

GoogleGuy Announces Radically Fresher Supplemental Index

Halfdeck — Tue, 08 Aug 2006 15:03:52 +0000

Earlier this morning, GoogleGuy announced a supplemental index facelift:

Okay, I believe most/all U.S. users should see radically fresher supplemental results now. The earliest page I saw was from Feb 2006, and most of the ones that I looked at averaged in the ~2 month old range.

As data gets copied to more places, the fresher supplemental results should eventually be visible everywhere, not just the U.S.

Unfortunately, I’m either hitting the wrong DCs or I’m not seeing any change for my sites. So I won’t be doing cartwheels anytime soon…

Holy *** - checking 72.14.207.99 I’m seeing clean results and a whole load of new pages indexed correctly! If that spreads out to other DCs… well, I won’t count my chickens till they hatch, but I’m close to doing cartwheels. Thanks for the heads up Googleguy.

UPDATES:

Posted on Aug 11 at WMW:

trinorthlighting, the refresh of supplemental results went out to a data center that serves North American and Asian traffic. If you went looking at specific data centers, you could probably find older supplemental pages–however, as the newer supplemental results roll out at more data centers in the next few weeks, they will completely replace the older supplemental results.

Vanessa Fox and Susie Bright Mingling at BlogHer?

Halfdeck — Thu, 03 Aug 2006 19:45:08 +0000

Earlier today I read Susie Bright talk about “the elephant in the Blogher living room”, so I was somewhat surprised to just find out Vanessa Fox was also there. That they have two completely different take on the convention is obviously a no shocker. But it doesn’t hurt to run a little comparison, does it? This bit’s from a post written by Susie I dug up a few minutes ago off Google Blogsearch:

This post is for the women at the Blogher conference that I just attended. We had a session about sex and blogging that I thought was the best, and most controversial session of the gathering.

I decided to throw together an Un-Official, Un-Authorized, Blogher Sex Survey!

And here’s the post I was reading earlier this morning:

I’m home from Blogher, the women’s blogging meet-up, that happened this past weekend. 750 women bloggers… kind of shocking. We all came out of our cubbyholes.

I’ve never been to an IT gathering of any kind before. I live on the coast next to Silicon Valley,but I’ve never been to San Jose except to drive to the airport….I had no idea that a group of female computer nuts, blogging women, were so revolutionary…In our sex blog workshop at Blogher, several women raised their concern of writing about sex publicly, in any context. They were fearful of ridicule, discrimination, and dismissive stereotypes…The conference was astounding for the authority of its women speakers. You can find “pretty” girls anywhere— how often can you find ones who can rewire your whole world?

Susie Bright made an appearance along side Melissa Gira, Logan Levkoff (meow!), and Halley Suitt on Let’s Talk about Sex on Saturday July 29th, 2006. In her post, Susie goes on to quote Guy Kawasaki, who effectively tells readers who object to reading “non-business, non-tech, non-male subjects” to take a hike.

On the other hand, Vanessa Fox seemed to carefully craft her post to meet her target audience - mostly geeky male webmasters trying to get our sites back in Google’s index, I suppose. Yet she sounds inspired. 750 women, many of whom “provide unique perspectives and content on topics”, of which Susie Bright is an outstanding example, means improving Google and helping female bloggers understand Google better will lead to their unique voices being exposed to a wider audience.

I just got back from BlogHer, a conference primarily for women about the technical and community aspects of blogging. As a woman who blogs, I had a wonderful time. As a woman who blogs about topics of interest to site owners, I gained some new perspectives…The panelists understood visitor awareness. They told the crowd to look closely at their sites to determine how unique they were from other sites out there. They talked about getting visitors to care. This thoughtfulness leads to great sites, which in turn can lead to great search results…Given all of the dedication these bloggers put into their sites, they are of course interested in attracting visitors. Some want the joy of sharing; others are interested in making money from their writing. Here are a few tips to help make your site easier to find, whatever your motivation:

She goes on (as you probably already read) to emphasize high quality links, crawlability (echoing Matt Cutts), and the ease of verifying site ownership for Google Sitempas via META tags. What kinda touched me was the human passion I sensed behind all the techno talk, which to me is a hundred times more motivating than a few checks in the mail (though those don’t hurt either).

P.S. Interesting though that the cocktail parties and such were sponsored by Windows Live Spaces and Yahoo, not Google. BTW, here are some of the women who attended Blogher - definitely some lookers in the crowd.

Update Aug. 4: BlogHer photos on Flickr, tip from Village Voice’s Rachel Kramer.

Update Aug. 8: Yahoo!’s Havi Hoffman on BlogHer: “Personally, I think the noisy aftermath is a testimonial to the extraordinary range of people who participated.” One more tidbit: “Yahoo! slipped a purple pen and notebook into the schwag bag. We dressed the pool deck in purple and served plenty of our famous Yahootinis at the closing night cocktail party.” I want one of ‘em Yahootinis.

Google Guy, Matt Cutts, Vanessa Fox, and Adam Lasnik in One Thread?

Halfdeck — Wed, 02 Aug 2006 13:49:42 +0000

In the last two days, both Vanessa Fox and Adam Lasnik made cameo appearances in the second installment of WMW’s Google Webmaster Communications thread. Now that’s Googleguy, Matt Cutts, Vanessa Fox, and Adam Lasnik all in one thread. Is this for real? Reading reseller greet them (”Its inspiring and a great pleasure to see our good Googlers friends and WebmasterWorld fellow members…”), I feel like I’m watching an SEO version of The Fellowship of the Ring. Still, there’s alot of griping and grumbling going on, and there’s not much to take away from the thread so far except a hope that a good suggestion or a question by one of us may get some revealing responses in the days ahead.

I suggested Google Sitemaps show timestamps of when Supplemental Bot last visited a page, though I doubt Vanessa will bite. I’d also like to know if Supplemental Bot crawl depth or priority depends on trust / PR. I don’t plan to hold my breath for too long, but it would be great to know Supplemental Bot crawl priority is trust independent. Otherwise, a PR 0 site with no trust gone 100% supplemental will never see the light of day - which doesn’t exactly describe any of my sites. My domains are at least PR 2, dammit.

UPDATE (8/29/2006): Since I’m getting Google hits from people looking for the dirt on Vanessa Fox, I’ll list a few relevant links here so you can track her down. First place to look for Vanessa is Google Groups/ Google Webmaster Help. Just look for the green “G” logo, and you’ll find either Adam Lasnik or Vanessa helping webmasters out of tight spots. You can also find Vanessa blogging away at either The Official Google Blog or Webmaster Central Blog. She also drops by at Webmasterworld as well as other forums like cre8site (links on my sidebar). Finally, I also wrote a bit about Vanessa Fox blogging about BlogHer in this post.

UPDATE 99/20/2006): Vanessa Fox and Amanda Camp in A Conversation with Google Webmaster Central. Check out their pics here. Wow..they’re kinda cute. Why the hell am I not working at Google?

Matt Cutts Releases SEO Video Quickies

Halfdeck — Mon, 31 Jul 2006 13:05:24 +0000

Perhaps as part of an extended response to his recent cameo appearance on a WMW thread started by Reseller, Matt Cutts posted three videos answering questions asked previously on The Simple Life, er Grabbag Friday. Here’s the cliff notes:

Does updates of sitemaps depend on page views? Matts: No, page views isn’t really a factor when things are updated in sitemaps.
How do I improve my site’s visibility on Google? (make sure its crawlable, then market it with a hook). 1) Make sure the site is crawlable. Try viewing your site via a text browser, like Lynx. Use an HTML sitemap to help spiders along; use Google Sitemaps in addition to that. 2) Market your site - “think about the people who are relevant to your niche, and make sure they know about you. You also wanna be thinking about a hook - something that’s viral.” Good content, social bookmarking sites, etc. “Fundamentally you need something interesting that sets you apart from the pack.”
What conditions cause Google to use the DMOZ snippet, when there is already a valid meta description tag on the page? Result is query-dependent. If you have a page about Christina Aguirella but your DMOZ listing says the page is about Britney Spears, a “britney spears” search will display the DMOZ description, whereas a “christina aquirela” search will display the page’s meta description snippet. Of course, if you don’t want the DMOZ description, you can use the NOODP meta tag.
Does Google favor bold or strong? In general, Google slightly favors bold. Keyword: slightly. Recommendation: Just code your site the way you want. Update: In his latest video, Matt Cutts mentioned he followed up with a Google engineer about this and was shown that Google’s algo treats B exactly the same as STRONG, and EM exactly the same as I. So, there ya go.
Does having many sites under one IP/server matter? There’s a range: 4-5 sites on a similar theme = no problem. 2000 sites = probably spam.
If you’re launching a site with millions of pages, launch softly.
Don’t worry about including the same JS off different sites, Google Analytics or Google Adsense for example. People do this alot and unless you use the same sneaky code on 5000 sites, you shouldn’t worry.
Google image index update happened last weekend. You can read Matt Cutts’ comment regarding the update on a recent WMW thread titled Google: Has There Been an Image Update? started by ianevans.
Should I build for search engines, or for people? Both SE optimization AND end-user optimization is important - otherwise “you won’t do as well.” “The trick in my mind, is to try to see the world such that they’re the same thing. You want to make it so that your users’ interests and the search engines’ interests are aligned as you can.”
How do I detect spam? Try Yahoo site explorer, which shows you backlinks. Tools to show you everything on one IP address. Use Google Sitemaps to find any problems with your own sites.
WC3 validation, does it matter? “40% of all html has syntax errors. and there’s no way search engines can remove 40% of its index, just because somebody didn’t validate. I wouldn’t put it at the top of the list.” Could be a signal in the future especially after the release of accessibility search.

Follow Up: Matt posted another batch of videos today. In the last video, he said Supplemental googlebot will be refreshing pages and following redirects more frequently as time goes on. I’ll believe it when I see it.

Aug 2: In this video released today, where Matt makes me want to go grab a drink from the fridge, he mentions something that will have all adult webmasters’ heads reeling. I know many people who like to put a whole bunch of adult keywords in their meta keywords and description tags. Those people are also usually not so fond of using ICRA tags. According to Matt, Google uses those tags to filter out adult content from Safe Search. Gotta love it.

He also goes into explaining what happened on June 27 and the difference between an index update, an algo update, and a data refresh. Matt mentions PageRank enough times in this video to make PR naysayers cringe. Few keypoints: Google’s index is in an everflux (no more updates ala 2003); PageRank updates daily; over-optimization can hurt your site. Yes we all knew that already, but what exactly is data refresh again?

Aug 3: Matt Cutts throws another curve ball at SEOs in this new video, where he dismisses a .gov myth and credits the power of .edu sites mainly to PageRank. According to Matt, .gov/.org/.edu sites aren’t really treated differently from any other type of domains. “It’s just those sites tend to have higher PageRank…because more people link to them.”

P.S.

I left a question on his blog regarding WC3 because I believe some errors result in crappy description snippets being displayed on a Google site: search.

BTW, here’s a few GoogleGuy quotes from the same WMW Google communication thread, where a couple of people bitched about Google not responding to frustrated webmasters :

There was a data refresh on June 27th that lots of people ask about, but there was also a data refresh in the last 1-2 days that refreshes the same data. Going forward, I’d expect that the cycle time would go down even more, possibly down to once a week for that particular algorithm.

72.14.207.104 has some newer infrastructure that makes site: queries more accurate, and in general that infrastructure also improves results for other queries too. But the infrastructure at 72.14.207.104 is orthogonal/independent of many other changes.

BTW, Thanks to Peter over on V7N Blog for throwing me a bone. I hate to bitch but writing a full-fledged post at 4 in the morning just to publish something that reads almost exactly like every other Matt Cutts video blog post can get frustrating. No point in speedy delivery when you got no visibility.

Google Corrupt Titles and Short META Descriptions

Halfdeck — Tue, 18 Jul 2006 17:27:44 +0000

Marcia on WMW commented yesterday that pages with unlinked style declarations are turning up with corrupt TITLEs. Just Guessing claims he has the same problem but without a style declaration (mine does have a style declaration and their titles are still corrupt). Which brings us back to square one.

The thread got me wondering if having short META descriptions are one of the common denominators. Don’t get me wrong - I’m not saying short META descriptions are the CAUSE - just wondering if all pages with corrupt titles have META descriptions less than 50 chars long. (Disagree? Post a counterexample.)

Google’s been changing the way they snippetize pages and they may have temporarily injected their spider with a bug - in which case we can forget and move on. Still, this minor corrupt title bug can turn into a huge mess since Supplemental Bot revisits websites supposedly around every 6 months (though for my sites it feels more like once a year) and most if not all pages with this corrupt title problem ends up in the supplemental index.

Google Notebook - Error in Loading User Data

Halfdeck — Sat, 15 Jul 2006 04:05:00 +0000

I admit I love Google Notebook. I’ve got all sorts of clips saved up in this thing as I surf the web. But since 5 pm tonight I’ve been getting this error: “Error in loading user data” and it refuses to go away. Yesterday, gmail refused to load for a few minutes, so I thought maybe it was the same momentary lapse of service kinda deal, but 4 hours later, Google Notebook still refuses to fire up. What gives? I swear to God things like this make me wanna be DaveN. Recommendation: Send Google an email letting them know your problem - it worked for me.

Googe Notebook Error in Loading User Data

Update: Now Google Notebook is really acting screwy:

Googe Notebook Screenshot. Nav links on the left are unclickable . I minimized resolution to protect some sensitive data..so to speak.

7/19/2006: Google Notebook is back to working order. About the same time I noticed the bugs disappear, I received a reply to my Google labs feedback (bitching) informing me they fixed a few kinks in their app. I was afraid when Google apps bug out, those bugs go unnoticed by the Google folks, but in this case my fears turned out to be unfounded. Now I’m officially back to being a Google Notebook fanboy.

8/25/2006: I’m still getting hits from Google (e.g. “error in loading user data”, “google notebook error in loading user data”), so people are still having problems with Google Notebook, though I haven’t had problems with it for a while. I think it has to do with keeping the Notebook open in multiple browsers and creating a syncronization problem (just a guess).

Google’s Vanessa Fox Announces NOODP Tag

Halfdeck — Thu, 13 Jul 2006 21:19:49 +0000

A couple of hours ago, Vanessa Fox announced a new META tag that allows webmasters to opt out of DMOZ title/descriptions appearing in the SERPs:

or to cover all search engines.

MSN was of course the first to get the ball rolling on this, back on May 22.

Vanessa also wrote:

The way we generate the descriptions (snippets) that appear under a page in the search results is completely automated. The process uses both the content on a page as well as references to it that appear on other sites.

In case you’re one of those unlucky people who’s seeing nav links pop up in the search results, I’ve written previously about how Google snippetizes a page.

Log - Emailed Business.com and Built a New Rig

Halfdeck — Thu, 13 Jul 2006 00:07:02 +0000

I finally finished putting my new PC together and I gotta say after frying my old rig, it felt like walking on a tight rope without a break for 2 days (gimme a break, it’s my first try). Seriously, I connect the reset wire to my new ASUS ATX mobo in bullet time. First time I pressed the power button, I had one hand on the plug in case the fan didn’t spin. And when my DVD drive refused to show up, I was afraid I shorted a fuse - but it turned out I just had to set the damn thing on slave instead of CS. One thing I was glad of was I didn’t have to reformat my hard drive. Windows XP fired up just fine - I just had to call up Microsoft tech support to retrieve my key. Windows bonks out like that whenever it detects a major alteration in your system - like upgrading everything except the drives, which is what I ended up doing. I wish I had a digital cam to snap a pic since the back of my jet black case with a glowing blue PSU looks pretty damn cool. I have to remember never to throw away my old case though, since I got my Microsoft key sticker on it.

After I was done, I spent a day playing F.E.A.R and Far Cry (one good thing about being my own boss). My new Radeon X1800XT, Athelon AMD 64 2.2Ghz 2 GB corsair RAM set up still isn’t quite good enough to max out FEAR settings, but Far Cry plays flawlessly with Anti Aliasing on high. I wish I went SLI with a pair of GeForce 7900s, but I didn’t feel like blowing 800+ on a pair of video cards when I could buy an Xbox 360 for around 300. Now I gotta go get me some new titles: probably Oblivion (though that sounds like it’ll put me in a timewarp and I won’t ever get back to my work), Doom 3 for supposedly great looking visuals (I’ll have to re-download the demo to decide), or the Need to Speed Most Wanted. Whatever with tons of eye candy will do.

Anyway, that’s my excuse to myself for doing barely anything for the last few days. I did email business.com though early on in the week asking them why they added nofollow tags on their links and whether their directory was a paid directory. They promptly emailed me back, refering me to Threadwatch where Lane posted an elaborate response, which I was satisfied with. People go berzerk when their paid links are at stake, I tell ya.

Snippetization Experiment Follow-Up

Halfdeck — Fri, 07 Jul 2006 07:41:10 +0000

Just a follow-up to my last post about description snippets, posted on July 3, 2006. Lucky for me, Google cached a few pages on the same day, including this category page:

This is what I predicted. BTW, all I did was remove an H2 element so that H2 preceeds a text block, instead of having an H2 immediately follow another H2.

Now you’re probably thinking, “What does this textbook seo voodoo have to do with viral marketting, increasing my reputation, or improving my CTR?”

In case you haven’t read it yet, here’s a recent remark Jill Whalen made in her High Rankings Adivsor newsletter regarding the importance of placing content high up in the source:

This is an old SEO myth. It actually makes no difference where in the source code the copy of the page shows up. The search engines have always known how to ignore the HTML code that is not important to them, and can easily find the “meat” that is important.

Let’s break this down, shall we?

It actually makes no diffeence where in the source code the copy of the page shows up.

Take a look at this site: search (don’t click if you’re pornophobic, but hey, its just google SERP). 20 some pages, with plenty of unique wordy posts, all but one page supplemental. Tedster from WMW would tell you they’re supplemental because Google picked up identical description snippets for every page: “Notify Blogger about objectionable content. What does this mean? Blogger. Send As SMS. Get your own blog Flag Blog Next blog. BlogThis !” Agreed. Google is judging my blog based not on the page copy, but the text snippet right below BODY. If the blog template placed the “Notify Blogger” text below content, that blog wouldn’t be having a problem.

Don’t get me wrong. I’m not saying copy position in the source must always be high up. It depends on the HTML elements you use. But deeper you bury your copy text in the source, more likely Google will get lost.

The search engines have always known how to ignore the HTML code that is not important to them

Google doesn’t ignore any HTML when parsing a page. NOSCRIPT, IMG ALT, TD, DIV, even HR - they all dictate how Google reacts to a page. HTML elements add meaning; Google uses them as markers to find the meat. When those elements are misarranged, you are making Google’s life more difficult. An obvious sign of trouble is when you see Google defaulting to snippetizing the top of the source (e.g. nav link text, A HREF, image ALT, NOSCRIPT) instead of content.

Think of page copy position as the distance between you and a hoop on a basketball court. You move farther away from the hoop - you may still make the shot, but you’re decreasing your odds. If you know anything about games, winning is about playing your odds (though I’ve beat an open player in a 9-ball tournament finals making a 89 degree cut shot across the short side of the table instead of playing safe and forcing a ball in hand or a volley of safes, which would ‘ve been the smart thing to do).

[search engines] can easily find the “meat” that is important.

Clearly, Google can’t always find the important meat. If it could, identical description snippets wouldn’t throw it off and make it think its running into duplicate pages. But that’s what happens.

Experiment with Description Snippets

Halfdeck — Mon, 03 Jul 2006 20:52:56 +0000

I just removed H2 elements from archive page posts on this blog; I want to see what effect that has on how Google snippetize my archive pages. HOPEFULLY I didn’t just shoot myself in the foot, but what the hell - I’m in the mood for a little experimentation. BTW, this is a follow up to my previous post about how I think Google snippetizes a page. Note: The point of this excercise is not to rank higher. The point is to 1) learn how Google operates and 2) to avoid supplemental listings without using META descriptions (Look ma, no hands!).

BEFORE (post titles are wrapped in H2):

AFTER (h2 removed):

What effect will this have when I run site: on Google a few weeks down the road?

BEFORE (this is how my listing actually looks right now):

AFTER (image below was taken from my Google Notebook. I’ll have to wait a while to see the actual outcome):

“Archive for May, 2006″ is the first H2 text on that page, but it’s followed by another H2 which I think cancels it out. By removing the post H2, I’m hoping the first H2 text gets included in the description snippet. “Google Co-op Annotation…” is the post title for the first post on that archive page. Since it’s separate from the post excerpt wrapped in P, I’m not sure if it’ll get skipped over or not. I’ll bet a buck and a half it won’t get skipped over.

P.S. I realize now my archive pages are harder on the eyes. I’ll work on that later.

Since my main machine is fried and I don’t have access to Dreamweaver, I used Faststone, SmartFTP and Photoshop to work images into this post. Surprisingly, its way quicker this way than using Dreamweaver.

Ideally, I want to wrap everything (post title, date, excerpt) in P. But doing that requires an involved hack which I think is way too much hassle. I still wrote it down below, in case I change my mind later and implement it.

$post_excerpt is just a DB field. An easy hack here is use str_replace to strip out the P and /P in get_the_excerpt().

In wp-includes/template-functions-post.php, line 109:

$output = $post->post_excerpt;

Replace that with:

$replacement = array(”

”,”

”);
$output = str_replace($replacement, “”, $post->post_excerpt);

Now, just wrap excerpt text in P on your archive/index templates.

WARNING: This hack is completely untested. I take absolutely no responsibility if using it completely wipes your blog off the face of Google. You’ve been warned.

Mosaic Rambling: Kawasaki, FURL, Fried CPU

Halfdeck — Mon, 03 Jul 2006 14:12:50 +0000

I just finished watching Guy Kawasaki speak. Nearly 40 minutes, but man.. do I feel inspired. “Don’t listen to the bozos!”

I’m seeing FURL redirect pages ranking high on Google. A double edged sword: More SE traffic, but original pages seem to be getting knocked off the index.

It’s a long story, but I fried my CPU a few nights ago, so I’m fish out of water on a 1GHz / 256MB RAM Windows ME without access to Forum passwords and whatnot. Few lessons I’ve learned:

Don’t screw around with your main rig. Processors and mobos are cheap these days (found an 2.2Ghz AMD 64 for $100 bucks).
Don’t use Windows Backup to backup files. That program compresses everything into one huge file, and you can’t get to individual files from another machine.
Google Notebook, Google Reader, Google Calendar, gmail - when a good chunk of my data is online, one fried machine won’t slow me down.
Last but not least - don’t jam anything into the CPU fan.

SEO 101: Feeding Google Juicy Description Snippets

Halfdeck — Sat, 24 Jun 2006 13:40:14 +0000

How can you get Google to pick up the description snippet you want? And why should you care? If you use META descriptions, good. If not, read on. My views on this isn’t authoritative but if you disagree, post a good counterexample.

Don’t use H tags just to make text look bigger. Enuf said.
Avoid wrapping text in HREF if you want it snippetized. Google skips over H1 and H2 on this page (snippet:”Horrorbrain.com: Movies, Shows, Reviews and More. … Today’s Headlines. From Beyond: Frightmare · Fan Fiction: Up To Snuff · Slasher Friday: Katiebird*”), including a chunk of text wrapped in HREF: “Horror Brain is very proud to present a wonderful screenplay written by Aaron Boehm.” Another example: On this page (snippet:”A guide to Locke’s Essay. … Introduction. John Locke’s An Essay Concerning Human Understanding is a classic statement of empiricist epistemology.“), the first H tag wrapped in HREF - “A Guide to Locke’s Essay”is ignored.
Keep H and P together. Don’t insert HR between them, or wrap them individually in DIV. If there’s a huge chunk of text on a page, will Google always find it, even if its buried under MENUs, TABLEs and DIVS? No. Here’s an example (snippet: “TPC’S List of World-wide Tropical Cyclone Names. … Atlantic Names: 2002 2003 2004 2005 2006 2007 Arthur Ana Alex Arlene Alberto Andrea Bertha Bill Bonnie”) where Google doesn’t. Google starts snippetizing text right under H2. On this page (snippet: “by David Striar Years ago I found myself struggling with the question of what I should do with my life. While I was drawn to the arts, especially to music”), the text “The journey by David Striar” gets chopped in half because of an HR between “The Journey” and “by David Briar.” Google ignores a few leading P text with A HREF embedded and chooses text from a bigger P on this page (”I rarely search for companions that understand my unearthly ways, but I do know that you are out there, and that I alone do not possess the only key to the”), similar to the previous page. The only difference between the two pages is that on this page, the heading text is wrapped in its own P, and Google skips them because they’re too short. This tells me P is a separator just like DIV.
Use H on every page to tell Google where content begins (as well as what the page is about). SEO 101 sometimes fancy lookin’ sites seem to ignore.
Lots of TABLES suck. Here’s an example (snippet: “Skip navigations, DHS Seal, FEMA, Background image - Flag, Background image - DHS seal, Background image - DHS seal. Disaster tab · Emergency tab”) of a pretty densely structured page with lots of TABLE elements. Google can’t find a good snippet so it defaults to indexing ALT text.
Use CSS to move content up in the source. Another obvious point but some people believe Google has no problem finding text on a page. Answer is it depends on your page layout. Check out this fema page (snippet: “Sorry but your browser does not support JavaScript. Please download the lastest version of your browser. This JavaScript controls primary navigation …”)where Google snippetizes NOSCRIPT text instead of the content. There’s only one H tag, and its followed by paragraphs of text. Content is inside a TABLE though and is buried deep on the bottom of source.On this page, (”About The Division Of Emerergency Management. … State Emergency ResponseTeam. Prepare and Stay Aware! FL Hazard Lookup, NATURAL, - Hurricane, - Lightning …”)IMG ALT and text in SELECT listbox in the first TABLE gets picked up; Google doesn’t even get to the content text. There’s a META description, but its not long enough for Google.
Use lengthy META description tags (over 50 chars long). If Google finds that your META description is too short, it’ll scrape text off your page. This spells trouble if Google ends up snippetizing your nav links instead of content. Ideally, you want to design your page so that it doesn’t even require a META description tag.
HTML Validate to get rid of nesting problems and other major errors. Replacing XHTML declaration with HTML Transitional for example seems to wack out Google’s parser, and ill formed TABLE may also cause problems. For example, Google indexes nav text here instead of content probably because the content text is wrapped in TD and is missing a TABLE tag. IMG alt picked up here. The first IMG alt is ignored because HTML is broken.
Write full paragraphs. Google prefers full sentences and paragraphs to link text or short phrases. Google defaults to snippetizing the MENU links on this page, because it doesn’t find any good chunk of text. On a similar page with identical structure, Google correctly skips over the MENU links. Here, I used to have a short sentence and Google used to snippetize my blogroll, but after beefing up my text, Google indexed the right text.
Wrap navigation links in a DIV, avoid any use of H tags in that section of code, and get rid of unlinked text. Not bulletproof, but worth a try.
Don’t add ALT text for images if the image isn’t page specific. Use ALT=”" instead. This is especially if your page is ill-formed (i.e. without an obvious content starting point), which increases the chance of meaningless ALT text for logos or whatever getting snippetized.

Corrupt Titles in Google SERP

Halfdeck — Wed, 21 Jun 2006 00:04:51 +0000

I’ve been keeping track of a WMW thread about Google displaying corrupt titles in their new SERP. Basically, Google is tagging on on-page text snippets to the end of the TITLE tag (or element, whatever). Since similar TITLE/description snippets throw a duplicate content flag, this little bug may end up causing major problems for some sites. In fact, at least one WMW posters claimed thousands of his pages with corrupt titles are now marked supplemental in Google’s index.

As of today, the problem still hasn’t gone away. Notice the “notify blogger about objectionable…” text at the end of my titles. Also notice (though this is somewhat off-topic) that due to that stupid blogger javascript nav on the top of source code (and also probably due to me switching the XHTML declaration with HTML transitional), every disription snippet for that blog is identical, throwing 99% of my blog into the supplemental index.

First question - is this intentional or is it a bug? Well, menu link text tagged on to the end of my titles don’t look all that pretty, and I really can’t think of any reason why Google would do this intentionally.

Second question - what’s causing it and how do I prevent Google from screwing up my listing? Well, on one of the pages, the cache date says May 30, 2006 03:24:19 GMT. But interestingly, another page listed correctly is showing the same cache date: May 30, 2006 03:24:18 GMT.. What’s the difference between the two pages?

First off, in both cases, Google is completely ignoring my META description - assuming the cached page I see corresponds to the SERP snippet (which isn’t always the case). Both pages have HTML validation errors. Meta description I used: /> on a HTML transitional. I dunno - both pages look pretty much the same to me.

All I can say at this point is if you see a page listed incorrectly, validate it. My guess is Google rewrote their parser used to build description snippets, and the new parser is parsing some pages incorrectly. Validating your pages should minimize the chance of that happening. ‘Course if Google is choking on a perfectly HTML valid page, then we’ll just have to sit it out.

How Do You Bait Without Looking Stupid?

Halfdeck — Thu, 08 Jun 2006 22:30:25 +0000

Dan Thies wrote a number last week titled Link Building: Are Bloggers a Bunch of Morons?, a title that got one of his foreign readers a little riled up. The intent of the title is plain as grey, but does it get the job done? Don’t cry wolf unless there’s really a wolf, and don’t fret over a wrinkle on your forehead. In Shoemoney’s case (a post that got at least 557 sites buzzing), the wolf was plentyoffish. At the end of the day, his post got more people questioning his own credibility than Markus Frind (one guy at Threadwatch asked: “Who is Shoemoney anyway?”, to which SEObook jokingly replied: “I don’t know Shoemoney. He is one of DaveN’s friends.”). In Dan’s case, the wolf is natural links of pure hate. What’s that?

When a blogger cleverly links to Microsoft’s web site with words like “evil empire” in the text of the link, they think they’re sticking it to the man, but they’re actually passing link love to the man.

Is that really a wolf? Or is it a rat in my girlfriend’s closet? If the point of Dan’s post is linking to sites you hate is dumb - I’m not sure what else to take away from it — isn’t that like telling a 9-ball A player to chalk up after every shot?

When you shine a spotlight on yourself, deliver a good rendition of Stairway to Heaven, not a “yo momma was so ugly, she looked out the window and got arrested for mooning.”

Google Co-op Annotations for Dummies

Halfdeck — Sat, 27 May 2006 17:01:33 +0000

Just think of annotations like a del.icio.us or FURL save. Every annotation then consists of URL you want to save, and tags. For example, if I wanted to save cnn.com, the data I’d be saving using del.icio.us would look like this:

URL: http://www.cnn.com/
tags: news, liberal-bullshit

All you have to do is translate that into Google Co-op speak:

You also have the option to specify all urls under a specific directory, or all urls that match a number of keywords, instead of annotating a site page by page. You can also add additional information, like comments and score (only one url per domain will show up per query, so score helps to clarify which URL you prefer). And of course, for annotations to work, they need context. I’ll stop here though, since I need to grab some breakfast, plus I want to stay true to the title of this post.

Google Co-op Annotation references: Google Co-op Topics Developers Guide

Sitewide Duplicate Content Checker Tool Useless?

Halfdeck — Sat, 27 May 2006 15:38:50 +0000

SEO Junkie just released an clientside application you can use to check for duplicate content on your site. I found a link to his app through Search Engine Watch yesterday while I was playing around with Google Co-op. The application is meant for small sites and he warns that there’s alot of functionality missing. When he said “small sites” I hoped he meant less than 10,000 pages, but in the end I walked away scratching my head.

First, even when checking just two urls, its incredibly slow, compared to something like the Similar Pages Checker. Second, it indiscriminantly crawls every URL it finds: affiliate redirect links, links to images, videos, etc. It took me a few minutes to figure out why I was suddenly getting hit with a barrage of pop-ups. Third, it crapped out at the middle of crawling my site with an error message: Runtime Error: Method ‘~’ of object ‘~’ failed. I went over to his blog and saw similar error messages being reported. It could be due to hitting a memory cap or something, but why not just elegantly stop crawling links when you get above a certain threshhold?

My biggest complaint though, is the fact that you can’t use this to check large sites. Those are exactly the kind of sites I’d want to check for page similarity. I mean, why would I want to run my 30 page blog through something like this?

Suggestions:

Reference robots.txt and/or check robots tag and ignore disallowed/noindex pages.
Do not follow redirect urls that lead outside of a given domain.
Automatically limit number of pages crawled to prevent the program from crashing.
Ignore image, audio, video files (or does it already ignore them?)
Give users an option to compare urls in a subdirectory instead of comparing each page to every other page in a domain.

Google Sitemap Reporting 404s under Summary

Halfdeck — Sat, 27 May 2006 12:06:06 +0000

Note: This post is a follow up to my earlier post, Googlebot Refreshing Supplementals.

I noticed this morning that Google Sitemap is reporting 404s on the Summary page. The pages Google Sitemap reports missing include some of the pages I’ve been getting emails for since May 20th.

Not all the urls showed up under HTTP errors, just around 11 of them.

As I said earlier, the only interesting thing about these 404 pages are they no longer have any incoming links, and they exist only in Google’s supplemental index. So, I’m kinda hoping Google is in the process of refreshing their supplemental index by checking urls in their database, to see if they changed, 404/410, or 301 redirect to some other url.

In fact, Matt Cutts has previously posted on his blog (520 freaking comments, longer than any forum thread I ever seen, never mind a blog post) that a supplemental refresh has been going on since before April: “In early April, we started showing some refreshed supplemental results to users.” He also said the index/crawl team’s turning their focus to refreshing supplementals: “Well, now that Bigdaddy is done, we’ve turned our focus to refreshing our supplemental results.” Another quote (this one I remember reading the day he posted it), which suggests we have a glimmer of hope to have all of our supplementals refreshed by September: “I believe that folks here intend to refresh all of the supplemental results over the summer months, although I’m not 100% sure.”

Well…this is what I like to know: will lack of high quality/relevant incoming links or crappy outgoing links put a site at the back of the bus during the big summar supplemental refresh, or what?

P.S. Anyway, assuming Google will refresh your supplemental pages by the end of the summer, if you or your client is having problems with supplemental pages, this is a good time as any to make sure you don’t end up with a bunch of new supplemental pages, and wait another year to have them refreshed again. Beef up those product pages, make sure your title/description metas are unique across your entire site, tighten up your dynamic url handling, etc.

Turning Blog RSS Feed into Google Co-Op Subscribed Links

Halfdeck — Fri, 26 May 2006 13:54:45 +0000

If you’re looking for a way to submit your blog to Google co-op, I just wrote a script that you can use. It makes your blog posts appear on top of Google SERPs, using categories under which you submitted your post. For example, if a post was submitted under “SEO” or “Google coop”, it’ll show up for those keyword searches. Assuming Google recrawls your submitted link occassionally (it seems to recrawl every few hours for me), the results will refresh automatically as you publish more posts.

Just a warning: this is a BETA script I wrote an hour after I woke up without coffee. I’m also not convinced submitting your blog posts to Google co-op is really productive, but no harm done until somebody loses an eye.

Anyway, back to my script. All you have to do is submit a link to Google Co-op using this format:

http://seo4fun.com/php/rss-parser.php?url=YOURRSSFEEDURL

The URL you submit must be RSS 2.0. The source is written in PHP 5, in case you’re wondering. There’s a ton of features missing (e.g. I’d want it to scan for technorati tags embedded in posts, though that requires full feeds) and it ignores some guidelines (80 chars per text1 line, no http:// in url, etc). Let me know if you’re interested in looking at the source code. Submit your subscribed links here

Updates:

The script might timeout if your feed is huge, but I doubt it.
posts with tabs/carriage returns (\t\n) in description fields weren’t getting parsed [fixed].
Some feeds like seomoz.org that lacks categories won’t work.
If your blog doesn’t run on Wordpress, chances are my script will choke on your feed.
If your post urls use a ‘?’, it won’t parse [Fixed].
Barfed on titles with commas [fixed].

Blogs I’ve tested:

First blog to go under the knife: My SEO notebook.
SEO By the Sea (bulky category names but it seems to work): Google Co-op XML. Searches: (design, search engines and directories, usability, search engine optimization, culture, internet advertising)
SEOMoz (doesn’t work, found no categories).
Matt Cutt’s Blog works, but his categories are wacked (i.e. “weblog/blog”). searches: (”weblog blog”,”google seo”, “personal”, “productivity”, “web net”,”how to”…)
SEOBuzzbox uses instead of [Fixed] Searches: (marketing news, dmoz, blackhat, matt cutts, v7ndotcom elursrebmem, internet marketing, google news…)
Jim Boykin’s Blog seems to load without a hitch. Searches: (places, jim’s crazy ideas, yahoo, “CEO, not SEO stuff”, jim boykin, google, internet marketing tips, seo research, link building…)
Damn Jezebel’s Dairy searches: (conversations, randomness, bitching, what the fuck?)

Subscribe to my profile instead of submitting those feeds individually, and run a few Google searches to see what happens.

Notes

I see every damn RSS has its nuances, so it’ll take a while before this script can handle 50% of the blogs out there.
What’s the point of subscribing to multiple blog feeds with Google Co-op? (I luv my Google Reader). SER type subscribed link display is cool, but individual blogs? Plus you need to remember categories to pull up results. Not practical.
Only one match per account. Say I’m subscribed to 10 links, and 5 of them match a search for “yahoo.” Only one will show in the Onebox. There’s also no ResultSpec timestamp element, so there’s no way to make the most recent item to show. Also, subscriptions complete for the same keywords (e.g. for Google, digg/sew/and ser compete for the same onebox space), something that would steer me away from running a generic search like “seo.”
Some blogs use long tail categories or throw several words under one category (i.e. weblog/blog). I prefer using data objects to handle synonyms.

Googlebot Refreshing Supplementals?

Halfdeck — Thu, 25 May 2006 10:27:18 +0000

Weird 404 error email I sent myself yesterday (url hidden to prevent linking to an adult domain):

HTTP_REFERER: [blank] HTTP_HOST: www.domain.com PHP_SELF: /fgdfgfert4534.html REQUEST_URI /NONEXISTENTURL.html REMOTE_ADDR: 66.249.65.69 TIMESTAMP: 5/24/2006 9:15 PM

Quick explanation: I rigged my dynamic pages, so a request to retrieve “maroon-widget.html” 404s and triggers an email if I don’t have “maroon widget” in my database.

REQUEST_URI is linked from nowhere; it exists solely in the supplemental index. I’ve seen Yahoo do this kinda thing, but this week I’m starting to see Google do the same thing. I guess Google’s basically crawling my site using its own database instead of following links. Is this a common behavior/part of a normal crawl, or is Google trying to clean up supplementals?

Looking up 66.249.65.69 in Google returns 208,000 jibberish results (mostly those that display your IP on their page). So I guess its just a regular bot, not supplemental bot?

Update: After cleaning out my inbox, I found similar emails going back to May 20.

Google Coop Blog RSS to XML Generator

Halfdeck — Wed, 24 May 2006 22:20:46 +0000

UPDATE: Read this follow-up post to learn how to feed your Blog RSS to Google Co-op.

Scanning through Google Co-op Group, I came across a Google Co-op XML generator by 1000apps. Unfortunately, it uses words from the title instead of blog categories to generate the XML. So I’m going to have to rewrite it. This is the outline of my code (noting for myself, so I don’t have to think about it later):

Parse Blog RSS using regexp; create a class for individual pages, store title, url, and categories per page. Store those classes in an array.
Foreach item in the array, loop through categories. For each category, generate a Google co-op XML entry.
write to output file, using the blog title as part of the url to avoid dup urls.
done

Google XML entry structure:

CATEGORYNAME

POSTORBLOGTITLE
URL
DESCRIPTIONSNIPPET

Oh yeah, bracket the page with ,
Note to self: Make sure I blog robots from the dir I dump the XML files.

It is kind of cool to show up at the top of Google search, even though only I can see it. If I do get around to rewriting that script, I’ll link to it from this blog.

My Next Obsession - Getting a Site Back in the Index

Halfdeck — Tue, 23 May 2006 13:24:22 +0000

For the last two months, I’ve been maintaining a holding pattern with one of my adult sites, to no avail. The general concensus was that something was up at Google and I shouldn’t do anything drastic. But now I’m going to start working on my site again. Till I see some progress, how to get a screwed up site reindexed in Google will be the primary focus of my blog.

List of Todos:

Write more articles.
Take advantage of technorati and other sidekicks (though the site isn’t a blog).
Install a blog if necessary.
Optimize for MSN.
Streamline internal linking structure.
Beef up product pages.
Keep track of cache dates.
Periodically, refresh content (the site’s dynamically generated, but uses static elements).
Linkbait by writing aggressivly and getting it noticed by people I can piss off.

What I’ve been Up To

Halfdeck — Wed, 10 May 2006 12:52:44 +0000

I surfed over to Dan Thies’ blog yesterday and noticed he’s been slacking off on his blog, like I have with this one. Matt’s also been absent for a couple of days, though now he seems to be back blogging in full force. I’ve started a couple of new blogs this week, one blog for generating sales with adult paysites, and another blog ala ProBlogger, except my target audience is adult webmasters. Since Mike G wrote his anti textbook SEO article, I’ve gotten curious how bloggers generate traffic, and I’m now spending some time playing around with side kicks, Technorati tags and chicklets. So I write blogging notes on Adult Blogger while I try out my ideas on my other blog. I intentionally kept Google, Yahoo, and MSN out of my new blog just to see how much traffic I can generate without SEs. I won’t be linking to those blogs from here. I’m also thinking of building a time management blog just to help myself manage my days better.

Anyway, I’ve been pretty busy. I might finish up with some of the drafts on this blog too if I get around to it, but with Google dropping pages every day (this domain went from 58 pages indexed back down to 3), it’s hard to say anything. Most of my sites rank well if I can get pages indexed. But since Matt said BD’s crawling priorities are different from the past, its all up in the air. I’ve actually gotten an email back from Google’s Search Quality Team, telling me “Your site has not been manually penalized” which is cool, but Google needs to start indexing some of my pages.

The biggest bomb of the week is Shoemoney’s post about plentyoffish (noooo I’m not gonna link to that). I’m actually surprised SEOMoz haven’t blogged about it yet. It’s taken a nasty turn, and nothing against Shoe, but personally I believe Mark makes plenty of money off his site. All you ever need to make money with a site is traffic and clicks…and with adsense and 890ish Alexa rating, you can’t lose. With my sites, I have to not only generate clickthrus, but that click has to generate a sale for me to make money. So he really got it easy.

Anyway, that’s all I got.

Feedster Feeds off del.icio.us

Halfdeck — Wed, 03 May 2006 20:31:11 +0000

You’re probably wondering what’s so great about feedster traffic, right? :)

Quick side note: I’ve been struggling trying to decide whether to break up this blog into several blogs, so say people interested in reading about Google don’t get hit with RSS feeds about MYSQL. But since I don’t run this blog for money, I’m going to just blog away with whatever I’m interested in related to SEO. What I will do is use as descriptive titles as I can, so you can save yourself some time deciding what not to read by scanning my post titles.

Also, since this blog is mainly for my own benefit, expect me to go back to an earlier post and add new information, instead of writing short posts. That’s not good SEO (more pages, the better), and not good blogging either I know, but right now I have about a hundred unposted posts on hold because I’m worried about publishing newbie/boring/old ideas that’ll be a waste of time for you to read.

As part of an experiment to feed videos to Yahoo, I added del.icio.us tags to the video pages and socially bookmarked them using a batch of the most popular tags related to the content of each page. I didn’t get a whole lot of clicks from del.icio.us (20 hits after 12 hours to one page), and after a few hours/days I’ll expect that to trickle off, but for an unindexed page that’s not too bad. I also noticed my page shows up on Feedster, though the listings lacked descriptions (This post does not have a description) probably because I omitted writing descriptions to my del.icio.us entries.

My notes on del.icio.us, feedster, technorati and…Furl (so far):

Use as many popular tags related to a post as possible to maximize initial burst of traffic.
Add bookmarking tags/RSS links on a page to generate repeat visits. (These weren’t blog pages, so I added them manually).
Write descriptions for del.icio.us entries so they show up in feedster listings.
Publish bookmarkable/linkbait pages.
Bookmark with Furl using popular technorati tags as categories. Technorati Links feed off Furl, using Furl categories. Traffic is low unless you know your high traffic tags, so ROI might not be all that. /tag/ pages have robots=nofollow on them too, so those pages get indexed but you get no link juice, which is no big deal since the link wont stay on there forever anyway.
Repeat a keyword a few times in a post for longer Technorati description in their search results, which equals bigger adspace.
Tag your post before you ping. (Yep, I’m doing them by hand, lol. I’ll have to try out that Wordpress plugin in a bit.)
Use relevant, but different set of tags per post

For 40+ posts/day tags, hammer them every post; less posts/day tags i can spread them around a bit.

Technorati Questions

How do I know which technorati tag gets decent traffic (besides trial and error?) Does it correlate with Wordtracker/Overture?

Google Hits: “how to get indexed by feedster” Page One, 5/26/2006. Talk about a long tail, no traffic listing.

Yahoo Video RSS

Halfdeck — Wed, 03 May 2006 07:54:58 +0000

WARNING: Unpolished note to self. I’ll modify it as I know more about video RSS.

I’m having some problems getting new videos indexed in Yahoo Video. I’m reusing XML feed that worked in the past, but Yahoo is just not biting. Here’s a list of what I know works and what didn’t work in the past so I can come up with some ideas. Getting indexed in Yahoo Video is so much easier if I hosted videos on the same domain, but that’s out of the question at the moment.

BTW, I also noticed that my usual RSS feed validator doesn’t detect Media RSS errors. Use the link on the bottom for validation.

What seems to work before:

Embedding videos in HTML, and 302 redirecting.

What didn’t work:

Linking to video clips with url /xyz.wmv, then redirecting in .htaccess.

With my main site, I’m linking straight to video clips, so that might be the problem.

One validation error I fixed was this:

video #x from example.com

if isPermaLink=”true”, then a valid URL should be used instead of text.

Then again, my older RSS feeds don’t use guid but they worked.

How would I go about testing some stuff with Yahoo video?

Type of Link	Same Domain	mod_rewrite	Indexed?
href	Yes	No	-
href	No	No	-
href	No	Yes	Not Yet
embed	Yes	No	-
embed	No	Yes	Yes

Another possibility is if RSS feed is sitting on domainx.com and the html is on domainy.com.

Systematic testing

Upload 10+ video clips. Write a few different versions of Yahoo RSS feeds. Submit them all to Yahoo. Submitting one feed and waiting for results eats up too much time.

How do I improve my placement in Yahoo Video Search?

Right now, competition seems to be thin so placement isn’t a huge concern, but I’d still like to know the answer to this one.

Interesting quote regarding PlayerURL Field

Hopefully, that makes the reasoning a bit more clear. One of the
problems we want to solve is in the cases that a RSS feed can only
publish the playerUrl (for whatever reason).

Our playerUrl is just a normal web page
that shows the media content in the general sense of syndicated web
content.

How do we efficiently
allow the feed to include all interesting meta data about the media
object (such as real height/width)? From the search perspective, we
want this information since we can’t actually “touch” the media
object to determine it ourselves.

Ideally, from the search perspective, we’d much prefer that the
content provider include both a direct link (url) as well as a player
link (playerUrl). That way, we could harvest all the meta information
from the real content, while still delivering the experience to the
end user that the content provider prefers and/or requires.

So this means supplying a direct link will allow Yahoo to scan the video and generate meta data automatically, while just providing player URL will allow you to supply any url to Yahoo.

Yahoo Group: PlayerURL

Yahoo Video Hearsay

Some untested comments I picked up while surfing.

“the example in the spec leads me to believe that urn:mpaa ratings are
expected to be in lowercase.”

“The space in the media:thumbnail url needs to be replaced with %20″

“Since you are doing bold, italics, and line breaks inside the
media:text, you want to indicate this by specifying ‘type=”html”‘ on
that eleemnt.”

“This is pointing to the channel description, which is described in the
RSS 2.0 spec as being a “Phrase or sentence describing the channel”.”

Yahoo Video Search Resources and Links

Yahoo Video submit form
Yahoo Video Media RSS Specification
Yahoo Group: RSS Media
Feed Validator Test Cases
Feed Validator

For my main site, I want to decide on a page layout that will help me get my videos indexed while not throwing video pages into the supplementals. How can I do this?

One site I see link going straight to videos on the same domain.

Log: Last submitted a feed to Yahoo on 5/4/2006

Update Wp-Config.php When Re-Installing Wordpress

Halfdeck — Mon, 01 May 2006 17:17:27 +0000

After browsing Shoemoney’s Toronto pics (link’s dead so I removed it) at 7 am, I decided to start a new blog. Being half asleep, I used the same set of files I used to create this blog. After installation, I tried to load this blog and got a blank page. Oops. So I refreshed the /blog/ directory using Dreamweaver. That gave me a funky error: “Database error: [Unknown column ‘user_level’ in ‘where clause’]”. After Googling that error message, I learned user_level in wordpress_user table is obsolete in 2.0. I must have had Wordpress 1.0 sitting in my local /blog/ directory! So I backed up .htaccess, deleted /blog/, and uploaded a fresh copy of Wordpress 2.0.

Phew.

REMINDER: When installing a new Wordpress blog, use a fresh copy of Wordpress, or go into wp-config.php and make sure the DB info doesn’t point to one of your other blogs.

Keyword surrounding Links

Halfdeck — Fri, 28 Apr 2006 15:36:04 +0000

Jim Westergren wrote an interesting post in his blog recently. concluding:

both Yahoo and Google uses the text surrounding links in their algo

So, I looked at this test page that has an outgoing link. I took a snippet next to that link and ran a Google search, but the target page refused to come up. What am I missing?

Power of META tags in Yahoo

Halfdeck — Thu, 27 Apr 2006 16:00:10 +0000

After two months, Yahoo is finally starting to show a few pages from this site in their index. This SERP shows pages with keywords in META keyword/description rank in Yahoo, even when keywords are absent from the page copy. In fact, a page with keyword in just META description tag outranked this site’s home page, though since only a few pages are competiting for the same keyword, the SERP ranking order may not actually mean much. Both Google and MSN ignore keywords in META tags, at least when keywords don’t appear on a page. They may be useless - period - but I’ll have to confirm that with a test when I have more time to burn.

Duplicate Content Revisited

Halfdeck — Sun, 16 Apr 2006 18:16:12 +0000

Tedster wrote a meaty post in WMW concerning duplicate content:

Google tries to select the dupes and then put all but one of them into the “supplemental index”. If a domain has just a few instances of duplication like this in the Google index, things tend to go on as normal. But when many, many urls start showing up, all with identical content, then something seems to get tripped at Google and a site can start to see trouble.

One of my domains in supplemental hell (where not even my home page can be found on the first page of a site: query) uses a few directories to link off to sponsors - this adds up to a few thousand supplemental URLs out of 7,000 indexed. I also have session ID type seed in urls that results in thousands of supplementals. The result? I’m seeing even unqiue pages end up being listed by Google as supplemental.

Still, didn’t Matt Cutts say there’s no such thing as “duplicate content penalty?” What tedster says doesn’t sound like a mere filter - it sounds like a site wide penalty. Then again, I never fully trust what anyone says. I mean…Google didn’t even notice the sandbox phenomenon until someone pointed it out to them :)

Why doesn’t Google just want to forget supplementals? (speculation) Because if any page points to them, Moz-bot will have to recrawl them later, and it’s better to have records of these urls to avoid wasting time recrawling and indexing them.

Makes you want to start a few domains over from scratch, doesn’t it? My only hope at this point is to increase the number of incoming links, beef up PR sitewide and write more content. Still, seeing site:domain.com return the same SERP for the last 2-3 weeks isn’t encouraging.

Supplemental Test Update April 13th, 2006

Halfdeck — Fri, 14 Apr 2006 03:51:10 +0000

I’ve been generally keeping up with WMW posts and other SEO blogs and even kept webmaster radio running since noon today, but nothing is really grabbing my interest. I do have a pile of blog post drafts sitting around unfinished, though some of it is so specific to my domains that I’m not sure who will benefit from reading them. I opened up this blog to keep my thoughts organized, but I don’t want to go so far as to air dirty laundry in public (like the mess BD made after March 28). I’ve also spent some time today checking out Google’s calendar and Google Reader after Pluck started freezing IE. Because of that, I didn’t get a whole lot done today. A female buddy from Holland kept ICQing me throughout the day, which didn’t help either.

Anyway, here’s a few casual observations on how these supplemental test pages are doing on Google. The results are temporary so what’s not supplemental today may not even be in the index a few days from now. Also, the fact that these test pages aren’t in English may have an effect in keeping these pages from going supplemental. I may have to find a random text generator in English. It could also be that recently crawled pages are handled differently than before and pages are either dropped or indexed, instead of falling into supplemental index (just a guess).

Pages with unclosed head/body tags are indexed fine.
Pages 60%, 75%, 80% similar to the original is not supplemental.
A page with unique title/description but the same body text as the original page is not supplemental.
Original page is indexed.
A tiny page with 20 words is indexed and not supplemental.
A previously indexed page 90% similar to the original is gone from the index. The root page http://seo4fun.com/ is also gone; only /index.html remains. Is this an indication that Google is dropping pages from the index instead of storing them in the supplemental index? Or are duplicate content pages just being filtered out from SERPs?
Notice that optimization for these pages are identical (just one occurance of keyword) but this SERP is making me think #1 is a direct result of keyword placed at the start of a sentence.
This SERP shows the 80% similar page get dumped in omitted results, because the paragraph containing the keyword is identical for those pages.
One weird thing I noticed is the normal SERP returned 9 of 13 (4 pages omitted) and 11 when I clicked the omitted results link. The 4 omitted pages are those starting a paragraph “Se4funsupp testcases nullam velit libero” phrase. In this case, Google somehow chose the 60% similar page and dumped the rest in omitted. The omitted pages are: 80% similar / The original page / Unique title and desc identical body text / 70% similar. Now for different queries, I can see pages getting omitted, but its interesting that Google picked the 60% similar page to keep in the SERPS. It could be due to something else but only one incoming link goes to each page so everything else should be equal. Also, when I click the omitted results link these 4 pages go back in but why only 11? Which are the 2 missing pages? A page with broken head/body tag (2-15) and (2-11) 100 Words page disappears. Not sure why they poof.

By the way, a brief update to my previous MSN post. This SERP shows the home page and SEO combined page in 1st/2nd in Google. The home page is probably ranking 1st merely due to PR 3 (other PR 3s are /blog/ and /carcasher page). I’ll mention the keyword once on this page (”Brandnewsacx1 sasahz”) and see how high it ranks. Notice many of the keyword stuffed pages ranking high. Though a page with keyword in the title still ranked higher than any of the keyword stuffed pages, I still got this SERP - keyword repeated 11x (4th) 10x(5th) 9x(6th). Coincidence? Probably, but it’s eery.

MSN Optimization

Halfdeck — Sat, 08 Apr 2006 08:37:00 +0000

Probably old news to most SEOs, since I dont bother with MSN all that much, the idea that on-page optimization is king for MSN is news to me. This serp seems to confirm it, where the page that repeats keywords 11 times ranks the highest among all other types of optimizations.

Looking at my test pages, results are like night and day on MSN and Google. I guess I should really build 3 different sites running on the same database for 3 search engines, which is actually ridiculously easy do, except building incoming links — that would be the hard part, to split links among three versions of the same site. But it would make sense at least in the short term to build a site specifically targetting MSN.

Googlebot Slowing to a Crawl

Halfdeck — Fri, 07 Apr 2006 19:55:15 +0000

I thought I was the only one stuck in supplemental hell with no Googlebot to rectify the problem/pick up new pages but according to a thread in WMW it seems I’m not the only one experiencing this problem. Some guys report last visit by Googlebot at the end of March, which is pretty much the same as me. In fact, this month I’ve had more visits by Inktomi Slurp than Googlebot.

One thing I’d like to know is how do I tell Mozilla Googebot and Googlebot apart, but right now I’m too lazy to look it up :)

I’m also starting to read about a Google rollback to Aug 2005, with pages indexed from Jan 2006 up to March 28 lost in transition to BD - which sounds like the problem with my site.

Update Dynamic Pages like Walking on Glass

Halfdeck — Tue, 04 Apr 2006 07:49:41 +0000

Never change php code on a live page or underlying PHP classes called before writing a test class/test page first. You can crash hundreds or thousands of pages at once with one false move and if spiders happen to be deep crawling your site, boom, you’ve created a mess you might not be able to recover from, ever.

Here’s an example. Say you want to change livepage.html. Don’t touch it. Create livepage1.html and make changes to that page. Test it out to make sure it does what you want. Then save the page as livepage.html. Pretty basic stuff, but something you should never forget to do.

Other PHP Good Practices

- Make pages as PHP independent as possible. Try to use cached elements instead of doing live PHP calls.
- Write clean PHP classes. Create sensible class extensions.

Dynamic / Fixed Tables

Halfdeck — Sat, 01 Apr 2006 14:03:32 +0000

Read an interesting post by AlexK on WMW on the pros and cons of fixed/dynamic MYSQL tables. I was looking for some info to decide whether to keep my multiple tables or join them. Joining them will simplify inserts/updates but what I’m also wondering is its impact on speed. According to AlexK, fixed tables will auto convert to dynamic when I add Text/Blob/varchar (which I usually do), so exporting just those to a separate table and running left joins is his recommendation.

MYSQL reference regarding Fixed MYSQL tables

Why I don’t Like Index.html

Halfdeck — Sat, 01 Apr 2006 11:22:22 +0000

The obvious answer is supplementals in Google. I’ve used index.html on about 3 of my domains and since I use Dreamweaver, and sooner or later I make the mistake of linking to a page using /index.html and boom… Google will index it. Even this domain has /index.html for root url. It’s a good thing that’s the only index page I have on this domain (besides the crap wordpress generates, but that’s another story).

BTW, I noticed both / and /index.html on this domain has the same cache date and still both are managing to stay out of supplementals. This could mean a tweak in how pages are crawled? I’m not sure what it is, but one of them should be listed as supplemental. I’m sure later on it will be, but this tells me depending on adjustments made to Google’s duplicate content filter, which will come eventually, some pages listed as unique pages will get thrown into supplemental listings somewhere down the line.

So just because pages are not flagged as supplementals now won’t mean it won’t be later. Scary, huh?

Also, completely unrelated, but i got a hit for “why did i get deindexed by google” so I ran a search. I see this site listed under digitalpoint on page one but I notice:

Did you mean: why did i get reindexed by google?

Google Doesn’t Parse Keywords in URLS

Halfdeck — Tue, 14 Mar 2006 02:35:27 +0000

A few days ago, John Scott blogged that Google doesn’t parse keywords in urls. I was previously told by many reputable webmasters that Google can parse common keywords in urls. Here’s what would seem to be an indisputable proof: searching for “search engine optimization” returns searchenginewatch.com, with the words search engine in the domain name highlighted. So that proves Google does parse keywords in urls, right?

Wrong. John Scott’s tests demystify this notion completely. He points out that highlighting does not mean parsing. Even for common keywords like “sexual frustration”, Google fails to parse them in URLs.

Another proof is this search for the word “fun” on this site. If Google recognized the word fun in the domain name, it would return all pages in the domain; instead, it returns only the home page, which is the only page that contains the word fun on the page.

This teaches me a couple of things: 1) Never blindly buy into what an authority says 2) When in doubt, test it out =)

Does Google’s Spam Reports Work?

Halfdeck — Mon, 13 Mar 2006 12:53:42 +0000

A few days ago, Matt posted a request on his blog for people to file spam reports, especially for keyword stuffing and Asian spam sites. After reading that, I had a chat with another webmaster friend who thought spam reports were useless. Her point was that spammers injected hundreds of new spam domains every day, and reporting each one, even if Google acted on those reports, was a waste of time. Her business also doesn’t rely much on Google traffic, so that does skew her perspective a bit.

On the other hand, I file spam reports often. I think spam reports are useful and they’ve worked in the past to knock out hundreds of domains. The dissatisfied? link can be used to report spam in batch mode. I also hardly doubt Google just takes spam reports, verifies them, then black list individual sites. I’m sure they’re taking note of spam tactics used to improve their spam filters.

Anyway, a few days after Matt posted the request on his blog, there are over 60 replies on it. Most are your average gripes and views like “spamming is bad”, “adsense created spam”, etc. Still, here’s a list of some comments that stuck out in my mind:

Google does absoluetly nothing against hidden noscript content (or content in hidden divs). I reported about 5 examples to Google and Matt about hidden div spam ages ago and nothing was done, so now I do it on all my sites and am making an absolute killing. I’ll keep doing it until I see the original spammer I reported get axed (that’s www.ambergreeninternetmarketing.com and its clients).

I admit after reading that I wanted to see a concrete example to test out on this site.

Rawalex adds:

Harith, the response time between “spam in the index” and “spam removed from index” is very critical to how profitable spamming can be.

PhilC writes:

If I ran a search engine, and I wanted to deal with spam, I definitely wouldn’t remove sites that were reported as spamming. I’d want to write algos that would deal with them, and others that contain the same sort of spam, and I’d want them in the index so that I could see if the algos worked.

This line made me chuckle:

You’re sending Matt unsolicited job offers through spam reports? That’s sort of ironic isn’t it?

Robots.txt Before Linking Up

Halfdeck — Fri, 10 Mar 2006 11:30:49 +0000

First thing you should do after you buy a domain is install .htaccess that deals with canonical issues like non-www and /index.html.

Second thing you should do is install a robots.txt that prevents Google from crawling anything except the domain root.

If a hacker decides to submit your urls with a Google url removal tool, this may suck, but in general this will prevent alot of headaches later down the road.

If your site has low PR, Google won’t come around to deep crawl the site every day. If you screw up and send 100s of pages to supplemental hell, you’ll have even a harder time getting them recrawled and getting them back into the main index.

So imagine you get one shot for a clean crawl. Block spiders with robots.txt; build the site, check for dup problems, and only when your site’s ready, let Googlebot in.

—

Disallowing Googlebot from indexing a page in Robots.txt only seems to work if I put up a robots.txt file before Google ever spiders the page. If I use robots.txt after the pages are already crawled, the disallowed pages just get thrown into the supplementals and stay there till the end of time. If an entire site is blocked by robots.txt, it may be hidden from the SERPS but I suspect Google will still have records of the site in its database.

Evidence?

I have two directories on one domain I use to clicktrack out to sponsors. First domain I noticed these urls were cluttering up my site:index results in Google, so I put a robots.txt to block them from getting indexed. I also hoped that by putting up robots.txt, Google would eventually drop those urls from its index.

I also included a new directory in the same robots.txt, and switched many of my outgoing sponsor links to use this directory instead. After 6 months, none of the links under this directory shows up when I run site:xyz.com/directory/.

Well, what about WMW blocking their entire site using robots.txt? That seemed to work. But did those pages really get deindexed, or did they just stopped showing? My guess is they were just hidden from the SERPs.

Anyway, experience tells me robots.txt is not a tool for getting Google to drop pages from their index. Your page may be dropped from the main index, but most likely, they’ll migrate into the supplementals and stay there indefinitely.

If you want to keep Google out of a directory, put up robots.txt before that directory goes live.

Matt Cutts on Robots.txt

Matt Cutt’s wrote an interesting post on 3/17/2006 about robots.txt and why sometimes pages show up in Google SERP as URL only even when robots.txt forbade Google from indexing it:

Obscure note #1: using the ‘googlebot=nocrawl’ technique would not be the preferred method in my mind. Why? Because it might still show ‘googlebot=nocrawl’ urls as uncrawled urls. You might wonder why Google will sometimes return an uncrawled url reference, even if Googlebot was forbidden from crawling that url by a robots.txt file. There’s a pretty good reason for that: back when I started at Google in 2000, several useful websites (eBay, the New York Times, the California DMV) had robots.txt files that forbade any page fetches whatsoever. Now I ask you, what are we supposed to return as a search result when someone does the query [california dmv]? We’d look pretty sad if we didn’t return www.dmv.ca.gov as the first result. But remember: we weren’t allowed to fetch pages from www.dmv.ca.gov at that point. The solution was to show the uncrawled link when we had a high level of confidence that it was the correct link. Sometimes we could even pull a description from the Open Directory Project, so that we could give a lot of info to users even without fetching the page. I’ve fielded questions about Nissan, Metallica, and the Library of Congress where someone believed that Google had crawled a page when in fact it hadn’t; a robots.txt forbade us from crawling, but Google was able to show enough information that someone assumed the page had been crawled. Happily, most major websites (including all the ones I’ve mentioned so far) let Google into more of their pages these days.

That’s why we might show uncrawled urls in response to a query, even if we can’t fetch a url because of robots.txt. So ‘googlebot=nocrawl’ pages might show up as uncrawled. The two preferred ways to have the pages not even show up in Google would be A) to use the “noindex” meta tag that I mentioned above, or B) to use the url removal tool that Google provides. I’ve seen too many people make a mistake with option B and shoot themselves in the foot, so I would recommend just going with the noindex meta tag if you don’t want a page indexed.

Wrong SERP Snippet for Cache

Halfdeck — Wed, 08 Mar 2006 12:10:13 +0000

I’ve always assumed title/description snippets displayed in the SERP reflects what’s in Google’s Cache. But now, I’m starting to see at least one page where title/description doesn’t match what’s stored in the cache.

Here’s an example:

The cache of one page I’m looking at (I won’t post the url since its adult related) is dated 3/5/2006, and the cached page is up to date, but the title/description that actually displays in the SERP is from an ancient version of the page.

I also noticed that for this particular page across a dozen+ DCs, there are at least 2 versions of the cache. For a particular page I’m looking at, the dates on the cache are Aug 9 and Mar. 5. There’s no way the March. 5 cache is stored in a supplemental database (the way I rewrote the page, there’s no way it’ll end up in there), so even though the page is listed as supplemental, the cache must be from a live repository.

My guess is the document’s hits aren’t re-counted yet and its snippet is still outdated. Though the page is crawled and cached recently, according to the document snippet, the page is supplemental.

I’ll check a few more pages to confirm this. But if this is true, I’m still inclined to believe Google has no more than one copy of a page in their supplemental database.

This is slightly off topic, but here’s another note to myself when looking at a domain listing in Google pages are either:

main index
supplemental
main index but similar
supplemental and similar < - these are hardest to spot.

So far, I found 2 urls with snippets not corresponding to the cache.

Cached pages for a few other supplementals are identical (timestamps and stored HTML are the same) across all DCs, so this one page must have been a glitch.

Google Cache

Halfdeck — Tue, 07 Mar 2006 03:03:57 +0000

I just checked my site at 216.239.59.147 and noticed a huge drop in pages indexed. Either I’m still doing something wrong or Google is hiccupping again.

I need to check my pages on this DC and see how many of my pages including subdomains are indexed correctly.
Since Google keeps falling back to cache from August these days, I’m wondering if

A. Google keeps several versions of document in its cache per DC

B. Google keeps one version of document per DC.

Also, considering BD seems to shift in and out of a DC, I’m not sure what the hell I’m seeing.

Maybe my site is lacking PR to get crawled often, but considering the major supplemental “bug” hitting alot of well-established sites out there, lack of PR may only be a small part of the problem.

Question: Are the pages listed as supplementals supposed to be supplemental? (definitely, some pages are supposed to be supplemental; usually every domain naturally has a few supps). Were the pages close call dupes that got tipped over into supplementals due to a bug in the dup filter aglo?

I’ll have to take a look at a few of my competitor sites to see how their pages are holding up in Google.

I just did a &filter=0 to my site and the page count jumped from 700 to 10,900!? Now what in the hell is going on?

Another thing, how did Google crawl/cache sites again? It has a few Mozilla/regular crawlers (around 300?) and they cache the information where? In a single repository or do they dump their info on separate DCs (which would make no sense). They obviously must all share the same docID and url hash…but what about the HTML cache? Is there some status field which prevents a page from being displayed depending on the DC?
Why would a site show on one DC and not display at all on another? I’m missing a basic SEO 101 info here…

Here’s an interesting speculation by lammert on supplementals at wmw (msg #61):

The search engine index is primarily designed to store pages, pages and even more pages. The rate at which new pages occur is higher than the rate at which old pages disappear. This was at least the situation a few years ago. As a programmer I wouldn’t be surprised if Google designed the database to be add-only, and solved the delete problem just as DBase did, by marking unwanted records with a flag.

According to g1smd at wmw, Google has a database for supplementals apart from a live cache, and that supplemental cache once it gets in there is permanent, even when hosting is taken down and a domain no longer exists.

Think of this in terms of MYSQL tables: google_supplemental DocID / url hash / cache date/ cache content.

google_live DocID / urlhash / cache date/ cache content.

Whatever record gets added in google_supplemental is never deleted.

Interestingly, steveb added: “there are two sets of supplementals on different datacenter groups, so depending on what datacenter you hit you could see a different batch.”

I can’t confirm this, but it’s worth noting.

Another quote, this one from Dayo_uk: ”

These pages have been crawled and cached in the supplimental index but not been crawled or cached in the normal index….The question that people should be asking themselves is why Google are now not listing there pages in the normal crawl as theses have disappeared rather than the pages going supplimental (as a supplimental copy was probably already there).

So…pages didn’t go supplemental, just that pages in the main index is no longer being displayed(?)

I think Dayo_UK is on to something here. The pages that were crawled correctly (e.g. the homepage, which never had a supplemental problem) is showing perfectly in BD DCs.

Now I ran another subdomains page count, a few minutes ago I saw 584 indexed on 64.233.179.104. Now I see 684 on the same DC. Even with filter=0, the resulting number is the same. I guess it could be a timing thing…but where is that 10,900 number coming from? Is that just an approximation? I ran page counts on every directory and they don’t add up anywhere close to that huge number.

I know Google sometimes hides pages, but usually if I do a site: targetting a directory, it will show the pages that are being suppressed. I just can’t figure it out.

Even with a 684 page count, Google is only displaying 276, which is pretty much identical to the number of pages Yahoo has indexed. Am I missing something? And why are the rest of the stuff not being displayed? Are they supplemental? filter=0 won’t display them.

Is this some kind of subdomain penalty??

site:janesguide.com -inurl:www returns 607 pages today on the same DC, around 86 urls as unique. Lets see why they could be supplemental.

Some urls with 5 query strings are supplementals (they should be).
Some urls are not supplental but are hidden as “similar pages” due to identical META description: “Since 1997, JanesGuide and Jane Duvall have been your guide …”
All subdomain root urls are cached correctly and 607 out of 607 shows up in the SERP.
Interesting to note that the pages are all light, around 7k
The nav menus are below content, and there are 240 unique words on the page.

At around 350 urls, the rest are going supplemental.

I refresh a few minutes later and now the page count is 718, and a 10k page count with filter=0. 306 documents shown.

I only use a few &query type urls, and I have most of them blocked by NOINDEX. Could be Google indexed alot of them and are completely suppressing them.

I’ll have to email myself when generating 404 pages just to see if bots are crawling some non-existent pages.

What really bugs me is out of say 724 pages, only around 250 are actually displayed in the SERPS. Some of those hidden pages are supplementals. But what about the rest of the pages?

Google Sitemap FAQ

Halfdeck — Mon, 06 Mar 2006 13:05:33 +0000

This is my list of things about Google Sitemaps I’m pretty certain about.

“A Sitemap file can contain no more than 50,000 URLs and be no larger than 10MB when uncompressed.” Read more about Google Sitemaps.
Sitemaps are used to let spiders know about pages on the site that is hard for spiders to get to. Don’t just submit top-level directories.
Google also make use of the sitemap metadata. Priority and refresh cycle are important in telling G spider how often to crawl the page. Refresh cycle is useful in preventing Google from spidering the page too often.
Add your images and videos to the sitemap.
Add RSS feeds to the sitemap.
Validate your sitemap.
If you really need to know when the sitemap based crawling begins, then create a non-indexable page (having a NOINDEX robots META tag) which is not linked from anywhere, and put its URI in your sitemap.
Use relative paths for ErrorDocument to enable verification. Relative paths do not redirect. Read more about sitemap verificaion.
Google Sitemaps will not work with subdomains.

Google Sitemap Scripts

Halfdeck — Mon, 06 Mar 2006 12:34:58 +0000

I’m in the habit of writing own scripts when I can, and since I hear alot of positive things about Google Sitemaps, and I didn’t find a script I liked out there, I decided to write a script to crawl one of my domains. Right now, it’s site-specific, since I’m excluding certain paths (e.g. AVI, RSS, #, and certain directories), but I should be able to modify it so I can use it for all of my domains. Once I add an UI, anyone should be able to use it to generate a sitemap.

One thing I overlooked is that the generated sitemap XML is static, so lastmod will always be outdated. An article suggested making the XML file dynamic. For example, hardcode and re-evaluate $lastmod in the XML file header.
P.S. So far, Google seems to like the dynamic sitemap I created. There’s one URL that’s broken; Not sure where its coming from…since I deleted that link from the site; it could be from Google cache. Now I have no clue how to track down this broken link, since I never saw the error using Xenu.

By the way, To enable PHP parsing of XML files:

use: AddType application/x-httpd-php .php .htm .html .xml Note: You can do the same for .rss files and generate them dynamically.
echo the XML header, which has xml with which will be parsed as PHP.

One annoyance is XML with PHP parsing enabled won’t display correctly in NN or FF.

HTACCESS Code (General Purpose)

Halfdeck — Mon, 06 Mar 2006 11:58:44 +0000

This is basically what I use to manage dynamic URLs.

RewriteCond %{QUERY_STRING} ^(.+)$ [NC] RewriteRule ^(.*)$ - [F,L] Protects against bogus query strings attached to the end of a URL. This code is customized to issue a forbidden instead of a 404.
RewriteCond %{REQUEST_URI} ([a-z0-9]+)\.html$ [NC]
RewriteRule ^([a-z0-9]+)\.html /script.file?p_path=$1 [L] This allows dynamic xyz.html under a directory and block any links to xyz.html%20, xyz.html^20, etc.
I can also cover multiple “root files” (urls that are visible to visitors and should not take query strings):

RewriteCond %{QUERY_STRING} ^(.+)$ [NC]
RewriteCond %{REQUEST_URI} ^/dynamicpage1\.html
RewriteCond %{REQUEST_URI} ^/dynamicpage2\.html
RewriteRule ^(.*)$ - [F,L]

RewriteRule ^([^/]+)$ http://www.domain.com/path/$1/ [R=301,L] 301 redirect for path without /. I use this in cases where urls end in / (as opposed to say .html). Just a rewrite may work but use this if you’re worried about duplicates.
RewriteRule ^([^/]+)/$ /encryptedfile04ha8fksdasd.html?query=$1 [L] This rewrites URL to the actual dynamic URL. The html file name is encrypted to make it difficult to link directly to the actual dynamic page.

Additionally, in rare cases where I can’t catch bad query strings externally or I need to scan query strings for bad query values, I use this PHP 5+ code, where redirectfunction() is a custom function that will generate a 404 page:

foreach($_REQUEST as $key => $value) {
$$key = $value;
if(check for values) redirectfunction();
}

Last thing: In addition to encrypting dynamic pages, you might want to block them via robots.txt to prevent it from getting them indexed. Of course, doing so can expose that url to your competitors, so personally, I would just make sure no internal links point to the actual dynamic page.

More .htaccess / mod_rewrite links:

META Tag Analyzer Check List

Halfdeck — Mon, 06 Mar 2006 11:41:02 +0000

General things I look for when I run a page through a meta tag analyzer.

Title/descriptions/keyword are unique ENOUGH for each page. To be on the safe side, build pages with completely different titles/META descriptions with as few repeating words as possible.
Source code starts off with unique H1 and P
100 or less links on the page - There’s nothing wrong with having more than 100 links on the page, but if you want transfer as much PR as possible per link, its best to limit the number of links per page. Of course, if you have keyword rich outgoing links, this might balance out PR lost with excessive linking.
At least 20kb+ file size. (I’ve seen 700kb pages indexed, but that’s not a whole lot of spider food, and a change in the filter may bump them into the supplemental).

Handling Erroneous Dynamic URLS

Halfdeck — Mon, 06 Mar 2006 11:23:47 +0000

When creating dynamic pages, make sure you handle weird urls so that malicious linking doesn’t lead to the indexing of non-existent urls. Use HTTP header checker to see server responses. Catch all invalid requests with a custom 404 page. What URLS to check? Try to let .htaccess handle as much of the 404s as possible instead. PHP should be reserved for checking query values. You don’t want a chain of 301s where invalid URLs are returning 301s. If the bad urls already exist, I would replace the 404s with 301s below.

/index.html => 404
/index.htm => 404
/?junksdso => 404
// => 404
/index =>404
/%20 =>404 (One time a guy linked me with this and the url got indexed)
/scriptfile.html => 404 Direct access to the file that generates dynamic content should generate a 404. Better yet, rename the file to something difficult to crack. Don’t forget to delete unencrypted versions of files on your server.
404 any invalid query values. For example, if a page produce unique pages for values 1-5, a value of 6 should 404.
If you’re not using unencrypted files:
- /validurl.html?junk=query =>404
- /validurl.html?junk= should also 404.
/path 301=> /path/ Missing / for urls that end in / must 301 to url/. Yahoo will drop the /, so I can’t have that URL 404.
If a page is not dynamic, make sure page.html?bogusquery=string => 404
Capitalization should not matter. Use [NC] in Cond statements to avoid this.
urls with +, _ or - should be handled correctly.

Question: If I link to a url like this: index.html? (with nothing following the question mark) will Google index it?