google bot herding, pagerank sculpting and manipulation

30
© 2008 Stephan M Spencer Netconcepts www.netconcepts.com [email protected] Bot Herding presented by Stephan Spencer, Founder & President, Netconcepts

Upload: david-degrelle-consultant-seo-expert

Post on 06-May-2015

8.423 views

Category:

Technology


1 download

DESCRIPTION

Presentation from Stephan Spencer, Founder & President of Netconcepts and about Google Bot Herding, PageRank Sculpting and Manipulation.

TRANSCRIPT

Page 1: Google Bot Herding, PageRank Sculpting and Manipulation

© 2008 Stephan M Spencer Netconcepts www.netconcepts.com [email protected]

Bot Herding

presented by Stephan Spencer,Founder & President, Netconcepts

Page 2: Google Bot Herding, PageRank Sculpting and Manipulation

© 2008 Stephan M Spencer Netconcepts www.netconcepts.com [email protected]

Duplicate Content Mitigation

Dup content is rampant on blogs. Herd bots to permalink URL & lead in everywhere else (Archives by Date pages, Category pages, Tag pages, Home page, etc.) with paraphrased “Optional Excerpt”– Not just the first couple paragraphs, i.e. the <!--more--> tag!– Requires you to revise your Main Index Template theme file:

if (empty($post->post_excerpt) || is_single() || is_page()) { the_content(); } else { the_excerpt(); echo "<a href='”; the_permalink(); echo "' rel='nofollow'>Continue reading &raquo;</a>"; }

Page 3: Google Bot Herding, PageRank Sculpting and Manipulation

© 2008 Stephan M Spencer Netconcepts www.netconcepts.com [email protected]

Duplicate Content Mitigation

Include sig line (& headshot photo!) at bottom of post/article. Link to original article/post permalink URL!– http://www.naturalsearchblog.com/archives/2008/06/03/

syndicating-your-articles/– http://www.businessblogconsulting.com/2008/05/brand-

yourself-with-photo-sig-line

Page 4: Google Bot Herding, PageRank Sculpting and Manipulation

© 2008 Stephan M Spencer Netconcepts www.netconcepts.com [email protected]

Duplicate Content Mitigation

On ecommerce sites, dup content also rampant:– Manufacturer-provided product descriptions, inconsistent order

of query string parameters, “guided navigation”, pagination within categories, tracking parameters

Selectively append tracking codes for humans w/ “white hat cloaking” or use JavaScript to append the codes– REI.com used to append a "vcat" parameter on all brand links

on their Shop By Brand page (see http://web.archive.org/web/20060823085548/www.rei.com/rei/sales_and_events/brands.html)

Page 5: Google Bot Herding, PageRank Sculpting and Manipulation

© 2008 Stephan M Spencer Netconcepts www.netconcepts.com [email protected]

Pagination

Not only creates many pages that share the same keyword theme, also very large categories with thousands of products result in hundreds of pages of product listings not getting crawled. Thus lowered product page indexation.

Herd bots through keyword-rich subcat links or “View All” link or both? How to display page number links? Optimal # of products to display/link per page? Test!

Page 6: Google Bot Herding, PageRank Sculpting and Manipulation

© 2008 Stephan M Spencer Netconcepts www.netconcepts.com [email protected]

PageRank Leakage?

If you’re using Robots.txt Disallow, you’re probably leaking PageRank

Robots.txt Disallow & Meta Robots Noindex both accumulate and pass PageRank– Meta Noindex tag on a Master sitemap page will de-index the

page but still pass PageRank to linked sub-sitemap pages Meta Robots Nofollow blocks the flow of PageRank

– http://www.stonetemple.com/articles/interview-matt-cutts.shtml

Page 7: Google Bot Herding, PageRank Sculpting and Manipulation

© 2008 Stephan M Spencer Netconcepts www.netconcepts.com [email protected]

Rewriting Spider-Unfriendly URLs

3 approaches:1) Use a “URL rewriting” server module / plugin – such as

mod_rewrite for Apache, or ISAPI_Rewrite for IIS Server2) Recode your scripts to extract variables out of the “path_info”

part of the URL instead of the “query_string”3) Or, if IT department involvement must be minimized, use a

proxy server based solution (e.g. Netconcepts' GravityStream)– With (1) and (2), replace all occurrences of your old URLs in

links on your site with your new search-friendly URLs. 301 redirect the old to new URLs too, so no link juice is lost.

Page 8: Google Bot Herding, PageRank Sculpting and Manipulation

© 2008 Stephan M Spencer Netconcepts www.netconcepts.com [email protected]

mod_rewrite – the Foundation for URL Rewriting, Remapping & Redirecting Works with Apache and IBM HTTP Server Place “rules” within .htaccess or your Apache config file

(e.g. httpd.conf, sites_conf/…)– RewriteEngine on– RewriteBase /– RewriteRule ^products/([0-9]+)/?$ /get_product.php?id=$1 [L]– RewriteRule ^([^/]+)/([^/]+)\.htm$

/webapp/wcs/stores/servlet/ProductDisplay?storeId=10001&catalogId=10001&langId=-1 &categoryID=$1&productID=$2 [QSA,P,L]

Page 9: Google Bot Herding, PageRank Sculpting and Manipulation

© 2008 Stephan M Spencer Netconcepts www.netconcepts.com [email protected]

Regular Expressions

The magic of regular expressions / pattern matching– * means 0 or more of the immediately preceding character– + means 1 or more of the immediately preceding character– ? means 0 or 1 occurrence of the immediately preceding char – ^ means the beginning of the string, $ means the end of it– . means any character (i.e. wildcard) – \ “escapes” the character that follows, e.g. \. means dot– [ ] is for character ranges, e.g. [A-Za-z]. – ^ inside [] brackets means “not”, e.g. [^/]

Page 10: Google Bot Herding, PageRank Sculpting and Manipulation

© 2008 Stephan M Spencer Netconcepts www.netconcepts.com [email protected]

Regular Expressions

– () puts whatever is wrapped within it into memory– Access what’s in memory with $1 (what’s in first set of

parens), $2 (what’s in second set of parens), and so on Gotchas to beware of:

– “Greedy” expressions. Use [^ instead of .* – .* can match on nothing. Use .+ instead– Unintentional substring matches because ^ or $ wasn’t

specified

Page 11: Google Bot Herding, PageRank Sculpting and Manipulation

© 2008 Stephan M Spencer Netconcepts www.netconcepts.com [email protected]

mod_rewrite Specifics

Proxy page using [P] flag– RewriteRule /blah\.html$ http://www.google.com/ [P]

[QSA] flag is for when you don’t want query string params dropped (like when you want a tracking param preserved)

[L] flag saves on server processing Got a huge pile of rewrites? Use RewriteMap and have

a lookup table as a text file

Page 12: Google Bot Herding, PageRank Sculpting and Manipulation

© 2008 Stephan M Spencer Netconcepts www.netconcepts.com [email protected]

IIS? ISAPI_Rewrite!

What if your site is running Microsoft IIS Server? ISAPI_Rewrite plugin! Not that different from mod_rewrite In httpd.ini :

– [ISAPI_Rewrite]RewriteRule ^/category/([0-9]+)\.htm$ /index.asp?PageAction=VIEWCATS&Category=$1 [L]

– Will rewrite a URL like http://www.example.com/index.asp?PageAction=VIEWCATS&Category=207 to something like http://www.example.com/category/207.htm

Page 13: Google Bot Herding, PageRank Sculpting and Manipulation

© 2008 Stephan M Spencer Netconcepts www.netconcepts.com [email protected]

Implementing 301 Redirects Using Redirect Directives In .htaccess (or httpd.conf), you can redirect individual

URLs, the contents of directories, entire domains… :– Redirect 301 /old_url.htm

http://www.example.com/new_url.htm– Redirect 301 /old_dir/ http://www.example.com/new_dir/– Redirect 301 / http://www.example.com

Pattern matching can be done with RedirectMatch 301– RedirectMatch 301 ^/(.+)/index\.html$

http://www.example.com/$1/

Page 14: Google Bot Herding, PageRank Sculpting and Manipulation

© 2008 Stephan M Spencer Netconcepts www.netconcepts.com [email protected]

Implementing 301 Redirects Using Rewrite Rules Or use a rewrite rule with the [R=301] flag

– RewriteCond %{HTTP_HOST} !^www\.example\.com$ [NC]– RewriteRule ^(.*)$ http://www.example.com/$1

[L,QSA,R=301] [NC] flag makes the rewrite condition case-insensitive

Page 15: Google Bot Herding, PageRank Sculpting and Manipulation

© 2008 Stephan M Spencer Netconcepts www.netconcepts.com [email protected]

Conditional Redirects Conditional 301 for bots – great for capturing the link juice

from inbound affiliate links Only works if you manage your own affiliate program Most are outsourced and 302 (e.g. C.J.) By outsourcing your affiliate marketing, none of your deep

affiliate links are counting If Amazon’s doing it, why can’t you?

– (Credit to Brian Klais for hypothesizing Amazon was doing this)– http://tinyurl.com/5ubc28

Page 16: Google Bot Herding, PageRank Sculpting and Manipulation

© 2008 Stephan M Spencer Netconcepts www.netconcepts.com [email protected]

Status Code200 for humans

Page 17: Google Bot Herding, PageRank Sculpting and Manipulation

© 2008 Stephan M Spencer Netconcepts www.netconcepts.com [email protected]

301 for all bots.Muahaha!!

Page 18: Google Bot Herding, PageRank Sculpting and Manipulation

© 2008 Stephan M Spencer Netconcepts www.netconcepts.com [email protected]

Implementing Conditional Redirects Using Rewrite Rules Selectively redirect bots that request URLs with session

IDs to the URL sans session ID:– RewriteCond %{QUERY_STRING} PHPSESSID

RewriteCond %{HTTP_USER_AGENT} Googlebot.* [OR] RewriteCond %{HTTP_USER_AGENT} ^msnbot.* [OR] RewriteCond %{HTTP_USER_AGENT} Slurp [OR] RewriteCond %{HTTP_USER_AGENT} Ask\ Jeeves RewriteRule ^/(.*)$ /$1 [R=301,L]

Utilize browscap.ini instead of having to keep up with each spider’s name and version changes

Page 19: Google Bot Herding, PageRank Sculpting and Manipulation

© 2008 Stephan M Spencer Netconcepts www.netconcepts.com [email protected]

Error Pages

Page 20: Google Bot Herding, PageRank Sculpting and Manipulation

© 2008 Stephan M Spencer Netconcepts www.netconcepts.com [email protected]

Error Pages Traditional approach is to serve up a 404, which drops that error

page with the obsolete or wrong URL out of the search indexes. This squanders the link juice to that page.

But what if you return a 200 status code instead, so that the spiders follow the links! Then include a meta robots noindex so the error page itself doesn’t get indexed.

Or do a 301 redirect to something valuable (e.g. your home page) and dynamically include a small error notice

(Credit to Francois Planque for this clever approach.)

Page 21: Google Bot Herding, PageRank Sculpting and Manipulation

© 2008 Stephan M Spencer Netconcepts www.netconcepts.com [email protected]

URL Stability

An annually recurring feature, like a Holiday Gift Buying Guide, should have a stable, date-unspecified URL– No need for any 301s– When the current edition is to be retired and replaced with a

new edition, assign a new URL to the archived edition Otherwise link juice earned over time is not carried over

to future years’ editions

Page 22: Google Bot Herding, PageRank Sculpting and Manipulation

© 2008 Stephan M Spencer Netconcepts www.netconcepts.com [email protected]

URL Testing URL affects

searcherclickthroughrates

Short URLsget clicked on2X long URLs

(Source: MarketingSherpa, used with permission)

Page 23: Google Bot Herding, PageRank Sculpting and Manipulation

© 2008 Stephan M Spencer Netconcepts www.netconcepts.com [email protected]

URL Testing Further, long URLs appear to act as a deterrent to clicking,

drawing attention away from its listing and instead directing it to the listing below it, which then gets clicked 2.5x more frequently. – http://searchengineland.com/080515-084124.php

Don’t be complacent with search-friendly URLs. Test and optimize. Make iterative improvements to URLs, but don’t lose link juice to

previous URLs. 301 previous URLs to latest. No chains of 301s. WordPress handles 301s automatically when renaming post slugs Mass editing URLs (post slugs) in WordPress – announcement

tomorrow in Give It Up session

Page 24: Google Bot Herding, PageRank Sculpting and Manipulation

© 2008 Stephan M Spencer Netconcepts www.netconcepts.com [email protected]

Yank Competitor’s Grouped Results from Google page 1 SERPs Knock out your competitor’s second indented (grouped)

listing by directing link juice to other non-competitive listings (e.g. on page 2 SERPs, or directly below indented result’s true position)

First, find the true position of their indented result by appending &num=9 to the URL and see if the indented listing drops off. If not, append &num=8. Rinse and repeat until the indented listing falls away. Indented listing is more susceptible the worse its true position.

Page 25: Google Bot Herding, PageRank Sculpting and Manipulation

© 2008 Stephan M Spencer Netconcepts www.netconcepts.com [email protected]

This isn’t really #3

Page 26: Google Bot Herding, PageRank Sculpting and Manipulation

© 2008 Stephan M Spencer Netconcepts www.netconcepts.com [email protected]

Nope, not yet

Page 27: Google Bot Herding, PageRank Sculpting and Manipulation

© 2008 Stephan M Spencer Netconcepts www.netconcepts.com [email protected]

Gone! It’s true position was #9

Page 28: Google Bot Herding, PageRank Sculpting and Manipulation

© 2008 Stephan M Spencer Netconcepts www.netconcepts.com [email protected]

SEO the title of #12 to bump it up to page

1 – it will be grouped to

#2. Then link to #11

and bump it up to page 1 to knock

#4 to page 2

Page 29: Google Bot Herding, PageRank Sculpting and Manipulation

© 2008 Stephan M Spencer Netconcepts www.netconcepts.com [email protected]

More Things I Wish I Had Time to Cover Robots.txt gotchas Webmaster Central tools (www vs no www, crawl rate, robots.txt

builder, Sitemaps, etc.) Yahoo's Dynamic URLs tab in Site Explorer <div class="robots-nocontent"> If-Modified-Since Status codes 404, 401, 500 etc. PageRank transfer from PDFs, RSS feeds, Word docs etc. Diagnostic tools (e.g. livehttpheaders, User Agent Switcher)

Page 30: Google Bot Herding, PageRank Sculpting and Manipulation

© 2008 Stephan M Spencer Netconcepts www.netconcepts.com [email protected]

Thanks!

This Powerpoint can be downloaded from www.netconcepts.com/learn/bot-herding.ppt

For 180 minute long screencast (including 90 minutes of Q&A) on SEO for large dynamic websites (taught by myself and Chris Smith) – including transcripts – email [email protected]

Questions after the show? Email me at [email protected]