google sitemaps case study eric papczun ses chicago bulk submit 2.0 december 5 th, 2006

17
Google Sitemaps Case Study Eric Papczun SES Chicago Bulk Submit 2.0 December 5 th , 2006

Upload: charlene-ray

Post on 01-Jan-2016

236 views

Category:

Documents


0 download

TRANSCRIPT

Google Sitemaps Case Study

Eric PapczunSES Chicago

Bulk Submit 2.0December 5th, 2006

2

Building & Publishing Your Sitemap

• Get a complete and accurate list of URLs– If you have a small site, you can manually build the list– You will need some help with larger sites– Google has a list of tools available, including the Sitemap Generator,

which will generate URLs from your access log files– Be careful to not add noise to the crawl by supplying multiple URL’s to

the same page

• Convert file to the XLM protocol

• Pick your verification method– Add a META tag to the head of the home page

• <META name="verify-v1" content=“YTuWr8du6ftURLETtEP/qaCFXJbrfUu62IufL6Pa/mmI=" />

– Upload a HTML file to the root directory• googlea4ee4a78316cab4e.html

3

Now that it’s Up

• Sitemaps usually picked up within 1-2 days

• The entire sitemap is crawled in 3-14 days

• The average time for a full crawl is 7 days

• Small sites with low PageRank will take longer– Refresh content regularly

– Add external links if site is new

4

Sitemap Management Tips

• Have an optimized native sitemap– Link to it in your global footer

• Focus the crawler on the right content by excluding:– Redundant Content (printer friendly pages)– Disembodied Content (like flash objects)– Spammy Stuff

• Use “preferred domain” tool to tell Google if you want

www.domain.com or domain.com to appear in search

results

• Include separate sitemap for news and mobile content

5

What to Expect

• We’ve seen two effects from index optimization– The number of pages indexed goes up

• Large specialty retailer with ugly, parameter filled URLs

– Indexation went from 61,000 URLs to 133,000 URLs

– …Or the number of pages indexed goes down• Very large retailer with multiple URLs per page

– Indexation went from 2,500,000 URLs to 520,000 URLs

• Both instances are successes

• Google sitemaps is just a tool, use it to help you

accomplish your objectives

6

Selecting URLs for More Frequent Crawls

• Using the crawl priority XML tag allows you to tell

Google what pages are most important

• We use this tag to spotlight:– Frequently updated pages

• Category and section pages• Recently optimized pages

– New pages• New product & content pages• News releases• Promotions

• We have found that Google is responsive to this tag,

crawling these URLs on subsequent crawls

7

Handling Crawl Errors

• HTTP Errors

- syntax not understood

• 404 – Page not found

- Broken link or bad URL

• URL not followed

• Redirect & cookie errors

• Robots.txt restricted

• Timed out

• Unreachable robots.txt file

8

404 – URL Not Found Errors

Google could not find these pages

These will probably be the majority of your crawl errors

Problems tend to be:

-Typo in URL

-Broken site link

-Your server returned the error

9

URLs Restricted by Robots.txt

Make sure that these are pages you want blind to searchers

In this example, one URL is a news release (great for searchers), while others are flash files (bad experiences on there own)

10

Unreachable URLs

Robots.txt file could not be reached by Google

Google assumes that if you don’t have a robots.txt file then you are OK with a full crawl of the site

In this example, the problem is the URL was redirected to a domain that could not be accessed by the crawler

Tool can be found at: http://www.internetofficer.com/redirect-check.html

11

Crawl Rate Report

Number of Pages Crawled per Day

Number of Kilobytes Download per Day

Downloading Times

12

Pages Crawled per Day

On large dynamic sites we typically see about 5% of the site called per visit

13

Kilobytes Downloaded per Day

Number of Kilobytes Download per Day

14

Downloading Times

Time it takes (in ms) for Google to download your pages

Might help you spot performance issues

15

Advanced Image Search

Opt in service to provide better metadata for images

16

Yahoo! Site Explorer

• Shows pages indexed

• Submit and track feeds for your sites (RSS, Atom and

URL lists)

• Great tool to show internal and external links

• Like Google’s tool this is constantly evolving

Thank You for Your Time!

Eric PapczunSES Chicago

Bulk Submit 2.0December 5th, 2006