brightonseo 5 critical questions your log files can answer september 2016

39
LOG FILE ANALYSIS 5 CRITICAL TECH SEO QUESTIONS YOUR LOGS CAN ANSWER #BrightonSEO | @SearchMATH

Upload: mark-thomas

Post on 07-Jan-2017

3.230 views

Category:

Internet


0 download

TRANSCRIPT

Page 1: BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016

LOG FILE ANALYSIS

5 CRITICAL TECH SEO QUESTIONS YOUR LOGS CAN ANSWER

#BrightonSEO | @SearchMATH

Page 2: BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016

#BrightonSEO

As used by…

About Botify

Page 3: BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016

Here’s the problem…

> Google doesn’t crawl every page of your website>> If a page isn’t crawled it won’t be indexed>>> If a page isn’t indexed, it won’t make you money

Page 4: BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016

Identify Desired Outcomes and

Objectives

Information Gathering

Action Planning

Implementation and Review

New Initiative Planning Process

This presentation will focus on the “Information Gathering” stage of the

process.

Page 5: BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016

Identify Desired Outcomes and

Objectives

Information Gathering

Action Planning

Implementation and Review

New Initiative Planning ProcessDawn Anderson’s slide-deck “BRINGING IN THE

FAMILY DURING CRAWLING” is an insightful guide to help you identify crawl budget

opportunities. Dawn also suggests powerful actions you should explore.

Page 6: BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016

Log File 101

Page 7: BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016

Hypertext Transfer Protocol (HTTP)

ClientServer

HTTP RequestGET /index.html HTTP/1.1 Host: www.exampleshop.comUser-Agent: Mozilla 5.0

HTTP ResponseHTTP/1.1 200 OKDate: Mon, 11 Jul 2016 08:06:45 GMTServer: Apache/1.3.27 (Unix) (Red-Hat/Linux)Last-Modified: Wed, 04 Feb 2016 23:11:55 GMTEtag: “3f84f-1b9-3elcd16b”Accept-Ranges: bytes Content-Length: 458Connection: closeContent-Type: text/html; charset=UTF-8

Fig 1: HTTP Client/Server Communication

This is a standard HTTP/1.1 exchange between Client (e.g. Browser or Googlebot) & your Server.

Page 8: BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016

Server Log Files

Server

188.65.114.122 - - [19/Jul/2016:08:07:05 -0400] "GET /women/shoes/ converse14579/ HTTP/1.1" 200 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"charset=UTF-8

Server IPTimestamp (date & time)Method (GET / POST)Request URI HTTP status codeUser-agent

Fig 2: Example Server OutputWHO’S REQUESTING? |

WHEN? | HOW?

WHAT FILE?

SERVER RESPONSE Server Logs are the SINGLE SOURCE OF TRUTH when it comes to seeing how search engines, such as Googlebot,

assess your website.

Your webserver keeps a file of every hit the server receives during the exchange on the previous slide. Your very own data treasure chest.

Page 9: BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016

“[Cleanup your architecture because] we get lost crawling unnecessary URLs and we might not be able to crawl and index your new and updated content as quickly as we would otherwise… There are a number of crawlers you can use to crawl your website on your own, to run across your website.” Google Webmaster Central office hours hangout, 16 Oct 2015

@JohnMuCrawl your

website with a THIRD-PARTY

CRAWLER

@JohnMuConduct

LOG FILE ANALYSIS

Page 10: BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016

How does Log Files Analysis differ to Web Crawl Analysis?

Page 11: BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016

Home

Category

Subcategory

Detail

Web CrawlSystematically fetch, retrieve, and validate the HTML on every page of your website to simulate

Googlebot’s/Bingbot’s analysis of your pages

_______________

_______________

_______________

_______________

_______________

_______________

_______________

_______________

_______________

_______________

_______________

_______________

_______________

_______________

_______________

_______________

__________________________________________

______________

Let’s consider how the information is collected…

This is great for optimising your HTML code and helps you try and produce a best

in class website.

Page 12: BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016

But that’s not how search engines operate and crawling alone lacks the evidence to back up your strategy.

For example, Googlebot might enter through a popular category and crawl the same pages time after time. Search Console won’t tell you this and

neither does simulating a crawl from your homepage.

So, you need to crawl your architecture and compare the data to Google’s activity (via your log files) to gain an insight into how you’ll get more of your

money making pages crawled and indexed.

Page 13: BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016

What barriers do people face when trying to study this vital information?

• Access to Server Logs• File Sizes• Misplacing trust in Search Console• Time required to process the data

But I don’t think you should be deterred and here’s why…

Page 14: BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016

Accessing your logs is simpler than you think. Your organisation is probably already using them.

Common Log Analysis use cases for eCommerce organisations include:>> Application Management>> Access Management>> Network Forensics>> Compliance   

Popular products used by Applications and Security teams at major Enterprise companies include: LogRhythm, Loggly, and Splunk.

Page 15: BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016

Splunk (a log file storage and processing company): Market Cap $8.6bn, 11,000 Customers

Page 16: BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016

http://www.slideshare.net/Splunk/splunklive-london-john-lewis

This is a picture from a presentation I watched at SplunkLive in London 2016. John Lewis visualise their operational intelligence from log files. You can get your logs!!

Page 17: BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016

It’s true that the volume of data involved can make working with the files prohibitive.

For example, if a site receives 50,000 visitors a day browsing an average of 5 pages per session, that’s 250,000 log entries per day for the HTML

7.5M entries per monthNow add 10 assets requested from the server for

each page:75,000,000 lines in your Log Files per month

Page 18: BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016

SEOs regularly monitor and trend site architecture data (HTTP codes, etc.) in third-party apps

but it’s not possible to scrutinise Search Console’s crawling and indexing charts, but you really should.

Page 19: BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016

So, how is engineering helping us overcome these barriers and expand our knowledge?

>>> Secure File Transfer Protocol (SFTP)>>> Storing and trending Log Data thanks to cloud services>>> Processing Automation (saving TIME)>>> Diffing Log Data with Simulated Crawl

Data for greater insights 

Page 20: BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016

Let’s move onto the questions I think you should be looking to answer.

Page 21: BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016

What are the typical questions SEOs try and answer with Log Analysis?

• Where do I have accessibility errors?• Which pages are being spidered most frequently?• Is spammer activity proving detrimental to performance?• Which pages haven’t been crawled by search engines?

And these are all very valid and helpful but I suggest looking at the next list too…

Page 22: BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016

# 5 Critical Questions / KPIs Score

1 What is my ‘Crawl Ratio’?

2 What percentage of my compliant pages (2xx & unique) will Google crawl each month?

3 How deep will Google crawl into my site architecture?

4 What does Google consider to be my Top, Middle and Long Tail pages?

5 What is my ‘Crawl Window’ score?

HOW MANY MORE PAGES NOW HAVE THE POTENTIAL TO MAKE US MONEY? (THANKS TO MY EFFORTS OVER THE PAST 30 DAYS?)

INDEX

Page 23: BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016

Crawl Definition Score

Crawl Rate requests per second Googlebot makes to your site when it is crawling it

Crawl Budget the maximum number of pages that Google crawls on a website

Crawl Frequency program determining which sites to crawl, how often, and how many pages to fetch from each site

Crawl Rank the frequency a page is crawled compared with the ranking position of that page

Crawl Space the totality of possible URLs for a website

Crawl Ratio the percentage of my website structure Google is crawling every 30 days

Crawl Window the percentage of the compliant (unique & 200) pages on my website Google usually crawls in a 14 day period

I’ve mentioned a few terms you might not be familiar with so here’s a list of old friends with a couple of new additions.

Page 24: BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016

Critical Question 1 – What is my ‘Crawl Ratio’?

Page 25: BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016

Crawl Ratio: the percentage of my website structure Google is crawling every 30 days

Total Pages in the website structure crawled by Google in 30 days

Total Pages in the website structurex100

Page 26: BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016

Organic Growth Opportunities

LifestylePublisher

Business Equipment Retail Real EstateClassified

The Venn diagram clearly illustrates the mis-match between the URLs you hope Google is looking at with the accurate picture from your server logs.

Page 27: BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016

Critical Question 2 What percentage of my compliant pages

(200 & unique) will Google crawl each month?

Page 28: BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016

% of key pages crawled

Total Compliant Pages Crawled By Google in 30 daysTotal Compliant Pages in the website structure x100

%

Potential

Page 29: BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016

LifestylePublisher

76.4%

Business Equipment Retail92.2%

Real EstateClassified

42%

These examples reflect just how varied Google’s crawling of compliant pages can be.

Organic Growth Measure

Page 30: BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016

Critical Question 3 – how deep will Google crawl into my website

architecture?

Page 31: BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016

What depths will Google plunge?

LifestylePublisher

Business Equipment Retail Real EstateClassified

This chart indicates the correlation between the depth of your content and Google’s crawling activity

Page 32: BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016

LifestylePublisher

Business Equipment Retail Real EstateClassified

This chart indicates Google's crawl rate (URL crawled or not by any bot) by Internal Pagerank

How can I more effectively use Pagerank to increase visibility?

Page 33: BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016

Critical Question 4 – what does Google consider to be my Top, Middle and Long

Tail pages?

Page 34: BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016

This graph details visits frequency from Google search result pages for all URLs analysed by the crawler: how often URLs get

organic visits from Google

LifestylePublisher

Business Equipment Retail Real EstateClassified

Page 35: BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016

Then compare Organic Traffic with a measure of how often URLs are crawled by any Google bot. 

Increase your Middle Tail

LifestylePublisher

Business Equipment Retail Real EstateClassified

Page 36: BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016

Critical Question 5 – what is my ‘Crawl Window’?

Page 37: BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016

Crawl Window: the percentage of my compliant URLs Google usually crawls in a 14 day period*

When a change appears on the website, either voluntary or involuntary, understanding your Crawl Window value will help you know precisely how long it will take to identify a positive/negative impact.

*This is a simplified calculation of Botify’s Crawl Window metric.

Real EstateClassified

25.5%

Business Equipment Retail80.8%

LifestylePublisher

66.3%

Page 38: BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016

# 5 Critical Questions / KPIs Score

1 What is my ‘Crawl Ratio’?

2 What percentage of my compliant pages (2xx & unique) will Google crawl each month?

3 How deep will Google crawl into my site architecture?

4 What does Google consider to be my Top, Middle and Long Tail pages?

5 What is my ‘Crawl Window’ score?

HOW MANY MORE PAGES NOW HAVE THE POTENTIAL TO MAKE US MONEY? (THANKS TO MY EFFORTS OVER THE PAST 30 DAYS?)

INDEX

You might find this checklist helpful.

Page 39: BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016

THANK YOU!

Take a Free Trial via www.botify.com

#BrightonSEO | @SearchMATH