searchlove london 2016 | dom woodman | how to get insight from your logs

Post on 07-Jan-2017

2.340 Views

Category:

Marketing

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

2009

Dominic Woodman
double back on this is so important - and you can do this
Dominic Woodman
focus here on those two things. being able to do it and why you should do it
Dominic Woodman
https://www.deepcrawl.com/knowledge/news/google-webmaster-hangout-notes-september-9th-2016/
Dominic Woodman
graph
Dominic Woodman
show disproportionate
Dominic Woodman
there's more detail here.
Dominic Woodman
too long
Dominic Woodman
mention mess
Dominic Woodman
raise hand and keep raised
Dominic Woodman
As a messy person, i can see that this is really just efficiency. The ironboard js next to the suitcase.
Dominic Woodman
I want you to raise your hand if you couldnt stand having the room your seeing in your house.If youd have to go over and do something about it.Wouldnt it be wonderful if any rooms in our houses looked that good?
Dominic Woodman
This is more likely. Were tryihg.
Dominic Woodman
one word
Dominic Woodman
move powerful to top possibly remove
Dominic Woodman
emphasis why you want do want it.
Dominic Woodman
possibly cut all this content
Dominic Woodman
time poor
Dominic Woodman
angle of structured business question - or it's easy to fish
Dominic Woodman
little work shows up interesting insights
Dominic Woodman
try putting on other data
Dominic Woodman
show all 5
Dominic Woodman
lost it
Dominic Woodman
double check deep crawl can't do this
Dominic Woodman
gifs
Dominic Woodman
shows table
Dominic Woodman
dont' talk abotu concept of a query language
Dominic Woodman
remove
Dominic Woodman
move this to after BQ
Dominic Woodman
kill sentence explanation
Dominic Woodman
too many bullets
Dominic Woodman
windows has stopped working
Dominic Woodman
bosy not lazy
Dominic Woodman
possibly rotate . backgorund
Dominic Woodman
remove this lside
Dominic Woodman
drop
Dominic Woodman
you might be missing logs
Dominic Woodman
site health
Dominic Woodman
should just be resources
Dominic Woodman
bible example
Dominic Woodman
15 words per line
Dominic Woodman
8700 bibles
Dominic Woodman
redo this section to mention possible use cases where you might use other things
Dominic Woodman
Move to after this section on tools
Dominic Woodman
Change of plan move to begiinninng as to why query languages are great.
Dominic Woodman
Turn these into tables ticking off the other points
Dominic Woodman
more pace change
Dominic Woodman
Lets travel back ibtime to 2009.

God it’s bad.

Dominic Woodman
And i had just got this truly twrrible haircut. I say haircut, but it really was the lack of it that was so shocking.

-$1.5 Billion

Dominic Woodman
larger pause
Dominic Woodman
there is a disconnect between what people say and how they behave - particularly if you're asking

Why hasn’t Google seen the changes on my page?

How should I prioritise errors in Search Console?

Are my canonicals being respected?

Does Google think this page is important?

Dominic Woodman
acknowledge we'll explain this

What can you do with logs?

PART 1: THE WHY

Getting logs

Analysing Logs

Processing Logs

PART 2: THE HOW

Dominic Woodman
sell this as hard as possible - this is the biggest possible opportunity

What is a log?

Dominic Woodman
watch out for mentions of log files
Dominic Woodman
says log files too much

What does a log look like?

123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

IP Address

What does a log look like?

123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

Timestamp

What does a log look like?

123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

Request type

Dominic Woodman
remove this explanation

What does a log look like?

123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

Homepage

What does a log look like?

123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

Protocol

Dominic Woodman
possibility of things
Dominic Woodman
you dont need to kniow all now,. hammer in how easy
Dominic Woodman
factually incorrect

What does a log look like?

123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

Status Code

What does a log look like?

123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

Size of the page (in bytes)

What does a log look like?

123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html))"

User Agent

What can you do with logs?

PART 1: THE WHY

Getting logs

Analysing Logs

Processing Logs

PART 2: THE HOW

5 things2 3 4 51

1 Diagnose crawling & indexation issues

2 3 4 51

Dominic Woodman
throwaway comments
Dominic Woodman
how many you of you

Number of requests

Five folders Googlebot crawled the most

Five folders Googlebot crawled the most

Number of requests

% of Organic sessions VS % of crawl budget

Sessions

Crawl budget

2 Prioritisation

2 3 4 51

example.com/article

Dominic Woodman
dont mention it's an example - obvious

Prioritizing

1

Full

Print

example.com/article/full

example.com/article/print

Prioritizing

2

example.com/article/pdf

Prioritizing

3

Dominic Woodman
fuzzy

Prioritizing

1

Full

Print

3 Spot bugs & view site health

2 3 4 51

Delayed errors with a limit of 1000

4 How important does Google see parts of your site?

2 3 4 51

My SEO was as bad as my design

Dominic Woodman
but my hair was better
Dominic Woodman
zoom in onhair chepa laugh

But at least my hair was better

teflsearch.com

teflsearch.com/job-results

teflsearch.com/job-results/country/china

teflsearch.com/jobadvert3455

Average number of times Googlebot crawled a template

1. teflsearch.com

2. teflsearch.com/job-results

3. teflsearch.com/job-results/country/china

4. teflsearch.com/job-advert3455

1. teflsearch.com

2. teflsearch.com/job-results

3. teflsearch.com/job-results/country/china

4. teflsearch.com/job-advert3455

teflsearch.com/job-results

Average number of times Googlebot crawled a template

35%

Dominic Woodman
label over 40% - perhasp extra lside
Dominic Woodman
new graph
Dominic Woodman
reverse graph order

5 How fresh does it think your content is?

2 3 4 51

Dominic Woodman
show more screen shots

bit.ly/moz-fresh

Average number of times a page template is crawled by Googlebot

Dominic Woodman
more detail on this
Dominic Woodman
more ephasis on this
Dominic Woodman
dotted red line
Dominic Woodman
make point of results
Dominic Woodman
clarity problem

●Improve our internal linking●Build trust with last modified date in sitemap

2 3 4 51

What can you do with logs?

PART 1: THE WHY

Getting logs

Analysing Logs

Processing Logs

PART 2: THE HOW

Talk to a developer and

ask for information

Are all the logs in one place?

Hi xI’m {x} from {y} and we’ve been asked to do some log analysis to understand better how Google is behaving on the website and I was hoping you could help with some questions about the log set-up (as well as with getting the logs!).What we’d ideally like is 3-6 months of historical logs for the website. Our goal is look at all the different pages search engines are crawling on our website, discover where they’re spending their time, the status code errors they’re finding etc. There are also some things that are really helpful for us to know when getting logs.Do the logs have any personal information in?We’re just concerned about the various search crawler bots like Google and Bing, we don’t need any logs from users, so any logs with emails, or telephone numbers etc. can be removed.Do you have any sort of caching which would create separate sets of logs?If there is anything like Varnish running on the server, or a CDN which might create logs in different location to the rest of your server? If so then we will need those logs as well as just those from the server. (Although we’re only concerned about a CDN if it’s caching pages, or serving from the same hostname; if you’re just using Cloudflare for example to cache external images then we don’t need it).Are there any sub parts of your site which log to a different place?Have you got anything like an embedded Wordpress blog which logs to a different location? If so then we’ll need those logs as well.Do you log hostname?It’s really useful for us to be able to see hostname in the logs. By default a lot of common server logging set-ups don’t log hostname, so if it’s not turned on, then it would be very useful to have that turned on now for any future analysis.Is there anything else we should know?Best,{x}

Email for a developer

So we might have something that looks like this

What can you do with logs?

PART 1: THE WHY

Getting logs

Analysing Logs

Processing Logs

PART 2: THE HOW

How should we analyse our

logs?

Dominic Woodman
possibly hammered to much

BigQuery

Dominic Woodman
also say why

BigQuery

Google’s online database for data

analysis.

1. Ask powerful questions2. Repeatable3. Scaleable4. Combine with crawl data5. Easy to set-up6. Easy to learn

What do we want from analysing our logs?

Dominic Woodman
quote how much it is
Dominic Woodman
moar pause
Dominic Woodman
emphasis once you've written it you can copy paste
Dominic Woodman
change gif
Dominic Woodman
use same book
Dominic Woodman
practice transition into this

9,000,000 rows of data for 2 months.

400 - 800 queries

What can you do with logs?

PART 1: THE WHY

Getting logs

Analysing Logs

Processing Logs

PART 2: THE HOW

Format the logs so we can import them into BigQuery

Separate the Googlebot logs from all the other logs

Screaming Frog Log Analyser Code something

Screaming Frog Log Analyser

Dominic Woodman
remove the other one
Dominic Woodman
change to video slide

Code something

bit.ly/logs-code

What can you do with logs?

PART 1: THE WHY

Getting logs

Analysing Logs

Processing Logs

PART 2: THE HOW

Our data in BQ

We make sure we got what we wanted

THE QUESTION: What is the total number of

requests Googlebot makes each day to our site?

Our first SQL query

SELECT timestampFROM [mydata.log_analysis]

Our first SQL query

SELECT timestampFROM [mydata.log_analysis]

Our first SQL query

SELECT DATE(timestamp)FROM [mydata.log_analysis]

Our first SQL query

SELECT DATE(timestamp)FROM [mydata.log_analysis]

Our first SQL query

SELECT DATE(timestamp) as dateFROM [mydata.log_analysis]

Our first SQL query

SELECT DATE(timestamp) as dateFROM [mydata.log_analysis]

Our first SQL query

SELECT DATE(timestamp) as date, count(*)FROM [mydata.log_analysis]

Our first SQL query

SELECT DATE(timestamp) as date, count(*)FROM [mydata.log_analysis]GROUP BY date

Our first SQL query

SELECT DATE(timestamp) as date, count(*) as number_of_requestsFROM [mydata.log_analysis]GROUP BY date

Our first SQL query

SELECT DATE(timestamp) as date, count(*) as number_of_requestsFROM [mydata.log_analysis]GROUP BY date

Comparing logs to GSC crawl volume

Number of requests

Dominic Woodman
put in similar slides

Run queries

Find something weird

Go look at crawl & website

Dominic Woodman
add visual interest
Dominic Woodman
icons

Our data in BQ

1 Diagnose crawling & indexation issues

2 Prioritisation

3 Spot bugs & view site health

4 How important does Google see parts of your site?

5 How fresh does it think your content is?

1 Diagnose crawling & indexation issues

4 How important does Google see parts of your site?

Dominic Woodman
loop back to beginning verbally

What are the top 20 URLs crawled by Google over our logs?

Dominic Woodman
make tiny sotries

Login is my top crawled page and then search?

What are the top 20 page_path_1 folders crawled by Google over our

logs?

Location folders are taking more than 70% of my budget

Getting data by the day

Page Number of Googlebot Requests

page1 200,000

page2 120,000

Number of Googlebot requests day by day

Dominic Woodman
add more lines

3 Spot bugs & view site health

How many of each status code does Google find per day over our

logs?

Number of Googlebot requests day by day

Dominic Woodman
stories

What are most requested 404 URLs by Googlebot over the past

30 days?

Boy does it want that ad-tech snippet

5 How fresh does it think your content is?

How many times on average is each page in a page template

crawled a day?

Average number of times a page template is crawled by Googlebot

How long does it take for a page to be discovered after being published?

Dominic Woodman
put to multiple slides

How long does it take for a page to be discovered after being published?What are the top 20 combinations of page_path_1 & path_path_2

folders crawled by Google over the time period of our logs?

How long does it take for a page to be discovered after being published?What are the top 20 combinations of page_path_1 & path_path_2

folders crawled by Google over the time period of our logs?Which pages have requests from Googlebot, which don’t appear in our

crawl?

How long does it take for a page to be discovered after being published?What are the top 20 combinations of page_path_1 & path_path_2

folders crawled by Google over the time period of our logs?Which pages have requests from Googlebot, which don’t appear in our

crawl?What are the top non-canonical pages being crawled?

How long does it take for a page to be discovered after being published?What are the top 20 combinations of page_path_1 & path_path_2

folders crawled by Google over the time period of our logs?Which pages have requests from Googlebot, which don’t appear in our

crawl?What are the top non-canonical pages being crawled?Which are most crawled parameters on the website?

How long does it take for a page to be discovered after being published?What are the top 20 combinations of page_path_1 & path_path_2

folders crawled by Google over the time period of our logs?Which pages have requests from Googlebot, which don’t appear in our

crawl?What are the top non-canonical pages being crawled?Which are most crawled parameters on the website?How often are the most visited parameters crawled each day?

How long does it take for a page to be discovered after being published?What are the top 20 combinations of page_path_1 & path_path_2

folders crawled by Google over the time period of our logs?Which pages have requests from Googlebot, which don’t appear in our

crawl?What are the top non-canonical pages being crawled?Which are most crawled parameters on the website?How often are the most visited parameters crawled each day?Which directories have the most 301 & 404 error codes?

How long does it take for a page to be discovered after being published?What are the top 20 combinations of page_path_1 & path_path_2

folders crawled by Google over the time period of our logs?Which pages have requests from Googlebot, which don’t appear in our

crawl?What are the top non-canonical pages being crawled?Which are most crawled parameters on the website?How often are the most visited parameters crawled each day?Which directories have the most 301 & 404 error codes?Which pages are crawled with parameters and without parameters?

How long does it take for a page to be discovered after being published?What are the top 20 combinations of page_path_1 & path_path_2

folders crawled by Google over the time period of our logs?Which pages have requests from Googlebot, which don’t appear in our

crawl?What are the top non-canonical pages being crawled?Which are most crawled parameters on the website?How often are the most visited parameters crawled each day?Which directories have the most 301 & 404 error codes?Which pages are crawled with parameters and without parameters?Which pages are only partly downloaded?How many hits does each section get, when the sections are classified in

an external dataset?

How long does it take for a page to be discovered after being published?What are the top 20 combinations of page_path_1 & path_path_2

folders crawled by Google over the time period of our logs?Which pages have requests from Googlebot, which don’t appear in our

crawl?What are the top non-canonical pages being crawled?Which are most crawled parameters on the website?How often are the most visited parameters crawled each day?Which directories have the most 301 & 404 error codes?Which pages are crawled with parameters and without parameters?Which pages are only partly downloaded?How many hits does each section get, when the sections are classified in

an external dataset?What percentage of a directory was crawled over the past 30 days?

How long does it take for a page to be discovered after being published?What are the top 20 combinations of page_path_1 & path_path_2

folders crawled by Google over the time period of our logs?Which pages have requests from Googlebot, which don’t appear in our

crawl?What are the top non-canonical pages being crawled?Which are most crawled parameters on the website?How often are the most visited parameters crawled each day?Which directories have the most 301 & 404 error codes?Which pages are crawled with parameters and without parameters?Which pages are only partly downloaded?How many hits does each section get, when the sections are classified in

an external dataset?What percentage of a directory was crawled over the past 30 days?What are the total number of requests across two different time periods?

That’s a lot of questions

bit.ly/logs-resource

bit.ly/logs-resource

bit.ly/logs-resource

bit.ly/logs-resource

In Summary

This is the thing you’re probably not doing

bit.ly/logs-resource@dom_woodman

Dominic Woodman
drive back down

bit.ly/logs-resource@dom_woodman

Dominic Woodman
drive back down

top related