site search analytics workshop presentation
DESCRIPTION
Workshop presented at Webdagene 2013 (http://webdagene.no/en/) September 9, 2013; UX Lisbon (http://www.ux-lx.com), May 12, 2011; UX Hong Kong (http://www.uxhongkong.com/), February 17, 2011.TRANSCRIPT
Workshop:Search Analytics for Your Site
Louis Rosenfeld
[email protected] • @louisrosenfeld
Webdagene • 9 September 2013
Hello, my name is Lou
www.louisrosenfeld.com | www.rosenfeldmedia.com
Agenda
1.The basics of Site Search Analytics (SSA)2.Exercise 1 (pattern analysis)3.Things you can do with SSA4.Exercise 2 (longitudinal analysis5.More things you can do with SSA6.A case study7.More on metrics8.Things you can do today9.Discussion
Let’s look at the data
No, let’s look at the real dataCritical elements in bold: IP address, time/date stamp, query, and # of
results:
XXX.XXX.X.104 - - [10/Jul/2011:10:25:46 -0800] "GET /search?access=p&entqr=0&output=xml_no_dtd&sort=date%3AD%3AL%3Ad1&ud=1&site=AllSites&ie=UTF-8&client=www&oe=UTF-8&proxystylesheet=www&q=lincense+plate&ip=XXX.XXX.X.104 HTTP/1.1" 200 971 0 0.02
XXX.XXX.X.104 - - [10/Jul/2011:10:25:48 -0800] "GET /searchaccess=p&entqr=0&output=xml_no_dtd&sort=date%3AD%3AL%3Ad1&ie=UTF-8&client=www&q=license+plate&ud=1&site=AllSites&spell=1&oe=UTF-8&proxystylesheet=www&ip=XXX.XXX.X.104 HTTP/1.1" 200 8283 146 0.16
No, let’s look at the real dataCritical elements in bold: IP address, time/date stamp, query, and # of
results:
XXX.XXX.X.104 - - [10/Jul/2011:10:25:46 -0800] "GET /search?access=p&entqr=0&output=xml_no_dtd&sort=date%3AD%3AL%3Ad1&ud=1&site=AllSites&ie=UTF-8&client=www&oe=UTF-8&proxystylesheet=www&q=lincense+plate&ip=XXX.XXX.X.104 HTTP/1.1" 200 971 0 0.02
XXX.XXX.X.104 - - [10/Jul/2011:10:25:48 -0800] "GET /searchaccess=p&entqr=0&output=xml_no_dtd&sort=date%3AD%3AL%3Ad1&ie=UTF-8&client=www&q=license+plate&ud=1&site=AllSites&spell=1&oe=UTF-8&proxystylesheet=www&ip=XXX.XXX.X.104 HTTP/1.1" 200 8283 146 0.16
What are users searching?
No, let’s look at the real dataCritical elements in bold: IP address, time/date stamp, query, and # of
results:
XXX.XXX.X.104 - - [10/Jul/2011:10:25:46 -0800] "GET /search?access=p&entqr=0&output=xml_no_dtd&sort=date%3AD%3AL%3Ad1&ud=1&site=AllSites&ie=UTF-8&client=www&oe=UTF-8&proxystylesheet=www&q=lincense+plate&ip=XXX.XXX.X.104 HTTP/1.1" 200 971 0 0.02
XXX.XXX.X.104 - - [10/Jul/2011:10:25:48 -0800] "GET /searchaccess=p&entqr=0&output=xml_no_dtd&sort=date%3AD%3AL%3Ad1&ie=UTF-8&client=www&q=license+plate&ud=1&site=AllSites&spell=1&oe=UTF-8&proxystylesheet=www&ip=XXX.XXX.X.104 HTTP/1.1" 200 8283 146 0.16
What are users searching?
How often are users failing?
SSA is semantically rich data, and...
SSA is semantically rich data, and...
Queries sorted by frequency
...what users want--in their own words
A little goes a long wayA handful of queries/tasks/ways to navigate/features/ documents meet the needs of your most important audiences
A little goes a long wayA handful of queries/tasks/ways to navigate/features/ documents meet the needs of your most important audiences
Not all queries are distributed equally
A little goes a long wayA handful of queries/tasks/ways to navigate/features/ documents meet the needs of your most important audiences
A little goes a long wayA handful of queries/tasks/ways to navigate/features/ documents meet the needs of your most important audiences
Nor do they diminish gradually
A little goes a long wayA handful of queries/tasks/ways to navigate/features/ documents meet the needs of your most important audiences
A little goes a long wayA handful of queries/tasks/ways to navigate/features/ documents meet the needs of your most important audiences
80/20 rule isn’t quite accurate
(and the tail is quite long)
(and the tail is quite long)
(and the tail is quite long)
(and the tail is quite long)
(and the tail is quite long)
(and the tail is quite long)The Long Tail is
much longer than you’d suspect
The Zipf Distribution, textually
Insert Long Tail here
Agenda
1.The basics of Site Search Analytics (SSA)2.Exercise 1 (pattern analysis)3.Things you can do with SSA4.Exercise 2 (longitudinal analysis5.More things you can do with SSA6.A case study7.More on metrics8.Things you can do today9.Discussion
Exercise 1 (pattern analysis)
Work in pairs• Each pair should have a laptop with
Microsoft Excel• Laptop platform (Mac, PC) doesn’t matter
Download data files: 2005-October.xls
Refer to exercise sheetNo right answersHave fun!
Agenda
1.The basics of Site Search Analytics (SSA)2.Exercise 1 (pattern analysis)3.Things you can do with SSA4.Exercise 2 (longitudinal analysis5.More things you can do with SSA6.A case study7.More on metrics8.Things you can do today9.Discussion
Tune site-wide navigation
Nailing the basics in top-down navigation
Nailing the basics in top-down navigation
Tune contextual navigation
Start with basic SSA data: queries and query frequency
Percent: volume of search activity for a unique query during a particular time period
Cumulative Percent: running sum of percentages
Tease out common content types
Tease out common content types
Tease out common content types
Took an hour to...• Analyze top 50 queries (20% of all search activity)
• Ask and iterate: “what kind of content would users be looking for when they searched these terms?”
• Add cumulative percentages
Result: prioritized list of potential content types#1) application: 11.77%
#2) reference: 10.5% #3) instructions: 8.6%
#4) main/navigation pages: 5.91%
#5) contact info: 5.79%
#6) news/announcements: 4.27%
Clear content types lead to better contextual navigation
artist descriptions
album reviews
album pages
artist biosdiscography
TV listings
Make search smarter
Clear content types improve search performance
Clear content types improve search performance
Clear content types improve search performance
Content objects related to products
Clear content types improve search performance
Content objects related to products
Raw search results
Enabling filtering/faceted search
Contextualizing “advanced” features
Session data suggest progression and context
Session data suggest progression and context
search session patterns1. solar energy2. how solar energy works
Session data suggest progression and context
search session patterns1. solar energy2. how solar energy works
search session patterns1. solar energy2. energy
Session data suggest progression and context
search session patterns1. solar energy2. how solar energy works
search session patterns1. solar energy2. energy
search session patterns1. solar energy2. solar energy charts
Session data suggest progression and context
search session patterns1. solar energy2. how solar energy works
search session patterns1. solar energy2. energy
search session patterns1. solar energy2. solar energy charts
search session patterns1. solar energy2. explain solar energy
Session data suggest progression and context
search session patterns1. solar energy2. how solar energy works
search session patterns1. solar energy2. energy
search session patterns1. solar energy2. solar energy charts
search session patterns1. solar energy2. explain solar energy
search session patterns1. solar energy2. solar energy news
Recognizing proper nouns, dates, and unique ID#s
©2010 Louis Rosenfeld, LLC (www.louisrosenfeld.com). All rights reserved.
Identifying a need for a glossary
27
Smarter best bets
©2010 Louis Rosenfeld, LLC (www.louisrosenfeld.com). All rights reserved.
29
Best bets without guessing
Frequent keywords
“recycled” best bets
Learn how audiences differ
Who cares about what? (AIGA.org)
Who cares about what? (AIGA.org)
Who cares about what? (Open U)
Who cares about what? (Open U)
Who cares about what? (Open U)
Who cares about what? (Open U)
Why analyze queries by audience?
Fortify your personas with dataLearn about differences between audiences
• Open University “Enquirers”: 16 of 25 queries are for subjects not taught at OU
• Open University Students: search for course codes, topics dealing with completing program
Determine what’s commonly important to all audiences (these queries better work well)
Reduce jargon
Save the brand by killing jargon
Jargon related to online education: FlexEd, COD, College on Demand
Marketing’s solution: expensive campaign to educate public (via posters, brochures)
Result: content relabeled, money saved
query rank query#22 online*#101 COD#259 College on Demand#389 FlexTrack
* “online” part of 213 queries
Agenda
1.The basics of Site Search Analytics (SSA)2.Exercise 1 (pattern analysis)3.Things you can do with SSA4.Exercise 2 (longitudinal analysis5.More things you can do with SSA6.A case study7.More on metrics8.Things you can do today9.Discussion
Exercise 2 (longitudinal analysis)
Work in pairs• Each pair should have a laptop with
Microsoft Excel• Laptop platform (Mac, PC) doesn’t matter
Download data files: 2006-February.xls + 2006-June.xls
Refer to exercise sheetNo right answersHave fun!
Agenda
1.The basics of Site Search Analytics (SSA)2.Exercise 1 (pattern analysis)3.Things you can do with SSA4.Exercise 2 (longitudinal analysis5.More things you can do with SSA6.A case study7.More on metrics8.Things you can do today9.Discussion
Know when to publish what
Interest in the football team:
going...
Interest in the football team:
going...
...going...
Interest in the football team:
going...
...going...
gone
Interest in the football team:
going...
...going...
gone
Time to study!
Before Tax Day
After Tax Day
Identify trends
Learn from failure
Failed navigation?Examining unexpected searching
Look for places searches happen beyond main page
What’s going on?
• Navigational failure?
• Content failure?
• Something else?
Where navigation is failing (“Professional Resources” page)
Do users and AIGA mean different things by “Professional Resources”?
Comparing what users findand what they want
Comparing what users findand what they want
Failed business goals?Developing custom metrics
Netflix asks
1. Which movies most frequently searched? (query count)
2. Which of them most frequently clicked through? (MDP views)
3. Which of them least frequently added to queue? (queue adds)
Failed business goals?Developing custom metrics
Netflix asks
1. Which movies most frequently searched? (query count)
2. Which of them most frequently clicked through? (MDP views)
3. Which of them least frequently added to queue? (queue adds)
Failed business goals?Developing custom metrics
Netflix asks
1. Which movies most frequently searched? (query count)
2. Which of them most frequently clicked through? (MDP views)
3. Which of them least frequently added to queue? (queue adds)
Learn from search sessions
Sample search session (Teach for America intranet)
Session analysis
These queries co-occur within sessions: why?
TFAnet session analysis results
• Searches for “delta ICEG” perform poorly (way below the fold)
• Users then try an (incorrect) alternative (“delta learning team”)
54
Identify content gaps
0 results report (from behaviortracking.com)
Are we missing something?
Are we missing a type of something?
Identifying gaps helps force an issue
Identify failed content
1.Choose a content type (e.g., events)
2.Ask: “Where should users go from here?”
3.Analyze the frequent queries from this content type
from aiga.org
Analyze frequent queries generated from each content sample
Make content owners into stakeholders
Sandia National Labs
• Regularly record which documents came up at position #1 for 50 most frequent queries
• If and when that top document falls out of position #1, document's owner is alerted
• Result: healthy dialogue (often about following policies and procedures and their value)
Connecting pages (and their owners) that are found through search...
...with how those pages were found
Predict the future
Shaping the Financial Times’ editorial agendaFT compares these
• Spiking queries for proper nouns (i.e., people and companies)
• Recent editorial coverage of people and companies
Discrepancy? • Breaking story?!
• Let the editors know!
Agenda
1.The basics of Site Search Analytics (SSA)2.Exercise 1 (pattern analysis)3.Things you can do with SSA4.Exercise 2 (longitudinal analysis5.More things you can do with SSA6.A case study7.More on metrics8.Things you can do today9.Discussion
Avoiding a disaster at Vanguard
Vanguard used SSA to help benchmark existing search engine’s performance and help select new engine
New search engine “performed” poorlyBut IT needed
convincing to delay launch
Information Architect &
Dev Team Meeting
Search seems to have a few
problems… Nah
.
Where’s the
proof?
You can’t tell
for sure.
What to do?
Test performance of most frequent queriesMeasure using original two sets of metrics
1.relevance: how reliably the search engine returns the best matches first
2.precision: proportion of relevant and irrelevant results clustered at the top of the list
Relevance: 5 metrics (queries tested have “best” result)
Mean: Average distance from the topMedian: Less sensitive to outliers, but not useful once at
least half are ranked #1Count - Below 1st: How
often is the best target something other than 1st?
Count – Below 5th: How often is the best target outside the critical area?
Count – Below 10th: How often is the best target beyond the first page?
Relevance: 5 metrics (queries tested have “best” result)
Mean: Average distance from the topMedian: Less sensitive to outliers, but not useful once at
least half are ranked #1Count - Below 1st: How
often is the best target something other than 1st?
Count – Below 5th: How often is the best target outside the critical area?
Count – Below 10th: How often is the best target beyond the first page?
OK!
Relevance: 5 metrics (queries tested have “best” result)
Mean: Average distance from the topMedian: Less sensitive to outliers, but not useful once at
least half are ranked #1Count - Below 1st: How
often is the best target something other than 1st?
Count – Below 5th: How often is the best target outside the critical area?
Count – Below 10th: How often is the best target beyond the first page?
OK! Hmmm...
Relevance: 5 metrics (queries tested have “best” result)
Mean: Average distance from the topMedian: Less sensitive to outliers, but not useful once at
least half are ranked #1Count - Below 1st: How
often is the best target something other than 1st?
Count – Below 5th: How often is the best target outside the critical area?
Count – Below 10th: How often is the best target beyond the first page?
OK! Hmmm...
Uh oh
Precision: rating scale
Evaluate frequent queries’ top search results on this scale• r / Relevant: Based on the information the user provided, the page's
ranking is completely relevant
• n / Near: The page is not a perfect match, but it’s clearly reasonable for it to be ranked highly
• m / Misplaced: You can see why the search engine returned it, but it should not be ranked highly
• i / Irrelevant: The result has no apparent relationship to the user’s search
Precision: three metrics
Metrics based on degrees of permissiveness1. strict: only counts completely relevant results
2. loose: counts relevant and near results
3. permissive: counts relevant, near, and misplaced results
Putting it all together:old engine (target) and new
Note: low relevance and high precision scores are optimal
More on Vanguard case study: http://bit.ly/D3B8c
Agenda
1.The basics of Site Search Analytics (SSA)2.Exercise 1 (pattern analysis)3.Things you can do with SSA4.Exercise 2 (longitudinal analysis5.More things you can do with SSA6.A case study7.More on metrics8.Things you can do today9.Discussion
Mapping KPI and metrics: A generic “search success” KPI
Search Metrics: general examples(Lee Romero, blog.leeromero.org)• Total searches for a given time period
• Total distinct search terms for a given time period
• Total distinct words for a given time period
• Average words per search
• Top searches for a given time period
• Top Searches over time
• Not found searches
• Error searches
• Ratio of searches performed each reporting period to the number of visits for that same time period
Search Metrics: search engine tuning(Jeannine Bartlett, earley.com)Do users not find what they want because the search engine
and its ranking and relevance algorithms have not been adequately tuned?
Example Benchmarks and Metrics• # of valid queries returning no results / total unique queries• Relative % search results per data source
• Relative % click throughs per data source
• Pass/fail % for queries using stemming
• Pass/fail % for queries with misspellings
• Precision measures of “seed” documents sent through the tagging process
Search Metrics: query entry(Jeannine Bartlett, earley.com)Do users not find what they want because the UI for
expressing search terms is inadequate or unintuitive? Example Benchmarks and Metrics• % queries in the bottom set of the Zipf Curve (flat vs. hockey-stick
distribution)• % queries with no click throughs
• % queries using syntactic metadata filtering (date, author, source, document type, geography, etc.)
• % queries using Boolean search grammar
• % queries using type-ahead against taxonomy terms and synonyms
• % queries using faceted semantic refinement • % pages from which search is available
Search Metrics: result sets(Jeannine Bartlett, earley.com)Do users not find what they want because the UI for
visualizing result sets is inadequate or unintuitive?Example Benchmarks and Metrics• % queries utilizing multiple results views • % queries with drill-down through clusters
• % queries using iterative syntactic metadata filtering (date range, sorting, type or source inclusion/exclusion, etc.)
• % queries suggesting broader/narrower terms
• % queries suggesting “Best Bets” or “See Also”
• % queries using iterative semantic term filtering, inclusion or exclusion
Agenda
1.The basics of Site Search Analytics (SSA)2.Exercise 1 (pattern analysis)3.Things you can do with SSA4.Exercise 2 (longitudinal analysis5.More things you can do with SSA6.A case study7.More on metrics8.Things you can do today9.Discussion
Things to do today
1.Set up SSA in Google Analytics2.Query your queries3.Start developing a site report card4.Start incorporating SSA into your
user research program
Turn on SSA in Google Analytics
Set up GA for your site if you haven’t alreadyThen teach it to parse and capture your
search engine’s queries (not set by default)References
• http://is.gd/cR0qr• http://is.gd/cR0qP
Seed your analysis by querying your queriesStarter questions
1. What are the most frequent unique queries?
2. Are frequent queries retrieving quality results?
3. Click-through rates per frequent query?4. Most frequently clicked result per query?
5. Which frequent queries retrieve zero results?
6. What are the referrer pages for frequent queries?
7. Which queries retrieve popular documents?8. What interesting patterns emerge in general?
Use SSA to start work on a site report card
Use SSA to start work on a site report card
SSA helps determine common information needs
©2010 Louis Rosenfeld, LLC (www.louisrosenfeld.com). All rights reserved.
From Christian Rohrer, xdstrategy.com
©2010 Louis Rosenfeld, LLC (www.louisrosenfeld.com). All rights reserved.
Augment personas and audience profiles with frequent queries
Persona example (from Adaptive Path)
Frequent queries added (in green)
Agenda
1.The basics of Site Search Analytics (SSA)2.Exercise 1 (pattern analysis)3.Things you can do with SSA4.Exercise 2 (longitudinal analysis5.More things you can do with SSA6.A case study7.More on metrics8.Things you can do today9.Discussion
Comparing referral queries with local queries
Long tail queries:Longer, more complex (from Vanguard)
Short head: common queries Long tail: common queries
Beneficiary form401(k)beneficiarycareerformsamtmoney marketlocationloanscalculator
403(b)(7) account asset transfer authorizationautomatic investingWire transfer instructionsadoption agreementinternational wire transferssocially responsible investingVanguard tax identification numberIRA Asset Transfer formfdic insured accountearly withdrawal penalties
Now on sale
Search Analytics for Your Site: Conversations with Your Customers by Louis Rosenfeld (Rosenfeld Media, 2011)
www.rosenfeldmedia.com
Use code WEBDAGENE2013
for 20% off allRosenfeld Media books
Louis Rosenfeld [email protected]
www.louisrosenfeld.comwww.rosenfeldmedia.com
@louisrosenfeld@rosenfeldmedia
This presentation available on SlideShare: http://slidesha.re/otzE2t
Say hello