copyright 2011 sematext int'l. all rights reserved. 1 search analytics what? why? how? otis...
TRANSCRIPT
![Page 1: Copyright 2011 Sematext Int'l. All rights reserved. 1 Search Analytics What? Why? How? Otis Gospodnetić – Sematext International](https://reader035.vdocuments.us/reader035/viewer/2022062408/56649e965503460f94b99f6c/html5/thumbnails/1.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.1
Search Analytics
What? Why? How?
Otis Gospodnetić – Sematext International
![Page 2: Copyright 2011 Sematext Int'l. All rights reserved. 1 Search Analytics What? Why? How? Otis Gospodnetić – Sematext International](https://reader035.vdocuments.us/reader035/viewer/2022062408/56649e965503460f94b99f6c/html5/thumbnails/2.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.2
About Otis Gospodnetić
• Member: Apache Lucene, Solr, Nutch, Mahout
• Author: Lucene in Action 1 & 2
• Entrepreneur: Sematext, Simpy
![Page 3: Copyright 2011 Sematext Int'l. All rights reserved. 1 Search Analytics What? Why? How? Otis Gospodnetić – Sematext International](https://reader035.vdocuments.us/reader035/viewer/2022062408/56649e965503460f94b99f6c/html5/thumbnails/3.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.3
About Sematext
Products & ServicesConsulting, Development, Tech Support:
Search (Lucene, Solr, Elastic Search...) Big Data (Hadoop, HBase, Voldemort...) Web Crawling (Nutch) Machine Learning (Mahout)
![Page 4: Copyright 2011 Sematext Int'l. All rights reserved. 1 Search Analytics What? Why? How? Otis Gospodnetić – Sematext International](https://reader035.vdocuments.us/reader035/viewer/2022062408/56649e965503460f94b99f6c/html5/thumbnails/4.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.4
Agenda
Intro: Otis & Sematext - DONE What Why Specific Reports & their value
![Page 5: Copyright 2011 Sematext Int'l. All rights reserved. 1 Search Analytics What? Why? How? Otis Gospodnetić – Sematext International](https://reader035.vdocuments.us/reader035/viewer/2022062408/56649e965503460f94b99f6c/html5/thumbnails/5.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.5
What is Search Analytics?
Input: queries and clicks Subsequent: actions / xactions / conversions Output: reports – over time
The means, not the goal Ongoing, not one-off
![Page 6: Copyright 2011 Sematext Int'l. All rights reserved. 1 Search Analytics What? Why? How? Otis Gospodnetić – Sematext International](https://reader035.vdocuments.us/reader035/viewer/2022062408/56649e965503460f94b99f6c/html5/thumbnails/6.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.6
Search Analytics and SEO
Not the same SA can help with SEO
![Page 7: Copyright 2011 Sematext Int'l. All rights reserved. 1 Search Analytics What? Why? How? Otis Gospodnetić – Sematext International](https://reader035.vdocuments.us/reader035/viewer/2022062408/56649e965503460f94b99f6c/html5/thumbnails/7.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.7
Search vs. Web Analytics
User intent and information needs vs. inferring Hand in hand Ideally you can relate data from both or even
unify it
![Page 8: Copyright 2011 Sematext Int'l. All rights reserved. 1 Search Analytics What? Why? How? Otis Gospodnetić – Sematext International](https://reader035.vdocuments.us/reader035/viewer/2022062408/56649e965503460f94b99f6c/html5/thumbnails/8.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.8
Why Search Analytics?
Measure and monitor everything. Introspection. Supports (re)design, navigation choices Helps with content acquisition & enhancement Improve search experience Mula
![Page 9: Copyright 2011 Sematext Int'l. All rights reserved. 1 Search Analytics What? Why? How? Otis Gospodnetić – Sematext International](https://reader035.vdocuments.us/reader035/viewer/2022062408/56649e965503460f94b99f6c/html5/thumbnails/9.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.9
Report Groups
Failures vs. non-failures Actionable vs. non-actionable
![Page 10: Copyright 2011 Sematext Int'l. All rights reserved. 1 Search Analytics What? Why? How? Otis Gospodnetić – Sematext International](https://reader035.vdocuments.us/reader035/viewer/2022062408/56649e965503460f94b99f6c/html5/thumbnails/10.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.10
Failures
Be aware of failures, but don't be one.
Zero hits Low query CTR High search exit rate Irrelevant results Over N refinements
![Page 11: Copyright 2011 Sematext Int'l. All rights reserved. 1 Search Analytics What? Why? How? Otis Gospodnetić – Sematext International](https://reader035.vdocuments.us/reader035/viewer/2022062408/56649e965503460f94b99f6c/html5/thumbnails/11.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.11
Report: Zero Hit Queries
Overall pct. (not raw count) vs. popular queries Misspellings? Synonyms? No matching content? Need (different) tagging? Bad analysis? Multilingual issue?
![Page 12: Copyright 2011 Sematext Int'l. All rights reserved. 1 Search Analytics What? Why? How? Otis Gospodnetić – Sematext International](https://reader035.vdocuments.us/reader035/viewer/2022062408/56649e965503460f94b99f6c/html5/thumbnails/12.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.12
![Page 13: Copyright 2011 Sematext Int'l. All rights reserved. 1 Search Analytics What? Why? How? Otis Gospodnetić – Sematext International](https://reader035.vdocuments.us/reader035/viewer/2022062408/56649e965503460f94b99f6c/html5/thumbnails/13.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.13
Report: Zero Hit Queries (cont.)
Use Query Spellchecker (aka DYM) Using AutoComplete - $MM improvement Using DYM ReSearcher Designing No Results page
![Page 14: Copyright 2011 Sematext Int'l. All rights reserved. 1 Search Analytics What? Why? How? Otis Gospodnetić – Sematext International](https://reader035.vdocuments.us/reader035/viewer/2022062408/56649e965503460f94b99f6c/html5/thumbnails/14.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.14
![Page 15: Copyright 2011 Sematext Int'l. All rights reserved. 1 Search Analytics What? Why? How? Otis Gospodnetić – Sematext International](https://reader035.vdocuments.us/reader035/viewer/2022062408/56649e965503460f94b99f6c/html5/thumbnails/15.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.15
![Page 16: Copyright 2011 Sematext Int'l. All rights reserved. 1 Search Analytics What? Why? How? Otis Gospodnetić – Sematext International](https://reader035.vdocuments.us/reader035/viewer/2022062408/56649e965503460f94b99f6c/html5/thumbnails/16.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.16
Report: High Exit Rate Queries
Disappointed, frustrated users Major revenue loss, no second chance Marriage with Web Analytics Relevance bad? Default ordering bad? Titles of hits need adjusting? Search terms highlighting looking bad? Bad thumbnails? Need thumbnails?
![Page 17: Copyright 2011 Sematext Int'l. All rights reserved. 1 Search Analytics What? Why? How? Otis Gospodnetić – Sematext International](https://reader035.vdocuments.us/reader035/viewer/2022062408/56649e965503460f94b99f6c/html5/thumbnails/17.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.17
Report: Irrelevant Result Queries
Manual: top N hits of top N queries Judge relevance of each hit and assign score Per-query score: sum scores of top N hits Cumulative top N query score: sum per-query
scores
Automated: Mean Reciprocal Rank (MRR)
![Page 18: Copyright 2011 Sematext Int'l. All rights reserved. 1 Search Analytics What? Why? How? Otis Gospodnetić – Sematext International](https://reader035.vdocuments.us/reader035/viewer/2022062408/56649e965503460f94b99f6c/html5/thumbnails/18.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.18
Report: Total Queries
Search vs. navigation/browsing Search vs. overall site usage Related report: % of visits with search Segment: new users vs. return users, etc.
Questions: do you count paging? Facet selection? Re-sorting?
![Page 19: Copyright 2011 Sematext Int'l. All rights reserved. 1 Search Analytics What? Why? How? Otis Gospodnetić – Sematext International](https://reader035.vdocuments.us/reader035/viewer/2022062408/56649e965503460f94b99f6c/html5/thumbnails/19.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.19
Report: Total Distinct Queries
What's distinct? Car vs. Cars # Total Queries / # Distinct Queries = Avg. # Tied to performance and query cache utilization Extension: Total distinct words in queries
![Page 20: Copyright 2011 Sematext Int'l. All rights reserved. 1 Search Analytics What? Why? How? Otis Gospodnetić – Sematext International](https://reader035.vdocuments.us/reader035/viewer/2022062408/56649e965503460f94b99f6c/html5/thumbnails/20.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.20
Report: Words Per Query
Informative, slowly changing, not terribly actionable
Can affect search box size Use AutoComplete if queries are long
![Page 21: Copyright 2011 Sematext Int'l. All rights reserved. 1 Search Analytics What? Why? How? Otis Gospodnetić – Sematext International](https://reader035.vdocuments.us/reader035/viewer/2022062408/56649e965503460f94b99f6c/html5/thumbnails/21.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.21
Report: Top Queries
User intent and information needs Ensure good results Calculate MRR for top N queries Calculate MRR for each top N query Compare to global MRR
![Page 22: Copyright 2011 Sematext Int'l. All rights reserved. 1 Search Analytics What? Why? How? Otis Gospodnetić – Sematext International](https://reader035.vdocuments.us/reader035/viewer/2022062408/56649e965503460f94b99f6c/html5/thumbnails/22.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.22
Report: Top Queries (cont.)
New top queries – new trend? New demand? Best Bets (aka Query Elevation in Solr) Expose before search is needed Seasonality – hour of day, day of the week, etc.
Adjust content presentation and availability (e.g. week vs. weekend, business vs. personal)
Anticipate demand in the next cycle
![Page 23: Copyright 2011 Sematext Int'l. All rights reserved. 1 Search Analytics What? Why? How? Otis Gospodnetić – Sematext International](https://reader035.vdocuments.us/reader035/viewer/2022062408/56649e965503460f94b99f6c/html5/thumbnails/23.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.23
![Page 24: Copyright 2011 Sematext Int'l. All rights reserved. 1 Search Analytics What? Why? How? Otis Gospodnetić – Sematext International](https://reader035.vdocuments.us/reader035/viewer/2022062408/56649e965503460f94b99f6c/html5/thumbnails/24.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.24
Clickstream Analysis
Query analysis is not a complete story: Queries Clicks Actions / Transactions / Conversions
![Page 25: Copyright 2011 Sematext Int'l. All rights reserved. 1 Search Analytics What? Why? How? Otis Gospodnetić – Sematext International](https://reader035.vdocuments.us/reader035/viewer/2022062408/56649e965503460f94b99f6c/html5/thumbnails/25.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.25
Query and Hit Valuation
Query: by popularity (count) Query: by CTR Query: by subsequent (trans)action count/pct.
Hit: by click count Hit: by subsequent (trans)action count/pct.
![Page 26: Copyright 2011 Sematext Int'l. All rights reserved. 1 Search Analytics What? Why? How? Otis Gospodnetić – Sematext International](https://reader035.vdocuments.us/reader035/viewer/2022062408/56649e965503460f94b99f6c/html5/thumbnails/26.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.26
Query and Hit Valuation (cont.)
Maximize:pop(query) + ctr(query) + action(query)clicks(hit) + action(hit)
Failures:high pop(q), yet low ctr(q)
high pop(q), high ctr(q), yet low action(q)
Integration with backend required
![Page 27: Copyright 2011 Sematext Int'l. All rights reserved. 1 Search Analytics What? Why? How? Otis Gospodnetić – Sematext International](https://reader035.vdocuments.us/reader035/viewer/2022062408/56649e965503460f94b99f6c/html5/thumbnails/27.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.27
Report: Low CTR Queries
Percentage (not raw count) vs. popular queries Relevance bad? Default ordering bad? Titles of hits need adjusting? Search terms highlighting looking bad? Bad thumbnails? Need thumbnails?
![Page 28: Copyright 2011 Sematext Int'l. All rights reserved. 1 Search Analytics What? Why? How? Otis Gospodnetić – Sematext International](https://reader035.vdocuments.us/reader035/viewer/2022062408/56649e965503460f94b99f6c/html5/thumbnails/28.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.28
![Page 29: Copyright 2011 Sematext Int'l. All rights reserved. 1 Search Analytics What? Why? How? Otis Gospodnetić – Sematext International](https://reader035.vdocuments.us/reader035/viewer/2022062408/56649e965503460f94b99f6c/html5/thumbnails/29.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.29
Report: Queries with Most Clicks
i.e. Queries with Highest CTR Informative? Yes Actionable? Somewhat: expose relevant
content outside of search
![Page 30: Copyright 2011 Sematext Int'l. All rights reserved. 1 Search Analytics What? Why? How? Otis Gospodnetić – Sematext International](https://reader035.vdocuments.us/reader035/viewer/2022062408/56649e965503460f94b99f6c/html5/thumbnails/30.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.30
![Page 31: Copyright 2011 Sematext Int'l. All rights reserved. 1 Search Analytics What? Why? How? Otis Gospodnetić – Sematext International](https://reader035.vdocuments.us/reader035/viewer/2022062408/56649e965503460f94b99f6c/html5/thumbnails/31.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.31
Search Session
Search activity aimed at satisfying a specific information need in a some limited amount of time.
i.e. it's very fuzzy
![Page 32: Copyright 2011 Sematext Int'l. All rights reserved. 1 Search Analytics What? Why? How? Otis Gospodnetić – Sematext International](https://reader035.vdocuments.us/reader035/viewer/2022062408/56649e965503460f94b99f6c/html5/thumbnails/32.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.32
Interesting Search Sessions
More than N queries in M minutes Sessions that end in a failure Sessions for specific type of info (e.g. person
name, product name, event)
![Page 33: Copyright 2011 Sematext Int'l. All rights reserved. 1 Search Analytics What? Why? How? Otis Gospodnetić – Sematext International](https://reader035.vdocuments.us/reader035/viewer/2022062408/56649e965503460f94b99f6c/html5/thumbnails/33.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.33
Segmentation
Searches that resulted in conversion vs. not Search metrics for:
New vs. returning visitors English vs. French vs. Spanish vs. … Chrome vs. IE ...
![Page 34: Copyright 2011 Sematext Int'l. All rights reserved. 1 Search Analytics What? Why? How? Otis Gospodnetić – Sematext International](https://reader035.vdocuments.us/reader035/viewer/2022062408/56649e965503460f94b99f6c/html5/thumbnails/34.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.34
More SA Reports/Questions
% of queries from DYM vs. AC vs. typed Most common queries per clicked hit Which hits are generally popular? Which hits are trending up? Are there docs that are never ever clicked on? Average number of queries per session Breakdown of queries by number of hits
![Page 35: Copyright 2011 Sematext Int'l. All rights reserved. 1 Search Analytics What? Why? How? Otis Gospodnetić – Sematext International](https://reader035.vdocuments.us/reader035/viewer/2022062408/56649e965503460f94b99f6c/html5/thumbnails/35.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.35
More SA Reports/Questions
Breakdown of queries by latency Frequently used facets or sort criteria Avg number of clicks per query Time spent on site before/after searching Search initiation pages How deep into SERPs are people drilling? Are too many clicks on pages other than 1st? ...
![Page 36: Copyright 2011 Sematext Int'l. All rights reserved. 1 Search Analytics What? Why? How? Otis Gospodnetić – Sematext International](https://reader035.vdocuments.us/reader035/viewer/2022062408/56649e965503460f94b99f6c/html5/thumbnails/36.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.36
Data Collection
Details in Search Analytics with Flume and HBase on http://slideshare.net/sematext
![Page 37: Copyright 2011 Sematext Int'l. All rights reserved. 1 Search Analytics What? Why? How? Otis Gospodnetić – Sematext International](https://reader035.vdocuments.us/reader035/viewer/2022062408/56649e965503460f94b99f6c/html5/thumbnails/37.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.37
Sematext's Search Analytics
Built with Flume, HBase, Hadoop, etc. Resulted in 2 open-source projects:
https://github.com/sematext/HBaseWD
https://github.com/sematext/HBaseHUT http://sematext.com/open-source/index.html
Resulted in patches for Flume and HBase
![Page 38: Copyright 2011 Sematext Int'l. All rights reserved. 1 Search Analytics What? Why? How? Otis Gospodnetić – Sematext International](https://reader035.vdocuments.us/reader035/viewer/2022062408/56649e965503460f94b99f6c/html5/thumbnails/38.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.38
We're Hiring
Dig Search?Dig Analytics?Dig Big Data?Dig Performance?
Dig working with and in open-source?
We're hiring world-wide!
http://sematext.com/about/jobs.html
![Page 39: Copyright 2011 Sematext Int'l. All rights reserved. 1 Search Analytics What? Why? How? Otis Gospodnetić – Sematext International](https://reader035.vdocuments.us/reader035/viewer/2022062408/56649e965503460f94b99f6c/html5/thumbnails/39.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.39
sematext.com blog.sematext.com @sematext @otisg [email protected]
Want SA? Grab me! http://sematext.com/search-analytics
Contact