search analytics at enterprise search summit fall 2011
DESCRIPTION
This presentation describes what Search Analytics is, what value it brings to the table, how it can be used, what additional functionality and values can be build with search data, etc.TRANSCRIPT
Search Analytics
What? Why? How?
Otis Gospodnetić – Sematext International@otisg ◦ @sematext ◦ sematext.com
sematext.com/search-analytics
Copyright 2011 Sematext Int'l. All rights reserved.2
About Otis Gospodnetić
• ASF Member: Lucene, Solr, Nutch, Mahout
• Author: Lucene in Action 1 & 2
• Entrepreneur: Sematext, Simpy
Copyright 2011 Sematext Int'l. All rights reserved.3
Sematext Metrics
100% organic: no GMO, no VC 4 years old < 10 people 7 countries 3 timezones 2 continents > 100 customers
Copyright 2011 Sematext Int'l. All rights reserved.4
About Sematext
Products & Services
Consulting, Development, Tech Support:
Search (Lucene, Solr, ElasticSearch...) Big Data (Hadoop, HBase, Voldemort...) Web Crawling (Nutch, Droids) Machine Learning (Mahout)
Copyright 2011 Sematext Int'l. All rights reserved.5
Agenda
What is Search Analytics and why it matters Example reports and their value Optional: Search Analytics in the Cloud
Copyright 2011 Sematext Int'l. All rights reserved.6
Communication
twitter.com/sematext twitter.com/otisg hash tags: #stsa or #stanalytics http://sematext.com/search-analytics/index.html Raise your hand! [email protected]
Copyright 2011 Sematext Int'l. All rights reserved.7
Why
searchusers
searchproviders
searchexperience
Copyright 2011 Sematext Int'l. All rights reserved.8
Why Oh Why
searchproviders
searchexperience
This search sucks!It takes 17 tries to find anything here!
F!?@#$%^&?!?
searchusers
Cool, the latest search tweaks made our site really sticky!
Awesome!
Copyright 2011 Sematext Int'l. All rights reserved.9
Fill in the Missing Piece
Search Analytics
Performance Monitoring
Quality Assurance
Tuning UI
Copyright 2011 Sematext Int'l. All rights reserved.10
Blind Leading the Blind
Copyright 2011 Sematext Int'l. All rights reserved.11
Analytics as Compass
Search logs are your Map
Search Analytics is your Compass
Copyright 2011 Sematext Int'l. All rights reserved.12
The Bottom Line Why
Measure and monitor everything. Supports (re)design, navigation choices Helps with content acquisition & enhancement Improve search experience Mula
Copyright 2011 Sematext Int'l. All rights reserved.13
The Moment of Truth
Question for the audience #1
What do you use for Search Analytics?
a) Home grown stuffb) Google Analyticsc) Omnitured) Webtrendse) Otherf ) Nothing
Copyright 2011 Sematext Int'l. All rights reserved.14
Search Analytics Basics
Collect: queries & clicks & interactions & ... Analyze: actions / xactions / conversions Output: reports – over time Output++: feedback loop
The means, not the goal Ongoing, not one-off
remember this
Copyright 2011 Sematext Int'l. All rights reserved.15
Search vs. Web Analytics
User intent and information needs vs. inferring Hand in hand Ideally you can relate data from both or even
unify it
Copyright 2011 Sematext Int'l. All rights reserved.16
Report Types
Failures vs. non-failures
Actionable vs. non-actionable
Trends vs. summaries
Copyright 2011 Sematext Int'l. All rights reserved.17
Failures vs. Non-Failures
Zero hits Low CTR Low MRR High bounce rate Low conversion rate Deep paging Deep clicking High latency
Query rate Query volume Top seen & clicked
docs Top queries Terms per query Search sessions Search users Distinct queries
Copyright 2011 Sematext Int'l. All rights reserved.18
Value of Failure Fixes
Zero hits Low CTR Low MRR High bounce rate Low conversion rate Deep paging Deep clicking High latency
Re-search
Findability
Relevance Tuning
Performance Tuning
Copyright 2011 Sematext Int'l. All rights reserved.19
Measure, then Fix
If you can't measure, it you can't fix it!
Copyright 2011 Sematext Int'l. All rights reserved.20
Relevance A/B Testing
Copyright 2011 Sematext Int'l. All rights reserved.21
Tracking Zero Hits
Copyright 2011 Sematext Int'l. All rights reserved.22
Watching Latency
Copyright 2011 Sematext Int'l. All rights reserved.23
Search Analytics & Measuring
If you can't measure it, you can't fix it!
You can't measure it if you don't have Analytics
Copyright 2011 Sematext Int'l. All rights reserved.24
Actionable vs. Non-Actionable
Zero hits Low CTR Low MRR High bounce rate Low conversion rate Deep paging Deep clicking High latency
Query rate Query volume Top seen & clicked
docs Top queries Terms per query Search sessions Search users Distinct queries
Copyright 2011 Sematext Int'l. All rights reserved.25
More Fixin' Query rate Query volume Search sessions Search users Top seen & clicked
docs Top queries Terms per query Distinct queries
Navigation & Design
Results Shuffling Diversification
Recommendations
AutoCompleteSearch box size
Copyright 2011 Sematext Int'l. All rights reserved.26
Output++: Data is Power
AutoComplete - $MM improvement Better DYM Spellchecker Related Searches Recommendations Relevance Feedback ...
Copyright 2011 Sematext Int'l. All rights reserved.27
Closing the Loop
searchusers
searchproviders
searchexperience
Copyright 2011 Sematext Int'l. All rights reserved.28
Resources
http://rosenfeldmedia.com/books/searchanalytics/
Search Analytics for Your SiteLouis Rosenfeld
Search Analytics What? Why? How?
Search Analytics with Flume and HBase
Search Analytics Business Value & NoSQL Backend
http://blog.sematext.com/tag/analytics/
Copyright 2011 Sematext Int'l. All rights reserved.29
Key Take-aways
Without Analytics you are blind
If you can't measure it, you can't fix it
Use Search Analytics to understand, measure and improve search
Using Search Analytics means having a competitive advantage
Copyright 2011 Sematext Int'l. All rights reserved.30
Time permitting:
Behind the scenes of Sematext Search Analytics
Behind the Scenes
Copyright 2011 Sematext Int'l. All rights reserved.31
sematext.com blog.sematext.com @sematext @otisg [email protected]
Want SA? Grab me or go to: sematext.com/search-analytics
Hash tags: #stsa or #stanalytics
Contact
Copyright 2011 Sematext Int'l. All rights reserved.32
What We've Built
Search Analytics SaaS Numerous reports (e.g. query volume,
rate, latency, term frequencies / comparisons, hit buckets, search origins, etc.)
Trending over time Comparisons of time periods Top N reports Filter, slice and dice
Copyright 2011 Sematext Int'l. All rights reserved.33
Sematext Search Analytics
Copyright 2011 Sematext Int'l. All rights reserved.34
Big Dreams
SaaS Multitenant Large Scale – Massive Data Cloud
Copyright 2011 Sematext Int'l. All rights reserved.35
Storage Choices
RDBMS: MySQL, PostgreSQL HDFS Hive HBase Cassandra
Copyright 2011 Sematext Int'l. All rights reserved.36
SaaS vs. In-House
Question for the audience #2
SaaS vs in-house Search Analytics?
a) SaaSb) in-house
Copyright 2011 Sematext Int'l. All rights reserved.37
Sematext Search Analytics
Copyright 2011 Sematext Int'l. All rights reserved.38
Sematext Search Analytics
Copyright 2011 Sematext Int'l. All rights reserved.39
Sematext Search Analytics
Copyright 2011 Sematext Int'l. All rights reserved.40
Sematext Search Analytics
Copyright 2011 Sematext Int'l. All rights reserved.41
Data Flow See Search Analytics with Flume and HBase
http://blog.sematext.com/2010/10/16/search-analytics-hadoop-world-flume-hbase/
Copyright 2011 Sematext Int'l. All rights reserved.42
Data Collection See Search Analytics with Flume and HBase
http://blog.sematext.com/2010/10/16/search-analytics-hadoop-world-flume-hbase/
Copyright 2011 Sematext Int'l. All rights reserved.43
Core Tech
JavaScript Beacons Metric Capture Web App aka Receiver Flume Agents, Collectors, Sinks HBase MapReduce Aggregations Search Analytics Reporting Web App
Copyright 2011 Sematext Int'l. All rights reserved.44
What is Flume
Distributed data/log collection service Scalable, configurable, extensible Centrally manageable, open source
Agents get data from app, Collectors save it Abstractions: Source → Decorator(s) → Sink
Copyright 2011 Sematext Int'l. All rights reserved.45
What is HBase
Scalable, reliable, distributed, column-oriented DB On top of HDFS MapReducable
Copyright 2011 Sematext Int'l. All rights reserved.46
Data Flow, Detailed
Copyright 2011 Sematext Int'l. All rights reserved.47
Why Flume
Reliable delivery e.g. queue msgs locally if destination unreachable
Easy, centralized management via Web UI or console
Good community, good progress, now @ASF But: more complex, more moving parts On Flume: slideshare.net/cloudera/inside-flume Alternatives: Kafka, Scribe...
Copyright 2011 Sematext Int'l. All rights reserved.48
Why HBase
Scalable raw & aggregate data storage MapReduce data input Fast scans for time ranges, fast key lookups Easy storage and compute power expansion Good looking roadmap, community, progress
Copyright 2011 Sematext Int'l. All rights reserved.49
Open Sourcing
2 open-source projects:
github.com/sematext/HBaseWD
github.com/sematext/HBaseHUT See sematext.com/open-source/index.html
Patches for Flume and HBaseblog.sematext.com/tag/flume/
Copyright 2011 Sematext Int'l. All rights reserved.50
Challenges
Data size. Solutions: Compression (4-5x smaller with lzo) Data pruning (variable levels)
Query string distribution: very long-tail Lots of data to process, update, aggregate
Young tools: Flume, HBase Poor IO on EC2 Hadoop distributions
Copyright 2011 Sematext Int'l. All rights reserved.51
sematext.com blog.sematext.com @sematext @otisg [email protected]
Want SA? Grab me or go to: sematext.com/search-analytics
Hash tags: #stsa or #stanalytics
Contact