this ain't your parents' search engine
TRANSCRIPT
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
This Ain’t Your Parents’ Search Engine
Grant Ingersoll
CTO, LucidWorks
Twitter: @gsingers
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
Search is dead.
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
Long live search
Confidential and Proprietary © Copyright 2013
Search is good for…
• Traditional: Fast, fuzzy text matching across a large document collection
• De-normalized data- “light” relational
• Top N problems- Key-value (n=1)- Recommendations- “Good enough” classification,
clustering
• Faceting, aggregations, analytical slicing and dicing of data
• Spatial, record/event linkage, alertinghttp://cheezburger.com/5243950080
Confidential and Proprietary © Copyright 2013
Foundational Changes in Lucene/Solr 4
•Reduced Memory usage
•Pluggable Codecs/similarity
•FS(A|T)
•Doc Values (column oriented)
•Spatial upgrade
•New facets and functions
•Cursors (deep paging)
•Distributed capabilities
•Joins/Grouping
Confidential and Proprietary © Copyright 2013
Search + Hadoop
•What’s Old is New Again
•“Traditional” Use Cases:- Build/Store indexes
- https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS
•Enrichment and Signal processing- PageRank, Statistically Interesting Phrases, etc.
Confidential and Proprietary © Copyright 2013
LucidWorks + Hadoop
• Ingestion Help- Flexible Map-Reduce content ingestion supporting:
»Directory of files
»CSV, Writable, etc.
»LogStash
»Build Your Own
•Pig Load/Store and UDFs
•Hive 2-way support
•http://www.lucidworks.com/search-for-hadoop/- Open source this summer
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
LucidWorks SiLK
LucidWorks Search
JDBC Connector
Web/File System Crawl
Data Warehouse
Hadoop Connectors
Clickstream Networking
Data Sources
Connectors
Servers
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
Solr/Solr Cloud
Search Analytics—Data Ingestion & Visualization
Gateway(Reverse Proxy)
Solr Output Writer for
LogStash (Http)
Search Logs
Visualization Configurable Dashboards
Hadoop ConnectorGrokIngestMapperLogStash
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
LucidWorks Open Source
• Logstash for Solr: https://github.com/LucidWorks/solrlogmanager
• Banana (Kibana for Solr): https://github.com/LucidWorks/banana
• Effortless AWS deployment and monitoring: http://www.github.com/lucidworks/solr-scale-tk
• Data Quality Toolkit: https://github.com/LucidWorks/data-quality
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
Demos
Confidential and Proprietary © Copyright 201312
Fly the friendly skies
http://www.ibm.com/developerworks/library/j-solr-lucene/index.html
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
Make $$$
• Leverage time series data and visualization using LucidWorks SiLK
• Monitor Social
• Traditional Research
https://github.com/lucidworks/lws-financial-demo
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
Cure what ails you
Confidential and Proprietary © Copyright 201315
Space-Time Continuum
• Leverage Solr’s spatial capabilities to index non-spatial data, such as time ranges- Useful for Open Hours,
Shifts, etc.
•Query using rectangle intersections- q = shift:"Intersects(0 19
23 365)”
https://people.apache.org/~hossman/spatial-for-non-spatial-meetup-20130117/
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
Signal Processing for Search and Discovery
• Signals power modern relevance– Clicks, conversions, sharing, history, signatures
• LucidWorks 5 makes it easy to capture and leverage signals– Recommendations, analytics, discovery
• Simplifies your data workflow
• Simplify your operational footprint
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
Solr Powered Signal Processing
• Use Case: eCommerce
• Data:
– Product catalog (~1.2m items)
– Click data (~3.9M clicks)
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
Meta
• http://www.lucidworks.com– [email protected]– @gsingers
• Sales– Steve Drane (based here in Chicago)– [email protected]
• Lucene/Solr Revolution – Washington DC, Nov 11-14– http://www.lucenerevolution.org