this ain't your parent's search engine
DESCRIPTION
In just a few short years, search has quickly evolved from being a small text box in the nether regions of a website to being front and center in our lives. Increasingly, however, search engine technology is also being used for practical, real time recommendations, events processing, complex spatial functionality and time series analysis capable of not only matching user's queries in text, but also driving real time decision making and analytics. In fact, open source Apache Lucene/Solr can do all of this and more by taking advantage of new data structures and algorithms that complement more traditional IR approaches. In this demo-driven talk, Lucene committer Grant Ingersoll will take a look at some of the new and exciting ways users are leveraging Lucene/Solr and related technology to drive deeper insight into information needs that go beyond keywords in a text box.TRANSCRIPT
10010
10010
10010 10010
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
This ain’t your Parent’s Search Engine
Grant IngersollCTO, LucidWorks
Twitter: @gsingers
10010
10010
10010 10010
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
Search is dead.
10010
10010
10010 10010
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
Long live search
10010
10010
10010 10010
Confidential and Proprietary © Copyright 2013
Search is good for…
• Traditional: Fast, fuzzy text matching across a large document collection
• De-normalized data- “light” relational
• Top N problems- Key-value (n=1)- Recommendations- “Good enough” classification, clustering
• Faceting, aggregations, analytical slicing and dicing of data
• Spatial, record/event linkage, alerting
http://cheezburger.com/5243950080
10010
10010
10010 10010
Confidential and Proprietary © Copyright 2013
Foundational Changes in Lucene/Solr 4
•Reduced Memory usage•Pluggable Codecs/similarity•FS(A|T)•Doc Values (column oriented)•Spatial upgrade•New facets and functions•Cursors (deep paging)•Distributed capabilities•Joins/Grouping
10010
10010
10010 10010
Confidential and Proprietary © Copyright 2013
Search + Hadoop
•What’s Old is New Again
•“Traditional” Use Cases:-Build/Store indexes-https://cwiki.apache.org/confluence/display/
solr/Running+Solr+on+HDFS
•Enrichment and Signal processing-PageRank, Statistically Interesting Phrases, etc.
10010
10010
10010 10010
Confidential and Proprietary © Copyright 2013
LucidWorks + Hadoop
• Ingestion Help- Flexible Map-Reduce content ingestion supporting:»Directory of files»CSV, Writable, etc.»LogStash»Build Your Own
•Pig Load/Store and UDFs•Hive 2-way support•http://www.lucidworks.com/search-for-hadoop/-Open source this summer
10010
10010
10010 10010
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
LucidWorks SiLK
LucidWorks Search
JDBC Connector
Web/File System Crawl
Data Warehouse
Hadoop Connectors
Clickstream Networking
Data Sources
Connectors
Servers
10010
10010
10010 10010
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
Solr/Solr Cloud
Search Analytics—Data Ingestion & Visualization
Gateway(Reverse Proxy)
Solr Output Writer for
LogStash (Http)
Search Logs
Visualization Configurable Dashboards
Hadoop ConnectorGrokIngestMapperLogStash
10010
10010
10010 10010
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
LucidWorks Open Source
• Logstash for Solr: https://github.com/LucidWorks/solrlogmanager• Banana (Kibana for Solr): https://github.com/LucidWorks/banana
• Effortless AWS deployment and monitoring: http://www.github.com/lucidworks/solr-scale-tk
• Data Quality Toolkit: https://github.com/LucidWorks/data-quality
10010
10010
10010 10010
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
Demos
10010
10010
10010 10010
Confidential and Proprietary © Copyright 201312
Fly the friendly skies
http://www.ibm.com/developerworks/library/j-solr-lucene/index.html
10010
10010
10010 10010
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
Make $$$
• Leverage time series data and visualization using LucidWorks SiLK
• Monitor Social• Traditional Research
https://github.com/lucidworks/lws-financial-demo
10010
10010
10010 10010
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
Cure what ails you
10010
10010
10010 10010
Confidential and Proprietary © Copyright 201315
Space-Time Continuum
• Leverage Solr’s spatial capabilities to index non-spatial data, such as time ranges- Useful for Open Hours, Shifts, etc.
•Query using rectangle intersections- q = shift:"Intersects(0 19 23 365)”
https://people.apache.org/~hossman/spatial-for-non-spatial-meetup-20130117/
10010
10010
10010 10010
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
Signal Processing for Search and Discovery
• Signals power modern relevance– Clicks, conversions, sharing, history, signatures
• LucidWorks 5 makes it easy to capture and leverage signals– Recommendations, analytics, discovery
• Simplifies your data workflow• Simplify your operational footprint
10010
10010
10010 10010
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
Solr Powered Signal Processing
• Use Case: eCommerce
• Data: – Product catalog (~1.2m items)– Click data (~3.9M clicks)
10010
10010
10010 10010
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
Meta
• http://www.lucidworks.com– [email protected]– @gsingers
• Lucene/Solr Revolution – Washington DC, Nov 11-14– http://www.lucenerevolution.org