this ain't your parent's search engine

18
10010 10010 10010 10010 Confidential and Proprietary © Copyright 2013 Confidential and Proprietary © Copyright 2013 This ain’t your Parent’s Search Engine Grant Ingersoll CTO, LucidWorks Twitter: @gsingers

Upload: gsingers

Post on 27-Aug-2014

460 views

Category:

Software


0 download

DESCRIPTION

In just a few short years, search has quickly evolved from being a small text box in the nether regions of a website to being front and center in our lives. Increasingly, however, search engine technology is also being used for practical, real time recommendations, events processing, complex spatial functionality and time series analysis capable of not only matching user's queries in text, but also driving real time decision making and analytics. In fact, open source Apache Lucene/Solr can do all of this and more by taking advantage of new data structures and algorithms that complement more traditional IR approaches. In this demo-driven talk, Lucene committer Grant Ingersoll will take a look at some of the new and exciting ways users are leveraging Lucene/Solr and related technology to drive deeper insight into information needs that go beyond keywords in a text box.

TRANSCRIPT

Page 1: This Ain't Your Parent's Search Engine

10010

10010

10010 10010

Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013

This ain’t your Parent’s Search Engine

Grant IngersollCTO, LucidWorks

Twitter: @gsingers

Page 2: This Ain't Your Parent's Search Engine

10010

10010

10010 10010

Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013

Search is dead.

Page 3: This Ain't Your Parent's Search Engine

10010

10010

10010 10010

Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013

Long live search

Page 4: This Ain't Your Parent's Search Engine

10010

10010

10010 10010

Confidential and Proprietary © Copyright 2013

Search is good for…

• Traditional: Fast, fuzzy text matching across a large document collection

• De-normalized data- “light” relational

• Top N problems- Key-value (n=1)- Recommendations- “Good enough” classification, clustering

• Faceting, aggregations, analytical slicing and dicing of data

• Spatial, record/event linkage, alerting

http://cheezburger.com/5243950080

Page 5: This Ain't Your Parent's Search Engine

10010

10010

10010 10010

Confidential and Proprietary © Copyright 2013

Foundational Changes in Lucene/Solr 4

•Reduced Memory usage•Pluggable Codecs/similarity•FS(A|T)•Doc Values (column oriented)•Spatial upgrade•New facets and functions•Cursors (deep paging)•Distributed capabilities•Joins/Grouping

Page 6: This Ain't Your Parent's Search Engine

10010

10010

10010 10010

Confidential and Proprietary © Copyright 2013

Search + Hadoop

•What’s Old is New Again

•“Traditional” Use Cases:-Build/Store indexes-https://cwiki.apache.org/confluence/display/

solr/Running+Solr+on+HDFS

•Enrichment and Signal processing-PageRank, Statistically Interesting Phrases, etc.

Page 7: This Ain't Your Parent's Search Engine

10010

10010

10010 10010

Confidential and Proprietary © Copyright 2013

LucidWorks + Hadoop

• Ingestion Help- Flexible Map-Reduce content ingestion supporting:»Directory of files»CSV, Writable, etc.»LogStash»Build Your Own

•Pig Load/Store and UDFs•Hive 2-way support•http://www.lucidworks.com/search-for-hadoop/-Open source this summer

Page 8: This Ain't Your Parent's Search Engine

10010

10010

10010 10010

Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013

LucidWorks SiLK

LucidWorks Search

JDBC Connector

Web/File System Crawl

Data Warehouse

Hadoop Connectors

Clickstream Networking

Data Sources

Connectors

Servers

Page 9: This Ain't Your Parent's Search Engine

10010

10010

10010 10010

Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013

Solr/Solr Cloud

Search Analytics—Data Ingestion & Visualization

Gateway(Reverse Proxy)

Solr Output Writer for

LogStash (Http)

Search Logs

Visualization Configurable Dashboards

Hadoop ConnectorGrokIngestMapperLogStash

Page 10: This Ain't Your Parent's Search Engine

10010

10010

10010 10010

Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013

LucidWorks Open Source

• Logstash for Solr: https://github.com/LucidWorks/solrlogmanager• Banana (Kibana for Solr): https://github.com/LucidWorks/banana

• Effortless AWS deployment and monitoring: http://www.github.com/lucidworks/solr-scale-tk

• Data Quality Toolkit: https://github.com/LucidWorks/data-quality

Page 11: This Ain't Your Parent's Search Engine

10010

10010

10010 10010

Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013

Demos

Page 12: This Ain't Your Parent's Search Engine

10010

10010

10010 10010

Confidential and Proprietary © Copyright 201312

Fly the friendly skies

http://www.ibm.com/developerworks/library/j-solr-lucene/index.html

Page 13: This Ain't Your Parent's Search Engine

10010

10010

10010 10010

Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013

Make $$$

• Leverage time series data and visualization using LucidWorks SiLK

• Monitor Social• Traditional Research

https://github.com/lucidworks/lws-financial-demo

Page 14: This Ain't Your Parent's Search Engine

10010

10010

10010 10010

Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013

Cure what ails you

Page 15: This Ain't Your Parent's Search Engine

10010

10010

10010 10010

Confidential and Proprietary © Copyright 201315

Space-Time Continuum

• Leverage Solr’s spatial capabilities to index non-spatial data, such as time ranges- Useful for Open Hours, Shifts, etc.

•Query using rectangle intersections- q = shift:"Intersects(0 19 23 365)”

https://people.apache.org/~hossman/spatial-for-non-spatial-meetup-20130117/

Page 16: This Ain't Your Parent's Search Engine

10010

10010

10010 10010

Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013

Signal Processing for Search and Discovery

• Signals power modern relevance– Clicks, conversions, sharing, history, signatures

• LucidWorks 5 makes it easy to capture and leverage signals– Recommendations, analytics, discovery

• Simplifies your data workflow• Simplify your operational footprint

Page 17: This Ain't Your Parent's Search Engine

10010

10010

10010 10010

Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013

Solr Powered Signal Processing

• Use Case: eCommerce

• Data: – Product catalog (~1.2m items)– Click data (~3.9M clicks)

Page 18: This Ain't Your Parent's Search Engine

10010

10010

10010 10010

Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013

Meta

• http://www.lucidworks.com– [email protected]– @gsingers

• Lucene/Solr Revolution – Washington DC, Nov 11-14– http://www.lucenerevolution.org