this ain't your parents' search engine

18
Confidential and Proprietary © Copyright 2013 Confidential and Proprietary © Copyright 2013 This Ain’t Your Parents’ Search Engine Grant Ingersoll CTO, LucidWorks Twitter: @gsingers

Upload: lucidworks

Post on 14-Jul-2015

154 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: This Ain't Your Parents' Search Engine

Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013

This Ain’t Your Parents’ Search Engine

Grant Ingersoll

CTO, LucidWorks

Twitter: @gsingers

Page 2: This Ain't Your Parents' Search Engine

Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013

Search is dead.

Page 3: This Ain't Your Parents' Search Engine

Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013

Long live search

Page 4: This Ain't Your Parents' Search Engine

Confidential and Proprietary © Copyright 2013

Search is good for…

• Traditional: Fast, fuzzy text matching across a large document collection

• De-normalized data- “light” relational

• Top N problems- Key-value (n=1)- Recommendations- “Good enough” classification,

clustering

• Faceting, aggregations, analytical slicing and dicing of data

• Spatial, record/event linkage, alertinghttp://cheezburger.com/5243950080

Page 5: This Ain't Your Parents' Search Engine

Confidential and Proprietary © Copyright 2013

Foundational Changes in Lucene/Solr 4

•Reduced Memory usage

•Pluggable Codecs/similarity

•FS(A|T)

•Doc Values (column oriented)

•Spatial upgrade

•New facets and functions

•Cursors (deep paging)

•Distributed capabilities

•Joins/Grouping

Page 6: This Ain't Your Parents' Search Engine

Confidential and Proprietary © Copyright 2013

Search + Hadoop

•What’s Old is New Again

•“Traditional” Use Cases:- Build/Store indexes

- https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS

•Enrichment and Signal processing- PageRank, Statistically Interesting Phrases, etc.

Page 7: This Ain't Your Parents' Search Engine

Confidential and Proprietary © Copyright 2013

LucidWorks + Hadoop

• Ingestion Help- Flexible Map-Reduce content ingestion supporting:

»Directory of files

»CSV, Writable, etc.

»LogStash

»Build Your Own

•Pig Load/Store and UDFs

•Hive 2-way support

•http://www.lucidworks.com/search-for-hadoop/- Open source this summer

Page 8: This Ain't Your Parents' Search Engine

Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013

LucidWorks SiLK

LucidWorks Search

JDBC Connector

Web/File System Crawl

Data Warehouse

Hadoop Connectors

Clickstream Networking

Data Sources

Connectors

Servers

Page 9: This Ain't Your Parents' Search Engine

Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013

Solr/Solr Cloud

Search Analytics—Data Ingestion & Visualization

Gateway(Reverse Proxy)

Solr Output Writer for

LogStash (Http)

Search Logs

Visualization Configurable Dashboards

Hadoop ConnectorGrokIngestMapperLogStash

Page 10: This Ain't Your Parents' Search Engine

Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013

LucidWorks Open Source

• Logstash for Solr: https://github.com/LucidWorks/solrlogmanager

• Banana (Kibana for Solr): https://github.com/LucidWorks/banana

• Effortless AWS deployment and monitoring: http://www.github.com/lucidworks/solr-scale-tk

• Data Quality Toolkit: https://github.com/LucidWorks/data-quality

Page 11: This Ain't Your Parents' Search Engine

Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013

Demos

Page 12: This Ain't Your Parents' Search Engine

Confidential and Proprietary © Copyright 201312

Fly the friendly skies

http://www.ibm.com/developerworks/library/j-solr-lucene/index.html

Page 13: This Ain't Your Parents' Search Engine

Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013

Make $$$

• Leverage time series data and visualization using LucidWorks SiLK

• Monitor Social

• Traditional Research

https://github.com/lucidworks/lws-financial-demo

Page 14: This Ain't Your Parents' Search Engine

Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013

Cure what ails you

Page 15: This Ain't Your Parents' Search Engine

Confidential and Proprietary © Copyright 201315

Space-Time Continuum

• Leverage Solr’s spatial capabilities to index non-spatial data, such as time ranges- Useful for Open Hours,

Shifts, etc.

•Query using rectangle intersections- q = shift:"Intersects(0 19

23 365)”

https://people.apache.org/~hossman/spatial-for-non-spatial-meetup-20130117/

Page 16: This Ain't Your Parents' Search Engine

Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013

Signal Processing for Search and Discovery

• Signals power modern relevance– Clicks, conversions, sharing, history, signatures

• LucidWorks 5 makes it easy to capture and leverage signals– Recommendations, analytics, discovery

• Simplifies your data workflow

• Simplify your operational footprint

Page 17: This Ain't Your Parents' Search Engine

Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013

Solr Powered Signal Processing

• Use Case: eCommerce

• Data:

– Product catalog (~1.2m items)

– Click data (~3.9M clicks)

Page 18: This Ain't Your Parents' Search Engine

Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013

Meta

• http://www.lucidworks.com– [email protected]– @gsingers

• Sales– Steve Drane (based here in Chicago)– [email protected]

• Lucene/Solr Revolution – Washington DC, Nov 11-14– http://www.lucenerevolution.org