mapr and lucidworks joint webinar 2012
DESCRIPTION
Slides from webinar given by Ted Dunning and LucidWorks Chief Scientist, Grant Ingersoll on how search technology can be abused to implement apparently intelligent systemsTRANSCRIPT
1©MapR Technologies - Confidential
Crowd Sourcing Reflected Intelligence Using Search and Big Data
Ted Dunning
Grant Ingersoll
2©MapR Technologies - Confidential
Grant’s Background
Co-founder:– LucidWorks – Chief Scientist– Apache Mahout
Long time Lucene/Solr committer Author: Taming Text Background in IR and NLP– Built CLIR, QA and a variety of other search-based apps
3©MapR Technologies - Confidential
Ted’s Background
Academia, Startups– Aptex, MusicMatch, ID Analytics, Veoh– Big data since before big
Open source– since the dark ages before the internet– Mahout, Zookeeper, Drill– bought the beer at first HUG
MapR– Chief Application Architect
Founding member of Apache Drill
4©MapR Technologies - Confidential
Agenda
Intro Search Evolution and Search Revolution Reflected Intelligence Use Cases Building a Next Generation Search and Discovery Platform– MapR– LucidWorks
1+1=3
5©MapR Technologies - Confidential
Search is Dead, Long Live Search
Search is a system building block– text is only a part of the story
If the algorithms fit,
use them!
Embrace fuzziness!
Scoring features are everywhere
Content
User Interaction
Access
Content Relationships
6©MapR Technologies - Confidential
Search (R)evolution
Search use leads to search abuse– denormalization frees your mind– scoring is just a sparse matrix multiply
Lucene/Solr evolution– non free text usages abound– many DB-like features– noSQL before NoSQL was cool– flexible indexing– finite State Transducers FTW!
Scale
“This ain’t your father’s relevance anymore”
7©MapR Technologies - Confidential
Add (Lots of) Water
Large-scale analysis is key to reflected intelligence– correlation analysis• based on queries, clicks, mouse tracks,
even explicit feedback• produce clusters, trends, topics, SIP’s
– start with engineered knowledge, refine with user feedback
Large-scale discovery features
encourage experimentation
Always test, always enrich!
Search
DiscoveryAnalytics
8©MapR Technologies - Confidential
Social Media Analysis in Telecom
Correlate mobile traffic analysis with social media analysis– events cause traffic micro-bursts– participants tweet the events ahead of time
Deploy operations faster to predict outages and better handle emergency situations– high cost bandwidth augmentation can be marshaled as the traffic appears– anticipation beats reaction
9©MapR Technologies - Confidential
Provenance is 80% of value
Analysis of social media to determine advertising reach and response
In one case the same untargeted advertising was worth 5x if sold with supporting data.
10©MapR Technologies - Confidential
Claims Analysis
Goal– Insurance claims processing and analysis– fraud analysis
Method– Combine free text search with metadata analysis to identify high risk
activities across the country– Integrate with corporate workflows to detect and fix outliers in customer
relations
Results– Questions that took 24-48 hours now take seconds to answer
11©MapR Technologies - Confidential
Virginia Tech - Help the World
Grab data around crisis Search immediately Large-scale analysis enriches data to find
ways to improve responses and understanding
http://www.ctrnet.net
12©MapR Technologies - Confidential
Bright Planet - Catch the Bad Guys
Online Drug Counterfeit detection Identify commonly used language indicating counterfeits– you know it when you see it– and you know you have seen it
Feed to analyst via search-driven application– enrich based on analysts feedback
13©MapR Technologies - Confidential
Veoh - Cross Recommendations
Cross recommendation as search– with search used to build cross recommendation!
Recommend content to people who exhibit certain behaviors (clicks, query terms, other)
(Ab)use of a search engine– but not as a search engine for content– more like a search engine for behavior
14©MapR Technologies - Confidential
What Platform Do You Need?
Fast, efficient, scalable search– bulk and near real-time indexing– handle billions of records with sub-second search and faceting
Large scale, cost effective storage and processing capabilities
NLP and machine learning tools that scale to enhance discovery and analysis
Integrated log analysis workflows that close the loop between the raw data and user interactions
15©MapR Technologies - Confidential
Shards
1 23 N
Search View
•Documents •Users •Logs
DocumentStore
Analytic Services
•View into numeric/historic data
•Classification•Recommendation
Personalization & Machine Learning
Services
Classification ModelsIn memoryReplicatedMulti-tenant
Discovery & EnrichmentClustering, classification, NLP, topic identification, search log analysis, user behavior
Content AcquisitionETL, batch or near real-time
Access APIs
Data• LucidWorks Search
connectors• Push
Reference Architecture
16©MapR Technologies - Confidential
MapR
MapR provides the technology leading Hadoop distribution– full eco-system distribution– integrated data platform– complete solution for data integrity
MapR clusters also provide tight integration with search technologies like LucidWorks– integration is key for effective ops
17©MapR Technologies - Confidential
LucidWorks
LucidWorks provides the leading packaging of Apache Lucene and Solr– build your own, we support– founded by the most prominent Lucene/Solr experts
LucidWorks Search– “Solr++”• UI, REST API, MapR connectors, relevance tools, much more
LucidWorks Big Data– Big Data as a Service– Integrated LucidWorks Search, Hadoop, machine learning with prebuilt
workflows for many of these tasks
18©MapR Technologies - Confidential
LucidWorks Big Data Architecture
Big Data Operating System
• Administration
• Provisioning
• Monitoring
• Configuration
• Service Management • Data Management
• Security
SystemManagement
Uniform ReST API
Search – Discovery – Analytics• LucidWorks Search• Machine Learning (classification, clustering,
recommendations)• Natural Language Processing• SQL (Hive) Interface• Data Workflows (ETL, log analysis, common metrics)• Extensible
• Enterprise Repository
• Social Media
• Databases
• HDFS
• Cloud (S3)
• Push
ContentAcquisition
Hadoop/HBase SearchIndexes
SearchLogs
19©MapR Technologies - Confidential
Easy Wins
Analyze logs from application stored in MapR Seamlessly store search indexes in MapR– and feed to Pig, Mahout and others– use mirrors + NFS to directly deploy indexes
Snapshots make backups a snap LucidWorks 2.5 (2013 Q1) easily connects with MapR
20©MapR Technologies - Confidential
1 + 1 = 3
21©MapR Technologies - Confidential
Learn More
More informationhttp://www.mapr.com/company/events/lucidworks-12-13-2012
Vote for this topic for Hadoop Summit EU:http://bit.ly/128tLQe
Talk to Ted@[email protected]
Talk to Grant@gsingers
MapR and Lucid Workshttp://www.mapr.comhttp://www.lucidworks.com