chicago solr meetup - june 10th: exploring hadoop with search

10
Exploring Hadoop with Search Pritesh Patel, Principal Architect Search and Big Data Analytics @ Avalon Consulting, LLC

Upload: lucidworks-archived

Post on 11-May-2015

231 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Chicago Solr Meetup - June 10th: Exploring Hadoop with Search

Exploring Hadoop with SearchPritesh Patel, Principal Architect Search and Big Data Analytics @ Avalon Consulting, LLC

Page 2: Chicago Solr Meetup - June 10th: Exploring Hadoop with Search

Hadoop Ecosystem

Page 3: Chicago Solr Meetup - June 10th: Exploring Hadoop with Search

Possible Integration Points

Page 4: Chicago Solr Meetup - June 10th: Exploring Hadoop with Search

Why Search + Big Data?What Hadoop is good at What Search is good at

Distributed File storage Free text retrieval

Store large data sets Index large data sets

Distributed Processing Textual Analysis

Filtering and Sorting

= Intelligence Discovery System of large textual data sets

Page 5: Chicago Solr Meetup - June 10th: Exploring Hadoop with Search

How we Integrated Search and Big Data Hbase Replication Facade

Take advantage of results of Analytical Pig and Hive jobs in Hadoop to make retrieval more intelligent

Done with inbuilt replication and it scales Fast access since in Memory Push architecture so its near real time CRUD

Store in HDFS and Search in LW/Solr Gives reference to source when integrated this way Hbase has a RestFul API to retrieve data given ID

that Solr would have after replication/indexing

Page 6: Chicago Solr Meetup - June 10th: Exploring Hadoop with Search

Our Demo Architecture

Diagram by Varun Rao @ Avalon Consulting, LLC

Page 7: Chicago Solr Meetup - June 10th: Exploring Hadoop with Search

A Use Case of this Architecture Monitor tweets with words “Hadoop”,

“Lucidworks”, and “Big Data” Automatically extract url’s mentioned when

talking about these terms In near real time visualize which urls seem to

be mentioned with these terms Discover urls that are becoming the most

popular when mentioned with the topics “Big Data”, “Lucidworks”, and “Hadoop” and those might be urls you want to read

Page 8: Chicago Solr Meetup - June 10th: Exploring Hadoop with Search

Demo Any one want to send a tweet? Just use

one or more of the words “Hadoop”, “Lucidworks”, “Big Data”

Add the any url to the tweet that you’d like to share. Try: www.avalonconsult.com or www.lucidworks.com

Page 9: Chicago Solr Meetup - June 10th: Exploring Hadoop with Search

So much potential You can apply this to so many things. Do intelligent entity extraction to

discover topics with UIMA integration of Solr

Do similar analysis of popular mentions and people of the topics of choice

Endless … Any questions?

Page 10: Chicago Solr Meetup - June 10th: Exploring Hadoop with Search

Team Client Implementation done by Kevin

Risden @ Avalon ([email protected])

Demo Architecture Team Varun Rao @ Avalon (

[email protected]) Pritesh Patel @ Avalon (

[email protected])