hadoop summit - interactive big data analysis with solr, spark and hue

33
INTERACTIVELY QUERY AND SEARCH YOUR BIG DATA Romain Rigaux

Upload: gethue

Post on 03-Aug-2015

553 views

Category:

Data & Analytics


3 download

TRANSCRIPT

Page 1: Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue

INTERACTIVELY QUERY AND SEARCH YOUR BIG DATA

Romain Rigaux

Page 2: Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue

GOALS

Build  a  Web  app  Quickly  explore  data  

…  with  Solr

make  Solr  /  Hadoop  easier  to  use

+

Page 3: Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue

ARCHITECTURE“Just  a  view”  on  top  of  the  standard  Solr  API

REST

Page 4: Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue

HISTORYV1 USER

Page 5: Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue

HISTORYV1 ADMIN

Page 6: Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue

ARCHITECTURENEXT!

Lot  of  learning,  UX  Boost  needed  

Simple,  don’t  know  it  is  Solr

Page 7: Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue

HISTORYV2 USER

Page 8: Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue

HISTORYV2 ADMIN

Page 9: Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue

HISTORYV2 BETTER UX

Page 10: Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue

ARCHITECTURE

/select  /admin/collections  /get  /luke...

/add_widget  /zoom_in  /select_facet  /select_range...

REST AJAXTemplates  

+  JS  Model

www….

Page 11: Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue

ARCHITECTUREUI FOR FACETS

Query

Collection

 Layout All  the  2D  positioning  (cell  ids),  visual,  drag&drop

Dashboard,  fields,  template,  widgets  (ids)

Search  terms,  selected  facets  (q,  fqs)

Page 12: Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue

ADDING A WIDGETLIFECYCLE

Load  the  initial  page  Edit  mode  and  Drag&Drop

/solr/zookeeper/clusterstate.json  /solr/admin/luke…

/get_collection

Page 13: Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue

ADDING A WIDGETLIFECYCLE

/solr/select?stats=true /new_facet

Select  the  field  Guess  ranges  (number  or  dates)  Rounding  (number  or  dates)

Page 14: Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue

ADDING A WIDGETLIFECYCLE

Query  part  1

Query  Part  2

Augment  Solr  response

facet.range={!ex=bytes}bytes&f.bytes.facet.range.start=0&f.bytes.facet.range.end=9000000&  f.bytes.facet.range.gap=900000&f.bytes.facet.mincount=0&f.bytes.facet.limit=10

q=Chrome&fq={!tag=bytes}bytes:[900000+TO+1800000]

{ 'facet_counts':{ 'facet_ranges':{ 'bytes':{ 'start':10000, 'counts':[ '900000', 3423, '1800000', 339,

... ] } }}

{ ..., 'normalized_facets':[ { 'extraSeries':[

], 'label':'bytes', 'field':'bytes', 'counts':[ { 'from’:'900000', 'to':'1800000', 'selected':True, 'value':3423, 'field’:'bytes', 'exclude':False } ], ... } }}

Page 15: Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue

JSON TO WIDGET{ "field":"rate_code","counts":[ { "count":97797, "exclude":true, "selected":false, "value":"1", "cat":"rate_code" } ...

{ "field":"medallion","counts":[ { "count":159, "exclude":true, "selected":false, "value":"6CA28FC49A4C49A9A96", "cat":"medallion" } ….

{ "extraSeries":[

],"label":"trip_time_in_secs","field":"trip_time_in_secs","counts":[ { "from":"0", "to":"10", "selected":false, "value":527, "field":"trip_time_in_secs", "exclude":true } ...

{ "field":"passenger_count","counts":[ { "count":74766, "exclude":true, "selected":false, "value":"1", "cat":"passenger_count" } ...

Page 16: Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue

REPEATUNTIL…

Page 17: Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue

GAME CHANGER!

Possibilihes

5.1  /  5.2

Analyhc  Facets

Page 18: Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue

FACETFUNCTIONS

Count  Sum  Avg  Percentile  Max  ...

Count(id)  Sum(bytes)  Avg(mul(price,  quantity))  Percentile(salary,  50,  90)  Max(temperature)  ...

Page 19: Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue

FACETFUNCTIONS

Page 20: Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue

SUB “NESTED”FACETS

top_os  {      type:  term,      field:  os,      limit:  5  }

top_os  {      type:  term,      field:  os,      limit:  5,      facet  :  {          by_country:  {              type:  term,              field:  country          }      }  }

Page 21: Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue

FUNCTION + NESTED =ANALYTICS states  {  

   type:  term,      field:  state,      facet  :  {        by_month  :  {              type:  range,              field:  time,              start:  “TODAY-­‐6MONTHS”,              end:  “TODAY”,              gap:  “MONTH”,              facet  :  {                    avg_sal:  “avg(salary)”              }          }      }  }

states  {      type:  term,      field:  state,      facet  :  {          avg_sal:  “avg(salary)”      }  }

Page 22: Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue

OPERATIONS ONBUCKETS OF DATA

Counts  →  Functions

Page 23: Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue

OPERATIONS ONBUCKETS OF DATA

Nested  →  nD  functions

Page 24: Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue

ENTERPRISEFEATURES

- Access  to  Search  App  configurable,  LDAP/SAML  auths  - Share  by  link  - Solr  Cloud  (or  non  Cloud)  - Proxy  user

    /solr/jobs_demo/select?user.name=hue&doAs=romain&q=  - Security

    Kerberos  - Sentry

    Collection  level,  Solr  calls  like  /admin,  /query,  Solr  UI,  ZooKeeper

Page 25: Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue

SEARCH AS ONLYAPP IN HUE

gethue.com/solr-­‐search-­‐ui-­‐only/

Page 26: Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue

• Spark  in  your  browser  

• Notebooks  

• New  REST  Server

SPARKINDEXING WHAT

Page 27: Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue

• Open  source  REST  for  Spark  Shell  

• Runs  locally  or  inside  YARN  

• Spark  Scala,  PySpark  and  jar/py  submission

SPARKINDEXING WHAT

hsps://github.com/cloudera/hue/tree/master/apps/spark/java

Page 28: Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue

SPARK STREAMING

Real  hme!                    Spark  Solr

Page 29: Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue

• Pytho  

• Scala  

• Charts

NOTEBOOKS / SHELL

WHAT

Page 30: Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue

DEMO TIME• Analyze  Bay  area  bike  share  

• Visualize  one  year  of  data  

• Know  your  users,  predict  behavior

Page 31: Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue

MISSEDSOMETHING?

demo.gethue.com

Page 32: Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue

• Full  Analyhcs  

• Easier  indexing  

• Geo  

• Export/Share  results  

• “More  like  this”  

• Solr  Joins,  Solr  SQL  

• Spark,  SQL...  integrahon,  Hue  4

WHAT’S NEXT

NEW FEATURES

Page 33: Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue

TWITTER

@gethue

USER GROUP

hue-­‐user@

WEBSITE

hsp://gethue.com

LEARN

hsp://learn.gethue.com

THANKS!