2013 dec 9 data marketing 2013 - hadoop

69
ELEPHANT AT THE DOOR: HADOOP AND NEXT GENERATION DATA Adam Muise – Solu/on Architect, Hortonworks

Upload: adam-muise

Post on 27-Jan-2015

107 views

Category:

Documents


3 download

DESCRIPTION

Data Marketing 2013 Presentation of Hadoop. The paradigm shift in 45 minutes or less. No, really.

TRANSCRIPT

Page 1: 2013 Dec 9 Data Marketing 2013 - Hadoop

ELEPHANT  AT  THE  DOOR:  HADOOP  AND  NEXT  GENERATION  DATA  

Adam  Muise  –  Solu/on  Architect,  Hortonworks  

Page 2: 2013 Dec 9 Data Marketing 2013 - Hadoop

Who  am  I?  

Page 3: 2013 Dec 9 Data Marketing 2013 - Hadoop

Who  is                                        ?  

Page 4: 2013 Dec 9 Data Marketing 2013 - Hadoop

We  do  Hadoop  

The  leaders  of  Hadoop’s  development  

Community  driven,    Enterprise  Focused  

Drive  Innova/on  in  the  plaForm  –  We  lead  the  roadmap    

100%  Open  Source  –  Democra/zed  Access  to  Data  

Page 5: 2013 Dec 9 Data Marketing 2013 - Hadoop

We  do  Hadoop  successfully.  

Support    

Professional  Services  Training  

Page 6: 2013 Dec 9 Data Marketing 2013 - Hadoop

We  do  Hadoop  successfully  everywhere.  

Page 7: 2013 Dec 9 Data Marketing 2013 - Hadoop

We  do  Hadoop  successfully,  everywhere,  with  partners.  

Page 8: 2013 Dec 9 Data Marketing 2013 - Hadoop

What  is  Hadoop?    What  is  everyone  talking  about?  

Page 9: 2013 Dec 9 Data Marketing 2013 - Hadoop

Data  

Page 10: 2013 Dec 9 Data Marketing 2013 - Hadoop

“Big  Data”  is  the  marke/ng  term  of  the  decade  in  IT  

Page 11: 2013 Dec 9 Data Marketing 2013 - Hadoop

What  lurks  behind  the  hype  is  the  democra/za/on  of  Data.  

Page 12: 2013 Dec 9 Data Marketing 2013 - Hadoop

You  need  data.    

Page 13: 2013 Dec 9 Data Marketing 2013 - Hadoop

But  what  do  you  do  with  your  data  now?  

Page 14: 2013 Dec 9 Data Marketing 2013 - Hadoop

We  are  obsessive  compulsive  about  collec/ng  and  structuring  

our  data.  

Page 15: 2013 Dec 9 Data Marketing 2013 - Hadoop

Put  it  away,  delete  it,  tweet  it,  compress  it,  shred  it,  wikileak-­‐it,  put  it  in  a  database,  put  it  in  SAN/NAS,  put  it  in  the  cloud,  hide  it  in  tape…  

Page 16: 2013 Dec 9 Data Marketing 2013 - Hadoop

You  need  data.  Your  customers  expect  you  to  know  what  they  want  

before  they  do.    

Page 17: 2013 Dec 9 Data Marketing 2013 - Hadoop

Let’s  talk  challenges…  

Page 18: 2013 Dec 9 Data Marketing 2013 - Hadoop

Volume  

Volume  

Volume  

Volume  

Page 19: 2013 Dec 9 Data Marketing 2013 - Hadoop

Volume  

Volume  

Volume  

Volume  

Volume  Volume  

Volume  

Volume  Volume   Volume  

Volume  

Volume  Volume  

Volume  Volume  

Volume  Volume  

Page 20: 2013 Dec 9 Data Marketing 2013 - Hadoop

Volume  

Volume  

Volume  

Volume  

Volume  Volume  

Volume  

Volume  Volume   Volume  

Volume  

Volume  Volume  

Volume  Volume  

Volume  Volume   Volume  

Volume  

Volume  Volume  

Volume  Volume  

Volume  Volume  

Volume  Volume  

Volume  Volume  

Volume  Volume  

Volume  Volume  

Volume  Volume  

Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  

Volume  

Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  

Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  

Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  

Volume  Volume  

Volume  Volume  

Volume  

Volume  

Page 21: 2013 Dec 9 Data Marketing 2013 - Hadoop

Volume  

Volume  

Volume  

Volume  

Volume  Volume  

Volume  

Volume  Volume   Volume  

Volume  

Volume  Volume  

Volume  Volume  

Volume  Volume  

Volume  Volume  

Volume  Volume  

Volume  Volume  

Volume  Volume  

Volume  Volume  

Volume  Volume  

Volume  Volume  

Volume  Volume  

Volume  Volume  

Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  

Volume  

Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  

Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  

Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  

Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  

Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  

Volume  Volume  

Volume  Volume  

Volume  

Volume  Volume  

Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  

Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  

Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  

Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  

Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  

Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  

Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  

Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  

Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  

Volume  Volume  

Volume  Volume  

Volume  

Volume  

Page 22: 2013 Dec 9 Data Marketing 2013 - Hadoop

Storage,  Management,  Processing  all  become  challenges  with  Data  at  

Volume  

Page 23: 2013 Dec 9 Data Marketing 2013 - Hadoop

Tradi/onal  technologies  adopt  a  divide,  drop,  and  conquer  approach  

Page 24: 2013 Dec 9 Data Marketing 2013 - Hadoop

The  solu/on?  EDW  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Yet  Another  EDW  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Analy/cal  DB  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data   OLTP  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Another  EDW  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Page 25: 2013 Dec 9 Data Marketing 2013 - Hadoop

Ummm…you  dropped  something  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  Data  Data  Data  

Data  Data  Data  

Data   Data  Data  Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  Data  Data  

Data  

Data  Data  Data  

Data   Data  Data  

EDW  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Yet  Another  EDW  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Analy/cal  DB  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

OLTP  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Another  EDW  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Page 26: 2013 Dec 9 Data Marketing 2013 - Hadoop

Analyzing  the  data  usually  raises  more  interes/ng  ques/ons…  

Page 27: 2013 Dec 9 Data Marketing 2013 - Hadoop

…which  leads  to  more  data  

Page 28: 2013 Dec 9 Data Marketing 2013 - Hadoop

Wait,  you’ve  seen  this  before.  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  Data  Data  

Data  

Data  Data  Data  

Data   Data  Data  Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Analy/cs  Sausage  Factory  

Data   Data  Data  

Data  Data  Data  

Data   Data  Data   …  Data  

Data  Data  …  

Data  Data  

Data  

Data  

Page 29: 2013 Dec 9 Data Marketing 2013 - Hadoop

Data  begets  Data.  

Page 30: 2013 Dec 9 Data Marketing 2013 - Hadoop

What  keeps  us  from  our  Data?  

Page 31: 2013 Dec 9 Data Marketing 2013 - Hadoop

“Prices,  Stupid  passwords,  and  Boring  Sta/s/cs.”    -­‐  Hans  Rosling  

h)p://www.youtube.com/watch?v=hVimVzgtD6w  

Page 32: 2013 Dec 9 Data Marketing 2013 - Hadoop

Your  data  silos  are  lonely  places.  

EDW  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Accounts  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Customers  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Web  Proper/es  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Page 33: 2013 Dec 9 Data Marketing 2013 - Hadoop

…  Data  likes  to  be  together.  

EDW  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Accounts  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Customers  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Web  Proper/es  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Page 34: 2013 Dec 9 Data Marketing 2013 - Hadoop

Data  likes  to  socialize  too.  EDW  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Accounts  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Customers  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Web  Proper/es  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Machine  Data  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Twi^er  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Facebook  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

CDR  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Weather  Data  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Page 35: 2013 Dec 9 Data Marketing 2013 - Hadoop

New  types  of  data  don’t  quite  fit  into  your  pris/ne  view  of  the  world.  

My  Li^le  Data  Empire  

Data  Data  Data  

Data  

Data  Data  

Data   Data  Data  

Logs  

Data  Data  Data  Data  

Data  

Data  Data  

Machine  Data  

Data  Data  Data  Data  

Data  

Data  Data  

?  ?  

?  ?  

Page 36: 2013 Dec 9 Data Marketing 2013 - Hadoop

To  resolve  this,  some  people  take  hints  from  Lord  Of  The  Rings...  

Page 37: 2013 Dec 9 Data Marketing 2013 - Hadoop

…and  create  One-­‐Schema-­‐To-­‐Rule-­‐Them-­‐All…  

EDW  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  Schema  

Page 38: 2013 Dec 9 Data Marketing 2013 - Hadoop

…but  that  has  its  problems  too.  

EDW  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  Schema  Data  

Data  Data  

ETL   ETL  

ETL   ETL  

EDW  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  Schema  Data  

Data  Data  

ETL   ETL  

ETL   ETL  

Page 39: 2013 Dec 9 Data Marketing 2013 - Hadoop

Fragile  workflows  make  suppor/ng  the  analy/cal  models  you  want  expensive  and  

/me-­‐consuming.  

EDW  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  Schema  Data  

Data  Data  

ETL   ETL  

ETL   ETL  

EDW  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  Schema  Data  

Data  Data  

ETL   ETL  

ETL   ETL  

Page 40: 2013 Dec 9 Data Marketing 2013 - Hadoop

What  do  you  want  to  do  with  data?  

Page 41: 2013 Dec 9 Data Marketing 2013 - Hadoop

Marke/ng  Analy/cs  needs  data.  Work  with  the  popula/on,  not  just  a  

sample.  

Page 42: 2013 Dec 9 Data Marketing 2013 - Hadoop

Your  segmenta/on  today.  

Male  

Female  

Age:  25-­‐30  

Town/City  

Middle  Income  Band  

Product  Category  Preferences  

Page 43: 2013 Dec 9 Data Marketing 2013 - Hadoop

Your  segmenta/on  with  be^er  data.  

Male  

Female  

Age:  27  but  feels  old  

GPS  coordinates  

$65-­‐68k  per  year  

Product  recommenda/ons  

Tea  Party  Hippie  

Looking  to  start  a  business    

Walking  into  Starbucks  right  now…  

A  depressed  Toronto  Maple  Leaf’s  Fan  

Products  lek  in  basket  indicate  drunk  amazon  shopper  

Gene  Expression  for  Risk  Taker  

Thinking  about  a  new  house  

Unhappy  with  his  cell  phone  plan  

Pregnant  

Spent  25  minutes  looking  at  tea  cozies  

Page 44: 2013 Dec 9 Data Marketing 2013 - Hadoop

Pick  up  all  of  that  data  that  was  prohibi/vely  expensive  to  store  and  

use.      

Page 45: 2013 Dec 9 Data Marketing 2013 - Hadoop

Why  do  viewer  surveys…  

Page 46: 2013 Dec 9 Data Marketing 2013 - Hadoop

…when  raw  data  can  tell  you  what  bu^on  on  the  remote  was  pressed  during  what  commercial  for  the  

en/re  viewer  popula/on?  

Page 47: 2013 Dec 9 Data Marketing 2013 - Hadoop

To  approach  these  use  cases  you  need  an  affordable  plaForm  that  stores,  processes,  and  analyzes  the  

data.    

Page 48: 2013 Dec 9 Data Marketing 2013 - Hadoop

So  what  is  the  answer?  

Page 49: 2013 Dec 9 Data Marketing 2013 - Hadoop

Enter  the  Hadoop.  

h^p://www.fabulouslybroke.com/2011/05/ninja-­‐elephants-­‐and-­‐other-­‐awesome-­‐stories/  

………  

Page 50: 2013 Dec 9 Data Marketing 2013 - Hadoop

Hadoop  was  created  because  tradi/onal  technologies  never  cut  it  

for  the  Internet  proper/es  like  Google,  Yahoo,  Facebook,  Twi^er,  

and  LinkedIn  

Page 51: 2013 Dec 9 Data Marketing 2013 - Hadoop

Tradi/onal  architecture  didn’t  scale  enough…  

DB   DB  DB  

SAN  

App  App   App  App  

DB   DB  DB  

SAN  

App  App   App  App   DB   DB  DB  

SAN  

App  App   App  App  

Page 52: 2013 Dec 9 Data Marketing 2013 - Hadoop

Databases  can  become  bloated  and  useless  

Page 53: 2013 Dec 9 Data Marketing 2013 - Hadoop

Tradi/onal  architectures  cost  too  much  at  that  volume…  

$/TB  

$pecial  Hardware  

$upercompu/ng  

Page 54: 2013 Dec 9 Data Marketing 2013 - Hadoop

So  what  is  the  answer?  

Page 55: 2013 Dec 9 Data Marketing 2013 - Hadoop

If  you  could  design  a  system  that  would  handle  this,  what  would  it  

look  like?  

Page 56: 2013 Dec 9 Data Marketing 2013 - Hadoop

It  would  probably  need  a  highly  resilient,  self-­‐healing,  cost-­‐efficient,  

distributed  file  system…  

Storage   Storage   Storage  

Storage   Storage   Storage  

Storage   Storage   Storage  

Page 57: 2013 Dec 9 Data Marketing 2013 - Hadoop

It  would  probably  need  a  completely  parallel  processing  framework  that  

took  tasks  to  the  data…  

Storage   Storage   Storage  

Storage   Storage   Storage  

Storage   Storage   Storage  Processing   Processing  Processing  

Processing   Processing  Processing  

Processing   Processing  Processing  

Page 58: 2013 Dec 9 Data Marketing 2013 - Hadoop

It  would  probably  run  on  commodity  hardware,  virtualized  machines,  and  

common  OS  plaForms  

Storage   Storage   Storage  

Storage   Storage   Storage  

Storage   Storage   Storage  Processing   Processing  Processing  

Processing   Processing  Processing  

Processing   Processing  Processing  

Page 59: 2013 Dec 9 Data Marketing 2013 - Hadoop

It  would  probably  be  open  source  so  innova/on  could  happen  as  quickly  

as  possible  

Page 60: 2013 Dec 9 Data Marketing 2013 - Hadoop

It  would  need  a  cri/cal  mass  of  users  

Page 61: 2013 Dec 9 Data Marketing 2013 - Hadoop

Hadoop  2  just  hit  the  ground:  Introducing  YARN  

Page 62: 2013 Dec 9 Data Marketing 2013 - Hadoop

YARN  lets  you  run  more  data  apps  than  ever  before  

HDFS2  

MapReduce  V2  

YARN  MapReduce  V?   STORM  

MPI  Giraph  HBase  Tez   …  and  

more  

Page 63: 2013 Dec 9 Data Marketing 2013 - Hadoop

YARN  turns  Hadoop  into  a  smart  phone:  An  App  Ecosystem  

hortonworks.com/yarn/  

Page 64: 2013 Dec 9 Data Marketing 2013 - Hadoop

YARN:    Yeah,  we  did  that  too.  

hortonworks.com/yarn/  

Page 65: 2013 Dec 9 Data Marketing 2013 - Hadoop

Apache  Hadoop  

Flume  Ambari  

HBase  Falcon  

MapReduce  HDFS  

Sqoop  HCatalog  

Pig  

Hive  

Storm  YARN  

Page 66: 2013 Dec 9 Data Marketing 2013 - Hadoop

Hortonworks  Data  PlaForm  

Flume  Ambari  

HBase  Falcon  

MapReduce  HDFS  

Sqoop  HCatalog  

Pig  

Hive  

Storm   YARN  

Page 67: 2013 Dec 9 Data Marketing 2013 - Hadoop

What  else  are  we  working  on?  

hortonworks.com/labs/  

Page 68: 2013 Dec 9 Data Marketing 2013 - Hadoop

Hadoop  is  the  new  Data  Opera/ng  System  for  the  Enterprise  

Page 69: 2013 Dec 9 Data Marketing 2013 - Hadoop

© Hortonworks Inc. 2012: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Page  69  

There is NO second place

Hortonworks  …the  Bull  Elephant  of  Hadoop  InnovaDon