what is hadoop? oct 17 2013

67
WELCOME TO HADOOP Adam Muise – Hortonworks

Upload: adam-muise

Post on 27-Jan-2015

106 views

Category:

Technology


2 download

DESCRIPTION

What is Hadoop brief intro for Georgian Partners CTO Conference. This outlines the origins of Open Source Apache Hadoop and how Hortonworks fits into this picture. There is also a brief introduction to YARN, the new resource negotiation layer.

TRANSCRIPT

Page 1: What is Hadoop? Oct 17 2013

WELCOME  TO  HADOOP  Adam  Muise  –  Hortonworks  

Page 2: What is Hadoop? Oct 17 2013

Who  am  I?  

Page 3: What is Hadoop? Oct 17 2013

Why  are  we  here?  

Page 4: What is Hadoop? Oct 17 2013

Data  

Page 5: What is Hadoop? Oct 17 2013

“Big  Data”  is  the  marke=ng  term  of  the  decade  

Page 6: What is Hadoop? Oct 17 2013

What  lurks  behind  the  marke=ng  and  hype  is  a  legi=mate  movement  

forward  in  dealing  with  data  

Page 7: What is Hadoop? Oct 17 2013

You  need  to  deal  with  Data  

Page 8: What is Hadoop? Oct 17 2013

Put  it  away,  delete  it,  tweet  it,  compress  it,  shred  it,  wikileak-­‐it,  put  it  in  a  database,  put  it  in  SAN/NAS,  put  in  the  cloud,  hide  it  in  tape…  

Page 9: What is Hadoop? Oct 17 2013

Let’s  talk  challenges…  

Page 10: What is Hadoop? Oct 17 2013

Volume  

Volume  

Volume  

Volume  

Page 11: What is Hadoop? Oct 17 2013

Volume  

Volume  

Volume  

Volume  

Volume  Volume  

Volume  

Volume  Volume   Volume  

Volume  

Volume  Volume  

Volume  Volume  

Volume  Volume  

Page 12: What is Hadoop? Oct 17 2013

Volume  

Volume  

Volume  

Volume  

Volume  Volume  

Volume  

Volume  Volume   Volume  

Volume  

Volume  Volume  

Volume  Volume  

Volume  Volume   Volume  

Volume  

Volume  Volume  

Volume  Volume  

Volume  Volume  

Volume  Volume  

Volume  Volume  

Volume  Volume  

Volume  Volume  

Volume  Volume  

Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  

Volume  

Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  

Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  

Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  

Volume  Volume  

Volume  Volume  

Volume  

Volume  

Page 13: What is Hadoop? Oct 17 2013

Volume  

Volume  

Volume  

Volume  

Volume  Volume  

Volume  

Volume  Volume   Volume  

Volume  

Volume  Volume  

Volume  Volume  

Volume  Volume  

Volume  Volume  

Volume  Volume  

Volume  Volume  

Volume  Volume  

Volume  Volume  

Volume  Volume  

Volume  Volume  

Volume  Volume  

Volume  Volume  

Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  

Volume  

Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  

Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  

Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  

Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  

Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  

Volume  Volume  

Volume  Volume  

Volume  

Volume  Volume  

Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  

Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  

Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  

Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  

Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  

Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  

Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  

Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  

Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  

Volume  Volume  

Volume  Volume  

Volume  

Volume  

Page 14: What is Hadoop? Oct 17 2013

Storage,  Management,  Processing  all  become  challenges  with  Data  at  

Volume  

Page 15: What is Hadoop? Oct 17 2013

Tradi=onal  technologies  adopt  a  divide,  drop,  and  conquer  approach  

Page 16: What is Hadoop? Oct 17 2013

The  solu=on?  EDW  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Yet  Another  EDW  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Analy=cal  DB  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data   OLTP  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Another  EDW  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Page 17: What is Hadoop? Oct 17 2013

Ummm…you  dropped  something  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  Data  Data  Data  

Data  Data  Data  

Data   Data  Data  Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  Data  Data  

Data  

Data  Data  Data  

Data   Data  Data  

EDW  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Yet  Another  EDW  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Analy=cal  DB  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

OLTP  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Another  EDW  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Page 18: What is Hadoop? Oct 17 2013

Analyzing  the  data  usually  raises  more  interes=ng  ques=ons…  

Page 19: What is Hadoop? Oct 17 2013

…which  leads  to  more  data  

Page 20: What is Hadoop? Oct 17 2013

Wait,  you’ve  seen  this  before.  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  Data  Data  

Data  

Data  Data  Data  

Data   Data  Data  Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Sausage  Factory  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data   …  Data  

Data  Data   …  

Page 21: What is Hadoop? Oct 17 2013

Your  data  silos  are  lonely  places.  

EDW  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Accounts  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Customers  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Web  Proper=es  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Page 22: What is Hadoop? Oct 17 2013

…  Data  likes  to  be  together.  

EDW  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Accounts  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Customers  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Web  Proper=es  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Page 23: What is Hadoop? Oct 17 2013

New  types  of  data  don’t  quite  fit  your  pris=ne  view  of  the  world  

My  LiYle  Data  Empire  

Data  Data  Data  

Data  

Data  Data  

Data   Data  Data  

Logs  

Data  Data  Data  Data  

Data  

Data  Data  

CDR/SIP  

Data  Data  Data  Data  

Data  

Data  Data  

?  ?  

?  ?  

Page 24: What is Hadoop? Oct 17 2013

To  resolve  this,  some  people  take  hints  from  Lord  Of  The  Rings..  

Page 25: What is Hadoop? Oct 17 2013

…and  create  One-­‐Schema-­‐To-­‐Rule-­‐Them-­‐All…  

EDW  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  Schema  

Page 26: What is Hadoop? Oct 17 2013

…but  that  has  its  problems  too.  

EDW  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  Schema  Data  

Data  Data  

ETL   ETL  

ETL   ETL  

EDW  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  Schema  Data  

Data  Data  

ETL   ETL  

ETL   ETL  

Page 27: What is Hadoop? Oct 17 2013

So  what  is  the  answer?  

Page 28: What is Hadoop? Oct 17 2013

Enter  the  Hadoop.  

hYp://www.fabulouslybroke.com/2011/05/ninja-­‐elephants-­‐and-­‐other-­‐awesome-­‐stories/  

………  

Page 29: What is Hadoop? Oct 17 2013

Hadoop  was  created  because  Big  IT  never  cut  it  for  the  Internet  

Proper=es  like  Google,  Yahoo,  Facebook,  TwiYer,  LinkedIn  

Page 30: What is Hadoop? Oct 17 2013

Tradi=onal  architecture  didn’t  scale  enough…  

DB   DB  DB  

SAN  

App  App   App  App  

DB   DB  DB  

SAN  

App  App   App  App   DB   DB  DB  

SAN  

App  App   App  App  

Page 31: What is Hadoop? Oct 17 2013

Tradi=onal  architectures  cost  too  much  at  that  volume…  

$/TB  

$pecial  Hardware  

$upercompu=ng  

Page 32: What is Hadoop? Oct 17 2013

So  what  is  the  answer?  

Page 33: What is Hadoop? Oct 17 2013

If  you  could  design  a  system  that  would  handle  this,  what  would  it  

look  like?  

Page 34: What is Hadoop? Oct 17 2013

It  would  probably  need  a  highly  resilient,  self-­‐healing,  cost-­‐efficient,  

distributed  file  system…  

Storage   Storage   Storage  

Storage   Storage   Storage  

Storage   Storage   Storage  

Page 35: What is Hadoop? Oct 17 2013

It  would  probably  need  a  completely  parallel  processing  framework  that  

took  tasks  to  the  data…  

Storage   Storage   Storage  

Storage   Storage   Storage  

Storage   Storage   Storage  Processing   Processing  Processing  

Processing   Processing  Processing  

Processing   Processing  Processing  

Page 36: What is Hadoop? Oct 17 2013

It  would  probably  run  on  commodity  hardware,  virtualized  machines,  and  

common  OS  pladorms  

Storage   Storage   Storage  

Storage   Storage   Storage  

Storage   Storage   Storage  Processing   Processing  Processing  

Processing   Processing  Processing  

Processing   Processing  Processing  

Page 37: What is Hadoop? Oct 17 2013

It  would  probably  be  open  source  so  innova=on  could  happen  as  quickly  

as  possible  

Page 38: What is Hadoop? Oct 17 2013

It  would  need  a  cri=cal  mass  of  users  

Page 39: What is Hadoop? Oct 17 2013

{Processing  +  Storage}  =  

{MapReduce/YARN+  HDFS}  

Page 40: What is Hadoop? Oct 17 2013

HDFS  stores  data  in  blocks  and  replicates  those  blocks  

Storage   Storage   Storage  

Storage   Storage   Storage  

Storage   Storage   Storage  Processing   Processing  Processing  

Processing   Processing  Processing  

Processing   Processing  Processing  block3   block3  

block3  

block2   block2  

block2  

block1  

block1  

block1  

Page 41: What is Hadoop? Oct 17 2013

If  a  block  fails  then  HDFS  always  has  the  other  copies  and  heals  itself  

Storage   Storage   Storage  

Storage   Storage   Storage  

Storage   Storage   Storage  Processing   Processing  Processing  

Processing   Processing  Processing  

Processing   Processing  Processing  block3  

block3  

block3  

block2   block2  

block2  

block1  

block1  

block1  

X

Page 42: What is Hadoop? Oct 17 2013

MapReduce  is  a  programming  paradigm  that  completely  parallel  

Mapper  

Mapper  

Mapper  

Mapper  

Mapper  

Reducer  

Reducer  

Reducer  

Data  

Data  Data  

Data  

Data  Data  

Data  

Data  Data  

Data  

Data  Data  

Data  

Data  Data  

Data  

Data  Data  

Data  

Data  Data  

Data  

Data  Data  

Page 43: What is Hadoop? Oct 17 2013

MapReduce  has  three  phases:  Map,  Sort/Shuffle,  Reduce  

Mapper  

Mapper  

Mapper  

Mapper  

Mapper  

Reducer  

Reducer  

Reducer  

Key,  Value  Key,  Value  

Key,  Value  

Key,  Value  Key,  Value  

Key,  Value  

Key,  Value  Key,  Value  

Key,  Value  

Key,  Value  Key,  Value  

Key,  Value  

Key,  Value  Key,  Value  

Key,  Value  

Key,  Value  Key,  Value  

Key,  Value  Key,  Value  

Key,  Value  

Key,  Value  

Key,  Value  Key,  Value  

Key,  Value  

Key,  Value  Key,  Value  

Key,  Value  

Key,  Value  Key,  Value  

Key,  Value  

Key,  Value  Key,  Value  

Key,  Value  

Page 44: What is Hadoop? Oct 17 2013

MapReduce  applies  to  a  lot  of  data  processing  problems  

Mapper  

Mapper  

Mapper  

Mapper  

Mapper  

Reducer  

Reducer  

Reducer  

Data  

Data  Data  

Data  

Data  Data  

Data  

Data  Data  

Data  

Data  Data  

Data  

Data  Data  

Data  

Data  Data  

Data  

Data  Data  

Data  

Data  Data  

Page 45: What is Hadoop? Oct 17 2013

Introducing  YARN  

Page 46: What is Hadoop? Oct 17 2013

YARN  =  Yet  Another  Resource  Nego=ator  

Page 47: What is Hadoop? Oct 17 2013

YARN  abstracts  resource  management  so  you  can  run  more  

than  just  MapReduce  

HDFS2  

MapReduce  V2  

YARN  MapReduce  V?   STORM  

MPI  Giraph  HBase  Tez   …  and  

more  

Page 48: What is Hadoop? Oct 17 2013

YARN  turns  Hadoop  into  a  smart  phone:  An  App  Ecosystem  

hortonworks.com/yarn/  

Page 49: What is Hadoop? Oct 17 2013

Check  out  the  book  too…  

Preview  at:  hortonworks.com/yarn/  

Page 50: What is Hadoop? Oct 17 2013

YARN  is  an  essen=al  part  of  a  balanced  breakfast  in  Hadoop  2.0  

Oct  15  2013:  Apache  Community  releases  Hadoop  2.2.0  

Halloween  2013:  Hortonworks  releases  HDP  2.0  GA  

Page 51: What is Hadoop? Oct 17 2013

pict  

Page 52: What is Hadoop? Oct 17 2013

Hadoop  has  other  open  source  projects…  

Page 53: What is Hadoop? Oct 17 2013

Hive  =  {SQL  -­‐>  MapReduce}  SQL-­‐IN-­‐HADOOP  

Page 54: What is Hadoop? Oct 17 2013

Pig  =  {PigLa=n  -­‐>  MapReduce}  

Page 55: What is Hadoop? Oct 17 2013

HCatalog  =  {metadata*  for  MapReduce,  Hive,  Pig,  Hbase,  etc}  *metadata  =  tables,  columns,  par==ons,  types  

Page 56: What is Hadoop? Oct 17 2013

Oozie  =  Job::{Task,  Task,  if  Task,  then  Task,  final  Task}  

Page 57: What is Hadoop? Oct 17 2013

Falcon  

Hadoop   Hadoop  Feed   Feed  

Feed  Feed  

Feed  

Feed  

Feed  

Feed  DR  

Replica=on  

Page 58: What is Hadoop? Oct 17 2013

Flume  

JMS  

Weblogs  

Events  

Files  

Hadoop  Flume  

Flume  

Flume  

Flume  

Flume  

Flume  

Page 59: What is Hadoop? Oct 17 2013

Sqoop  

Hadoop  

DB  DB  Sqoop  

Sqoop  

Page 60: What is Hadoop? Oct 17 2013

Ambari  =  {install,  manage,  monitor}  

Page 61: What is Hadoop? Oct 17 2013

HBase  =  {real-­‐=me,  distributed-­‐map,  big-­‐tables}  

Page 62: What is Hadoop? Oct 17 2013

Storm  =  {Complex  Event  Processing,  Near-­‐Real-­‐Time,  Provisioned  by  

YARN  }  

Page 63: What is Hadoop? Oct 17 2013

Apache  Hadoop  

Flume  Ambari  

HBase  Falcon  

MapReduce  HDFS  

Sqoop  HCatalog  

Pig  

Hive  

Storm  YARN  

Page 64: What is Hadoop? Oct 17 2013

Hortonworks  Data  Pladorm  

Flume  Ambari  

HBase  Falcon  

MapReduce  HDFS  

Sqoop  HCatalog  

Pig  

Hive  

Storm   YARN  

Page 65: What is Hadoop? Oct 17 2013

What  else  are  we  working  on?  

hortonworks.com/labs/  

Page 66: What is Hadoop? Oct 17 2013

Hadoop  is  the  new  Data  Opera=ng  System  for  the  Enterprise  

Page 67: What is Hadoop? Oct 17 2013

© Hortonworks Inc. 2012: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Page  67  

There is NO second place

Hortonworks  …the  Bull  Elephant  of  Hadoop  Innova@on