cmu 2011 09.pptx

47
10/11/11 © MapR Confiden0al 1 MapR, Implica0ons for Integra0on CMU – September 2011

Upload: mapr-technologies

Post on 15-Jan-2015

95 views

Category:

Technology


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Cmu 2011 09.pptx

10/11/11   ©  MapR  Confiden0al   1  

MapR,  Implica0ons  for  Integra0on  

CMU  –  September  2011  

Page 2: Cmu 2011 09.pptx

10/11/11   ©  MapR  Confiden0al   2  

Outline  

•  MapR  system  overview  •  Map-­‐reduce  review  •  MapR  architecture  •  Performance  Results  •  Map-­‐reduce  on  MapR  

•  Architectural  implica0ons  •  Search  indexing  /  deployment  •  EM  algorithm  for  machine  learning  •  …  and  more  …  

Page 3: Cmu 2011 09.pptx

10/11/11   ©  MapR  Confiden0al   3  

Map-­‐Reduce  

!"

!"

!#

!#

$%&'()"*" +,&)!'%-(./%0)"*#

12'!!3)"*4 536'-3)!'%-(./%0)"*7

8'(&'()"930)"*:

;.<'=3)")>&=./=),=(?

@/-,9)A.0B

@/-,9)A.0B

!"!#

Input   Output  

Shuffle  

Page 4: Cmu 2011 09.pptx

10/11/11   ©  MapR  Confiden0al   4  

BoQlenecks  and  Issues  

•  Read-­‐only  files  •  Many  copies  in  I/O  path  •  Shuffle  based  on  HTTP  •  Can’t  use  new  technologies  •  Eats  file  descriptors  

•  Spills  go  to  local  file  space  •  Bad  for  skewed  distribu0on  of  sizes  

Page 5: Cmu 2011 09.pptx

10/11/11   ©  MapR  Confiden0al   5  

MapR  Areas  of  Development  

Map  Reduce  

Storage  Services  

Ecosystem  

HBase  

Management  

Page 6: Cmu 2011 09.pptx

10/11/11   ©  MapR  Confiden0al   6  

MapR  Improvements  

•  Faster  file  system  •  Fewer  copies  •  Mul0ple  NICS  •  No  file  descriptor  or  page-­‐buf  compe00on  

•  Faster  map-­‐reduce  •  Uses  distributed  file  system  •  Direct  RPC  to  receiver  •  Very  wide  merges  

Page 7: Cmu 2011 09.pptx

10/11/11   ©  MapR  Confiden0al   7  

MapR  Innova0ons  

•  Volumes  •  Distributed  management  •  Data  placement  

•  Read/write  random  access  file  system  •  Allows  distributed  meta-­‐data  •  Improved  scaling  •  Enables  NFS  access  

•  Applica0on-­‐level  NIC  bonding  •  Transac0onally  correct  snapshots  and  mirrors  

Page 8: Cmu 2011 09.pptx

10/11/11   ©  MapR  Confiden0al   8  

MapR's  Containers  

l  Each  container  contains  l  Directories  &  files  l  Data  blocks  

l  Replicated  on  servers  l  No  need  to  manage  

directly  

Files/directories  are  sharded  into  blocks,  which  are  placed  into  mini  NNs  (containers  )  on  disks  

Containers  are  16-­‐32  GB  segments  of  disk,  placed  on  nodes  

Page 9: Cmu 2011 09.pptx

10/11/11   ©  MapR  Confiden0al   9  

MapR's  Containers  

l  Each  container  has  a  replica0on  chain  

l  Updates  are  transac0onal  l  Failures  are  handled  by  rearranging  replica0on  

Page 10: Cmu 2011 09.pptx

10/11/11   ©  MapR  Confiden0al   10  

Container  loca0ons  and  replica0on  

CLDB  

N1,  N2  N3,  N2  

N1,  N2  

N1,  N3  

N3,  N2  

N1  

N2  

N3  Container  loca0on  database  (CLDB)  keeps  track  of  nodes  hos0ng  each  container  and  replica0on  chain  order  

Page 11: Cmu 2011 09.pptx

10/11/11   ©  MapR  Confiden0al   11  

MapR  Scaling  Containers  represent  16  -­‐  32GB  of  data  

l  Each  can  hold  up  to    1  Billion  files  and  directories  l  100M  containers  =    ~  2  Exabytes    (a  very  large  cluster)  

250  bytes  DRAM  to  cache  a  container  l  25GB  to  cache  all  containers  for  2EB  cluster  

-  But  not  necessary,  can  page  to  disk  l  Typical  large  10PB  cluster  needs  2GB  

Container-­‐reports  are  100x  -­‐  1000x    <    HDFS  block-­‐reports  l  Serve  100x  more  data-­‐nodes  l  Increase  container  size  to  64G  to  serve  4EB  cluster  

l  Map/reduce  not  affected  

Page 12: Cmu 2011 09.pptx

10/11/11   ©  MapR  Confiden0al   12  

MapR's  Streaming  Performance  

Read Write0

250

500

750

1000

1250

1500

1750

2000

2250

Read Write0

250

500

750

1000

1250

1500

1750

2000

2250

HardwareMapRHadoopMB  

per  sec  

Tests:          i.    16  streams  x  120GB              ii.    2000  streams  x  1GB  

11  x  7200rpm  SATA   11  x  15Krpm  SAS  

Higher  is  be;er  

Page 13: Cmu 2011 09.pptx

10/11/11   ©  MapR  Confiden0al   13  

Terasort  on  MapR  

1.0  TB0

10

20

30

40

50

60

3.5  TB0

50

100

150

200

250

300

MapRHadoop

Elapsed  =me  (mins)  

10+1  nodes:  8  core,  24GB  DRAM,  11  x  1TB  SATA  7200  rpm  

Lower  is  be;er  

Page 14: Cmu 2011 09.pptx

10/11/11   ©  MapR  Confiden0al   14  

HBase  on  MapR  

Records  per  

second  

Higher  is  be;er  0  

5000  

10000  

15000  

20000  

25000  

Zipfian   Uniform  

MapR  

Apache  

YCSB  Random  Read    with  1  billion  1K  records  10+1  node  cluster:  8  core,  24GB  DRAM,  11  x  1TB  7200  RPM  

Page 15: Cmu 2011 09.pptx

10/11/11   ©  MapR  Confiden0al   15  

#  of  files  (m)  

Rat

e (f

iles/

sec)

Op:    -­‐  create  file                    -­‐  write  100  bytes                    -­‐  close  

Notes:  

-­‐  NN  not  replicated  

-­‐  NN  uses  20G  DRAM  

-­‐  DN  uses    2G    DRAM  

Out  of  box  

Tuned  

Small  Files  (Apache  Hadoop,  10  nodes)  

Page 16: Cmu 2011 09.pptx

10/11/11   ©  MapR  Confiden0al   16  

MUCH  faster  for  some  opera0ons  

#  of  files  (millions)  

Create  Rate  

Same  10  nodes  …  

Page 17: Cmu 2011 09.pptx

10/11/11   ©  MapR  Confiden0al   17  

What  MapR  is  not  

•  Volumes  !=  federa0on  •  MapR  supports  >  10,000  volumes  all  with  independent  placement  and  defaults  

•  Volumes  support  snapshots  and  mirroring  •  NFS  !=  FUSE  •  Checksum  and  compress  at  gateway  •  IP  fail-­‐over  •  Read/write/update  seman0cs  at  full  speed  

•  MapR  !=  maprfs  

Page 18: Cmu 2011 09.pptx

10/11/11   ©  MapR  Confiden0al   18  

New  Capabili0es  

Page 19: Cmu 2011 09.pptx

10/11/11   ©  MapR  Confiden0al   19  

Alterna0ve  NFS  moun0ng  models  

•  Export  to  the  world  •  NFS  gateway  runs  on  selected  gateway  hosts  

•  Local  server  •  NFS  gateway  runs  on  local  host  •  Enables  local  compression  and  check  summing  

•  Export  to  self  •  NFS  gateway  runs  on  all  data  nodes,  mounted  from  localhost  

Page 20: Cmu 2011 09.pptx

10/11/11   ©  MapR  Confiden0al   20  

Export  to  the  world  

NFS  Server  NFS  Server  NFS  Server  NFS  Server  NFS  

Client  

Page 21: Cmu 2011 09.pptx

10/11/11   ©  MapR  Confiden0al   21  

Client  

NFS  Server  

Local  server  

Applica0on  

Cluster  Nodes  

Page 22: Cmu 2011 09.pptx

10/11/11   ©  MapR  Confiden0al   22  

Cluster  Node  

NFS  Server  

Universal  export  to  self  

Task  

Cluster  Nodes  

Page 23: Cmu 2011 09.pptx

10/11/11   ©  MapR  Confiden0al   23  

Cluster  Node  

NFS  Server  

Task  

Cluster  Node  

NFS  Server  

Task  

Cluster  Node  

NFS  Server  

Task  

Nodes  are  iden0cal  

Page 24: Cmu 2011 09.pptx

10/11/11   ©  MapR  Confiden0al   24  

Applica0on  architecture  

•  High  performance  map-­‐reduce  is  nice  

•  But  algorithmic  flexibility  is  even  nicer  

Page 25: Cmu 2011 09.pptx

10/11/11   ©  MapR  Confiden0al   25  

Sharded  text  Indexing  

Map  Reducer  

Input  documents  

Local  disk   Search  

Engine  Local  disk  

Clustered  index  storage  

Assign  documents  to  shards  

Index  text  to  local  disk  and  then  copy  index  to  distributed  file  store  

Copy  to  local  disk  typically  required  before  index  can  be  loaded  

Page 26: Cmu 2011 09.pptx

10/11/11   ©  MapR  Confiden0al   26  

Sharded  text  indexing  

•  Mapper  assigns  document  to  shard  •  Shard  is  usually  hash  of  document  id  

•  Reducer  indexes  all  documents  for  a  shard  •  Indexes  created  on  local  disk  •  On  success,  copy  index  to  DFS  •  On  failure,  delete  local  files  

•  Must  avoid  directory  collisions    •  can’t  use  shard  id!  

•  Must  manage  and  reclaim  local  disk  space  

Page 27: Cmu 2011 09.pptx

10/11/11   ©  MapR  Confiden0al   27  

Conven0onal  data  flow  

Map  Reducer  

Input  documents  

Local  disk   Search  

Engine  Local  disk  

Clustered  index  storage  

Failure  of  a  reducer  causes  garbage  to  accumulate  in  the  

local  disk  

Failure  of  search  engine  requires  

another  download  of  the  index  from  clustered  storage.  

Page 28: Cmu 2011 09.pptx

10/11/11   ©  MapR  Confiden0al   28  

Search  Engine  

Simplified  NFS  data  flows  

Map  Reducer  

Input  documents   Clustered  

index  storage  

Failure  of  a  reducer  is  cleaned  up  by  map-­‐reduce  framework  

Search  engine  reads  mirrored  index  directly.  

Index  to  task  work  directory  via  NFS  

Page 29: Cmu 2011 09.pptx

10/11/11   ©  MapR  Confiden0al   29  

Simplified  NFS  data  flows  

Map  Reducer  

Input  documents  

Search  Engine  

Mirrors  

Search  Engine  

Mirroring  allows  exact  placement  of  index  data  

Aribitrary  levels  of  replica0on  also  possible  

Page 30: Cmu 2011 09.pptx

10/11/11   ©  MapR  Confiden0al   30  

How  about  another  one?  

Page 31: Cmu 2011 09.pptx

10/11/11   ©  MapR  Confiden0al   31  

K-­‐means  

•  Classic  E-­‐M  based  algorithm  •  Given  cluster  centroids,  •  Assign  each  data  point  to  nearest  centroid  •  Accumulate  new  centroids  •  Rinse,  lather,  repeat  

Page 32: Cmu 2011 09.pptx

10/11/11   ©  MapR  Confiden0al   32  

Aggregate  new  

centroids  

K-­‐means,  the  movie  

Assign  to  

Nearest  centroid  

Centroids  

I  n  p  u  t  

Page 33: Cmu 2011 09.pptx

10/11/11   ©  MapR  Confiden0al   33  

But  …  

Page 34: Cmu 2011 09.pptx

10/11/11   ©  MapR  Confiden0al   34  

Average  models  

Parallel  Stochas0c  Gradient  Descent  

Train  sub  

model  

Model  

I  n  p  u  t  

Page 35: Cmu 2011 09.pptx

10/11/11   ©  MapR  Confiden0al   35  

Update  model  

Varia0onal  Dirichlet  Assignment  

Gather  sufficient  sta0s0cs  

Model  

I  n  p  u  t  

Page 36: Cmu 2011 09.pptx

10/11/11   ©  MapR  Confiden0al   36  

Old  tricks,  new  dogs  

•  Mapper  •  Assign  point  to  cluster  •  Emit  cluster  id,  (1,  point)  

•  Combiner  and  reducer  •  Sum  counts,  weighted  sum  of  points  •  Emit  cluster  id,  (n,  sum/n)  

•  Output  to  HDFS  

Read  from  HDFS  to  local  disk  by  distributed  cache  

WriQen  by  map-­‐reduce  

Read  from  local  disk  from  distributed  cache  

Page 37: Cmu 2011 09.pptx

10/11/11   ©  MapR  Confiden0al   37  

Old  tricks,  new  dogs  

•  Mapper  •  Assign  point  to  cluster  •  Emit  cluster  id,  (1,  point)  

•  Combiner  and  reducer  •  Sum  counts,  weighted  sum  of  points  •  Emit  cluster  id,  (n,  sum/n)  

•  Output  to  HDFS  MapR  FS  

Read  from  NFS  

WriQen  by  map-­‐reduce  

Page 38: Cmu 2011 09.pptx

10/11/11   ©  MapR  Confiden0al   38  

Poor  man’s  Pregel  

•  Mapper  

•  Lines  in  bold  can  use  conven0onal  I/O  via  NFS  

38  

while not done:! read and accumulate input models! for each input:! accumulate model! write model! synchronize! reset input format!emit summary!

Page 39: Cmu 2011 09.pptx

10/11/11   ©  MapR  Confiden0al   39  

Click  modeling  architecture  

Feature  extrac0on  

and  down  

sampling  

I  n  p  u  t  

Side-­‐data  

Data  join  

Sequen0al  SGD  

Learning  

Map-­‐reduce  

Now  via  NFS  

Page 40: Cmu 2011 09.pptx

10/11/11   ©  MapR  Confiden0al   40  

Click  modeling  architecture  

Map-­‐reduce  Map-­‐reduce  

Feature  extrac0on  

and  down  

sampling  

I  n  p  u  t  

Side-­‐data  

Data  join  

Sequen0al  SGD  

Learning  

Map-­‐reduce  cooperates  with  NFS  

Sequen0al  SGD  

Learning  

Sequen0al  SGD  

Learning  

Sequen0al  SGD  

Learning  

Page 41: Cmu 2011 09.pptx

10/11/11   ©  MapR  Confiden0al   41  

And  another…  

Page 42: Cmu 2011 09.pptx

10/11/11   ©  MapR  Confiden0al   42  

??  

Hybrid  model  flow  

Map-­‐reduce  

Map-­‐reduce  

Feature  extrac0on    and    

down  sampling  

SVD  (PageRank)  (spectral)  

Deployed  Model  

Down    stream    modeling  

Page 43: Cmu 2011 09.pptx

10/11/11   ©  MapR  Confiden0al   43  

Page 44: Cmu 2011 09.pptx

10/11/11   ©  MapR  Confiden0al   44  

Map-­‐reduce  Sequen0al  

Hybrid  model  flow  

Feature  extrac0on    and    

down  sampling  

SVD  (PageRank)  (spectral)  

Deployed  Model  

Down    stream    modeling  

Page 45: Cmu 2011 09.pptx

10/11/11   ©  MapR  Confiden0al   45  

And  visualiza0on…  

Page 46: Cmu 2011 09.pptx

10/11/11   ©  MapR  Confiden0al   46  

Trivial  visualiza0on  interface  

•  Map-­‐reduce  output  is  visible  via  NFS  

•  Legacy  visualiza0on  just  works  

$ R!> x <- read.csv(“/mapr/my.cluster/home/ted/data/foo.out”)!> plot(error ~ t, x)!> q(save=‘n’)!

Page 47: Cmu 2011 09.pptx

10/11/11   ©  MapR  Confiden0al   47  

Conclusions  

•  We  used  to  know  all  this  •  Tab  comple0on  used  to  work  •  5  years  of  work-­‐arounds  have  clouded  our  memories  

•  We  just  have  to  remember  the  future