challengesandlessonslearnedin)) dataintensivecompung )...feb 15, 2012  ·...

38
Challenges and Lessons Learned in DataIntensive Compu7ng Ümit V. Çatalyürek Associate Professor Department of Biomedical Informa<cs Department of Electrical & Computer Engineering The Ohio State University 2012 SIAM Conference on Parallel Processing for Scien7fic Compu7ng

Upload: others

Post on 30-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ChallengesandLessonsLearnedin)) DataIntensiveCompung )...Feb 15, 2012  · Çatalyürek"Challenges)and)Lessons)Learned)in)Data1Intensive)Compu7ng") 3 SIAM PP 15 Feb 2012 WhereIamcomingfrom:

Challenges  and  Lessons  Learned  in    Data-­‐Intensive  Compu7ng  

Ümit  V.  Çatalyürek  Associate  Professor  

Department  of  Biomedical  Informa<cs  Department  of  Electrical  &  Computer  Engineering  

The  Ohio  State  University  

2012  SIAM  Conference  on  Parallel  Processing  for  Scien7fic  Compu7ng  

Page 2: ChallengesandLessonsLearnedin)) DataIntensiveCompung )...Feb 15, 2012  · Çatalyürek"Challenges)and)Lessons)Learned)in)Data1Intensive)Compu7ng") 3 SIAM PP 15 Feb 2012 WhereIamcomingfrom:

•  1999-­‐2012:  Data-­‐Intensive  Applica<ons  •  Applica<ons;  Characteris<cs,  Commonali<es;  Middleware  

•  Mo<va<on  •  Hardware;    SoSware  

•  Dataflow  Applica<on  Model  and  Run<me  System  •  Run<me  System  Op<miza<ons  for    

•  Dynamic  Workloads;  Finitely  Divisible  Loads  

•  Conclusion  •  Summary;  Lessons  Learned;  Truth  Telling  

Çatalyürek  "Challenges  and  Lessons  Learned  in  Data-­‐Intensive  Compu7ng"   2 SIAM PP 15 Feb 2012

Outline  

Page 3: ChallengesandLessonsLearnedin)) DataIntensiveCompung )...Feb 15, 2012  · Çatalyürek"Challenges)and)Lessons)Learned)in)Data1Intensive)Compu7ng") 3 SIAM PP 15 Feb 2012 WhereIamcomingfrom:

Çatalyürek  "Challenges  and  Lessons  Learned  in  Data-­‐Intensive  Compu7ng"   3 SIAM PP 15 Feb 2012

Where  I  am  coming  from:  Applica7ons  associated  with  Large  Datasets  

Processing Remotely-Sensed DataNOAA Tiros-Nw/ AVHRR sensor

AVHRR Level 1 DataAVHRR Level 1 Data• As the TIROS-N satellite orbits, the Advanced Very High Resolution Radiometer (AVHRR)sensor scans perpendicular to the satellite’s track.• At regular intervals along a scan line measurementsare gathered to form an instantaneous field of view(IFOV).• Scan lines are aggregated into Level 1 data sets.

A single file of Global Area Coverage (GAC) data represents:• ~one full earth orbit.• ~110 minutes.• ~40 megabytes.• ~15,000 scan lines.

One scan line is 409 IFOV’s

Satellite Data Processing

DCE-MRI Analysis

Seismic Imaging

Digital Pathology

Managing Oilfields, Contaminant Transport Visualization -Virtual Human

Page 4: ChallengesandLessonsLearnedin)) DataIntensiveCompung )...Feb 15, 2012  · Çatalyürek"Challenges)and)Lessons)Learned)in)Data1Intensive)Compu7ng") 3 SIAM PP 15 Feb 2012 WhereIamcomingfrom:

•  Spa<o-­‐temporal  datasets  (generally  low  dimensional)  –  datasets  describe  physical  scenarios  •  Mul<-­‐dimensional,  Mul<-­‐resolu<on,  Mul<-­‐scale  

•  Very  large  file-­‐based  datasets  •  Hundreds  of  gigabytes  to  100+  TB  data  •  Data  is  stored  in  a  distributed  collec<on  of  files  •  Lots  of  datasets,  lots  of  files  

•  Data  products  oSen  involve  results  from  ensemble  of  spa<o-­‐temporal  datasets  

•  Some  applica<ons  require  interac<ve  explora<on  of  datasets  •  Common  opera<ons:  subsebng,  filtering,  interpola<ons,  

projec<ons,  comparisons,  frequency  counts  •  Modeling  and  management  of  data  analysis  workflows  

Çatalyürek  "Challenges  and  Lessons  Learned  in  Data-­‐Intensive  Compu7ng"   4 SIAM PP 15 Feb 2012

Characteris7cs,  Commonali7es…  

Page 5: ChallengesandLessonsLearnedin)) DataIntensiveCompung )...Feb 15, 2012  · Çatalyürek"Challenges)and)Lessons)Learned)in)Data1Intensive)Compu7ng") 3 SIAM PP 15 Feb 2012 WhereIamcomingfrom:

•  Distributed  data  processing  support  •  Grid  based  data  virtualiza<on,  data  management,  query,  on  demand  data  product  genera<on  

•  Distributed  metadata  and  data  management    •  Track  metadata  associated  with  data  and  data  analysis  workflows  

Çatalyürek  "Challenges  and  Lessons  Learned  in  Data-­‐Intensive  Compu7ng"   5 SIAM PP 15 Feb 2012

Data  Services  

Page 6: ChallengesandLessonsLearnedin)) DataIntensiveCompung )...Feb 15, 2012  · Çatalyürek"Challenges)and)Lessons)Learned)in)Data1Intensive)Compu7ng") 3 SIAM PP 15 Feb 2012 WhereIamcomingfrom:

•  Data  Analysis  on  Clusters:  Ac<ve  Data  Repository  (ADR)  •  Targets  storage,  retrieval  and  manipula<on  of  mul<-­‐dimensional  datasets  on  parallel  

machines  •  Several  services  are  customizable  for  various  applica<on  specific  processing  •  Data  retrieval,  memory  management,  and  scheduling  of  processing  •  Data  declustering/clustering,  Indexing  

•  Ac<ve  Seman<c  Data  Cache:  Ac<ve  Proxy  G  •  Employ  user  seman<cs  to  cache  and  retrieve  data  •  Store  and  reuse  results  of  computa<ons  

•  Applica<on  of  Grid  Technology  in  Cancer  Research:  caGrid  •  Driven  by  cancer  research  community  requirements  •  Services-­‐Oriented,  Metadata  driven  •  Crea<on  and  management  of  a  strongly-­‐typed,  mul<-­‐ins<tu<onal,  research  grid  •  Leverages  exis<ng  technology:  e.g.  caDSR,  EVS,  Mobius  GME,  Globus  

•  Distributed  Metadata  and  Data  Management:  Mobius  •  Create,  manage,  version  data  defini<ons    •  Management  of  metadata  and  data  instances  •  Data  integra<on  

Çatalyürek  "Challenges  and  Lessons  Learned  in  Data-­‐Intensive  Compu7ng"   6 SIAM PP 15 Feb 2012

Middleware  Support  

Page 7: ChallengesandLessonsLearnedin)) DataIntensiveCompung )...Feb 15, 2012  · Çatalyürek"Challenges)and)Lessons)Learned)in)Data1Intensive)Compu7ng") 3 SIAM PP 15 Feb 2012 WhereIamcomingfrom:

•  Data  Virtualiza<on:  STORM    •  Large  data  querying  capabili<es,  layered  on  DataCufer  •  Distributed  data  virtualiza<on    •  Indexing,  Data  Cluster/Decluster,  Parallel  Data  Transfer    

 •  Data  Analysis/Processing  Workflows:  DataCufer/DataCufer-­‐Lite  and  Anthill  

•  Component  Framework  for  Combined  Task/Data  Parallelism  •  Filtering/Program  coupling  Service:  Distributed  C++/Java/Python  component  framework  •  On  demand  data  product  genera<on  

•  more  later..  

Çatalyürek  "Challenges  and  Lessons  Learned  in  Data-­‐Intensive  Compu7ng"   7 SIAM PP 15 Feb 2012

Middleware  Support  (cont’d)  

SELECT <DataElements> FROM Dataset-1, Dataset-2,…, Dataset-n WHERE <Expression> AND <MyFilterFunc(<DataElement>)> GROUP-BY-PROCESSOR ComputeAttribute(<DataElement>)

Page 8: ChallengesandLessonsLearnedin)) DataIntensiveCompung )...Feb 15, 2012  · Çatalyürek"Challenges)and)Lessons)Learned)in)Data1Intensive)Compu7ng") 3 SIAM PP 15 Feb 2012 WhereIamcomingfrom:

•  More  and  more  machines  composed  of  mul<-­‐core  and  many-­‐core  CPUs,  and  accelerators  

•  Top500’s  top  5  has  3,  Green500’s  top  10  has  6  supercomputer  with  GPUs  (5  Nvidia,  1  ATI)  

•  Some  examples:  •  Intel  MIC  Knights  Corner  

•  50+  x86  cores,  4-­‐way  SMT  per  core  •  512-­‐byte  SIMD  registers  

•  NVIDIA  Fermi  GPU  •  Up  to  512  cores  •  Up  to  600  GFLOPS  of  double-­‐precision  

•  IBM  POWER7  •  Up  to  8  cores,    4-­‐way  SMT  per  core  •  32  MB  on-­‐die  L3  eDRAM  cache  

Çatalyürek  "Challenges  and  Lessons  Learned  in  Data-­‐Intensive  Compu7ng"   8 SIAM PP 15 Feb 2012

Hardware  Mo7va7on:  Current  &  Future  Systems  

Page 9: ChallengesandLessonsLearnedin)) DataIntensiveCompung )...Feb 15, 2012  · Çatalyürek"Challenges)and)Lessons)Learned)in)Data1Intensive)Compu7ng") 3 SIAM PP 15 Feb 2012 WhereIamcomingfrom:

Application Tasks

Incr

easi

ng G

ranu

lari

ty

SoSware  Mo7va7on:  How  to  u7lize  all  resources  effec7vely?  

SIAM PP 15 Feb 2012 9 Çatalyürek  "Challenges  and  Lessons  Learned  in  Data-­‐Intensive  Compu7ng"  

Page 10: ChallengesandLessonsLearnedin)) DataIntensiveCompung )...Feb 15, 2012  · Çatalyürek"Challenges)and)Lessons)Learned)in)Data1Intensive)Compu7ng") 3 SIAM PP 15 Feb 2012 WhereIamcomingfrom:

•  OSU/UMD-­‐DataCufer,  UMFG/OSU-­‐Anthill  •  Applica<on  decomposed  into  a  task-­‐graph  (filters  &  streams)  •  Task  graph  performs  computa<on  

•  Individual  tasks  perform  single  func<on  •  Tasks  (filters)  are  independent,  with  well-­‐defined  interfaces  •  Communica<on  using  streams  

•  Transparent  filter  instances  –  Single  stream  illusion  

•  Intra-­‐task  architecture-­‐specific  op<miza<ons  possible  •  Architectural  heterogeneity  abstracted  

•  Run<me  systems’  network  interface  matches  high-­‐bandwidth  networks’  communica<on  model  •  Applica<on  tasks  communicate  using  data  buffers  through  streams  •  Asynchronous  communica<ons  overlap  with  computa<on  

•  Robust  load  balancing  techniques    Çatalyürek  "Challenges  and  Lessons  Learned  in  Data-­‐Intensive  Compu7ng"   10 SIAM PP

15 Feb 2012

Component-­‐Based  Dataflow  

Page 11: ChallengesandLessonsLearnedin)) DataIntensiveCompung )...Feb 15, 2012  · Çatalyürek"Challenges)and)Lessons)Learned)in)Data1Intensive)Compu7ng") 3 SIAM PP 15 Feb 2012 WhereIamcomingfrom:

•  Mul<ple  dimensions  of  parallelism  •  Task  parallelism  •  Data  parallelism  •  Pipeline  parallelism  

11

Component-­‐Based  Dataflow  

������

����

� �������

������ ������

� �

����

����

SIAM PP 15 Feb 2012 Çatalyürek  "Challenges  and  Lessons  Learned  in  Data-­‐Intensive  Compu7ng"  

Page 12: ChallengesandLessonsLearnedin)) DataIntensiveCompung )...Feb 15, 2012  · Çatalyürek"Challenges)and)Lessons)Learned)in)Data1Intensive)Compu7ng") 3 SIAM PP 15 Feb 2012 WhereIamcomingfrom:

•  We  will  look  at  two  different  class  of  applica<ons:  •  Dynamic  workloads  with  non-­‐divisible  databuffer  sizes  are  given  by  the  applica<on  at  run<me  

•  Applica<ons  with  finitely  divisible  load  

Çatalyürek  "Challenges  and  Lessons  Learned  in  Data-­‐Intensive  Compu7ng"   12 SIAM PP 15 Feb 2012

Run7me  Op7miza7on  Challenges  

Page 13: ChallengesandLessonsLearnedin)) DataIntensiveCompung )...Feb 15, 2012  · Çatalyürek"Challenges)and)Lessons)Learned)in)Data1Intensive)Compu7ng") 3 SIAM PP 15 Feb 2012 WhereIamcomingfrom:

Çatalyürek  "Challenges  and  Lessons  Learned  in  Data-­‐Intensive  Compu7ng"  

Dynamic  Workloads  

SIAM PP 15 Feb 2012

Page 14: ChallengesandLessonsLearnedin)) DataIntensiveCompung )...Feb 15, 2012  · Çatalyürek"Challenges)and)Lessons)Learned)in)Data1Intensive)Compu7ng") 3 SIAM PP 15 Feb 2012 WhereIamcomingfrom:

•  Classify  biopsy  <ssue  images  into  different  subtypes  of  prognos<c  significance  

•  Very  high  resolu<on  slides  •  Divided  into  smaller  <les  

•  Mul<-­‐resolu<on  image  analysis  •  Mimics  the  way  pathologists  perform  their  analysis  •  If  classifica<on  at  lower  resolu<on  is  not  sa<sfactory,  analysis  algorithm  is  

executed  at  higher  resolu<on(s),  hence  the  dynamic  workload.  

Çatalyürek  "Challenges  and  Lessons  Learned  in  Data-­‐Intensive  Compu7ng"   14 SIAM PP 15 Feb 2012

Sample Application: Neuroblastoma Image Analysis

Page 15: ChallengesandLessonsLearnedin)) DataIntensiveCompung )...Feb 15, 2012  · Çatalyürek"Challenges)and)Lessons)Learned)in)Data1Intensive)Compu7ng") 3 SIAM PP 15 Feb 2012 WhereIamcomingfrom:

•  Improving  CPU/GPU  data  transfers  

•  Intra-­‐filter  task  assignment    •  Inter-­‐filter  op<miza<ons  

•  A  novel  stream  policy:  on-­‐demand  dynamic  selec<ve  stream  

Çatalyürek  "Challenges  and  Lessons  Learned  in  Data-­‐Intensive  Compu7ng"   15 SIAM PP 15 Feb 2012

Run-time optimizations

Page 16: ChallengesandLessonsLearnedin)) DataIntensiveCompung )...Feb 15, 2012  · Çatalyürek"Challenges)and)Lessons)Learned)in)Data1Intensive)Compu7ng") 3 SIAM PP 15 Feb 2012 WhereIamcomingfrom:

•  Data  bus  bandwidth  between  CPU/GPU  is  a  cri<cal  barrier  

•  Overlap  the  data  transfer  with  useful  computa<on  •  GPUs  allow  mul<ple  concurrent  transfers  (events)  •  The  op<mal  number  of  concurrent  transfers  varies:  computa<on/communica<on  rate  

•  An  automated  approach  to  dynamically  configure  the  number  of  concurrent  copies  –  based  on  the  kernel  throughput  

Çatalyürek  "Challenges  and  Lessons  Learned  in  Data-­‐Intensive  Compu7ng"   16 SIAM PP 15 Feb 2012

Improving CPU/GPU data transfers

Page 17: ChallengesandLessonsLearnedin)) DataIntensiveCompung )...Feb 15, 2012  · Çatalyürek"Challenges)and)Lessons)Learned)in)Data1Intensive)Compu7ng") 3 SIAM PP 15 Feb 2012 WhereIamcomingfrom:

•  The  processing  rates  of  the  processors  are  data-­‐dependent  

•  Exploit  intra-­‐filter  task  heterogeneity  •  Schedule  appropriate  tasks  to  appropriate  processors  

•  DDWRR  –  Demand-­‐Driven  Weighted  Round  Robin  •  Events  shared  among  processors  •  Sorts  events  according  to  their  performance  on  each  device  •  Selects  the  event  with  the  highest  speedup  (out-­‐of-­‐order  fashion)  

Çatalyürek  "Challenges  and  Lessons  Learned  in  Data-­‐Intensive  Compu7ng"   17 SIAM PP 15 Feb 2012

Intra-filter task assignment

Page 18: ChallengesandLessonsLearnedin)) DataIntensiveCompung )...Feb 15, 2012  · Çatalyürek"Challenges)and)Lessons)Learned)in)Data1Intensive)Compu7ng") 3 SIAM PP 15 Feb 2012 WhereIamcomingfrom:

•  Stream  premises  •  The  number  of  data  buffers  at  each  filter  copy  should  be  only  enough  to  

keep  its  processors  busy  •  Data  buffers  sent  to  a  filter  should  maximize  the  performance  of  processors  

allocated  to  each  filter  instance  

•  Dynamic  Queue  Adapta<on  Algorithm  •  Executed  at  the  receiver  side  •  Calculates  the  number  of  data  buffers  necessary  to  keep  processor  busy  

•  Ra<o  of  request  response  <me  to  data  buffer  processing  <me      

•  Data  Buffer  Selec<on  Algorithm  •  Executed  at  the  sender  side  •  Sends  the  data  buffer  with  the  highest  speedup  to  the  reques<ng  processor  

Çatalyürek  "Challenges  and  Lessons  Learned  in  Data-­‐Intensive  Compu7ng"   18 SIAM PP 15 Feb 2012

Inter-filter optimization: On-Demand Dynamic Selective stream (ODDS)

Page 19: ChallengesandLessonsLearnedin)) DataIntensiveCompung )...Feb 15, 2012  · Çatalyürek"Challenges)and)Lessons)Learned)in)Data1Intensive)Compu7ng") 3 SIAM PP 15 Feb 2012 WhereIamcomingfrom:

•  Setup  •  Homogeneous  cluster  -­‐    14  nodes  w/  CPU/GPU  •  Heterogeneous  cluster  –  14  nodes  

•  7  use  both  the  CPU  and  the  GPU  •  7  nodes  do  not  use  the  GPU  

•  Stream  policies  

               *DDFCFS:  Anthill’s  default  “Demand-­‐driven  first-­‐come,  first-­‐served  

Çatalyürek  "Challenges  and  Lessons  Learned  in  Data-­‐Intensive  Compu7ng"   19 SIAM PP 15 Feb 2012

Effect  of  run7me  op7miza7ons  

Demand-­‐driven  scheduling  policy  

Area  of  effect  

Queue  Policy   Size  of  request  for  data  buffers  Sender   Receiver  

DDFCFS*   Intra-­‐filter   Unsorted   Unsorted   Sta<c  

DDWRR   Intra-­‐filter    

Unsorted   Sorted  by  speedup   Sta<c  

ODDS   Inter-­‐filter    

Sorted  by  speedup    

Sorted  by  speedup   Dynamic  

Page 20: ChallengesandLessonsLearnedin)) DataIntensiveCompung )...Feb 15, 2012  · Çatalyürek"Challenges)and)Lessons)Learned)in)Data1Intensive)Compu7ng") 3 SIAM PP 15 Feb 2012 WhereIamcomingfrom:

Çatalyürek  "Challenges  and  Lessons  Learned  in  Data-­‐Intensive  Compu7ng"   20 SIAM PP 15 Feb 2012

Effect  of  run7me  op7miza7ons  Homogeneous base case

Heterogeneous base case

Tile  recalcula<on  rate:  %  of  <les  recalculated  at  higher  resolu<on.      ODDS  improves  performance  even  in  the  base  case  

Using  an  addi<onal  CPU-­‐only  machine  is  more  than  3x  faster  than  GPU-­‐only  version  

Page 21: ChallengesandLessonsLearnedin)) DataIntensiveCompung )...Feb 15, 2012  · Çatalyürek"Challenges)and)Lessons)Learned)in)Data1Intensive)Compu7ng") 3 SIAM PP 15 Feb 2012 WhereIamcomingfrom:

Homogeneous environment

Heterogeneous environment

Policies  with  sta<c  size  of  input  requests  fail  to  scale  in  heterogeneous  environments  

All  policies  scale  near  to  linear  in  homogeneous  environments  

Çatalyürek  "Challenges  and  Lessons  Learned  in  Data-­‐Intensive  Compu7ng"   21 SIAM PP 15 Feb 2012

Strong  Scalability  

Page 22: ChallengesandLessonsLearnedin)) DataIntensiveCompung )...Feb 15, 2012  · Çatalyürek"Challenges)and)Lessons)Learned)in)Data1Intensive)Compu7ng") 3 SIAM PP 15 Feb 2012 WhereIamcomingfrom:

Çatalyürek  "Challenges  and  Lessons  Learned  in  Data-­‐Intensive  Compu7ng"  

Adap7ve  Dataflow  Middleware    for    

Finitely  Divisible  Load  

SIAM PP 15 Feb 2012

Page 23: ChallengesandLessonsLearnedin)) DataIntensiveCompung )...Feb 15, 2012  · Çatalyürek"Challenges)and)Lessons)Learned)in)Data1Intensive)Compu7ng") 3 SIAM PP 15 Feb 2012 WhereIamcomingfrom:

Adap7ve  Dataflow  Middleware  for  Heterogeneous  Systems  •  Many  real-­‐world  applica<ons  follow  simple  model  

•  Independent  tasks  •  Mul<-­‐dimensional  (rectangular,  cuboid  etc.)  work  area  •  Finitely  divisible  (<les)  

•  Arbitrary  par<<ons,  within  minimum  and  maximum  constraints  

•  Even  modern  distributed  middleware  requires  developers  to  perform  manual  tuning  •  Minimum  execu<on  <me  dependent  on:  

•  System  configura<on  (#  nodes,  #  CPUs  /  node,  #  GPUs  /  node,  CPU  speed,  GPU  speed,  network  speed)  

•  Applica<on  specifics  

SIAM PP 15 Feb 2012 23 Çatalyürek  "Challenges  and  Lessons  Learned  in  Data-­‐Intensive  Compu7ng"  

Page 24: ChallengesandLessonsLearnedin)) DataIntensiveCompung )...Feb 15, 2012  · Çatalyürek"Challenges)and)Lessons)Learned)in)Data1Intensive)Compu7ng") 3 SIAM PP 15 Feb 2012 WhereIamcomingfrom:

��������

� �

����

���

����������

����

•  A  component-­‐based  dataflow  applica<on  •  n  processors  •  One  copy  of  each  filter  pipeline  on  each  processor  

•  No  communica<on  between  filters  inside  pipeline  

•  Processors  can  be  heterogeneous  

Çatalyürek  "Challenges  and  Lessons  Learned  in  Data-­‐Intensive  Compu7ng"   24 SIAM PP 15 Feb 2012

A  Simplified  Applica7on  Model  for  Finitely  Divisible  Load  Applica7ons  

Page 25: ChallengesandLessonsLearnedin)) DataIntensiveCompung )...Feb 15, 2012  · Çatalyürek"Challenges)and)Lessons)Learned)in)Data1Intensive)Compu7ng") 3 SIAM PP 15 Feb 2012 WhereIamcomingfrom:

•  Two  concerns:  •  Choosing  the  best  data  buffer  (work  <le)  size  

•  Accelerators'  performance  highly  dependent  on  amount  of  work  per  data  buffer  

•  Directly  related  to  the  applica<on  work  area  •  Allow  run<me  system  to  automa<cally  par<<on  the  work  area  into  ;les  

•  Keep  track  of  processors'  performance  with  performance  model  for  each  processor  

•  Balancing  the  load  •  Use  work-­‐stealing  to  give  processors  appropriate  work,  such  that  all  processors  finish  simultaneously  

Çatalyürek  "Challenges  and  Lessons  Learned  in  Data-­‐Intensive  Compu7ng"   25 SIAM PP 15 Feb 2012

Adap7ve  Par77oning  Controller  (APC+)  

Page 26: ChallengesandLessonsLearnedin)) DataIntensiveCompung )...Feb 15, 2012  · Çatalyürek"Challenges)and)Lessons)Learned)in)Data1Intensive)Compu7ng") 3 SIAM PP 15 Feb 2012 WhereIamcomingfrom:

•  Four  parts  •  Performance  Model  

•  Tracks  execu<on  <me  performance  for  each  data  buffer  size  •  Dynamically  finds  the  highest-­‐performing  data  buffer  size  

•  Work  Par<<oner  •  Dynamically  par<<ons  and  schedules  data  buffers  

•  Distributed  Work-­‐Stealing  Layer  •  Lightweight  work-­‐stealing  technique  to  balance  the  load  across  the  nodes  

•  Storage  Layer  •  Enables  efficient  network  use  and  separa<on  of  stealing  of  work  and  movement  of  data  

Çatalyürek  "Challenges  and  Lessons  Learned  in  Data-­‐Intensive  Compu7ng"   26 SIAM PP 15 Feb 2012

APC+  (cont’d)  

Page 27: ChallengesandLessonsLearnedin)) DataIntensiveCompung )...Feb 15, 2012  · Çatalyürek"Challenges)and)Lessons)Learned)in)Data1Intensive)Compu7ng") 3 SIAM PP 15 Feb 2012 WhereIamcomingfrom:

•  Three  phases  •  Bootstrap  

•  Ini<alize  performance  model  by  trying  some  data  buffer  sizes  

•  Steady-­‐state  •  Par<<on  work  area  using  maximum  processing  rates  •  Send  work  <les  which  have  the  maximum  processing  rate  •  Keep  processor  queues  full  to  hide  latencies  •  Steal  work  from  remote  peer  if  none  available  locally  

•  Flushout  •  If  total  work  area  is  low,  reduce  <le  sizes,  for  load  balance  

Çatalyürek  "Challenges  and  Lessons  Learned  in  Data-­‐Intensive  Compu7ng"   27 SIAM PP 15 Feb 2012

APC+  Algorithm  

Page 28: ChallengesandLessonsLearnedin)) DataIntensiveCompung )...Feb 15, 2012  · Çatalyürek"Challenges)and)Lessons)Learned)in)Data1Intensive)Compu7ng") 3 SIAM PP 15 Feb 2012 WhereIamcomingfrom:

�� ��

��������

��

��

� •  Storage  Layer  (SL),  Processing  Filter  (P)  

•  Successful  steal  (1,2)  schedules  one  data  buffer  for  transfer  (3)  

•  Upon  receipt  of  the  data  buffer  in  the  storage  layer  (4),  the  data  buffer  is  unlocked  (5),  and  scheduled  for  processing  (6)  

•  Acknowledgement  of  receipt  (7,8)  schedules  next  data  buffer    

Çatalyürek  "Challenges  and  Lessons  Learned  in  Data-­‐Intensive  Compu7ng"   28 SIAM PP 15 Feb 2012

APC+:  Work-­‐stealing  

Page 29: ChallengesandLessonsLearnedin)) DataIntensiveCompung )...Feb 15, 2012  · Çatalyürek"Challenges)and)Lessons)Learned)in)Data1Intensive)Compu7ng") 3 SIAM PP 15 Feb 2012 WhereIamcomingfrom:

•  Real-­‐World  Applica<ons  •  Synthe<c  Aperture  Radar  Image  Analysis  •  Histopathology  Image  Analysis  •  Black-­‐Scholes  European  Stock  Op<on  Pricing  Calcula<on  

•  Conduct  weak  scalability  experiments  •  Increase  work  commensurate  with  increased  computa<onal  resources  

•  Owens  cluster  •  32  nodes,  dual  quad-­‐core  Intel  Xeon  CPUs,  single  Nvidia  C2050  GPU  

•  20Gbps  Infiniband  connected  

Çatalyürek  "Challenges  and  Lessons  Learned  in  Data-­‐Intensive  Compu7ng"   29 SIAM PP 15 Feb 2012

Experimental  Setup  

Page 30: ChallengesandLessonsLearnedin)) DataIntensiveCompung )...Feb 15, 2012  · Çatalyürek"Challenges)and)Lessons)Learned)in)Data1Intensive)Compu7ng") 3 SIAM PP 15 Feb 2012 WhereIamcomingfrom:

State-­‐of-­‐the-­‐art  Distributed  Middleware  •  DataCufer  

•  Demand-­‐Driven  load  balancing  (DC-­‐DD)  •  Requires  manual  tuning  

•  Simple,  fast  

•  KAAPI/Athapascan-­‐1  •  Work-­‐stealing  load  balancing;  scheduling  with  dependencies  

•  Requires  manual  tuning  •  Theore<cally  asympto<c  op<mal  behavior,  in  certain  condi<ons  •  KAAPI  has  no  na<ve  IB  protocol  

•  MR-­‐MPI  •  Simple  MapReduce  programming  seman<c  

•  Requires  manual  tuning  •  Small  API,  no  parallel  programming  knowledge  is  required  

SIAM PP 15 Feb 2012 30 Çatalyürek  "Challenges  and  Lessons  Learned  in  Data-­‐Intensive  Compu7ng"  

Page 31: ChallengesandLessonsLearnedin)) DataIntensiveCompung )...Feb 15, 2012  · Çatalyürek"Challenges)and)Lessons)Learned)in)Data1Intensive)Compu7ng") 3 SIAM PP 15 Feb 2012 WhereIamcomingfrom:

!"

#!"

$!!"

$#!"

%!!"

%#!"

$" %" &" '" $(")%" $" %" &" '" $(")%" $" %" &" '" $(")%" $" %" &" '" $(")%"

*+,-.+/" *+,**" 0--.1" 23,2.1"

!"#$%&

'()&*#)+,-)

4+."

5664"

6783"

9,125"

.36+"

Çatalyürek  "Challenges  and  Lessons  Learned  in  Data-­‐Intensive  Compu7ng"   31 SIAM PP 15 Feb 2012

Synthe7c  Aperture  Radar:  CPU+GPU  

•  KAAPI  perf.  due  to  network  bofleneck  (16  thr./node)  •  MR-­‐MPI  slow-­‐down  due  to  synchroniza<on  

Page 32: ChallengesandLessonsLearnedin)) DataIntensiveCompung )...Feb 15, 2012  · Çatalyürek"Challenges)and)Lessons)Learned)in)Data1Intensive)Compu7ng") 3 SIAM PP 15 Feb 2012 WhereIamcomingfrom:

!"

#!"

$!!"

$#!"

%!!"

%#!"

&!!"

$" %" '" (" $)"&%" $" %" '" (" $)"&%" $" %" '" (" $)"&%" $" %" '" (" $)"&%"

*+,-.+/" *+,**" 0--.1" 23,2.1"

!"#$%&

'()&*#)+,-)

4+."

5664"

6783"

9,125"

.36+"

Çatalyürek  "Challenges  and  Lessons  Learned  in  Data-­‐Intensive  Compu7ng"   32 SIAM PP 15 Feb 2012

Histopathology  Image  Analysis:  CPU+GPU  

•  Endpoint  conten<on  on  node  0:  cannot  feed  whole  cluster  sufficiently,  especially  using  TCPoIB  

Page 33: ChallengesandLessonsLearnedin)) DataIntensiveCompung )...Feb 15, 2012  · Çatalyürek"Challenges)and)Lessons)Learned)in)Data1Intensive)Compu7ng") 3 SIAM PP 15 Feb 2012 WhereIamcomingfrom:

!"

#!"

$!!"

$#!"

%!!"

%#!"

&!!"

$" %" '" (" $)"&%" $" %" '" (" $)"&%" $" %" '" (" $)"&%" $" %" '" (" $)"&%"

*+,-.+/" *+,**" 0--.1" 23,2.1"

!"#$%&

'()&*#)+,-)

4+."

5664"

6783"

694"

:,125"

.36+"

1;14"

Çatalyürek  "Challenges  and  Lessons  Learned  in  Data-­‐Intensive  Compu7ng"   33 SIAM PP 15 Feb 2012

Black-­‐Scholes:  CPU+GPU  

•  Data-­‐intensive  reduc<on  to  one  node;  KAAPI  cannot  pin  distributed  data  read  tasks;  MR-­‐MPI:  no  comm.  overlap  

Page 34: ChallengesandLessonsLearnedin)) DataIntensiveCompung )...Feb 15, 2012  · Çatalyürek"Challenges)and)Lessons)Learned)in)Data1Intensive)Compu7ng") 3 SIAM PP 15 Feb 2012 WhereIamcomingfrom:

•  Dataflow  systems  are  easy  to  program  and  good  for  efficiently  using  heterogeneous  systems  

•  Dynamic  run<me  system  op<miza<ons  were  useful  to  u<lize  all  compu<ng  resources  •  Don't  neglect  CPU  cores  even  if  you  have  powerful  accelerators  •  Adequate  use  of  CPU/GPU  doubled  GPU-­‐only  performance  in  several  

scenarios  

•  APC+  does  a  good  job  of  applica<on  tuning  and  load  balancing  •  Meets  or  beats  well-­‐tuned  demand-­‐driven  policy  

•  There  is  always  future  work  •  Expand  applica<on  and  work  area  models  for  APC+  

Çatalyürek  "Challenges  and  Lessons  Learned  in  Data-­‐Intensive  Compu7ng"   34 SIAM PP 15 Feb 2012

Summary  

Page 35: ChallengesandLessonsLearnedin)) DataIntensiveCompung )...Feb 15, 2012  · Çatalyürek"Challenges)and)Lessons)Learned)in)Data1Intensive)Compu7ng") 3 SIAM PP 15 Feb 2012 WhereIamcomingfrom:

•  Tradi<onal  low  emphasis  on  performance  –  major  end  to  end  performance  issues  

•  For  real  high-­‐performing  data-­‐intensive  applica<ons  •  Dynamic  resource  management  •  Data-­‐centric  compu<ng  

•  Treat  data  as  first-­‐class  ci<zen  •  Move  work  to  data  when  needed  

•  Global  dependency  graph  (DAGs?)  exposed  to  run<me  system  •  Smart  work-­‐queues  (distributed  work-­‐stealing)  

•  Separate  work-­‐stealing  and  data  transfer  •  Asynchronous  execu<on  (pipelined  execu;on)  

•  Not  surprising,  not  “much”  different  than  others’  recommenda<ons  for  future  roadmap  to  Exascale  

 Çatalyürek  "Challenges  and  Lessons  Learned  in  Data-­‐Intensive  Compu7ng"   35 SIAM PP

15 Feb 2012

Lessons  Learned  

Page 36: ChallengesandLessonsLearnedin)) DataIntensiveCompung )...Feb 15, 2012  · Çatalyürek"Challenges)and)Lessons)Learned)in)Data1Intensive)Compu7ng") 3 SIAM PP 15 Feb 2012 WhereIamcomingfrom:

Truth  Telling    

•  Large  scale  data  is  a  great  buzz  word  but  great  need  to  systema<cally  nail  down  user  requirements  now  that  we  have  plausible  data  systems  with  fast  Hard  disks  and  SSDs    

•  End  to  end  performance  characteriza<on  is  rare  and  valuable.  Tradi<onally  hindered  by  shared  nature  of  storage  systems    

•  Claim:  “I  can  do  all  in  MPI+X+Y  in  my  applica<on!”  •  Sure  you  can!  How  do  you  think  we  implemented  the  run<me  system!    

•  But  do  you  want  to  re-­‐do  in  all  of  your  applica<ons?  SIAM PP 15 Feb 2012 Çatalyürek  "Challenges  and  Lessons  Learned  in  Data-­‐Intensive  Compu7ng"   36

Page 37: ChallengesandLessonsLearnedin)) DataIntensiveCompung )...Feb 15, 2012  · Çatalyürek"Challenges)and)Lessons)Learned)in)Data1Intensive)Compu7ng") 3 SIAM PP 15 Feb 2012 WhereIamcomingfrom:

•  The  Ohio  State  University  •  Timothy  Hartley,  Erik  Saule,  Olcay  

Sertel,  Jun  Kong,  Me<n  Gurcan,  Barla  Cambazoglu,  Mike  Gray,  Shannon  Has<ngs,  Steve  Langella,  Scof  Oster,  Li  Weng,  Gagan  Agrawal,  Krishnan  Sivaramakrishnan,  Vijay  Kumar,  Benjamin  Ruf,  Tony  Pan,  Tahsin  Kurc  (@Emory),  Joel  Saltz  (@Emory)  

•  University  of  Maryland,  College  Park  •  Alan  Sussman,  Michael  Beynon,  

Chialin  Chang,  Henrique  Andrade  

•  Ohio  Supercomputer  Center  •  Don  Stredney,  Dennis  Sessanna    

•  UMFH,  Brazil:  •  Renato  Ferreira,  George  Teodoro  

•  University  of  Malaga,  Malaga,  Spain    •  Manuel  Ujaldon,  Antonio  Ruiz  

•  University  Jaume  I,  Castellon,  Spain  •  Rafael  Mayo,  Francisco  Igual  

•  Rutgers  University  •  Manish  Parashar  

•  University  of  Texas  at  Aus<n  •  Mary  Wheeler,  Hector  Klie,  Paul  

Stoffa,  Mrinal  Sen,  Malgorzata  Peszynska,  Ryan  Mar<no  

Çatalyürek  "Challenges  and  Lessons  Learned  in  Data-­‐Intensive  Compu7ng"   37 SIAM PP 15 Feb 2012

Collaborators  (listed  with  their  affilia<ons  when  the  work  was  carried  out)  

Page 38: ChallengesandLessonsLearnedin)) DataIntensiveCompung )...Feb 15, 2012  · Çatalyürek"Challenges)and)Lessons)Learned)in)Data1Intensive)Compu7ng") 3 SIAM PP 15 Feb 2012 WhereIamcomingfrom:

•  For  more  informa<on  •  Email  [email protected]  •  Visit    hfp://bmi.osu.edu/~umit  or  hfp://bmi.osu.edu/hpc  

•  Research  at  the  HPC  Lab  is  funded  by  

Çatalyürek  "Challenges  and  Lessons  Learned  in  Data-­‐Intensive  Compu7ng"   38 SIAM PP 15 Feb 2012

Thanks