large&scale*dataanaly0cs*workflow* supportfor*climate ...service deployed on demand through the...

17
LargeScale Data Analy0cs Workflow Support for Climate Change Experiments S. Fiore , C. Doutriaux, D. Palazzo, A. D’Anca, Z. Shaeen, D. Elia, J. BouIe, V. Anantharaj, D. N. Williams, G. Aloisio

Upload: others

Post on 30-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Large&Scale*DataAnaly0cs*Workflow* Supportfor*Climate ...Service deployed on demand through the INDIGO PaaS Adaptors interacting with the INDIGO PaaS for dynamic instantiation of

Large-­‐Scale  Data  Analy0cs  Workflow  Support  for  Climate  Change  

Experiments      

S.  Fiore,  C.  Doutriaux,  D.  Palazzo,  A.  D’Anca,  Z.  Shaeen,  D.  Elia,  J.  BouIe,  V.  Anantharaj,  D.  N.  Williams,  G.  Aloisio  

Page 2: Large&Scale*DataAnaly0cs*Workflow* Supportfor*Climate ...Service deployed on demand through the INDIGO PaaS Adaptors interacting with the INDIGO PaaS for dynamic instantiation of

INDIGO  DataCloud  project  

•  The  proposed  case  study  is  mainly  related  to  the  climate  change  community  

•  It  is  directly  connected  to  the  Coupled  Model  Intercomparison  Project  (CMIP)  and  to  the  Earth  System  Grid  Federa;on  (ESGF)  infrastructure  

•  INDIGO…  

2  

Page 3: Large&Scale*DataAnaly0cs*Workflow* Supportfor*Climate ...Service deployed on demand through the INDIGO PaaS Adaptors interacting with the INDIGO PaaS for dynamic instantiation of

INDIGO  &  the  Climate  Model  Intercomparison  Data  Analysis  case  study  

•  The  proposed  case  study  is  mainly  related  to  the  climate  change  community  •  It  is  directly  connected  to  the  Coupled  Model  Intercomparison  Project  (CMIP)  and  to  the  Earth  System  Grid  Federa;on  (ESGF)  infrastructure  •  A  EU/US  testbed  has  been  setup  at  CMCC,  LLNL,  ORNL  and  PSNC  to  demonstrate  the  feasibility  of  the  approach  and  provide  real  feedback  to  end  users  •  Preliminary  results  have  been  presented  by  Valen0ne  G.  Anantharaj  (ORNL)  at  the  IEEE  Big  data  2016  conference  this  week  •  S.  Fiore  et  al,  “Distributed  and  cloud-­‐based  mul2-­‐model  analy2cs  experiments  on  large  volumes  

of  climate  change  data  in  the  Earth  System  Grid  Federa2on  eco-­‐system”,  IEEE  Big  Data  Conference  2016,  December  5-­‐8,  2016,  Washington  [to  appear].  

 

 

3  

Page 4: Large&Scale*DataAnaly0cs*Workflow* Supportfor*Climate ...Service deployed on demand through the INDIGO PaaS Adaptors interacting with the INDIGO PaaS for dynamic instantiation of

The  context  of  the  case  study:    ESGF  and  the  CMIP5  data  archive  

DOE/ANL  

DOE/PNNL  

DOE/LLNL  

DOE/ORNL  

NASA/JPL  NASA/NCCS  

MPI/DKRZ  

BADC  

CMCC  

IPSL  

NOAA/ESRL  

ANU/NCI  

IPCC/CMIP  ACME  

Japan   Ireland  

Norway  China  

Canada  

NSF/NCAR   NOAA/GFDL  

C-­‐LAMP  ARM  ACME  

ACME  

DOE/NERSC   Russia  

ACME  

IPCC/CMIP5  CORDEX  

IPCC/CMIP5  CORDEX  

IPCC/CMIP5  PMIIP3  

IPCC/CMIP5  

obs4MIPs  MERRA  GMAO  

DCMIP  

IPCC/CMIP5  

Image courtesy: Dean N. Williams (LLNL)   4  

Page 5: Large&Scale*DataAnaly0cs*Workflow* Supportfor*Climate ...Service deployed on demand through the INDIGO PaaS Adaptors interacting with the INDIGO PaaS for dynamic instantiation of

Requirements  analysis  for  the    climate  change  case  study  

5  

Page 6: Large&Scale*DataAnaly0cs*Workflow* Supportfor*Climate ...Service deployed on demand through the INDIGO PaaS Adaptors interacting with the INDIGO PaaS for dynamic instantiation of

High-­‐level  view  of  the  mul0-­‐model  experiment  on    Precipita0on  trend  analysis  

Single  model  precipita0on  trend  analysis  

Mul0-­‐model  sta0s0cal  analysis  

6  

Page 7: Large&Scale*DataAnaly0cs*Workflow* Supportfor*Climate ...Service deployed on demand through the INDIGO PaaS Adaptors interacting with the INDIGO PaaS for dynamic instantiation of

Climate  Model  Intercomparison  Data  Analysis  case  study  challenges  &  issues  

•  CMIP*  experiments  provide  input  for  mul0-­‐model  analy0cs  experiments  (e.g.  trend  analysis)  •  Input  data  from  mul0ple  models  needed  

•  Data  distribu;on  inherent  in  the  infrastructure    •  Data  download  is  a  big  barrier  for  end-­‐users  (download  can  take  from  several  days  to  weeks!)  

•  Current  infrastructure  mainly  for  data  sharing  •  Data  analysis  mainly  performed  using  client-­‐side  approaches  

•  Complexity  of  the  data  analysis  needs  more  robust  end-­‐to-­‐end  support  

7  

Page 8: Large&Scale*DataAnaly0cs*Workflow* Supportfor*Climate ...Service deployed on demand through the INDIGO PaaS Adaptors interacting with the INDIGO PaaS for dynamic instantiation of

ESGF NodesINDIGO FGEngine + Kepler

The  current  scien0fic  workflow  in  ESGF  (client-­‐side)  

8  

Page 9: Large&Scale*DataAnaly0cs*Workflow* Supportfor*Climate ...Service deployed on demand through the INDIGO PaaS Adaptors interacting with the INDIGO PaaS for dynamic instantiation of

The  paradigm  shiq  implemented  in  INDIGO  (server-­‐side)  

9  

Page 10: Large&Scale*DataAnaly0cs*Workflow* Supportfor*Climate ...Service deployed on demand through the INDIGO PaaS Adaptors interacting with the INDIGO PaaS for dynamic instantiation of

Architectural  solu0on  Running  the  mul0-­‐model  experiment    

 

•  Distributed  experiments  for  climate  data  analysis  

•  Server-­‐side  processing  

•  Two-­‐level  workflow  strategy  to  orchestrate  mul0-­‐site  experiments  

•  Three-­‐level  of  parallelism    •  Inter-­‐workflow,  intra-­‐

workflow,  intra-­‐task  

•  Access  through  Kepler  GUI  

•  INDIGO  solu0ons:  Kepler,  FGEngine,  Ophidia,  INDIGO  PaaS  

•  INDIGO  complements,  extends  and  interoperates  with  the  ESGF  stack  

Legend:  legacy  components  in  green,  INDIGO  components  in  orange,  external  components  in  yellow     10  

REST APIFG Engine

JSAGA

.nc

.nc.nc

ESGF Node

ESGF Node ESGF

Node

Ophidia big data framework

Ophidia big data framework Ophidia big data

framework

Kepler

Big data analytics gateway for climate change

(based on FG framework) Other Science Gateways & Mobile Apps (developed

CLIApps

MyExperiment Workflow

Marketplace

HTTP PubService

Identity provider Search index, data access

Identity provider Search index, data access

Identity provider Search index, data access

Search BigData WfMS

Search & discovery

big data exp. submission

Publication

long running services

Apps developed by external users by leveraging the

provided APIs

Service deployed on demand through the

INDIGO PaaS

Adaptors interacting with the INDIGO PaaS for dynamic instantiation of cloud

resources through TOSCA Interfaces

IDC

INDIGO Paas

CLIAppsCLI

Apps

Page 11: Large&Scale*DataAnaly0cs*Workflow* Supportfor*Climate ...Service deployed on demand through the INDIGO PaaS Adaptors interacting with the INDIGO PaaS for dynamic instantiation of

Running  the  mul0-­‐model  experiment      

•  Applica;on-­‐domain  oriented    •  Strong  requirements  elicita0on/valida0on  

•  Prototype  running  on  a  real  testbed  involving  3  ESGF  sites  +  PSNC  •  Integra;on  of  tools  widely  used  by  the  community  (UV-­‐CDAT  data  viz.)  

•  Integrates  mul0ple  INDIGO  components  (FGEngine,  Kepler,  Ophidia)  •  Planned  IAM,  Orchestrator,  CLUES,  IM  

•  Poten0al  impact:  very  high    

•  We  expect  the  0me-­‐to-­‐solu0on  for  the  mul0-­‐model  experiment  can  go  down  from  weeks  to  hours!  

 

11  

Page 12: Large&Scale*DataAnaly0cs*Workflow* Supportfor*Climate ...Service deployed on demand through the INDIGO PaaS Adaptors interacting with the INDIGO PaaS for dynamic instantiation of

Architectural  solu0on  Flexible  and  dynamic  deployment  

•  Dynamic  instan0a0on  of  Ophidia  and  Kepler  WfMS  

•  Automated  deployment  through  TOSCA  document  

•  Data  locality  key  due  to  the  large  amount  of  data    

•  Interoperability  with  ESGF    

•  Integra0on  of  largely  adopted  community-­‐based  tools    •  UV-­‐CDAT  viz  tool  •  OPeNDAP/THREDDS  

(publica0on  services)    Legend:  legacy  components  in  green,  INDIGO  components  in  orange,  external  components  in  yellow     12  

REST APIFG Engine

JSAGA

.nc

ESGF Node

Kepler

Big data analytics gateway for climate change

(based on FG framework) Other Science Gateways & Mobile Apps (developed

CLIApps

MyExperiment Workflow

Marketplace

HTTP PubService

Search BigData WfMS

big data exp. submission

Publication

long running services

Apps developed by external users by leveraging the

provided APIs

Service deployed on demand through the

INDIGO PaaS

Adaptors interacting with the INDIGO PaaS for dynamic instantiation of cloud

resources through TOSCA Interfaces

IDC

CLIAppsCLI

Apps

Mount

Cloud environment (e.g. OpenNebula)

Tosca Interface

Legacy eco-system

Tosca recipes for Ophidia/Kepler

Orchestrator

IM

ESGF Site with INDIGO extensions

Other sites in the ESGF federation

I/O I/O I/O I/O

Compute ComputeServer Front-end

Ophidia cluster

Page 13: Large&Scale*DataAnaly0cs*Workflow* Supportfor*Climate ...Service deployed on demand through the INDIGO PaaS Adaptors interacting with the INDIGO PaaS for dynamic instantiation of

Flexible  and  dynamic  deployment  

•  PlaSorm-­‐as-­‐a-­‐Service  level  •  Dynamic  deployment  of  Ophidia  through  the  INDIGO  PaaS  layer  •  Based  on  ansible  roles  and  TOSCA  document  •  Run  through  the  Command  Line  Interface  

 •  Dynamic  and  flexible  deployment  of  an  Ophidia  cluster  

•  integrates  mul0ple  INDIGO  components  (IAM,  CLUES,  IM,  Orchestrator,  Ophidia)  

•  automates  and  makes  easy  the  deployment  of  an  Ophidia  cluster  •  Time-­‐to-­‐solu0on  (deployment/setup)  from  1-­‐2  days  to  less  than  1  hour!  

•  enables  the  implementa0on  of  more  “isolated”  scenarios,  where  resources  are  deployed  on  demand  on  an  experiment-­‐basis  

13  

Page 14: Large&Scale*DataAnaly0cs*Workflow* Supportfor*Climate ...Service deployed on demand through the INDIGO PaaS Adaptors interacting with the INDIGO PaaS for dynamic instantiation of

Added  value  and  Innova0on    

Added  Value  •  Paradigm  shiW  from  client-­‐  to  server-­‐side  •  Intrinsic  data  movement  reduc;on  •  Lightweight  end-­‐user  setup  •  Re-­‐usability  of  data,  final/intermediate  products,  workflows,  etc.    •  Complements,  extends  and  interoperates  with  the  ESGF  stack  •  Provisioning  of  a  “new  and  easy  to  use  tool”  for  scien0sts  •  Dras0c  ;me-­‐to-­‐solu;on  reduc;on    

Innova;on  •  provisioning  of  a  core  infrastructural  piece  (based  on  big  data  and  cloud  technologies)  enabling  large-­‐scale  data  analysis  and  strongly  needed  in  the  current  climate  research  ecosystem  

  14  

Page 15: Large&Scale*DataAnaly0cs*Workflow* Supportfor*Climate ...Service deployed on demand through the INDIGO PaaS Adaptors interacting with the INDIGO PaaS for dynamic instantiation of

Exploita0on:  ESGF  &  RDA    •  Research  Data  Alliance    

•  Involvement  into  the  Array-­‐Database  Assessment  WG    •  RDA  applica;on  with  the  aim  of  providing  a  provenance-­‐aware  analy2cs  eco-­‐system  (ongoing  evalua0on  –  November  15,  2016)  

•  Earth  System  Grid  Federa;on  •  Involvement  into  several  ESGF  Working  Groups  •  Interac0on  with  climate  scien0sts  from  different  ESGF  sites  •  Testbed  across  EU/US  involving  3  ESGF  sites  •  Add  new  ESGF  sites  to  the  testbed    

•  Goal:  increase  exploita;on  and  users  engagement!  

•  If  you  want  to  join  the  testbed,  please  contact  us  ([email protected])    15  

Page 16: Large&Scale*DataAnaly0cs*Workflow* Supportfor*Climate ...Service deployed on demand through the INDIGO PaaS Adaptors interacting with the INDIGO PaaS for dynamic instantiation of

Dissemina0on  events  •  EGU  2015  (12-­‐17  April  2015,  Vienna,  Austria)  •  RDA  Sixth  Plenary  Mee0ng  (23-­‐25  September  2015,  Paris,  France)  

•  EOScience2.0  (12-­‐14  October  2015,  Frasca0,  Italy)  •  ESGF  F2F  Conference  2015  (7-­‐11  December  2015,  S.  Francisco,  CA,  USA)  

•  AGU2015  Conference  (14-­‐18  December  2015,  S.  Francisco,  CA,  USA)  

•  Ophidia  PlayDay  (29  April  2016,  Bologna,  Italy)  •  Invited  presenta0on  at  LLNL  (23  May  2016,  Livermore,  CA,  USA)    

•  Invited  presenta0on  at  ORNL  (26  May  2016,  Oak  Ridge,  TN,  USA)    

•  CMCC  Annual  Mee;ng  (30-­‐31  May  2016,  Lecce)  

•  Big  Data  and  Extreme  scale  Compu;ng  (15-­‐17  June  2016,  Frankfurt,  Germany)  

•  DI4R  (28-­‐30  September  2016,  Krakow,  Poland)  

•  ENES  Community  Mee;ng  Reading  2016  (25-­‐27  October  2016,  Reading,  UK)  

•  ESGF  F2F  2016  Conference  (Washington,  December  6-­‐9,  2016)   16  

Page 17: Large&Scale*DataAnaly0cs*Workflow* Supportfor*Climate ...Service deployed on demand through the INDIGO PaaS Adaptors interacting with the INDIGO PaaS for dynamic instantiation of

Thank  you  

17