data and analytics strategy - national energy research ... · data and analytics strategy 1 prabhat...

18
Data and Analytics Strategy 1 Prabhat Data and Analytics Group Lead February 23, 2015

Upload: buidan

Post on 05-Jul-2018

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Data and Analytics Strategy - National Energy Research ... · Data and Analytics Strategy 1 Prabhat Data and Analytics Group Lead February 23, 2015

Data and Analytics Strategy

-­‐  1  -­‐  

Prabhat Data and Analytics Group Lead February 23, 2015

Page 2: Data and Analytics Strategy - National Energy Research ... · Data and Analytics Strategy 1 Prabhat Data and Analytics Group Lead February 23, 2015

Talk Overview •  DAS  Team  and  Goals  •  Big  Data  Hardware  •  Big  Data  So6ware  •  Big  Data  Users  

-­‐  2  -­‐  

Page 3: Data and Analytics Strategy - National Energy Research ... · Data and Analytics Strategy 1 Prabhat Data and Analytics Group Lead February 23, 2015

Talk Overview •  DAS  Team  and  Goals  •  Big  Data  Hardware  •  Big  Data  So6ware  •  Big  Data  Users  

-­‐  3  -­‐  

Page 4: Data and Analytics Strategy - National Energy Research ... · Data and Analytics Strategy 1 Prabhat Data and Analytics Group Lead February 23, 2015

Data and Analytics Team

-­‐  4  -­‐  

DAS  Team  Member   Technology  Areas  

Shreyas  Cholia   Gateways,  Web,  Grid  

Yushu  Yao   Databases,  Analy>cs  

Anne@e  Greiner   UI,  Web,    

Joaquin  Correa   Imaging,  Machine  Learning  

Burlen  Loring   Vis  

Jeff  Porter   Data  Management  

Oliver  Ruebel   Vis,  Analy>cs  

Dani  Ushizima   Imaging,  R  

R.  K.  Owen   NIM  

Michael  Urashka   Web  

Page 5: Data and Analytics Strategy - National Energy Research ... · Data and Analytics Strategy 1 Prabhat Data and Analytics Group Lead February 23, 2015

DAS Team Goal: “Enable Data-Centric Science at Scale”

•  Big  Data  So6ware  –  Broad  ecosystem  of  capabili>es  and  technologies  –  Research  and  evaluate  –  Customize  and  op>mize  for  NERSC/HPC  plaZorms  –  Deploy  and  maintain  

•  Engaging  NERSC  Users  –  Broad  user  base  support  –  1-­‐1  in-­‐depth  engagement  

-­‐  5  -­‐  

Page 6: Data and Analytics Strategy - National Energy Research ... · Data and Analytics Strategy 1 Prabhat Data and Analytics Group Lead February 23, 2015

Talk Overview •  DAS  Team  and  Goals  •  Big  Data  Hardware  •  Big  Data  So6ware  •  Big  Data  Users  

-­‐  6  -­‐  

Page 7: Data and Analytics Strategy - National Energy Research ... · Data and Analytics Strategy 1 Prabhat Data and Analytics Group Lead February 23, 2015

7

Astronomy

Physics Light Sources

Genomics Climate

DOE Facilities are Facing a Data Deluge

Page 8: Data and Analytics Strategy - National Energy Research ... · Data and Analytics Strategy 1 Prabhat Data and Analytics Group Lead February 23, 2015

We currently deploy separate Compute Intensive and Data Intensive Systems

-­‐  8  -­‐  

Compute  Intensive   Data  Intensive  

Carver  

Genepool  PDSF  

Page 9: Data and Analytics Strategy - National Energy Research ... · Data and Analytics Strategy 1 Prabhat Data and Analytics Group Lead February 23, 2015

Cori: Unified architecture for HPC and Big Data •  64  Cabinets  of  Cray  XC  System  

–  50  cabinets  ‘Knights  Landing’  manycore  compute  nodes  –  10  cabinets  ‘Haswell’  compute  nodes  for  data  par44on  –  ~4  cabinets  of  Burst  Buffer  –  14  external  login  nodes  –  Aries  Interconnect  (same  as  on  Edison)  

•  Lustre  File  system  –  28  PB  capacity,  432  GB/sec  peak  performance  

•  NVRAM  “Burst  Buffer”  for  I/O  acceleraOon  •  Significant  Intel  and  Cray  applicaOon  transiOon  support  

•  Delivery  in  mid-­‐2016;  installaOon  in  new  LBNL  CRT  -­‐  9  -­‐  

Page 10: Data and Analytics Strategy - National Energy Research ... · Data and Analytics Strategy 1 Prabhat Data and Analytics Group Lead February 23, 2015

Popular features of a data intensive system can be supported on Cori

-­‐  10  -­‐  

Data  Intensive  Workload  Need   Cori  SoluOon  

Local  Disk   NVRAM  ‘burst  buffer’  

Large  memory  nodes   128  GB/node  on  Haswell;  Op>on  to  purchase  fat  (1TB)  login  node    

Massive  serial  jobs   NERSC  serial  queue  prototype  on  Edison;    MAMU  

Complex  workflows   More  (14)  external  login  nodes;  CCM  mode  for  now  

Communicate  with  databases  from  compute  nodes  

Proposed  Compute  Gateway  Node  COE  

Stream  Data  from  observa>onal  facili>es   Proposed  Compute  Gateway  Node  COE  

Easy  to  customize  environment   Proposed  User  Defined  Images  COE  

Policy  Flexibility   Improvements  coming  with  Cori:  Rolling  upgrades,  CCM,  MAMU,  above  COEs  would  also  contribute  

Page 11: Data and Analytics Strategy - National Energy Research ... · Data and Analytics Strategy 1 Prabhat Data and Analytics Group Lead February 23, 2015

Talk Overview •  DAS  Team  and  Goals  •  Big  Data  Hardware  •  Big  Data  So6ware  •  Big  Data  Users  

-­‐  11  -­‐  

Page 12: Data and Analytics Strategy - National Energy Research ... · Data and Analytics Strategy 1 Prabhat Data and Analytics Group Lead February 23, 2015

Big Data Software Portfolio CapabiliOes   Technology  Areas   Tools,  Libraries  

Data  Transfer  +  Access   Globus,  Grid  Stack,  Authen>ca>on  

Globus  Online,  Grid  FTP  

Portals,  Gateways,  RESTful  APIs   NEWT  

Data  Processing   Workflows   Swim,  Fireworks,  …  

Data  Management   Formats,  Models  Databases  

HDF5,  NetCDF  

Storage,  I/O,  Movement   SRM  

Data  Analy>cs   Sta>s>cs,  Machine  Learning   python,  R,  ROOT  

Imaging   OMERO,  Fiji,  …  

Data  Visualiza>on   SciVis    InfoVis  

VisIt,  Paraview  

Backend  Infrastructure   Analy>cs  Stack  Databases    Virtualiza>on  

BDAS  SciDB,  MySQL,  PostgreSQL,  MongoDB  Docker  

-­‐  12  -­‐  

Page 13: Data and Analytics Strategy - National Energy Research ... · Data and Analytics Strategy 1 Prabhat Data and Analytics Group Lead February 23, 2015

Analytics Software Strategy

Hardware  

Resource    Management  

Run>me    Framework  

Tools  +  Libraries  

Analy>cs  Capabili>es  

Science  Applica>ons   Climate,  Cosmology,  Kbase,  Materials,  BioImaging,…  

Sta>s>cs,  Machine  Learning  

R,  python,  MLBase  

Image  Processing  

MATLAB  

Graph  Analy>cs  

GraphX  

Database  Opera>ons  

SQL  

MPI   Spark   SciDB  

Filesystems  (Lustre),  Batch/Queue  Systems  

SandyBridge/KNL  chipset,  Burst  Buffers,  Aries  Interconnect  

Page 14: Data and Analytics Strategy - National Energy Research ... · Data and Analytics Strategy 1 Prabhat Data and Analytics Group Lead February 23, 2015

Current DAS Engagements

•  AnalyOcs:    –  Cray,  UCB  AMPLab,  Databricks,  SkyTree,  Dato  –  Intel  Research,  Nervana  Systems,  UCB,  Harvard,  MIT,  CMU  

•  Data  Transfer,  Access:    –  Globus  

•  VisualizaOon    –  Kitware  

•  Data  Management:    –  HDF  Group  –  Paradigm4,  MongoDB  

-­‐  14  -­‐  

Page 15: Data and Analytics Strategy - National Energy Research ... · Data and Analytics Strategy 1 Prabhat Data and Analytics Group Lead February 23, 2015

Talk Overview •  DAS  Team  and  Goals  •  Big  Data  Hardware  •  Big  Data  So6ware  •  Big  Data  Users  

-­‐  15  -­‐  

Page 16: Data and Analytics Strategy - National Energy Research ... · Data and Analytics Strategy 1 Prabhat Data and Analytics Group Lead February 23, 2015

NERSC Users: How to get help?

•  DocumentaOon:  –  h@p://www.nersc.gov/users/somware/data-­‐visualiza>on-­‐and-­‐analy>cs/    

•  RouOne  startup/troubleshooOng  quesOons:  –  Trouble  >cket  system  

 •  In-­‐depth  1-­‐1  collaboraOons:  

–  e-­‐mail  [email protected]  

-­‐  16  -­‐  

Page 17: Data and Analytics Strategy - National Energy Research ... · Data and Analytics Strategy 1 Prabhat Data and Analytics Group Lead February 23, 2015

Top NERSC Production Workflows

•  Advanced  Light  Source  SPOT  suite  –  Real  >me  reconstruc>on,  experimental  steering  

•  Materials  Project  •  Cosmology  Supernovae/Transient  classificaOon  pipeline  

 

-­‐  17  -­‐  

Page 18: Data and Analytics Strategy - National Energy Research ... · Data and Analytics Strategy 1 Prabhat Data and Analytics Group Lead February 23, 2015

                 

           Ques>ons?  

         Contact:  [email protected]    

-­‐  18  -­‐