the u.s. national strategic computing initiative and the ... · 4 exascale computing project •...
TRANSCRIPT
The U.S. National Strategic Computing Initiative and the DOE Exascale Computing Project
Presented to HP-CAST 27, Salt Lake City, Utah November 11, 2016 Jim Ang, ECP Hardware Technology Director Sandia National Laboratories Acknowledgements: Rob Leland, Sandia Paul Messina, ANL, Doug Kothe, ORNL, Rajeev Thakur, ANL, and John Shalf, LBNL
SAND2016-11620 PE
3 Exascale Computing Project
NSCI Strategic Objectives 1. Accelerating delivery of a capable exascale computing system that integrates
hardware and software capability to deliver approximately 100 times the performance of current 10 petaflop systems across a range of applications representing government needs.
2. Increasing coherence between the technology base used for modeling and simulation and that used for data analytic computing.
3. Establishing, over the next 15 years, a viable path forward for future HPC systems even after the limits of current semiconductor technology are reached (the "post- Moore's Law era").
4. Increasing the capacity and capability of an enduring national HPC ecosystem by employing a holistic approach that addresses relevant factors such as networking technology, workflow, downward scaling, foundational algorithms and software, accessibility, and workforce development.
5. Developing an enduring public-private collaboration to ensure that the benefits of the research and development advances are, to the greatest extent, shared between the United States Government and industrial and academic sectors.
4 Exascale Computing Project
• The National Strategic Computing Initiative (NSCI) created by President Obama’s Executive Order aims to maximize the benefits of HPC for US economic competitiveness and scientific discovery – Leadership in high-performance computing and large-scale data analysis
will advance national competitiveness in a wide array of strategic sectors, including basic science, national security, energy technology, and economic prosperity.
– NSCI Strategic Objective (1) DOE Exascale Computing Project – NSCI Strategic Objective (2) Driver for Inter-agency Collaboration
• DOE is a lead agency within NSCI; DOE’s role is focused on advanced simulation through capable exascale computing emphasizing sustained performance on relevant applications and data analytic computing
DOE’s Role in NSCI
5 Exascale Computing Project
The Exascale Computing Project (ECP)
• A collaborative effort of Two US Department of Energy (DOE) organizations: – Office of Science (DOE-SC) – National Nuclear Security Administration
(NNSA)
• A 10-year project to accelerate the development of capable exascale systems – Led by DOE laboratories – Executed in collaboration with academia
and industry
A capable exascale computing system will leverage a balanced ecosystem (software,
hardware, applications)
6 Exascale Computing Project
Exascale Computing Project (ECP) scope
ECP was established to accelerate development of
capable exascale computing systems that integrate hardware
and software capability to deliver approximately 50×
more performance than today’s 20 PF machines for mission-
critical applications
ECP will not procure exascale supercomputers
Systems will still be acquired by the DOE lab facility
procurements
ECP will drive pre-exascale application development,
hardware, and software R&D to ensure that the US has a
capable exascale ecosystem in the early 2023 time frame
7 Exascale Computing Project
ECP leadership team Staff from 6 national laboratories, with combined experience of >300 years
Project Management
Kathlyn Boudwin, Director, ORNL
Application Development
Doug Kothe, Director, ORNL
Bert Still, Deputy Director,
LLNL
Software Technology Rajeev Thakur,
Director, ANL Pat McCormick, Deputy Director,
LANL
Hardware Technology
Jim Ang, Director, SNL John Shalf,
Deputy Director, LBNL
Exascale Systems Terri Quinn,
Director, LLNL Susan Coghlan,
Deputy Director, ANL
Chief Technology Officer
Al Geist, ORNL
Integration Manager
Julia White, ORNL
Communications Manager
Mike Bernhardt, ORNL
Exascale Computing Project
Paul Messina, Project Director, ANL
Stephen Lee, Deputy Project Director, LANL
8 Exascale Computing Project
Exascale Computing Project Goals
Develop scientific, engineering, and
large-data applications that
exploit the emerging,
exascale-era computational
trends caused by the end of Dennard
scaling and Moore’s law
Foster application development
Create software that makes
exascale systems usable
by a wide variety of scientists
and engineers across
a range of applications
Ease of use
Enable by 2023 ≥ two diverse
computing platforms with up
to 50× more computational capability than today’s 20 PF
systems, within a similar size, cost, and power footprint
≥ Two diverse architectures
Help ensure continued American leadership
in architecture, software and
applications to support scientific discovery, energy
assurance, stockpile
stewardship, and nonproliferation programs and
policies
US HPC leadership
9 Exascale Computing Project
What is a capable exascale computing system? A capable exascale computing system requires an entire computational ecosystem that:
• Delivers 50× the performance of today’s 20 PF systems, supporting applications that deliver high-fidelity solutions in less time and address problems of greater complexity
• Operates in a power envelope of 20–30 MW
• Is sufficiently resilient (average fault rate: ≤1/week)
• Includes a software stack that supports a broad spectrum of applications and workloads
This ecosystem will be developed
using a co-design approach
to deliver new software, applications,
platforms, and computational science
capabilities at heretofore unseen
scale
10 Exascale Computing Project
Application Development
Software Technology
Hardware Technology
Exascale Systems
Scalable software
stack
Science and mission
applications
Hardware technology elements
Integrated exascale
supercomputers
ECP has formulated a holistic approach that uses co-design and integration to achieve capable exascale
Correctness Visualization Data Analysis
Applications Co-Design
Programming models, development
environment, and runtimes
Tools Math libraries
and Frameworks
System Software, resource management threading, scheduling, monitoring, and control
Memory and Burst
buffer
Data management I/O and file system
Node OS, runtimes
Res
ilien
ce
Wor
kflo
ws
Hardware interface
ECP Timeline
FY16 FY17 FY18 FY19 FY20 FY21 FY22 FY23 FY25 FY24
NRE
Exascale Systems
Testbeds
Application Development
Software Technology
Hardware Technology
Phase 3 Phase 2 Phase 1
Award vendor
PathForward contracts
CD-0 CD-1/3a approval CD-2/3b CD-4
Notional testbed deliveries
Fund 1st AD/ST testbed
Release system
RFP
Award NRE
contracts
12 Exascale Computing Project
ECP WBS Exascale Computing Project
1.
Application Development
1.2 DOE Science and Energy
Apps 1.2.1
DOE NNSA Applications
1.2.2
Other Agency Applications
1.2.3 Developer
Training and Productivity
1.2.4 Co-Design and
Integration 1.2.5
Exascale Systems
1.5
Site Preparation
1.5.1
System Build Phase NRE
1.5.2
Prototypes and Testbeds
1.5.3
Base System Expansion
1.5.4
Co-Design and Integration
1.5.5
Hardware Technology
1.4
PathForward Vendor Node and System
Design 1.4.1
Design Space Evaluation
1.4.2
Co-Design and Integration
1.4.3
Software Technology
1.3 Programming Models and Runtimes
1.3.1 Tools 1.3.2
Mathematical and Scientific
Libraries and Frameworks
1.3.3
Data Analytics and Visualization
1.3.5
Data Management
and Workflows 1.3.4
System Software
1.3.6
Resilience and Integrity
1.3.7
Co-Design and Integration
1.3.8
Project Management
1.1 Project Planning
and Management
1.1.1 Project
Reporting & Controls, Risk Management
1.1.2
Information Technology and
Quality Management
1.1.5
Business Management
1.1.3
Procurement Management
1.1.4
Communications & Outreach
1.1.6
Integration 1.1.7
13 Exascale Computing Project
Application Development Focus Area: RFI Responses Reflected Broad Domain Coverage of Mission Space
Accelerator physics Aerospace Bioinformatics Chemical
science Climate
Combustion Cosmology Energy efficiency
Energy storage
Fossil energy
Manufacturing Materials science
National security Neuroscience Nuclear
energy
Nuclear physics
Particle physics
Plasma Physics
Precision medicine
Renewable energy
Geoscience Grid modernization
High energy density physics
Inertial fusion energy
Magnetic fusion energy Weather
RFI showed potential for broad exascale application impact
DOE Science and Energy Programs: SC (HEP, NP, FES, BES, BER); Energy
(EERE, NE, FE, EDER)
DOE NNSA Programs: Defense Programs, Defense
Nuclear Nonproliferation, Naval Reactors
Other Agencies: NSF, NOAA, NASA,
NIH
Astrophysics
14 Exascale Computing Project
Exascale Applications Will Address National Challenges Summary of current DOE Science & Energy application development projects
Nuclear Energy (NE)
Accelerate design and
commercialization of next-
generation small modular reactors* Climate Action Plan;
SMR licensing support; GAIN
Climate (BER)
Accurate regional impact
assessment of climate change* Climate Action Plan
Wind Energy (EERE)
Increase efficiency and reduce cost of turbine wind
plants sited in complex terrains* Climate Action Plan
Combustion (BES)
Design high-efficiency, low-
emission combustion
engines and gas turbines*
2020 greenhouse gas and 2030
carbon emission goals
Chemical Science (BES,
BER) Biofuel catalysts design; stress-resistant crops
Climate Action Plan; MGI
* Scope includes a discernible data science component
15 Exascale Computing Project
Exascale Applications Will Address National Challenges Summary of current DOE Science & Energy application development projects
Materials Science (BES)
Find, predict, and control materials and properties:
property change due to hetero-interfaces and
complex structures
MGI
Materials Science (BES)
Protein structure and dynamics; 3D molecular
structure design of engineering
functional properties*
MGI; LCLS-II 2025 Path Forward
Nuclear Materials (BES,
NE, FES) Extend nuclear
reactor fuel burnup and
develop fusion reactor plasma- facing materials* Climate Action Plan;
MGI; Light Water Reactor
Sustainability; ITER; Stockpile
Stewardship Program
Accelerator Physics (HEP)
Practical economic design of 1 TeV electron-
positron high-energy collider
with plasma wakefield
acceleration* >30k accelerators today in industry, security, energy,
environment, medicine
Nuclear Physics (NP)
QCD-based elucidation of
fundamental laws of nature: SM validation and
beyond SM discoveries
2015 Long Range Plan for Nuclear Science; RHIC, CEBAF, FRIB
* Scope includes a discernible data science component
16 Exascale Computing Project
Exascale Applications Will Address National Challenges Summary of current DOE Science & Energy and Other Agency application development projects Magnetic Fusion
Energy (FES)
Predict and guide stable ITER operational
performance with an integrated whole device
model* ITER; fusion
experiments: NSTX, DIII-D, Alcator C-
Mod
Advanced Manufacturing
(EERE)
Additive manufacturing process design for qualifiable
metal components* NNMIs; Clean
Energy Manufacturing
Initiative
Cosmology (HEP)
Cosmological probe of standard
model (SM) of particle physics: Inflation, dark matter, dark
energy* Particle Physics
Project Prioritization Panel (P5)
Geoscience (BES, BER,
EERE, FE, NE) Safe and efficient use of subsurface
for carbon capture and
storage, petroleum extraction, geothermal
energy, nuclear waste*
EERE Forge; FE NRAP; Energy-Water Nexus;
SubTER Crosscut
Precision Medicine for Cancer (NIH)
Accelerate and translate cancer research in RAS pathways, drug
responses, treatment strategies*
Precision Medicine in Oncology; Cancer
Moonshot
* Scope includes a discernible data science component
17 Exascale Computing Project
Application Co-Design (CD)
• Pulls ST and HT developments into applications
• Pushes application requirements into ST and HT RD&D
• Evolved from best practice to an essential element of the development cycle
• Motif: algorithmic method that drives a common pattern of computation and communication
• CD Centers must address all high priority motifs invoked by ECP applications, including not only the 7 “classical” motifs but also the additional 6 motifs identified to be associated with data science applications
• Evaluate, deploy, and integrate exascale hardware-savvy software designs and technologies for key crosscutting algorithmic motifs into applications
Essential to ensure that applications effectively
utilize exascale systems
Executed by several CD Centers focusing on a
unique collection of algorithmic motifs invoked
by ECP applications
Game-changing mechanism for delivering next-
generation community products with broad application impact
18 Exascale Computing Project
Software Technology Focus Area
ST requirements derived from • Analysis of the software needs of exascale applications
• Inventory of software environments at major DOE HPC facilities (ALCF, OLCF, NERSC, LLNL, LANL, SNL) – For current systems and the next acquisition in 2–3 years
• Expected software environment for an exascale system
• Requirements beyond the software environment provided by vendors of HPC systems
19 Exascale Computing Project
Conceptual ECP Software Stack
Hardware interfaces
Node OS, Low-level Runtime
Data Management, I/O & File System
Math Libraries & Frameworks
Programming Models, Development
Environment, Runtime
Applications
Tools
Correctness Visualization Data Analysis
Co-Design
System Software, Resource Management,
Threading, Scheduling, Monitoring and Control
Memory & Burst Buffer
Res
ilien
ce
Wor
kflo
ws
21 Exascale Computing Project
Hardware Technology Focus Area
• Leverage our window of time to support advances in both system and node architectures
• Close gaps in vendor’s technology roadmaps or accelerate time to market to address ECP performance targets while affecting and intercepting the 2019 Exascale System RFP
• Provide an opportunity for ECP Application Development and Software Technology efforts to influence the design of future node and system architecture designs
22 Exascale Computing Project
Hardware Technology Overview Objective: Fund R&D to design hardware that meets ECP Targets
for application performance, power efficiency, and resilience
Issue PathForward Hardware Architecture R&D contracts that deliver: • Conceptual system, and associated node designs • Analysis of performance improvement relative to level of effort required to migrate software to the conceptual system design • Simulation, emulation or test hardware, technology demonstrations to quantify performance gains over existing roadmaps • Support for active engagement in ECP holistic co-design efforts
DOE labs engage to: • Evaluate PathForward RFP responses • Participate in reviews and evaluation of PathForward deliverables • Develop Architectural Analysis, Abstract Machine Models, Proxy Architectures of PathForward designs to support ECP Co-Design
23 Exascale Computing Project
Overarching Goals for PathForward
• Improve the quality and number of competitive offeror responses to the Exascale Systems RFP
• Improve the offeror’s confidence in the value and feasibility of aggressive advanced technology options that would be bid in response to the Exascale Systems RFP
• Improve DOE confidence in technology performance benefit, programmability and ability to integrate into a credible system platform acquisition
24 Exascale Computing Project
PathForward will drive improvements in vendor offerings that address ECP’s needs for scale, parallel simulations, and large scientific data analytics
PathForward addresses the disruptive trends in computing due to the power challenge
Power challenge
• End of Dennard scaling
• Today’s technology: ~50MW to 100 MW to power the largest systems
Processor/node trends
• GPUs/accelerators
• Simple in order cores
• Unreliability at near-threshold voltages
• Lack of large-scale cache coherency
• Massive on-node parallelism
System trends
• Complex hardware
• Massive numbers of nodes
• Low bandwidth to memory
• Drop in platform resiliency
Disruptive changes
• New algorithms • New programming
models • Less “fast”
memory • Managing for
increasing system disruptions
• High power costs
4 Challenges: Power, memory, parallelism, and resiliency
25 Exascale Computing Project
Example PathForward Project Description
Project Descriptions
• Conceptual system design with associated node design
• Technical challenge
• Proposed Remedy
• Value proposition
• Work Plan
Key Deliverables
• Workshop to describe the Conceptual HW Design and opportunities for SW to influence hardware design priorities and decisions
• Architectural analysis to quantify impact of conceptual designs on ECP performance goals
• Software development platforms
• Hardware Technology Demonstrators – With R&D grade software stack – Node or Rack scale with test
hardware: ASIC Test Chips or FPGA Emulators
26 Exascale Computing Project
Capable exascale computing requires close coupling and coordination of key
development and technology R&D areas
Application Development
Software Technology
Hardware Technology
Exascale Systems
ECP
Integration and co-design is key
27 Exascale Computing Project
Holistic co-design and culture change • ECP is a very large DOE Project, composed of over 80 separate projects
– Many organizations: National Labs, Vendors, Universities – Many technologies – At least two diverse system architectures – Different timeframes (Three phases)
• For ECP to be successful, the whole is more than the sum of the parts
• Culture Change – AD and ST teams cannot assume that the node and system architectures are firmly
defined and take these as inputs to develop their project plans: Scope, Schedule and Budgets
– HT PathForward projects also canot assume that applications, benchmarks and software stack are fixed inputs for their project plans
– Each ECP project needs to understand that they do not operate in a vacuum – Initial assumptions about inputs can lead to preliminary project plans with associated
deliverables, but there needs to be flexibility – In Holistic Co-design, each project’s output can be another project’s input
28 Exascale Computing Project
Holistic co-design and ECP challenges • Multi-disciplinary Co-design Teams
– Funding for ECP projects arrive in technology-centric bins, e.g. the ECP focus areas – ECP leadership must foster integration of projects into collaborative co-design teams – Every ECP project’s performance evaluation will include: how well they play with others
• The ECP Leadership Challenge – With ~25 AD teams, ~50 ST teams, ~5 CD teams, ~5 PathForward teams:
All-to-all communication is impractical – The Co-design projects and the HT Design Space Evaluation project team will provide
capabilities to help manage some of the communication workload • Proxy applications and benchmarks • Abstract Machine Models and Proxy Architectures
– The ECP Leadership team will be actively working to indentify: • The alignments of cross-cutting projects that will form natural co-design collaborations • The projects that are orthogonal and probably do not need to expend time and energy trying to
force an integration – this will need to be monitored to ensure a new enabling technology does not change this assessment
29 Exascale Computing Project
NSCI Strategic Objectives 1. Accelerating delivery of a capable exascale computing system that integrates
hardware and software capability to deliver approximately 100 times the performance of current 10 petaflop systems across a range of applications representing government needs.
2. Increasing coherence between the technology base used for modeling and simulation and that used for data analytic computing.
3. Establishing, over the next 15 years, a viable path forward for future HPC systems even after the limits of current semiconductor technology are reached (the "post- Moore's Law era").
4. Increasing the capacity and capability of an enduring national HPC ecosystem by employing a holistic approach that addresses relevant factors such as networking technology, workflow, downward scaling, foundational algorithms and software, accessibility, and workforce development.
5. Developing an enduring public-private collaboration to ensure that the benefits of the research and development advances are, to the greatest extent, shared between the United States Government and industrial and academic sectors.
30 Exascale Computing Project
Motivation for NSCI Objective 2: Increasing coherence between simulation and analytics ! For simulation (HPC)
! HPC simulation must ride on some commodity curve ! Larger market forces behind analytics ! Can exploit commodity component technology from analytics
! For analytics (Large Scale Data Analytics - LSDA) ! Large Scale Data Analytics problems becoming ever more sophisticated ! Requiring more coupled methods ! Can exploit architectural lessons from HPC simulation
! For both: Integration of simulation and analytics in the same workflow ! Automation of analysis of data from simulation ! Creation of synthetic data via simulation to augment analysis ! Automated generation and testing of hypothesis ! Exploration of new scientific and technical scenarios
Mutual&inspira,on,&technical&synergy,&and&economies&of&scale&in&the&crea,on,&deployment,&and&use&of&HPC&resources&
31 Exascale Computing Project
Data structures describing simulation and analytics differ Graphs from simulations may be irregular, but have more locality than those derived from analytics
Computa,onal&&Simula,on&&of&physical&&phenomena:&
Climate&modeling&& Car&crash&&
Internet&connec,vity&& Yeast&protein&interac,ons&&
Large&Scale&Data&Analy,cs:&
Figures from Leland et. al. courtesy of Yelick, LBNL.
32 Exascale Computing Project
Simulation
Analytics
Standard benchmarks include: • LINPACK (smallest data intensiveness; barely visible on graph) • STREAM • SPEC FP • Spec Int
Memory performance demands differ A key differentiator in the performance of simulation and analytics
Figure from Murphy & Kogge with adjustment to double radius of Linpack data point to make it visible.
Area%of%the%circle%=%rela.ve%data%intensiveness%(i.e.%total%amount%of%unique%data%accessed%over%a%fixed%interval%of%instruc.ons)%
Simula.on%
Analytics
33 Exascale Computing Project
Application code property Simulation Analytics
Spatial locality High Low Temporal locality Moderate Low Memory footprint Moderate High
Computation type May be floating-point dominated* Integer intensive
Input-output orientation Output dominated Input dominated
* Increasingly, simulation work has become less floating-point dominated
Application code characteristics differ
Contras,ng&proper,es:&
34 Exascale Computing Project
So what do we really mean by “increasing coherence” between simulation and analytics?
! NOT one system ostensibly optimized for both simulation and analytics
! Greater commonality in underlying Commodity Computing components and design principles
! Greater interoperability, allowing interleaving of both types of computations
…&A&more&common&hardware&and&soDware&roadmap&&between&simula,on&and&analy,cs&
35 Exascale Computing Project
Simulation and analytics are evolving to become more similar in their architectural needs ! Current challenges for the LSDA community
! Data movement ! Power consumption ! Memory/interconnect bandwidth ! Scaling efficiency
! Instruction mix for Sandia’s HPC engineering codes ! Memory operations 40% ! Integer operations 40% ! Floating point 10% ! Other 10%
! Common design impacts of energy cost trends ! Increased concurrency (processing threads, cores, memory depth) ! Increased complexity and burden on
! system software, languages, tools, runtime support, codes
…&similar&to&HPC&simula,on&
…&similar&to&LSDA&
36 Exascale Computing Project
Architectural Characteristic Simulation Analytics
Computation Memory address generation dominated Same
Primary memory
Low power, high bandwidth, semi-random access Same
Secondary memory
Emerging technologies may offset cost, allowing much more memory
… require extremely large memory spaces
Storage Integration of another layer of memory hierarchy to support checkpoint/restart
… to support out-of-core data set access
Interconnect technology
High bisection bandwidth, (for relatively coarse-grained access) … (for fine-grained access)
System software (node-level)
Low dependence on system services, increasingly adaptive, resource management for structured parallelism
… highly adaptive, resource management for unstructured parallelism
System software (system-level)
Increasingly irregular workflows Irregular workflows
Emerging architectural and system software synergies
38 Exascale Computing Project
NSCI Strategic Objectives 1. Accelerating delivery of a capable exascale computing system that integrates
hardware and software capability to deliver approximately 100 times the performance of current 10 petaflop systems across a range of applications representing government needs.
2. Increasing coherence between the technology base used for modeling and simulation and that used for data analytic computing.
3. Establishing, over the next 15 years, a viable path forward for future HPC systems even after the limits of current semiconductor technology are reached (the "post- Moore's Law era").
4. Increasing the capacity and capability of an enduring national HPC ecosystem by employing a holistic approach that addresses relevant factors such as networking technology, workflow, downward scaling, foundational algorithms and software, accessibility, and workforce development.
5. Developing an enduring public-private collaboration to ensure that the benefits of the research and development advances are, to the greatest extent, shared between the United States Government and industrial and academic sectors.
39 Exascale Computing Project
2020 2025 2030Year
Motivation for NSCI Objective 3: Technology Scaling Trends
Perf
orm
ance
Transistors
Thread Performance
Clock Frequency
Power (watts)
# Cores
Figure courtesy of Kunle Olukotun, Lance Hammond, Herb Sutter, and Burton Smith
40 Exascale Computing Project
Preparing for the Post-Moore’s Law Era
See article by John Shalf and Rob Leland in IEEE Computer, Vol. 48. Issue 12, December 2015
41 Exascale Computing Project
Photonic ICs
PETs
New architectures and packaging
Generalpurpose
CMOS
TFETs
Carbonnanotubes
andgraphene
Spintronics
New models ofcomputaion
Neuromorphic
Adabiaticreversible
Dataflow
Approximatecomputing
Systemson chip NTV
3D stacking,adv. packaging
Superconducting
Reconfigurablecomputing
y
x
z
QuantumAnalog
New
dev
ices
and
mat
eria
ls
Darksilicon
Numerous Opportunities to Compute Beyond Moore’s Law (but winning solution is unclear)
Revolutionary Heterogeneous
HPC architectures & software
More Efficient Architectures and Packaging First 10 years
New
Mat
eria
ls a
nd E
ffici
ent
Dev
ices
10
+ ye
ars
(10
year
lead
tim
e)
Acknowledgments
This research was supported by the Exascale Computing Project (http://www.exascaleproject.org), a joint U.S. Department of Energy and National Nuclear Security Administration project responsible for delivering a capable exascale ecosystem, including software, applications, hardware, and early testbed platforms, to support the nation’s exascale computing imperative.
U.S. Department of Energy Contract No [xxxxxxxxx]
www.ExascaleProject.org