recent hpc research trends and strategy in the united states

23
Recent HPC Research Trends and Strategy in the United States Professor William Kramer National Center for Supercomputing Applications, University of Illinois http://bluewaters.ncsa.illinois.edu

Upload: others

Post on 24-Jan-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Recent HPC Research Trends and Strategy in the United States

Recent HPC Research Trends and Strategy in the United States Professor William Kramer

National Center for Supercomputing Applications, University of Illinois http://bluewaters.ncsa.illinois.edu

Page 2: Recent HPC Research Trends and Strategy in the United States

Extreme Scale Motivations • Science and Research Drivers require more investment in

computing, but is more always the biggest? • More investment • Better/bigger systems • More efficient

• National Security and leader in HW and SW innovations • National Competitiveness for science, research and for

industry • Engagement/NRE funding for technology vendors

Impacts of Extreme Scale Computing - Oct 2017

Page 3: Recent HPC Research Trends and Strategy in the United States

• National • “Whole of government” approach • Public/private partnership with industry

and academia • Strategic

• Leverage beyond individual programs • Long time horizon (decade or more)

• Computing • HPC as advanced, capable computing technology • Multiple styles of computing and all necessary infrastructure • Scope includes everything necessary for a fully integrated capability

• Initiative • Above baseline effort • Link and lift efforts

National Strategic Computing Initiative Executive Order Signed July 29, 2015

Enhance U.S. strategic advantage in HPC for security, economic competitiveness, and scientific discovery

Impacts of Extreme Scale Computing - Oct 2017

https://www.nitrd.gov/nsci/index.aspx http://science.energy.gov/~/media/ascr/ascac/pdf/meetings/201512/Szulman_ASCAC_Briefing_120915.pdf

Page 5: Recent HPC Research Trends and Strategy in the United States

NSCI Objectives 1. Accelerate delivery of a capable exascale computing system (hardware,

software) to deliver approximately 100X the performance of current 10PF systems across a range of applications reflecting government needs

2. Increase coherence between technology base used for modeling and simulation and that used for data analytic computing.

3. Establish, over the next 15 years, a viable path forward for future HPC systems in the post Moore’s Law …

4. Increase the capacity and capability of an enduring national HPC ecosystem, employing a holistic approach … networking, workflow, downward scaling, foundational algorithms and software, and workforce development.

5. Develop an enduring public-private partnership to assure that the benefits .. are transferred to the U.S. commercial, government, and academic sectors

Impacts of Extreme Scale Computing - Oct 2017

http://science.energy.gov/~/media/ascr/ascac/pdf/meetings/201512/Szulman_ASCAC_Briefing_120915.pdf

Page 6: Recent HPC Research Trends and Strategy in the United States

The Government’s Co-Leader Roles in NSCI • DOE

• Capable exascale program • NSF

• Scientific discovery • Broader HPC ecosystem • Workforce Development

• DOD • Analytic computing to support missions: science and national

security • IARPA + NIST

• Future computing technologies • NASA, FBI, NIH, DHS, NOAA

• Deployment within their mission contexts

Impacts of Extreme Scale Computing - Oct 2017

プレゼンター
プレゼンテーションのノート
The NSCI is formulated as a “whole of government” activity
Page 7: Recent HPC Research Trends and Strategy in the United States

National AI Initiative • Additional National

Initiative – Oct 2016 • Synergistic with some

aspects of NSCI, but not linked

• Implementation and other details are to come

Impacts of Extreme Scale Computing - Oct 2017

Page 8: Recent HPC Research Trends and Strategy in the United States

National Science Foundation Initiatives • Strong Leadership in the National Strategic Computing Initiative

• Kept advocating sustained, productive technology that may have helped focus other programs

• National Academy report • Plans for provisioning new resources • Plans for Computer Science and Computer Engineering research

• NSF provides >80% of funding for CS research • Software Centers • Data focused Software and Infrastructure • NSF supports a large amount of computing for the National Institutes

of Health

Impacts of Extreme Scale Computing - Oct 2017

Page 9: Recent HPC Research Trends and Strategy in the United States

National Academy Report for NSF Future Leadership Computing Infrastructure

Impacts of Extreme Scale Computing - Oct 2017

• Recommendations 1. sustain and seek to grow its investments in advanced computing—to include

hardware and services, software and algorithms, and expertise 2. provide support for the revolution in data-driven science along with

simulation 3. collect community requirements and construct and publish roadmaps 4. allow investments … to be considered in an integrated manner 5. support the development and maintenance of expertise, scientific software,

and software tools … to make efficient use of its advanced computing resources

6. invest modestly to explore next-generation hardware and software technologies

7. manage advanced computing investments in a more predictable and sustainable way.

Page 10: Recent HPC Research Trends and Strategy in the United States

Impacts of Extreme Scale Computing - Oct 2017

Slide Courtesy of Irene Qualters-NSF

Page 11: Recent HPC Research Trends and Strategy in the United States

“Big Data” Data Analytics

High-Performance Modeling And Simulation

Large Scale Data Driven

Modeling And Simulation

Dat

a In

tens

ity

Computational Intensity

NSF Aspirations for Convergence

Impacts of Extreme Scale Computing - Oct 2017

Information Courtesy of Irene Qualters-NSF

Page 12: Recent HPC Research Trends and Strategy in the United States

2. Increase coherence between technology base used for modeling and simulation and that used for data analytic

computing

Modeling and Simulation - Multi-scale - Multi-physics - Multi-resolution - Multidisciplinary - Coupled models Data Science - Data Assimilation - Visualization - Image Analysis - Data Compression - Data Analytics

NSF Role: Support foundational research and research infrastructure within and across all disciplines (across all NSF directorates)

Time

Impacts of Extreme Scale Computing - Oct 2017

Slide Courtesy of Irene Qualters-NSF

Page 13: Recent HPC Research Trends and Strategy in the United States

NSF role in NSCI: Enduring Computational Ecosystem for Advancing Science and Engineering

Fundamental Research in HPC Platform

Technologies, Architectures, and

Approaches

Infrastructure Platform Pilots, Development and

Deployment

Computational and Data Literacy across all STEM

disciplines

Computational and Data Enabled Science and

Engineering Discovery

Impacts of Extreme Scale Computing - Oct 2017

Information Courtesy of Irene Qualters-NSF

Page 14: Recent HPC Research Trends and Strategy in the United States

What does DOE mean by Capable Exascale?

• Paul Messina, Director of the Exascale Project has this working definition of “capable exacale” from a presentation he delivered :

•  A capable exascale system is defined as a supercomputer that can solve science problems 50X faster (or more complex) than on the 20PF systems (Titan, Sequoia) of today in a power envelope of 20-30 MW and is sufficiently resilient that user intervention due to hardware or system faults is on the order of a week on average.

• And has a software stack that meets the needs of a broad spectrum of applications and workloads.

Impacts of Extreme Scale Computing - Oct 2017

Page 15: Recent HPC Research Trends and Strategy in the United States

DOE Exascale Computing Project Goals

Develop scientific, engineering, and

large-data applications that

exploit the emerging, exascale-era computational

trends caused by the end of Dennard

scaling and Moore’s law

Foster application

development

Create software that makes exascale systems usable

by a wide variety of scientists

and engineers across a range of

applications

Ease of use

Enable by 2023 ≥ two diverse

computing platforms with up to 50× more

computational capability than today’s 20 PF

systems, within a similar size, cost,

and power footprint

≥ Two diverse architectures

Help ensure continued American

leadership in architecture, software and

applications to support scientific discovery, energy

assurance, stockpile stewardship, and nonproliferation

programs and policies

US HPC leadership

Impacts of Extreme Scale Computing - Oct 2017

Slide Courtesy of Paul Messina-ANL

Page 16: Recent HPC Research Trends and Strategy in the United States

ECP is an holistic approach that uses co-design and integration Application

Development Software

Technology Hardware

Technology Exascale Systems

Scalable and productive software

stack

Science and mission

applications

Hardware technology elements

Integrated exascale

supercomputers

Correctness Visualization Data Analysis

Applications Co-Design

Programming models, development environment,

and runtimes Tools Math libraries and

Frameworks

System Software, resource management threading,

scheduling, monitoring, and control

Memory and Burst buffer

Data management

I/O and file system

Node OS, runtimes

Resil

ienc

e

Wor

kflo

ws

Hardware interface

ECP’s work encompasses applications, system software, hardware technologies and architectures, and workforce development

Impacts of Extreme Scale Computing - Oct 2017

Slide Courtesy of Paul Messina-ANL

Page 17: Recent HPC Research Trends and Strategy in the United States

ECP Mission Need Defines the Application Strategy

• Materials discovery and design

• Climate science • Nuclear energy • Combustion science • Large-data applications • Fusion energy • National security • Additive manufacturing • Many others!

• Stockpile Stewardship Annual Assessment and Significant Finding Investigations

• Robust uncertainty quantification (UQ) techniques in support of lifetime extension programs

• Understanding evolving nuclear threats posed by adversaries and in developing policies to mitigate these threats

• Discover and characterize next-generation materials

• Systematically understand and improve chemical processes

• Analyze the extremely large datasets resulting from the next generation of particle physics experiments

• Extract knowledge from systems-biology studies of the microbiome

• Advance applied energy technologies (e.g., whole-device models of plasma-based fusion systems)

Key science and technology challenges to be addressed

with exascale Meet national security needs

Support DOE science and energy missions

Impacts of Extreme Scale Computing - Oct 2017

Slide Courtesy of Paul Messina-ANL

Page 18: Recent HPC Research Trends and Strategy in the United States

Recent ST Selections Mapped to Software Stack Correctness Visualization

VTK-m, ALPINE (ParaView, VisIt) Data Analysis

ALPINE

Applications Co-Design

Programming Models, Development Environment, and Runtimes

MPI (MPICH, Open MPI), OpenMP, OpenACC, PGAS (UPC++, Global Arrays), Task-Based

(PaRSEC, Legion), RAJA, Kokkos, Runtime library for power steering

System Software, Resource Management Threading, Scheduling,

Monitoring, and Control Qthreads, Argobots, global resource

management

Tools PAPI, HPCToolkit, Darshan

(I/O), Perf. portability (ROSE, Autotuning,

PROTEAS, OpenMP), Compilers (LLVM, Flang)

Math Libraries/Frameworks ScaLAPACK, DPLASMA, MAGMA,

PETSc/TAO, Trilinos Fortran, xSDK, PEEKS, SuperLU, STRUMPACK,

SUNDIALS, DTK, TASMANIAN, AMP

Memory and Burst buffer

Chkpt/Restart (UNIFYCR), API and library for complex memory

hierarchy

Data Management, I/O and File System

ExaHDF5, PnetCDF, ROMIO, ADIOS,

Chkpt/Restart (VeloC), Compression, I/O services Node OS, low-level runtimes

Argo OS enhancements

Resil

ienc

e Ch

eckp

oint

/Res

tart

(Vel

oC, U

NIF

YCR)

Wor

kflo

ws

Hardware interface

Impacts of Extreme Scale Computing - Oct 2017

Slide Courtesy of Paul Messina-ANL

Page 19: Recent HPC Research Trends and Strategy in the United States

Develop the technology needed to build and support the Exascale systems

The Exascale Computing Project requires Hardware Technology

R&D to enhance application and system performance for science,

engineering and data-analytics applications

on exascale systems

Support hardware architecture R&D at both the node and system architecture levels

Prioritize R&D activities that address ECP performance objectives for the

initial Exascale System RFPs

Enable Application Development, Software Technology, and Exascale Systems to

improve the performance and usability of future HPC hardware platforms (holistic

codesign)

Mission need Objective

Hardware Thrust Scope

Impacts of Extreme Scale Computing - Oct 2017

Slide Courtesy of Paul Messina-ANL

Page 20: Recent HPC Research Trends and Strategy in the United States

Capable exascale computing requires close coupling and coordination of key development and technology

R&D areas

Application Development

Software Technology

Hardware Technology

Exascale Systems

ECP

Integration and Co-Design is key

Impacts of Extreme Scale Computing - Oct 2017

Slide Courtesy of Paul Messina-ANL

Page 21: Recent HPC Research Trends and Strategy in the United States

“Non-recurring engineering” (NRE) activities will be

integral to next-generation computing hardware and

software.

Four key challenges will be addressed through

targeted R&D investments to bridge the capability

gap

Systems must meet ECP’s essential performance

parameters

Energy consumption

Reliability

Memory and storage

Parallelism

50 times the current performance

10 times reduction in power consumption

System resilience: 6 days without app failure

DOE Capable exascale systems by 2021-2023

Slide Courtesy of Paul Messina-ANL Impacts of Extreme Scale Computing - Oct 2017

Page 22: Recent HPC Research Trends and Strategy in the United States

Planned US Leadership Systems • 2019

• NSF Leadership Class Computing Facility Phase 1 • 2-3x sustained performance of Blue Waters

• ~2021 • DOE accelerated exascale systems to deliver “approximately 50x more

performance than today’s 20-petaflops machines on mission critical applications” • “at least one exascale system will be delivered in 2021 to a DOE Office of

Science Leadership Computing Facility (Argonne and/or Oak Ridge LCFs)” • ~2023-2024

• NSF Leadership Class Computing Facility – Phase 2 • 10-20x sustained performance of Phase 1

• DOE • An exascale system at a National Nuclear Security Administration (NNSA)

facility (LLNL or LANL)

Impacts of Extreme Scale Computing - Oct 2017

Page 23: Recent HPC Research Trends and Strategy in the United States

QUESTIONS

Impacts of Extreme Scale Computing - Oct 2017