the high performance computing roadmap · 2020. 6. 4. · espos sle hpc 15 sp1 ltss sle hpc 15 sp1...
TRANSCRIPT
![Page 1: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All](https://reader035.vdocuments.us/reader035/viewer/2022070219/612f97db1ecc515869438c52/html5/thumbnails/1.jpg)
1
The High Performance Computing Roadmap
FUT-1438
Jay KruemckeSenior Product Manager – SUSE High Performance Computing
@mr_sles
![Page 2: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All](https://reader035.vdocuments.us/reader035/viewer/2022070219/612f97db1ecc515869438c52/html5/thumbnails/2.jpg)
2
Agenda
1. Why HPC?
2. Customer challenges
3. What SUSE brings to HPC
4. Where are we going?
![Page 3: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All](https://reader035.vdocuments.us/reader035/viewer/2022070219/612f97db1ecc515869438c52/html5/thumbnails/3.jpg)
3
Why HPC?
Worldwide HPC revenue expected to reach over $19.95 billion by 20231
Big data combined with HPC creating new solutions, adding many new
users/buyers to the HPC space (AI/ML/DL and HPDA are hot new areas)
SUSE runs on 21 of the top 50 supercomputers (7 RH, 9 CentOS)2
SUSE dominates top 100, CentOS gains share in “smaller” supercomputers2
Commercial OS Share in Top 500 (represents 100 supercomputers in the list): SUSE 53%, RH 24%, bullx 17%, Ubuntu 6%2
1 Hyperion Research, November 20192 Top500 Supercomputer Report, November 2019
![Page 4: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All](https://reader035.vdocuments.us/reader035/viewer/2022070219/612f97db1ecc515869438c52/html5/thumbnails/4.jpg)
4
Cloud Computing For HPC Will Grow Faster
1 Hyperion Research, November 2019
• Total HPC spending is projected to reach $44B
in 2022
• Over 70% of HPC sites run some jobs in public
clouds
• Over 10% of all HPC jobs are now running in
clouds (primarily hybrid)
• Public clouds are cost-effective for some jobs
but up to 10x more expensive for others,
depending on where data resides
• Private and hybrid cloud use is growing faster
![Page 5: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All](https://reader035.vdocuments.us/reader035/viewer/2022070219/612f97db1ecc515869438c52/html5/thumbnails/5.jpg)
5
Customer Pain Points And Challenges
Time to Solution
“I need to maximize
application performance,
scale workloads, and
minimize overhead.”
• Parallel software is lacking
with many applications
needing a major re-design
• Segmented into
commercial and scientific,
and there is not enough
collaboration
Maintenance
“My IT staff doesn’t have
time to update and test all
the different software
components.”
• Better management
software needed; update
deployment approach to
leverage HPC and cloud
infrastructure
• Stack components
provided by multiple
vendors, making it more
challenging to maintain
Complexity
Composing a working HPC
environment is difficult, time-
consuming, requiring
experts.”
• Clusters are hard to use
and manage as they
become more complex in
heterogeneous
environments
• Storage access time and
data management are
becoming new bottlenecks
![Page 6: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All](https://reader035.vdocuments.us/reader035/viewer/2022070219/612f97db1ecc515869438c52/html5/thumbnails/6.jpg)
6
SUSE Linux Enterprise High Performance Computing
HPC bundle with supported HPC packages – beyond an OS
Supports Aarch64 (Arm) and x86-64
Many IHV/ISV/CSP partnerships
Multiple service life options
Competitive cluster node pricing model
![Page 7: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All](https://reader035.vdocuments.us/reader035/viewer/2022070219/612f97db1ecc515869438c52/html5/thumbnails/7.jpg)
7
SuperMUC Petascale system runs SUSE on Lenovo
ThinkSystem
Geophysicists use earthquake simulation software to
investigate seismic waves beneath Earth’s surface
Calculations involved in this kind of simulation are so
complex that they push even supercomputers to their limits
![Page 8: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All](https://reader035.vdocuments.us/reader035/viewer/2022070219/612f97db1ecc515869438c52/html5/thumbnails/8.jpg)
8
Selected SUSE HPC Projects
SUSE Linux for HPC
& the HPC module
SUSE Enterprise Storage
SUSE Package Hub
HPC Containers
Arm: the emerging platform
HPC in the Cloud
Accelerator enablement
![Page 9: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All](https://reader035.vdocuments.us/reader035/viewer/2022070219/612f97db1ecc515869438c52/html5/thumbnails/9.jpg)
9
Why SUSE Linux For HPC?
Enterprise Linux with Enterprise support
• Security incidents require quick response to address system vulnerabilities
More than just an OS - HPC software included and supported
• SLE HPC includes popular HPC software such as slurm and OpenMPI
• Deployment templates for Head Nodes, Compute Nodes, Dev Nodes
Aggressively priced subscriptions
• SUSE Linux for HPC priced for large and small HPC configurations
Proven track record in HPC
• 50% of the Top 100 HPC systems are running SUSE Linux or SLE-based OS
![Page 10: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All](https://reader035.vdocuments.us/reader035/viewer/2022070219/612f97db1ecc515869438c52/html5/thumbnails/10.jpg)
10
SUSE Linux HPC Module
Simplify access to supported
HPC packages
All packages supported by SUSE
via SUSE Linux Enterprise HPC
Available for x86 and Arm-based
platforms
SLE HPC 12 and SLE HPC 15MUNGE
ScaLAPACK
genders
![Page 11: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All](https://reader035.vdocuments.us/reader035/viewer/2022070219/612f97db1ecc515869438c52/html5/thumbnails/11.jpg)
11
Installing The HPC Module
6/3/2020
![Page 12: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All](https://reader035.vdocuments.us/reader035/viewer/2022070219/612f97db1ecc515869438c52/html5/thumbnails/12.jpg)
12
SUSE HPC Reference Architecture
![Page 13: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All](https://reader035.vdocuments.us/reader035/viewer/2022070219/612f97db1ecc515869438c52/html5/thumbnails/13.jpg)
13
Cloud is being optimized for HPC
workloads through performance,
scalability and cost efficiency,
enabling you to extend your HPC
environment to the cloud on-demand.
Dynamically burst to the cloud to
complement your on-premises
capabilities, or even fully migrate
entire HPC environments and
workflows.
Cloud-Ready HPC
![Page 14: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All](https://reader035.vdocuments.us/reader035/viewer/2022070219/612f97db1ecc515869438c52/html5/thumbnails/14.jpg)
14
HPC In The Cloud
HPC “all-in” the cloud
• Includes the head, compute and storage nodes,
with no hardware infrastructure to maintain
• Optimized cost and performance for scale-out
applications
HPC bursting to hybrid/public clouds
• Address changing capacity needs
• Extend HPC jobs to the Cloud for on-demand
scale and flexibility
Local Network Cloud Local Network Cloud
![Page 15: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All](https://reader035.vdocuments.us/reader035/viewer/2022070219/612f97db1ecc515869438c52/html5/thumbnails/15.jpg)
15
Goal: Propel the Arm HPC ecosystem and exascale computing in the UK
• More than 12,000 Arm-based cores running across three universities
• 64 Apollo 70 systems per site
• Two 32 core Cavium ThunderX2 processors per system
• Running SUSE Linux Enterprise for High Performance Computing
Catalyst UK program: HPE, Arm, SUSE, and three leading UK universities establish one of the largest Arm-based supercomputer deployments in the world
![Page 16: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All](https://reader035.vdocuments.us/reader035/viewer/2022070219/612f97db1ecc515869438c52/html5/thumbnails/16.jpg)
16
Artificial Intelligence
Machine Learning
Neural Networks
Deep Learning
Convolutional Neural Networks
Transfer Learning
The Spectrum Of AI Solutions
Deep LearningExamples are disease identification
and energy demand optimization.
Machine LearningExamples are cyber security,
autonomous vehicles and F1 racing.
Artificial IntelligenceExamples are Google Maps and game
play.
Neural NetworksExamples are facial and voice
recognition.
Convolutional Neural NetworksExamples are image/video recognition
and medical image analysis.
Transfer LearningFor example, knowledge gained while
learning to recognize cars could apply
when trying to recognize trucks.
![Page 17: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All](https://reader035.vdocuments.us/reader035/viewer/2022070219/612f97db1ecc515869438c52/html5/thumbnails/17.jpg)
17
SUSE Participation In OpenACC
OpenACC is a directive based programming model designed to provide
performance and portability for CPUs, GPUs, and other accelerators
SUSE joined OpenACC to simplify access to accelerator technology for
SUSE HPC customers
![Page 18: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All](https://reader035.vdocuments.us/reader035/viewer/2022070219/612f97db1ecc515869438c52/html5/thumbnails/18.jpg)
18
SUSE PackageHub
• High-quality, up-to-date packages
delivered by openSUSE Factory
• Easy to install via zypper or yast
• Built and maintained by the
community of users
• Approved and curated by SUSE
• No charge
About 1000 packages
available for X86-64
More than 500 packages
available for ARM
Enterprise UserSUSE Package HubUpstream packages
Package Category
TensorFlow ML Framework
Caffe2 Framework
Theano Deep learning library
Numpy* Math library
Pytorch* ML library
ArmNN ML Framework
clustershell Administrative
robinhood Administrative
singularity Runtime*planned
![Page 19: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All](https://reader035.vdocuments.us/reader035/viewer/2022070219/612f97db1ecc515869438c52/html5/thumbnails/19.jpg)
19
Key HPC Partnerships
1
9
![Page 20: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All](https://reader035.vdocuments.us/reader035/viewer/2022070219/612f97db1ecc515869438c52/html5/thumbnails/20.jpg)
20
Ceph-based, software-defined storage
Backup/archival HPC storage
IO500 benchmark-ranked
Easy to manage with openATTIC
Certified with HPE DMF
SUSE Enterprise Storage
![Page 21: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All](https://reader035.vdocuments.us/reader035/viewer/2022070219/612f97db1ecc515869438c52/html5/thumbnails/21.jpg)
21
“Thanks to the stability and ease of
management of the SUSE solution, we
have significantly reduced the time we
spend managing live and archived
data. This keeps our internal team free
to focus on driving new value for the
university and its life-changing
research projects.”
“SUSE Enterprise Storage has already brought
clear improvements to our deep learning projects,
one of which requires two million files in a single
directory. Putting these files into SUSE Enterprise
Storage has increased performance more than ten
times compared with the previous storage solution.”Steve Cousins
Supercomputer Engineer
University of Maine System
![Page 22: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All](https://reader035.vdocuments.us/reader035/viewer/2022070219/612f97db1ecc515869438c52/html5/thumbnails/22.jpg)
22
SUSE Enterprise Storage Solution For HPC - Ceph
Tier 2 Storage Use Case
6/3/2020
Low Latency
Storage (Lustre,
XFS, NFS etc)
HPC Compute
Cluster
SUSE Enterprise
Storage
• Use Cases:
• Primary Storage (Certain Use Cases)
• Nearline or Archival Storage
• Home Directories
• Certified with HPE Data Management Framework (DMF) and iRODS*
![Page 23: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All](https://reader035.vdocuments.us/reader035/viewer/2022070219/612f97db1ecc515869438c52/html5/thumbnails/23.jpg)
23
HPC Storage Use Case:
Large European Energy Company
Active TierHot Data
Dormant TierCold Data
HPC/AI Compute Cluster
High-Performance Storage
Scale-out NAS
Parallel File Systems
All-Flash File System
HPE Data Management FrameworkTiered data management
TapeDMF zero watt storage
Object Storage & Cloud
Tier 0 Storage needs
- Clustered file system
- Lustre
- 10 PiB, 240Gb/sec
Tier 1 Storage needs
(SUSE / Ceph)
- Object Storage, resilient
- Widely used, affordable
- Automatic access`
- 5 PiB
![Page 24: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All](https://reader035.vdocuments.us/reader035/viewer/2022070219/612f97db1ecc515869438c52/html5/thumbnails/24.jpg)
24
SLES HPC Lifecycle Roadmap*
SLES 12 HPC SP5
SLES 12 HPC
SP5 LTSS
SLES 12 HPC SP5SLES 12 HPC SP5
ESPOS
2017 2018 2019 2020 2021 2022 20252023 2024
SLES 12 HPC
SP3 LTSS
SLES 12 HPC SP3
ESPOS
SLES 12 HPC SP3
FCS
Sept 2017
SLES 12 HPC SP3
”Normal” SP
overlap
SLES 12 HPC
SP4 LTSS
SLES 12 HPC SP4
ESPOS
SLES 12 HPC SP4
FCS
4Q 2018
SLES 12 HPC SP4
”Normal” SP
overlap
SLE HPC 15
ESPOS
SLE HPC 15 FCS
Q2 2018
SLE HPC 15
”Normal” SP
overlap
SLE HPC 15 SP2
SLE HPC 15 SP2
SLE HPC 15
SP2 LTSS
SLE HPC 15 SP2
ESPOS
SLE HPC 15
SP1 LTSS
SLE HPC 15 SP1
ESPOS
SLE HPC 15 SP1
FCS
Q2 2019
SLE HPC 15 SP1
”Normal” SP
overlap
*NOTE: All future dates are estimates for illustration purposes and are not intended as committed dates.
SLE HPC 15
LTSS
![Page 25: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All](https://reader035.vdocuments.us/reader035/viewer/2022070219/612f97db1ecc515869438c52/html5/thumbnails/25.jpg)
25
Strategic Directions
Enable and exploit new HPC hardware
Shift HPC Module focus to utilities
Blend in AI/ML support
Simplify HPC in the Cloud experience
Improve Day 1 and Day 2 experience
![Page 26: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All](https://reader035.vdocuments.us/reader035/viewer/2022070219/612f97db1ecc515869438c52/html5/thumbnails/26.jpg)