accelerated technical-and high performance computing · accelerated technical-and high performance...

21
Accelerated Technical- and High Performance Computing Klaus Gottschalk - [email protected] HPC Architect IBM Cognitive Systems

Upload: others

Post on 25-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Accelerated Technical-and High Performance Computing · Accelerated Technical-and High Performance Computing Klaus Gottschalk-gottschalk@de.ibm.com HPC Architect ... Broadens the

AcceleratedTechnical- andHighPerformanceComputing

Klaus Gottschalk - [email protected] Architect IBM Cognitive Systems

Page 2: Accelerated Technical-and High Performance Computing · Accelerated Technical-and High Performance Computing Klaus Gottschalk-gottschalk@de.ibm.com HPC Architect ... Broadens the

IBM Cognitive Systems

High Level Strategy Drivers and Directions

• Data Volumes are Exploding – Especially Unstructured Data• Data Needs to be Collected, Managed, and ‘Digested’

• Deriving Insight and Information from the Data requires:• Data access and availability• A variety of processing steps in a ‘Workflow’, and processing optimizations

• Need for Compute continues to Grow• Per IDC, technical computing growth @ 11.9% (vs. 4.9%) in 2015 , supporting

both High Performance Computing and High Performance Data Analytics • Moores Law and Frequency stabilization require more threads, cores, &

nodes• Accelerated Computing emerges: GPUs, FPGAs, CAPI attached Flash & I/O

• Energy Efficiency continues to rise in value and requires:• Processing Elements that are Optimized to the task• Energy and Data aware Workflow Management

• The OpenPOWER Foundation provides innovation opportunities to a variety of Partners

Pric

e/Pe

rform

ance

§Full system stack innovation required

§Technology and §Processors

2000 2020

§Firmware / OSAcceleratorsSoftwareStorageNetwork

§Workflow§Dependency Graph

Page 3: Accelerated Technical-and High Performance Computing · Accelerated Technical-and High Performance Computing Klaus Gottschalk-gottschalk@de.ibm.com HPC Architect ... Broadens the

IBM Cognitive Systems

OpenPOWER, a catalyst for Open Innovation

The OpenPOWER Foundation creates an open ecosystem, using the POWER Architecture to share expertise, investment, and

server-class intellectual property to serve the evolving needs of customers.

Performance of leading POWER architecture Broadens the capability and performance of the POWER platform

Open DevelopmentOpenPOWER enables greater innovation through both open software and open hardware

Collaboration across multiple thought leadersCollaborative development model drives collective thought leadership, simultaneously across multiple disciplines

Page 4: Accelerated Technical-and High Performance Computing · Accelerated Technical-and High Performance Computing Klaus Gottschalk-gottschalk@de.ibm.com HPC Architect ... Broadens the

System / Integration

I/O / Storage / Acceleration

Boards / Systems

Chip / SOC

This is What A Revolution Looks Like © 2017 OpenPOWER Foundation

Software

Implementation / HCP / Research

Page 5: Accelerated Technical-and High Performance Computing · Accelerated Technical-and High Performance Computing Klaus Gottschalk-gottschalk@de.ibm.com HPC Architect ... Broadens the

System / Integration

I/O / Storage / Acceleration

Boards / Systems

Chip / SOC

This is What A Revolution Looks Like © 2017 OpenPOWER Foundation

Software

Implementation / HCP / Research

300+ Members

31Countries

40+ISVs

Page 6: Accelerated Technical-and High Performance Computing · Accelerated Technical-and High Performance Computing Klaus Gottschalk-gottschalk@de.ibm.com HPC Architect ... Broadens the

IBM Cognitive Systems

Portfolio of HPC Solutions

• Deploymenttools,integratedmanagement• Compilers:gcc,IBMXLC,LLVMOpenMP4,PGIFortran/C/C++,Java,OpenACC,OpenMP

• Debuggers,Profilers,Mathlibraries,MPI &HPC apps

Processors & Systems

HPCSoftware

High PerformanceFile System &

Storage

• HighPerformanceProcessors&Systems• Accelerator,networking,storageintegrationviaNVLink &CAPI• Highestmemorythroughput

• HighestPerformanceHPC Storage:ElasticStorageServer• HighPerformanceSpectrumScale(GPFS)ParallelFileSystem• Datacentricdesign

High Speed Interconnect

• Highspeedinterconnect/networkfabricfromMellanoxTechnologies

• MPIaccelerationintheIBfabric,reducingCPUoverhead• SupportforGPUDirect,NVMe overfabric

Page 7: Accelerated Technical-and High Performance Computing · Accelerated Technical-and High Performance Computing Klaus Gottschalk-gottschalk@de.ibm.com HPC Architect ... Broadens the

IBM Cognitive Systems

OpenPOWER: Open Architecture for HPC & Analytics

ProcessorIPLicensing

Open

Interfaces

Systems

&Software

§LicensingprocessorcoretoenablesemiconductorpartnerslikeSuzhouPowercore tobuildPOWERchips

§TightintegrationusingCAPI &NVLink withAccelerators(NVIDIA,Xilinx),Networking(Mellanox),Storage(CAPIFlash)

§EnablingSystemPartnerstobuildPOWER-basedserversandOpenSourcingSoftwareincludingFirmware&Hypervisor

Page 8: Accelerated Technical-and High Performance Computing · Accelerated Technical-and High Performance Computing Klaus Gottschalk-gottschalk@de.ibm.com HPC Architect ... Broadens the

IBM Cognitive Systems

Collaborative Innovation between IBM and NVIDIA: POWER8 with NVLink

Built for Developer Goals• Think less about architecture in code• Break apart my problem less• Spend less time optimizing• Write simpler code

Casting NVLink into Silicon• IBM: transistors and I/O to NVLink on CPU• NVIDIA: deep interface into GPU (NVLink)• 2+ years in the making• 2.5X the bandwidth from CPU:GPU,

built into the chip

with NVLinkTM

Don’t overthink your hardwareDon’t waste time writing for data movementEasily unleash the parallelism of your GPU

Embedded NVLinkTM

Page 9: Accelerated Technical-and High Performance Computing · Accelerated Technical-and High Performance Computing Klaus Gottschalk-gottschalk@de.ibm.com HPC Architect ... Broadens the

IBM Cognitive Systems

NVLink And Unified Memory/Page Migration Engine

Tesla P100 architecture simplifies programming, sharing memory between CPU & GPU• Unified memory: allows programs to access full memory addresses of all CPU and

GPUs• Page Migration Engine: GPU memory faults seamlessly migrate to CPU memory

NVLink• POWER8 with NVLink ensures fast data access for pages and data movement• Fat and Flat: Memory migration on POWER systems moves at the same bandwidth

CPU-to-GPU or GPU-to- GPU

Programming consequences• Far simpler programming and memory model

§Eliminates the programming details of allocating and copying device memory• Larger data sizes permissible

§Applications can now use data sets that are larger than the memory size of the GPU

Page 10: Accelerated Technical-and High Performance Computing · Accelerated Technical-and High Performance Computing Klaus Gottschalk-gottschalk@de.ibm.com HPC Architect ... Broadens the

IBM Cognitive Systems

Why it Matters: Use Cases where NVLink will have the most Impact

Mask Bus Transfers from Host-Device

Constant Data Transfers between adjacent GPUs

Burst Data at Startup and Teardown

.

Stream Data at Same Rate as Computation

Genomics, Cryptography, Video Processing, etc.

CFD/CAE, Machine Learning, Deep Learning, etc.

Molecular Dynamics, Amber, etc.

Accelerated Databases, Analytics, etc.

Page 11: Accelerated Technical-and High Performance Computing · Accelerated Technical-and High Performance Computing Klaus Gottschalk-gottschalk@de.ibm.com HPC Architect ... Broadens the

IBM Cognitive Systems

HPC Pre-Sales Centers and Technical Support• PADC centers with IBM, NVIDIA and Mellanox focused on accelerated applications and technical

collaborations • IBM Systems Client Centers

§ HPC Briefings§ HPC Workshops§ HPC Benchmarks

UK Science and Technology Facilities Council (STFC) PADC

§IBM PADC Montpellier joint center with NVIDIA and Mellanox

§IBM PADC Boeblingen joint center with NVIDIA

IBM Poughkeepsie POWER HPC

Benchmark Center

For latest HPC information refer to the

IBM Systems Client Centers HPC page

IBM Austin POWER HPC Executive Briefing Center

§[email protected]

NEW! NVIDIA/IBM Acceleration Lab

Page 12: Accelerated Technical-and High Performance Computing · Accelerated Technical-and High Performance Computing Klaus Gottschalk-gottschalk@de.ibm.com HPC Architect ... Broadens the

IBM Cognitive Systems

POWER9 Chip

New Core Microarchitecture§ Stronger thread performance§ Efficient agile pipeline§ POWER ISA v3.0

Enhanced Cache Hierarchy§ 120MB NUCA L3 architecture§ 12 x 20-way associative regions§ Advanced replacement policies§ Fed by 7 TB/s on-chip bandwidth

Cloud + Virtualization Innovation§ Quality of service assists§ New interrupt architecture§ Workload optimized frequency§ Hardware enforced trusted execution

14nm finFET§ Improved device performance and

reduced energy§ 17 layer metal stack and eDRAM§ 8.0 billion transistors

Leadership Acceleration Platform§ Enhanced on-chip acceleration§ Nvidia NVLink 2.0: High

bandwidth, advanced features§ CAPI 2.0: Coherent accelerator

and storage attach (PCIe G4)§ OpenCAPI: Improved latency and

bandwidth, open interface

State of the Art I/O Subsystem§ PCIe Gen4 – 48 lanes

High Bandwidth Signaling § 16 Gb/s interface: Local SMP§ 25 Gb/s interface: 25G Link for

Accelerator and remote SMP

Page 13: Accelerated Technical-and High Performance Computing · Accelerated Technical-and High Performance Computing Klaus Gottschalk-gottschalk@de.ibm.com HPC Architect ... Broadens the

IBM Cognitive Systems

Witherspoon (4-6 GPU) Server

Anticipated 10X performance improvement over 2015 solution

• combined GPU and CPU advancesWitherspoon is the platform that will deliver the commitments made in the CORAL contract

• 2 POWER9, 4 GPU for LLNL, water cooled• 2 POWER9, 6 GPU for ORNL, water cooled

4 GPU

Page 14: Accelerated Technical-and High Performance Computing · Accelerated Technical-and High Performance Computing Klaus Gottschalk-gottschalk@de.ibm.com HPC Architect ... Broadens the

IBM Cognitive Systems

CORAL

§2 versions of code for each application:• Baseline: (lab codes) minimal changes + offloading directives (e.g. OpenACC)• Optimized: can create codes from scratch, using any language we choose

§Tools to implement/modify the lab codes:• Languages: MPI, OpenMP, OpenACC, CUDA, Fortran, etc.• Architectures: Power Processors, GPUs, Infiniband, NVLink, etc.

§OpenACC directives to off-load work to the GPU

Page 15: Accelerated Technical-and High Performance Computing · Accelerated Technical-and High Performance Computing Klaus Gottschalk-gottschalk@de.ibm.com HPC Architect ... Broadens the

IBM Cognitive Systems

CORAL SYSTEM ARCHITECTURE

Compute Rack: 18 Servers/rack779 TFlop/rack10.8 TB/rack

55 kWatts max

System:200 Pflops compute + 5 PB Active Flash+120 PB Disk

Scalable Active Network:Mellanox IB4X EDR Switch

Converged 2U server drawer for HPC and Cloud

ESS Rack:

- Scalable system software and data architecture

- LLVM Open Source compiler- Water cooling- Integrated Local Active

Storage

256 Compute Racks

40 Disk Racks

16 Optional Flash Racks

TMS drawers orFlash cards.CAPI attached.Globally accessible with local processing

POWER9:22 Cores4 Threads/core0.65 DP TF/s3.7 GHz

SXM2

Volta:7.0DPTF/[email protected]/s

POWER9 2 Socket Server2 P9 + 4/6 Volta GPU (@7 TF/s)

512 GiB SMP Memory (32 GB DDR4 RDIMMs)64/96GiB GPU Memory (HBM stacks)

Page 16: Accelerated Technical-and High Performance Computing · Accelerated Technical-and High Performance Computing Klaus Gottschalk-gottschalk@de.ibm.com HPC Architect ... Broadens the

IBM Cognitive Systems

DOE Project CORAL Status

Oak Ridge (IBM) on-time*

Livermore (IBM) on-time*

Argonne (Intel/Cray) delayed+

*) https://www.hpcwire.com/2017/10/03/olcfs-200-petaflops-summit-machine-still-slated-2018-start/

+) https://www.nextplatform.com/2017/05/23/surprises-2018-doe-budget-supercomputing/

Page 17: Accelerated Technical-and High Performance Computing · Accelerated Technical-and High Performance Computing Klaus Gottschalk-gottschalk@de.ibm.com HPC Architect ... Broadens the

IBM Cognitive Systems

OpenCAPI Consortion Founded

SILICON VALLEY, CA - 14 Oct 2016: OpenCAP Consortium formed by AMD, Dell EMC, Google, HP, IBM, Mellanox, Micron, NVIDIA and XilinxServers and related products based on the new standard are expected in the second half of 2017

Page 18: Accelerated Technical-and High Performance Computing · Accelerated Technical-and High Performance Computing Klaus Gottschalk-gottschalk@de.ibm.com HPC Architect ... Broadens the

IBM Cognitive Systems

OpenCAPI Approach • What is OpenCAPI?

• OpenCAPI is an Open Interface Architecture that allows any microprocessor to attach to• Coherent user-level accelerators and I/O devices• Advanced memories accessible via read/write or user-level DMA semantics• Agnostic to processor architecture

• Key Attributes of OpenCAPI

• High-bandwidth, low latency interface optimized to enable streamlined implementation of attached devices

• Attached devices operate natively within an application’s user space and coherently with processors

• Supports a wide range of use cases and access semantics

• 100% Open Consortium• All company participants welcome • All ISA participants welcome

Page 19: Accelerated Technical-and High Performance Computing · Accelerated Technical-and High Performance Computing Klaus Gottschalk-gottschalk@de.ibm.com HPC Architect ... Broadens the

IBM Cognitive Systems

openCAPI

OpenCAPI

Looking Ahead: POWER9 Accelerator Interfaces

Extreme Accelerator Bandwidth and Reduced Latency• PCIe Gen 4 x 48 lanes –

192 GB/s peak bandwidth (duplex)• IBM BlueLink 25Gb/s x 48 lanes –

300 GB/s peak bandwidth (duplex)Coherent Memory and Virtual Addressing Capability for all Accelerators

• CAPI 2.0 - 4x bandwidth of POWER8 using PCIe Gen 4• NVLink 2.0 – Next generation of GPU/CPU bandwidth

and integration using BlueLink• OpenCAPI – High bandwidth, low latency and open

interface using BlueLink

Page 20: Accelerated Technical-and High Performance Computing · Accelerated Technical-and High Performance Computing Klaus Gottschalk-gottschalk@de.ibm.com HPC Architect ... Broadens the

IBM Cognitive Systems

CAPI Accelerator Cards

Nallatech teamexplainingCAPIFlashcard:https://www.youtube.com/watch?v=1n_ceKkCRuk

Page 21: Accelerated Technical-and High Performance Computing · Accelerated Technical-and High Performance Computing Klaus Gottschalk-gottschalk@de.ibm.com HPC Architect ... Broadens the

Thank you!

IBM Cognitive Systems

ibm.com/systems/hpc