![Page 1: GPU accelerated Cray XC systems: Where HPC meets Big Data](https://reader034.vdocuments.us/reader034/viewer/2022042201/62588c33d31e6809834f266b/html5/thumbnails/1.jpg)
GPU accelerated Cray XC systems: Where HPC meets Big Data
Peter Messmer (NVIDIA), Chris Lindahl (Cray), Sadaf Alam (CSCS)
Cray User Group Meeting 2016, London, May 8 – 12, 2016
![Page 2: GPU accelerated Cray XC systems: Where HPC meets Big Data](https://reader034.vdocuments.us/reader034/viewer/2022042201/62588c33d31e6809834f266b/html5/thumbnails/2.jpg)
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
GPU accelerated Cray XC systems: Where HPC meets Big Data Peter Messmer
![Page 3: GPU accelerated Cray XC systems: Where HPC meets Big Data](https://reader034.vdocuments.us/reader034/viewer/2022042201/62588c33d31e6809834f266b/html5/thumbnails/3.jpg)
3
"Yes," said Deep Thought, "I can do it."
[Seven and a half million years later.... ]
“The Answer to the Great Question... Of Life, the Universe and Everything...
Is... Forty-two,' said Deep Thought, with infinite majesty and calm.”
— Douglas Adams, Hitchhiker’s Guide to the Galaxy
HIGH PERFORMANCE COMPUTING TODAY*
*mostly
![Page 4: GPU accelerated Cray XC systems: Where HPC meets Big Data](https://reader034.vdocuments.us/reader034/viewer/2022042201/62588c33d31e6809834f266b/html5/thumbnails/4.jpg)
7
Accuracy
Latency
HPC
application
Action
Game
Month
Week
Day
Hour
5 min
100 ms
30 ms
10 ms
Sit in it Has Engine Moves Flies
Flight
Simulator
CG Movie
Parameter Space
Exploration,
Approximate models
![Page 5: GPU accelerated Cray XC systems: Where HPC meets Big Data](https://reader034.vdocuments.us/reader034/viewer/2022042201/62588c33d31e6809834f266b/html5/thumbnails/5.jpg)
8
Accuracy
Latency
HPC
application
Action
Game
Month
Week
Day
Hour
5 min
100 ms
30 ms
10 ms
Sit in it Has Engine Moves Flies
Opportunity!
Flight
Simulator
CG Movie
Parameter Space
Exploration,
Approximate models
Explorative Science,
Real-time systems
![Page 6: GPU accelerated Cray XC systems: Where HPC meets Big Data](https://reader034.vdocuments.us/reader034/viewer/2022042201/62588c33d31e6809834f266b/html5/thumbnails/6.jpg)
9
Bonsai With In-Situ Viz On Piz Daint
Presented at SC14, streaming from CSCS/Switzerland to New Orleans Presented at SC14, streaming from CSCS/Switzerland to New Orleans
Compute & Vis on 1024 GPU nodes Live Streaming
J. Bedorf, E. Gaburov, P.Messmer, S. Portegies Zwart
![Page 7: GPU accelerated Cray XC systems: Where HPC meets Big Data](https://reader034.vdocuments.us/reader034/viewer/2022042201/62588c33d31e6809834f266b/html5/thumbnails/7.jpg)
10
Bonsai With In-Situ Viz On Piz Daint
Presented at SC14, streaming from CSCS/Switzerland to New Orleans Presented at SC14, streaming from CSCS/Switzerland to New Orleans
Compute & Vis on 1024 GPU nodes Live Streaming
![Page 8: GPU accelerated Cray XC systems: Where HPC meets Big Data](https://reader034.vdocuments.us/reader034/viewer/2022042201/62588c33d31e6809834f266b/html5/thumbnails/8.jpg)
11
![Page 9: GPU accelerated Cray XC systems: Where HPC meets Big Data](https://reader034.vdocuments.us/reader034/viewer/2022042201/62588c33d31e6809834f266b/html5/thumbnails/9.jpg)
12
Visualization-enabled supercomputers
http://blogs.nvidia.com/blog/2014/11/19/gpu-in-
situ-milky-way/
CSCS Piz Daint NCSA Blue Waters
Galaxy formation
http://devblogs.nvidia.com/parallelforall/hpc
-visualization-nvidia-tesla-gpus/
ORNL Titan
Molecular dynamics
Cosmology
http://www.sdav-scidac.org/29-
highlights/visualization/66-accelerated-cosmology-
data-anal.html
![Page 10: GPU accelerated Cray XC systems: Where HPC meets Big Data](https://reader034.vdocuments.us/reader034/viewer/2022042201/62588c33d31e6809834f266b/html5/thumbnails/10.jpg)
13
CO-PROCESSING PARTITIONED
SYSTEM LEGACY
WORKFLOW
Compute+Vis supports multiple workflows
Separate compute & vis system
Communication via file system
Compute and visualization on same GPU
Communication via host-device transfers or memcpy
Different nodes for different roles
Communication via high-speed network
![Page 11: GPU accelerated Cray XC systems: Where HPC meets Big Data](https://reader034.vdocuments.us/reader034/viewer/2022042201/62588c33d31e6809834f266b/html5/thumbnails/11.jpg)
14
EGL Context Management
Top systems support OpenGL under X
EGL: Driver based context management
Support for full OpenGL*, not only GL ES
Available in e.g. VTK, ParaView, Ensight, VMD,..
New opportunities for CUDA/OpenGL** interop
*Full OpenGL in r355.11; **CUDA interop in r358.7
Leaving it to the driver
Tesla GPU
Tesla driver with EGL
ParaView/VMD
X-server
![Page 12: GPU accelerated Cray XC systems: Where HPC meets Big Data](https://reader034.vdocuments.us/reader034/viewer/2022042201/62588c33d31e6809834f266b/html5/thumbnails/12.jpg)
15
Vis Tools embrace OpenGL on EGL
Prior to EGL: X server required for GPU accelerated OpenGL
Full OpenGL on EGL announced at SC16
With EGL: OpenGL without X
Major enabler for GPU rendering in HPC, incl. Cray systems*
Quick adoption by vis tool developers
https://devblogs.nvidia.com/parallelforall/egl-eye-opengl-visualization-without-x-server/
* Requires driver version 358.7 or newer required
Streamlined GPU accelerated off-screen rendering
5/24/2
016
![Page 13: GPU accelerated Cray XC systems: Where HPC meets Big Data](https://reader034.vdocuments.us/reader034/viewer/2022042201/62588c33d31e6809834f266b/html5/thumbnails/13.jpg)
16
Modern OpenGL for HPC viz
VTK supports now OpenGL 3.2
Enables advanced shaders (AO, VXGI, ..)
Some algorithms well suited for
distributed memory rendering
GPU hardware support
Mandatory to access advanced rendering features
Data courtesy Florida Intl University & TACC
![Page 14: GPU accelerated Cray XC systems: Where HPC meets Big Data](https://reader034.vdocuments.us/reader034/viewer/2022042201/62588c33d31e6809834f266b/html5/thumbnails/14.jpg)
17
Advanced Rendering in scientific visualization
Two lights, no shadows
Two lights,
hard shadows, 1 shadow
ray per light
Ambient occlusion + two
lights, 144 AO rays/hit
• Ray tracing offers ambient occlusion lighting, shadows, high quality transparent surfaces
Better insight with visual cues
Courtesy of John Stone, UIUC
![Page 15: GPU accelerated Cray XC systems: Where HPC meets Big Data](https://reader034.vdocuments.us/reader034/viewer/2022042201/62588c33d31e6809834f266b/html5/thumbnails/15.jpg)
18
OpenGL not limited to Rendering Tasks
CUDA->OpenGL typically one-way only
EGL enables lighter weight access to OpenGL
No X server needed
Potential use of OpenGL for rasterization-like problems?
Determine covered “pixels”
3D ordering/occlusion via Z-buffer
Interop goes both ways, esp with EGL
![Page 16: GPU accelerated Cray XC systems: Where HPC meets Big Data](https://reader034.vdocuments.us/reader034/viewer/2022042201/62588c33d31e6809834f266b/html5/thumbnails/16.jpg)
19
OpenGL Rendering Powerhouse OpenGL vs OpenSWR
Big
ger
is b
ett
er
![Page 17: GPU accelerated Cray XC systems: Where HPC meets Big Data](https://reader034.vdocuments.us/reader034/viewer/2022042201/62588c33d31e6809834f266b/html5/thumbnails/17.jpg)
20
Accelerated remote rendering with Video Encoding
Lossy and loss-less (Maxwell +) H264 encoder
Separate unit, does not consume “GPU resources”
Leveraged by commercial, free tools
Available on e.g. Titan
Possible use for non-video data
https://developer.nvidia.com/nvidia-video-codec-sdk
Interactivity over large distances
NICE DCV running on Titan in user space
![Page 18: GPU accelerated Cray XC systems: Where HPC meets Big Data](https://reader034.vdocuments.us/reader034/viewer/2022042201/62588c33d31e6809834f266b/html5/thumbnails/18.jpg)
21
INTRODUCING TESLA P100 New GPU Architecture to Enable the World’s Fastest Compute Node
Pascal Architecture NVLink CoWoS HBM2 Page Migration Engine
PCIe
Switch
PCIe
Switch
CPU CPU
Highest Compute Performance GPU Interconnect for Maximum Scalability
Unifying Compute & Memory in Single Package
Simple Parallel Programming with Virtually Unlimited Memory
Unified Memory
CPU
Tesla P100
![Page 19: GPU accelerated Cray XC systems: Where HPC meets Big Data](https://reader034.vdocuments.us/reader034/viewer/2022042201/62588c33d31e6809834f266b/html5/thumbnails/19.jpg)
22 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
Giant leaps
in everything
NVLINK
PAGE MIGRATION ENGINE
PASCAL ARCHITECTURE
CoWoS HBM2 Stacked Mem
K40 Tera
flops
(FP32/FP16)
5
10
15
20
P100
(FP32)
P100
(FP16)
M40
K40
Bi-
dir
ecti
onal BW
(G
B/Sec)
40
80
120
160 P100
M40
K40 Bandw
idth
(G
B/s)
200
400
600
P100
M40 K40
Addre
ssable
Mem
ory
(G
B)
10
100
1000
P100
M40
21 Teraflops of FP16 for Deep Learning 5x GPU-GPU Bandwidth
3x Higher for Massive Data Workloads Virtually Unlimited Memory Space
10000 800
![Page 20: GPU accelerated Cray XC systems: Where HPC meets Big Data](https://reader034.vdocuments.us/reader034/viewer/2022042201/62588c33d31e6809834f266b/html5/thumbnails/20.jpg)
23
nvGRAPH Accelerated Graph Analytics
nvGRAPH for high performance graph analytics
Deliver results up to 3x faster than CPU-only
Solve graphs with up to 2.5 Billion edges on 1x M40
Accelerates a wide range of graph analytics apps:
developer.nvidia.com/nvgraph
PageRank Single Source Shortest
Path
Single Source Widest
Path
Search Robotic Path Planning IP Routing
Recommendation Engines Power Network Planning Chip Design / EDA
Social Ad Placement Logistics & Supply Chain
Planning
Traffic sensitive routing 0
1
2
3
Itera
tions/
s
nvGRAPH: 3x Speedup
48 Core Xeon E5
nvGRAPH on M40
PageRank on Twitter 1.5B edge dataset
CPU System: 4U server w/ 4x12-core Xeon E5-2697 CPU,
30M Cache, 2.70 GHz, 512 GB RAM
![Page 21: GPU accelerated Cray XC systems: Where HPC meets Big Data](https://reader034.vdocuments.us/reader034/viewer/2022042201/62588c33d31e6809834f266b/html5/thumbnails/21.jpg)
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
![Page 22: GPU accelerated Cray XC systems: Where HPC meets Big Data](https://reader034.vdocuments.us/reader034/viewer/2022042201/62588c33d31e6809834f266b/html5/thumbnails/22.jpg)
GPUs in XC = GPUs elsewhere (almost)
GPU Accelerated Cray XC Systems: Where HPC Meets Big Data CUG 2016 BOF Sadaf Alam, CSCS May 10, 2016
![Page 23: GPU accelerated Cray XC systems: Where HPC meets Big Data](https://reader034.vdocuments.us/reader034/viewer/2022042201/62588c33d31e6809834f266b/html5/thumbnails/23.jpg)
![Page 24: GPU accelerated Cray XC systems: Where HPC meets Big Data](https://reader034.vdocuments.us/reader034/viewer/2022042201/62588c33d31e6809834f266b/html5/thumbnails/24.jpg)
Platform Consolidation: Enabling Workloads & Workflows Diversity
§ Simulations using a range of programming paradigms § MPI, OpenMP, … § CUDA, OpenCL, OpenACC, … § Optimized libraries § Domain specific libraries § …
§ In-situ and large-scale visualization tools § VisIt § ParaView § Application specific
§ New domains (e.g. DL)
![Page 25: GPU accelerated Cray XC systems: Where HPC meets Big Data](https://reader034.vdocuments.us/reader034/viewer/2022042201/62588c33d31e6809834f266b/html5/thumbnails/25.jpg)
Continuous Co-design & Integration § GPUs in Cray XK6/7 and XC environments have come a long way
§ GPUDirect and CUDA aware MPI § Quick CUDA releases and updates § Availability of complete toolchain & “user controlled” compute modes and features
§ Potential with containers § Variety of workloads and workflows with other dependencies (e.g. OS, Python, etc.) § Data science applications & workloads § DL solutions and frameworks
§ Emerging technologies from Nvidia readily available on Cray XC
![Page 26: GPU accelerated Cray XC systems: Where HPC meets Big Data](https://reader034.vdocuments.us/reader034/viewer/2022042201/62588c33d31e6809834f266b/html5/thumbnails/26.jpg)
Further Collaboration with Sites
§ Strengthen partnership with Cray and Nvidia … and other CUG sites § Enable a richer
experiences for users § Take advantage of
§ Excellent interconnect with GPU aware MPI
§ Flexible programming and execution model
§ Leverage & develop open- source solutions
§ Nvidia DL SDK (accelerated frameworks)