use of gpus in alice (and elsewhere) thorsten kollegger

13
Use of GPUs in ALICE (and elsewhere) Thorsten Kollegger TDOC-PG | CERN | 17.07.2013

Upload: jolene

Post on 22-Feb-2016

94 views

Category:

Documents


0 download

DESCRIPTION

Use of GPUs in ALICE (and elsewhere) Thorsten Kollegger. TDOC-PG | CERN | 17.07.2013. GPUs for General Purpose Computing. In the last 5+ years, increased usage of GPUs (or more general accelerator cards) in High Performance Computing Systems. Top 500 list - 2013. NVIDIA. AMD. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Use of  GPUs  in  ALICE  (and elsewhere) Thorsten  Kollegger

Use of GPUs in ALICE (and elsewhere)

Thorsten Kollegger

TDOC-PG | CERN | 17.07.2013

Page 2: Use of  GPUs  in  ALICE  (and elsewhere) Thorsten  Kollegger

Use of GPUs in ALICE and elsewhere | Thorsten Kollegger | ECFA HL-LHC TDOC-PG | 17.07.2013 2

GPUs for General Purpose Computing

In the last 5+ years, increased usage of GPUs (or more generalaccelerator cards) in High Performance Computing Systems

Top 500 list - 2013NVIDIA

AMD

Intel

Page 3: Use of  GPUs  in  ALICE  (and elsewhere) Thorsten  Kollegger

Use of GPUs in ALICE and elsewhere | Thorsten Kollegger | ECFA HL-LHC TDOC-PG | 17.07.2013 3

GPUs for General Purpose Computing

Driven by (theoretical) peak performanceGPU: O(1) TFLOP/s (NVIDIA TESLA K20: 3.2 TFLOP/s)CPU: O(0.1) TFLOP/s (Intel Xeon E5-2690 : 243 GFLOP/s)

Can this theoretical peak performance be used efficiently for the typical HEP workload?

Page 4: Use of  GPUs  in  ALICE  (and elsewhere) Thorsten  Kollegger

Use of GPUs in ALICE and elsewhere | Thorsten Kollegger | ECFA HL-LHC TDOC-PG | 17.07.2013 4

GPGPU Processing Model

Pre-Conditions for effective GPU speed-up of applications• Computationally intensive — Time needed for computing much

larger then time need for

data transfer to GPU

• Massively parallel — Hundreds of independent computing tasks

Few complex CPU cores vsmany simple GPU cores

Programming Languages:CUDA, OpenCLOpenACC, OpenMP, OpenHMPP, TBB, MPI

Page 5: Use of  GPUs  in  ALICE  (and elsewhere) Thorsten  Kollegger

Use of GPUs in ALICE and elsewhere | Thorsten Kollegger | ECFA HL-LHC TDOC-PG | 17.07.2013 5

What to expect?

Typical success stories of GPGPU usage report >x100 speedup

However:• The expected speedup is strongly depending on workloads.• Comparing optimized multi-core CPU versions with optimized

GPU versions for most workloads speedup’s of ~5 are measured

Page 6: Use of  GPUs  in  ALICE  (and elsewhere) Thorsten  Kollegger

Use of GPUs in ALICE and elsewhere | Thorsten Kollegger | ECFA HL-LHC TDOC-PG | 17.07.2013 6

GPGPUs in HEP

Lots of R&D activities in the experiments ongoing• Mostly focused on Trigger or High-Level-Trigger systems,

HW decisions easier than in heterogeneous GRID systems• R&D projects I know of, for sure incomplete:

• ALICE, ATLAS, CMS, LHCb @ LHC, CERN• NA62 @ SPS, CERN• CBM, PANDA @ FAIR, GSI, Germany• STAR @ RHIC, BNL, USA• GEANT 4• …

ALICE HLT is using GPUs in production since 2010/2011

Page 7: Use of  GPUs  in  ALICE  (and elsewhere) Thorsten  Kollegger

Use of GPUs in ALICE and elsewhere | Thorsten Kollegger | ECFA HL-LHC TDOC-PG | 17.07.2013 7

ALICE HLT

Input data rate: ~1 kHz, 20 GByte/sEvent size ranging from <1 MByte (p+p) to 80 MByte (central Pb+Pb)

Full online reconstructionincluding tracking of TPC+IST• (intermediate) results replace

raw data to limit storage space

Compute nodes (CN/CNGPU)• Full event reconstruction• 32+32 nodes with NVIDIA GTX 480/580 GTX580 newly installed in 2011

Page 8: Use of  GPUs  in  ALICE  (and elsewhere) Thorsten  Kollegger

Use of GPUs in ALICE and elsewhere | Thorsten Kollegger | ECFA HL-LHC TDOC-PG | 17.07.2013 8

ALICE HLT TPC Tracker

TPC tracking algorithm based on Cellular Automaton approach

Optimized for multi-core CPUs tofulfill latency requirements

2009 ported to CUDA for use on NVIDIA GTX285 consumer cards, changed to use single precision2010 ported to GTX480 2011 added GTX580, fully commisioned

Page 9: Use of  GPUs  in  ALICE  (and elsewhere) Thorsten  Kollegger

Use of GPUs in ALICE and elsewhere | Thorsten Kollegger | ECFA HL-LHC TDOC-PG | 17.07.2013 9

ALICE HLT TPC Tracker Speedup

4-fold Speedup compared to optimized CPU versionNote: frees CPUs on CN for other operations (tagging/trigger)

Page 10: Use of  GPUs  in  ALICE  (and elsewhere) Thorsten  Kollegger

Use of GPUs in ALICE and elsewhere | Thorsten Kollegger | ECFA HL-LHC TDOC-PG | 17.07.2013 10

ALICE HLT GPU Experience

Experience quite promising, will continue/expand in Run 2• Allowed to reduce system size by factor 3 • Stable operation even with consumer hardware

Comes with some cost• Initial porting to CUDA, change to SP: 1.5 PhD students/1 year• Every new GPU generation requires re-tuning (even same chip)• Need to support two versions (CPU for simulation, GPU)• Full loading of GPU requires quite some effort: currently at 67%

Page 11: Use of  GPUs  in  ALICE  (and elsewhere) Thorsten  Kollegger

Use of GPUs in ALICE and elsewhere | Thorsten Kollegger | ECFA HL-LHC TDOC-PG | 17.07.2013 11

GPUs in the NA62 TDAQ system

ROboard

L0TP

L1 PC

GPU

L1TP

L2 PC

GPU

GPU

1 MHz 100 kHz

ROboard

L0GPU

L0TP

10 MHz 10 MHz

1 MHz

Max 1 ms latency

The use of the GPU at the software levels (L1/2) is “straightforward”: put the video card in the PC.No particular changes to the hardware are neededThe main advantages is to exploit the power of GPUs to reduce the number of PCs in the L1 farms

The use of GPU at L0 is more challenging:

Fixed and small latency (dimension of the L0 buffers) Deterministic behavior(synchronous trigger)Very fast algorithms (high rate) 1

1

Slide from Gianluca Lamann (CERN)

Page 12: Use of  GPUs  in  ALICE  (and elsewhere) Thorsten  Kollegger

Use of GPUs in ALICE and elsewhere | Thorsten Kollegger | ECFA HL-LHC TDOC-PG | 17.07.2013 12

Some recent trends

Direct transfer of data from e.g. network to GPU w/o involving CPU(AMD: DirectGMA, NVIDIA: GPU Direct 2)

APUs: Integrate GPU with CPUs on a chipNVIDIA Tegra: ARM+GPUAMD Fusion: x86+GPU

CPU Memory

CPU

PCIe bus

SDI Input/Output card

Graphics card

FPGA

SDI out

SDI in

Peer-to-peer transfers (DirectGMA)

GPU

Page 13: Use of  GPUs  in  ALICE  (and elsewhere) Thorsten  Kollegger

Use of GPUs in ALICE and elsewhere | Thorsten Kollegger | ECFA HL-LHC TDOC-PG | 17.07.2013 13

Where we are…

GPGPUs can provide a significant benefit today• mainly for tightly-controlled systems, e.g. Trigger & HLT

- reduced infrastructure cost <-> development cost• main issue is programming complexity & maintenance

- will there be a common programming language/library? avoid vendor lock-in…- do we need the ultimate performance?

Highly-parallel programming model will be also relevant for effective use of future many-core CPUs

GPUs evolving more and more into independent compute units