nvidia application lab at juelichdeveloper.download.nvidia.com/gtc/pdf/gtc2012/...14.11.2012 4dirk...

15
Mitglied der Helmholtz- Gemeinschaft NVIDIA Application Lab at Jülich Dirk Pleiter | Jülich Supercomputing Centre (JSC)

Upload: others

Post on 28-Jan-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

  • Mitg

    lied

    de

    r H

    elm

    ho

    ltz-

    Gem

    ein

    sch

    aft

    NVIDIA Application

    Lab at Jülich

    Dirk Pleiter | Jülich Supercomputing Centre (JSC)

  • 14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 2

    Forschungszentrum Jülich at a Glance (status 2010)

    Budget: 450 mio Euro

    Staff: 4,800 (thereof 1,630 scientists)

    Visiting scientists: 900 per year

    Trainees: 90

    Publications: 1,800

    Protective rights and licences: 14,800

    Research fields: health, energy and

    environment, and information technology;

    key technologies for tomorrow

  • 14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 3

    Supercomputer operation for: Centre – FZJ,

    Regional – JARA

    Helmholtz & National – NIC, GCS

    Europe – PRACE, EU projects

    Application support User support; coordination with SimLabs

    Scientific Visualization

    Peer review support and coordination

    R&D work Algorithms, performance analysis and tools

    Community data management service

    Computer architectures, Exascale Laboratories: EIC, ECL, NVIDIA

    Education and Training

    Jülich Supercomputing Centre

  • 14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 4

    2004

    2006-8

    2009

    2012 File Server

    GPFS, Lustre

    IBM Power 4+

    JUMP, 9 TFlop/s

    IBM Power 6

    JUMP, 9 TFlop/s

    IBM Blue Gene/P

    JUGENE, 1 PFlop/s

    Supercomputer Systems: Dual Track Approach

    HPC-FF

    100 TFlop/s

    JUROPA

    200 TFlop/s

    General-Purpose Highly-Scalable

    2014

    JUROPA++

    Cluster, 1-2 PFlop/s

    + Booster

    IBM Blue Gene/Q

    JUQUEEN

    5.7 PFlop/s (target)

    IBM Blue Gene/L

    JUBL, 45 TFlop/s

    JUDGE

    240 TFlop/s

  • 14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 5

    JUDGE Cluster

    System 206 IBM iDataPlex nodes

    2 Tesla M2050 or M2070 per node

    Infiniband QDR network

    Peak performance: 239 Tflops

    Users Institute for Advanced Simulations

    Molecular dynamics and mechanics, micro-magnetism simulations, medical image reconstruction

    JuBrain partition

    Milkey Way partition

  • 14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 6

    NVIDIA Application Lab at Jülich

    Collaboration between JSC and NVIDIA since July 2012 Enable scientific applications for GPU-based architectures

    Provide support for their optimization

    Investigate performance and scaling

    Work focus Application requirements analysis

    Kepler and CUDA feature analysis

    Parallelization on many GPUs

    Collaboration with performance tools developers

    Training

  • 14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 7

    Pilot Application: JuBrain

    Application developed at the Institute of Neuroscience and Medicine (INM-1) at Forschungszentrum Jülich:

    Katrin Amunts, Markus Axer, Marcel Huysegoms

    Research goal Accurate, highly detailed computer model of the human brain

  • 14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 8

    Brain Section Images

    Blockface pictures Created while cutting brain in sections

    Histological images Polarized light images

    Low resolution vs. high resolution

    100 μm → 3 μm pixel size

    30 MBytes → 40 Gbytes data

    Challenge: 3d reconstruction

    Exceeds GPU

    memory capacity

  • 14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 9

    3D Reconstruction

    Registration algorithms Rigid registration → 3 parameters

    Afine registration → 6 parameters

    Elastic registration → O(100) parameters

    Moving image Metric

    Fixed image Interpolator Transformation

    Optimizer

    O(30)

    speedup

    on GPU

  • 14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 10

    Fluid dynamics on Fermi and Kepler

    Lattice Boltzmann method D2Q37 model

    Application developed at U Rome Tore Vergata/INFN, U Ferrara/INFN, TU Eindhoven

    Reproduce dynamics of fluid by simulating virtual particles which collide and propagate

    Simulation of large systems requires double precision computation on many GPUs

  • 14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 11

    Collide kernel on Fermi

    Kernel dominated by arithmetic operations

    Floating-point performance as a function of the number of threads/block [GFlop/s]

    Implementation:

    F. Schifano (U Ferrara/INFN)

    Excellent

    performance

    on Fermi

  • 14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 12

    Kepler Performance Tuning

    Performance analysis observations

    Significant increase of L1 cache misses

    17% (Tesla M2090) → 67% (Tesla K20)

    SM performance increased, but L1 cache capacity remained unchanged

    Problem mitigation by simple code change Enforce loop unrolling to eliminate indirect memory accesses

    for (i = 0; i < NPOP-1; i++) {

    lPop = p_prv[i*NX*NY + idx];

    u = u + param_cx[i] * lPop;

    v = v + param_cy[i] * lPop;

    }

    #pragma unroll

    for (i = 0; i < NPOP-1; i++) {

    lPop = p_prv[i*NX*NY + idx];

    u = u + param_cx[i] * lPop;

    v = v + param_cy[i] * lPop;

    }

    J. Kraus (NVIDIA Lab)

  • 14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 13

    Collide kernel on Kepler GK110

    Comparison Fermi vs. Kepler Grid size considered here:

    252 x 16384

    Floating-point performance as a function of the number of threads/block

    Performance

    improvement

    1.7x

  • 14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 14

    Propagate kernel

    Kernel dominated by memory access Grid size considered here:

    252 x 16384

    Memory bandwidth [GByte/s] as a function of the number of threads/block

    Performance

    improvement

    1.4x

  • 14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 15

    Summary

    NVIDIA Application Lab at Jülich New and fruitful model for collaboration

    We are just at the beginning ...

    Application requirements analysis JuBrain: Project aiming for realistic model of the human brain

    Kepler feature analysis Initial performance results for Lattice Boltzmann application on GK110

    Very high performance level reached on Fermi can be sustained