enspy: python library for computations of ensembles of particles on gpu

Post on 19-Feb-2015

42 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Talk at Frontiers in Computational Astrophysics (Lyon, France, 10-16 October, 2010).

TRANSCRIPT

EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy: Python library for computations ofensembles of particles on GPU

Glib Ivashkevych

Institute of Theoretical Physics, NSC KIPT,Kharkov, Ukraine

October 13, 2010

EnSPy: Python library for computations of ensembles of particles on GPU

Why GPU?

GPU – Graphic Processing Unit

programmable

manycore

multithreaded

with very high memory bandwidth

GPU programming give us:

high performance

transparent scalability

... and is useful for problems with high data parallelism:

large datasets

portions of data could be processed independently

EnSPy: Python library for computations of ensembles of particles on GPU

Why GPU?

GPU – Graphic Processing Unit

programmable

manycore

multithreaded

with very high memory bandwidth

GPU programming give us:

high performance

transparent scalability

... and is useful for problems with high data parallelism:

large datasets

portions of data could be processed independently

EnSPy: Python library for computations of ensembles of particles on GPU

Outline

Outline1 NVIDIA GPU Architecture and CUDA2 Python and CUDA3 EnSPy functionality4 EnSPy architecture5 Example: D5 potential6 Example: Hill problem7 Example: Hill problem, N–body version8 Performance results9 Future development

EnSPy: Python library for computations of ensembles of particles on GPU

NVIDIA GPU Architecture and CUDA

Outline1 NVIDIA GPU Architecture and CUDA2 Python and CUDA3 EnSPy functionality4 EnSPy architecture5 Example: D5 potential6 Example: Hill problem7 Example: Hill problem, N–body version8 Performance results9 Future development

EnSPy: Python library for computations of ensembles of particles on GPU

NVIDIA GPU Architecture and CUDA

Simplified GT200 architecture

consists ofmultiprocessors

each MP has:

8 stream processors1 unit for doubleprecision operationsshared memory

global memory

EnSPy: Python library for computations of ensembles of particles on GPU

NVIDIA GPU Architecture and CUDA

Multiprocessors and threads

MP can launch numerous threads

threads are ”lightweight” – little creation and switchingoverhead

threads run the same code

threads syncronization within MP

cooperation via shared memory

each thread have unique identifier – thread ID

Efficiency is achieved by latency hiding by calculation, and not bycache usage, as on CPU

EnSPy: Python library for computations of ensembles of particles on GPU

NVIDIA GPU Architecture and CUDA

C for CUDA

a set of extensions to C

runtime library

function and variable type qualifiers

built–in vector types: float4, double2 etc.

built–in variables

Kernels

maps parallel part of the program to the GPU

execution: N times in parallel by N CUDA threads

CUDA Driver API

low–level control over the execution

no need in nvcc compiler if kernels are precompiled – onlydriver needed

EnSPy: Python library for computations of ensembles of particles on GPU

NVIDIA GPU Architecture and CUDA

Execution model

EnSPy: Python library for computations of ensembles of particles on GPU

Python and CUDA

Outline1 NVIDIA GPU Architecture and CUDA2 Python and CUDA3 EnSPy functionality4 EnSPy architecture5 Example: D5 potential6 Example: Hill problem7 Example: Hill problem, N–body version8 Performance results9 Future development

EnSPy: Python library for computations of ensembles of particles on GPU

Why Python?

Python: flexible multipurpose interpreted language

easy to learn

dynamically typed

rich built–in functionality

very well documented

have large and active community

EnSPy: Python library for computations of ensembles of particles on GPU

Why Python?

Python scientific packages:

SciPy – modeling and simulation

Fourier transformsODEOptimizationscipy.weave.inline – C inlining with little or nooverhead· · ·

NumPy – arrays, linear algebra etc.

flexible array creation routinessorting, random sampling and statistics· · ·

Python is a convenient way of interfacing C/C++ libraries

EnSPy: Python library for computations of ensembles of particles on GPU

Why Python?

Python scientific packages:

SciPy – modeling and simulation

Fourier transformsODEOptimizationscipy.weave.inline – C inlining with little or nooverhead· · ·

NumPy – arrays, linear algebra etc.

flexible array creation routinessorting, random sampling and statistics· · ·

Python is a convenient way of interfacing C/C++ libraries

EnSPy: Python library for computations of ensembles of particles on GPU

Why Python?

Python scientific packages:

SciPy – modeling and simulation

Fourier transformsODEOptimizationscipy.weave.inline – C inlining with little or nooverhead· · ·

NumPy – arrays, linear algebra etc.

flexible array creation routinessorting, random sampling and statistics· · ·

Python is a convenient way of interfacing C/C++ libraries

EnSPy: Python library for computations of ensembles of particles on GPU

Why Python?

Python and CUDA

We could interface with:

Python C API – low–level approach: overkill

SWIG, Boost::Python – high–level approach: overkill

PyCUDA – most simple and straightforward way for CUDAonly

scipy.weave.inline – simple and straightforward way forboth CUDA and plain C/C++

EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy functionality

Outline1 NVIDIA GPU Architecture and CUDA2 Python and CUDA3 EnSPy functionality4 EnSPy architecture5 Example: D5 potential6 Example: Hill problem7 Example: Hill problem, N–body version8 Performance results9 Future development

EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy functionality

Motivation

Combine flexibility of Python with efficiency of C++ → CUDA forN–body sim

interface of EnSPy is written in Python

core of EnSPy is written in C++

joined together by scipy.weave.inline

C++ core could be used without Python – just include headerand link with precompiled shared library

easily extensible: both through high–level Python interfaceand low–level C++ core – new algorithms, initial distributionsetc.

multi–GPU parallelization

it’s easy to experiment with EnSPy!

EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy functionality

EnSPy functionality

Types of ensembles:

”Simple” ensemble – without interaction, only externalpotential

N–body ensemble – both external potential and gravitationalinteraction between particles

Current algorithms:

4-th order Runge–Kutta for ”simple” ensemble

Hermite scheme with shared time steps for N-body ensemble

EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy functionality

Predefined initial distributions:

Uniform, point and spherical for ”simple” ensembles

Uniform sphere with 2T/|U| = 1 for N-body ensemble

user could supply functions (in Python) for initial ensemblegeneration

User specified values and expressions:

parameters of initial distribution

potential, forces, parameters of integration scheme

arbitrary number of triggers – Ni (t) of particles which do notcross the given hypersurface Fi (q, p) = 0 before time t

arbitrary number of averages – F̄i (q, p, t) – quantities whichshould be averaged over the ensembles

EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy functionality

Runtime generation and compilation of C and CUDA code:

User specified expressions (as Python strings) are wrapped byEnSPy template subpackage into C functions and CUDAmodule

Compiled at runtime

High usage and calculation efficiency:

flexible Python interface

all actual calculations are performed by runtime generated Cextension and precompiled shared library

Drawback:

extra time for generation and compilation of new code

EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy architecture

Outline1 NVIDIA GPU Architecture and CUDA2 Python and CUDA3 EnSPy functionality4 EnSPy architecture5 Example: D5 potential6 Example: Hill problem7 Example: Hill problem, N–body version8 Performance results9 Future development

EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy architecture

Execution flow and architecture

Input parameters

Ensemble population(predefined or user specifieddistribution)

Code generation andcompilation

Launching NGPUs threads

EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy architecture

GPU parallelization scheme for N–body simulations

EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy architecture

Order of force calculation

EnSPy: Python library for computations of ensembles of particles on GPU

Example: D5 potential

Outline1 NVIDIA GPU Architecture and CUDA2 Python and CUDA3 EnSPy functionality4 EnSPy architecture5 Example: D5 potential6 Example: Hill problem7 Example: Hill problem, N–body version8 Performance results9 Future development

EnSPy: Python library for computations of ensembles of particles on GPU

Example: D5 potential

Overview

Problem:Escape from potential well.

Watched values (trigger):

N(t) – number of particles, remaining in the well at time t

Potential:

UD5 = 2ay2 − x2 + xy2 +x4

4

”Critical” energy: Ecr = ES = 0

EnSPy: Python library for computations of ensembles of particles on GPU

Example: D5 potential

Potential and structure of phase space:

−2 −1 0 1 2x

−2

−1

0

1

2

y

Level lines of D5 potential

2 1 0 1 2

2

1

0

1

2

x

px

EnSPy: Python library for computations of ensembles of particles on GPU

Example: D5 potential

Calculation setup:

”Simple ensemble”

uniform initial distribution of N = 10240 particles inx > 0 ∩ U(x , y) < E

trigger: x = 0→ q0 = 0.

12 lines of simple Python code (examples/d5.py):specification of integration parameters

EnSPy: Python library for computations of ensembles of particles on GPU

Example: D5 potential

Results:

Regular particles are trapped in well → initial ”mixed state” splits

E = 0.1

E = 0.9

0 10 20 300

0.2

0.4

0.6

0.8

1

t

N(t)/N(0)

EnSPy: Python library for computations of ensembles of particles on GPU

Example: Hill problem

Outline1 NVIDIA GPU Architecture and CUDA2 Python and CUDA3 EnSPy functionality4 EnSPy architecture5 Example: D5 potential6 Example: Hill problem7 Example: Hill problem, N–body version8 Performance results9 Future development

EnSPy: Python library for computations of ensembles of particles on GPU

Example: Hill problem

Overview

Problem:Toy model of escape from star cluster: escape of star frompotential of point rotating star cluster Mc and point galaxy coreMg � Mc

Watched values (trigger):

N(t) – number of particles, remaining in cluster at time t

”Potential” in cluster frame of reference (tidal approximation):

UHill = −3ω2x2 − GMc

r2

”Critical” energy: Ecr = ES = −4.5ω2

EnSPy: Python library for computations of ensembles of particles on GPU

Example: Hill problem

Potential:

−1.0 −0.5 0.0 0.5x

−1.0

−0.5

0.0

0.5

y

Hill curves

EnSPy: Python library for computations of ensembles of particles on GPU

Example: Hill problem

Calculation setup:

”Simple ensemble”

uniform initial distribution of N = 10240 particles in|x | < rt ∩ U(x , y) < E

ω = 1√3→ rt = 1

trigger: |x | − rt = 0→ abs(q0) - 1. = 0.

12 lines of simple Python code (examples/hill plain.py):specification of integration parameters

EnSPy: Python library for computations of ensembles of particles on GPU

Example: Hill problem

Results:

Traping of regular particles (some tricky physics here):

0

2 · 103

4 · 103

6 · 103

8 · 103

1 · 104

N(t

)

0 2.5 · 104 5 · 104 7.5 · 104 1 · 105

nt

E = −1.3E = −0.8E = −0.3

EnSPy: Python library for computations of ensembles of particles on GPU

Example: Hill problem, N–body version

Outline1 NVIDIA GPU Architecture and CUDA2 Python and CUDA3 EnSPy functionality4 EnSPy architecture5 Example: D5 potential6 Example: Hill problem7 Example: Hill problem, N–body version8 Performance results9 Future development

EnSPy: Python library for computations of ensembles of particles on GPU

Example: Hill problem, N–body version

Overview

Problem:Simplified model of escape from star cluster: escape of star frompotential of rotating star cluster with total mass Mc and pointpotential of galaxy core with mass Mg � Mc (2D)

Watched values:Configuration of cluster

Potential of galaxy core in cluster frame of reference (tidalapproximation):

UHillNB = −3ω2x2

EnSPy: Python library for computations of ensembles of particles on GPU

Example: Hill problem, N–body version

”Toy” Hill model vs N–body Hill model:

EnSPy: Python library for computations of ensembles of particles on GPU

Example: Hill problem, N–body version

Calculation setup:

N–body ensemble

2D (z = 0) initial distribution of N = 10240 particles insidecircle R with zero initial velocities

14 lines of simple Python code (examples/hill nbody.py):specification of integration parameters

Mc = 1, R = 200, ω = 1√3

EnSPy: Python library for computations of ensembles of particles on GPU

Example: Hill problem, N–body version

Results: cluster configuration

step = 201

−300

−200

−100

0

100

200

300

y

−300 −200 −100 0 100 200 300x

step = 801

−300

−200

−100

0

100

200

300

y

−300 −200 −100 0 100 200 300x

step = 401

−300

−200

−100

0

100

200

300

y

−300 −200 −100 0 100 200 300x

step = 1001

−300

−200

−100

0

100

200

300

y

−300 −200 −100 0 100 200 300x

step = 601

−300

−200

−100

0

100

200

300

y

−300 −200 −100 0 100 200 300x

step = 1201

−300

−200

−100

0

100

200

300

y

−300 −200 −100 0 100 200 300x

EnSPy: Python library for computations of ensembles of particles on GPU

Performance results

Not as good, as it could be – subject to improve. Estimation:∼ 1TFlops on 2x recent Fermi graphic processors

0

10

20

30

40

GF

lop/s

1 · 104 2 · 104 5 · 104 1 · 105 2 · 105

N

GTX260 DP - N–bodyGTX260 DP – ”simple” ensemble

EnSPy: Python library for computations of ensembles of particles on GPU

Future development

Must have features:

MPI: shifting from ”one host–multiple GPUs” to ”multiplehosts–multiple GPUs” environment

individual timesteps for Hermite

tree–codes

Performance improvements:

utilization of texture memory

better load balancing between GPUs

top related