sampling secondary particles in high energy physics ......s. y. jun @gpu in hep 9/12/2014 2...

27
Sampling Secondary Particles in High Energy Physics Simulation on the GPU Soon Yung Jun Fermilab for the Geant Vector/Coprocessor R&D Team GPU Computing in High Energy Physics Sept. 10-12, 2014 Pisa, Italy

Upload: others

Post on 08-May-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sampling Secondary Particles in High Energy Physics ......S. Y. Jun @GPU in HEP 9/12/2014 2 Simulation with Coprocessors • Optimal algorithms for coprocessors (GPUs, many cores)

Sampling Secondary Particles in High Energy PhysicsSimulation on the GPU

Soon Yung JunFermilab

for the Geant Vector/Coprocessor R&D Team

GPU Computing in High Energy Physics

Sept. 10-12, 2014

Pisa, Italy

Page 2: Sampling Secondary Particles in High Energy Physics ......S. Y. Jun @GPU in HEP 9/12/2014 2 Simulation with Coprocessors • Optimal algorithms for coprocessors (GPUs, many cores)

Introduction

• MC methods are often the only practicalway to sample random variables governedby complicated probability distributions

• MC methods that are widely used in highenergy physics and detector simulation

– inverse transform (analytical solution)– acceptance and rejection– composition and rejection (Geant4)

• Compton scattering: Formula vs. Geant4

– Klein-Nishina dσ[γ(E, 0) → γ′(E′, sin θ)]

– average number of trials Ns = 1.25,sampling efficiency = 0.80 at E = 1 MeV

using a composition and rejection

• What is a problem?

Scattered Photon Energy (MeV)0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

/dE

σC

ompt

on S

catte

ring

d

1

1.5

2

2.5

3 Geant4Klein-Nishina

E = 1 MeV

)S

Number of Trials for Sampling (N0 1 2 3 4 5 6 7 8 9 10

SdN

/dN

1

10

210

310

410

510

610

<Ns> = 1.25

S. Y. Jun @GPU in HEP 9/12/2014 2

Page 3: Sampling Secondary Particles in High Energy Physics ......S. Y. Jun @GPU in HEP 9/12/2014 2 Simulation with Coprocessors • Optimal algorithms for coprocessors (GPUs, many cores)

Simulation with Coprocessors

• Optimal algorithms for coprocessors (GPUs, many cores)

– maximal instruction throughput and data locality– minimal branch and memory access

• Problems of sampling with “acceptance/composition and rejection” methods

– non-deterministic loop counter (the number of trials in sampling)

//sample x and y for a given yo

do {

x = f(r1);

y = g(x,r2);

} while (y<yo);

– implicit (do-while) loops: divergence, not vectorizable– adopt/change/develop algorithms for parallel or vector architectures

S. Y. Jun @GPU in HEP 9/12/2014 3

Page 4: Sampling Secondary Particles in High Energy Physics ......S. Y. Jun @GPU in HEP 9/12/2014 2 Simulation with Coprocessors • Optimal algorithms for coprocessors (GPUs, many cores)

Problem Statement and Background

• Explore physics algorithms for HEP detector simulation suitable for SIMD/SIMT

• A part of R&D efforts for Geant Vector/Coprocessor Simulation

– concurrent framework (support heterogeneous computing models)– vectorized geometry (bucketized particle transport)– physics models (tabulated/vectorized/massively parallel)

S. Y. Jun @GPU in HEP 9/12/2014 4

Page 5: Sampling Secondary Particles in High Energy Physics ......S. Y. Jun @GPU in HEP 9/12/2014 2 Simulation with Coprocessors • Optimal algorithms for coprocessors (GPUs, many cores)

Review: Sampling Techniques

• Inverse transform (cumulative distribution):F (a) =

R a

−∞f(x)dx for ∀f(x) (p.d.f)

u[0, 1] = F (x) → ∃x = F−1(u)

• Acceptance and rejection (Von Neumann):try x, accept if

u[0, 1] ≤f(x)

Ch(x)

efficient if proportionality const. C → 1

• Composition and rejection: decomposewith density functions, fi(x) and rejectionfunctions, 0 < gi(x) < 1

f(x) =n

X

i=1

αifi(x)gi(x)

• Alias method (A.J Walker, 1974)

0

1

0

1

F(x)

F(x)

} f (xk)

xxk+1xk

u

xx = F−1(u)

Continuous

distribution

Discrete

distribution

u

(a)

(b)

C h(x)

C h(x)

f (x)

x

f (x)

(a)

(b)

S. Y. Jun @GPU in HEP 9/12/2014 5

Page 6: Sampling Secondary Particles in High Energy Physics ......S. Y. Jun @GPU in HEP 9/12/2014 2 Simulation with Coprocessors • Optimal algorithms for coprocessors (GPUs, many cores)

Alias Method for N Discrete Outcomes

• Recast p.d.f f(x) → c, N equal probableevents, each with likelihood 1/N = c

– alias, a[recipient]= donor– non-alias probability, q[i]– pdf, pj = f(xj) for interpolation

• Sample xj with random u[0.1)

– uniform (Alias): N · u = i + αselect j = (u ≤ q[i]) ? i : a[i]

xd = j∆x, xu = (j + 1)∆x

xj = (1 − α)xd + αxu

– linear interpolation (Alias2): testif u′(pj + pj+1) ≤ (1 − u)pj + u · pj+1

then x = xj

else x = αxd + (1 − α)xu

• Effectively vectorized

100 200 300 400 500 6000

0.001

0.002

0.003

0.004

0.005

0.006

PDF

100 200 300 400 500 6000

0.001

0.002

0.003

0.004

0.005

0.006

PDF

ix100 200 300 400 500 600

Pro

babi

lity

Den

sity

0

0.001

0.002

0.003

0.004

0.005

0.006

f(x)c = 1/N

f(x) > c : donorf(x) < c : recipient

x100 200 300 400 500 600

Equ

al O

utco

me

0

0.001

0.002

0.003

0.004

0.005

0.006

f(x)c = 1/N

S. Y. Jun @GPU in HEP 9/12/2014 6

Page 7: Sampling Secondary Particles in High Energy Physics ......S. Y. Jun @GPU in HEP 9/12/2014 2 Simulation with Coprocessors • Optimal algorithms for coprocessors (GPUs, many cores)

Implementation

• Sampling methods

MC methods Sampling Table

Composition and rejection do-while loop on-the-fly calc.

Inverse transform uniform (InverseCDF) q[double]

linear (InverseCDF2) q[double], p[double]

Alias uniform (Alias) a[int], q[double]

linear (Alias2) a[int], q[double], p[double]

• Physics models studied and size of sampling tables

Process Model Secondaries NS Bin Scale Table Size

γ Compton Klein-Nishina e− 1.012 linear 100×100

e− Bremsstrahlung Seltzer-Berger γ 1.943 log 100×1000

e− Ionization Moller e− 2.162 linear 100×100

e+ Ionization Bhabha e− 4.934 linear 100×100

Table 1: “Composition of rejection methods” used in Geant4 EM physics models

and tested for this talk with primary particle E = exp(−x/200) in x[1, 1001] MeV,

where NS = average number of trials, ǫ = Nevents/N(total trials)

S. Y. Jun @GPU in HEP 9/12/2014 7

Page 8: Sampling Secondary Particles in High Energy Physics ......S. Y. Jun @GPU in HEP 9/12/2014 2 Simulation with Coprocessors • Optimal algorithms for coprocessors (GPUs, many cores)

Test Setups

• Build 2-dimensional sampling tables for each physics models on CPU

– size of double[m][n] (ex: m=energy bins, n=sampling bins)– convert to 1D array: inverseCDF (p, q)m×n and alias (a, p, q)m×n

– remove/refine valley/tail ambiguity for inverseCDF (see backup, p26)

• Data transfer to GPU

– sampling tables and random states: one time– primary tracks (particles): recurrent per task

• Kernel set-up (performance measurement): sample secondary particles including

– task1: read energy of particles and write back the updated momenta– task2: create secondary particles and store them to the global memory (GST)

• Copy secondaries to CPU for validations

• Other considerations: impact of conditional loops, texture memory,data layout (AoS vs. SoA)

S. Y. Jun @GPU in HEP 9/12/2014 8

Page 9: Sampling Secondary Particles in High Energy Physics ......S. Y. Jun @GPU in HEP 9/12/2014 2 Simulation with Coprocessors • Optimal algorithms for coprocessors (GPUs, many cores)

Validation: GPU vs. CPU

• Comparisons: should be identical except the random sequence

– store secondaries on GPU and copy them to CPU (D2H)– compare entries of secondaries produced on GPU to those on CPU– ex: Compton - scattered energy (difference is a statistical fluctuation?)

Sampled Energy (MeV)

100 200 300 400 500 600 700 800 900 1000

/dE

σd

210

310

410

510

610

710

Geant4

GPU

CPU

Geant4

Sampled Energy (MeV)

100 200 300 400 500 600 700 800 900 1000

/dE

σd

210

310

410

510

610

710

InverseCDF

GPU

CPU

InverseCDF

Sampled Energy (MeV)

100 200 300 400 500 600 700 800 900 1000

/dE

σd

210

310

410

510

610

710

InverseCDF2

GPU

CPU

InverseCDF2

Sampled Energy (MeV)

100 200 300 400 500 600 700 800 900 1000

/dE

σd

210

310

410

510

610

710

Alias

GPU

CPU

Alias

Sampled Energy (MeV)

100 200 300 400 500 600 700 800 900 1000

/dE

σd

10

210

310

410

510

610

710

Alias2

GPU

CPU

Alias2

Sampled Energy (MeV)

100 200 300 400 500 600 700 800 900 1000

GP

U/C

PU

0.75

0.8

0.85

0.9

0.95

1

1.05

1.1

1.15

1.2

1.25Geant4Geant4

Sampled Energy (MeV)

100 200 300 400 500 600 700 800 900 1000

GP

U/C

PU

0.75

0.8

0.85

0.9

0.95

1

1.05

1.1

1.15

1.2

1.25InverseCDFInverseCDF

Sampled Energy (MeV)

100 200 300 400 500 600 700 800 900 1000

GP

U/C

PU

0.75

0.8

0.85

0.9

0.95

1

1.05

1.1

1.15

1.2

1.25InverseCDF2InverseCDF2

Sampled Energy (MeV)

100 200 300 400 500 600 700 800 900 1000

GP

U/C

PU

0.75

0.8

0.85

0.9

0.95

1

1.05

1.1

1.15

1.2

1.25AliasAlias

Sampled Energy (MeV)

100 200 300 400 500 600 700 800 900 1000

GP

U/C

PU

0.75

0.8

0.85

0.9

0.95

1

1.05

1.1

1.15

1.2

1.25Alias2Alias2

S. Y. Jun @GPU in HEP 9/12/2014 9

Page 10: Sampling Secondary Particles in High Energy Physics ......S. Y. Jun @GPU in HEP 9/12/2014 2 Simulation with Coprocessors • Optimal algorithms for coprocessors (GPUs, many cores)

Validation: GPU vs. CPU

• Compton - scattered angle (rotated to the incident γ direction)

– the angle is (100%) correlated to the scattered energy, E′

E = meme+E(1−cos θ)

– difference is less than 1.0% for the given sample (a ruler of measurement)

)θSampled sin(

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

)θdN

/dsi

n(

610

Geant4

GPU

CPU

Geant4

)θSampled sin(

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

)θdN

/dsi

n(

610

InverseCDF

GPU

CPU

InverseCDF

)θSampled sin(

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

)θdN

/dsi

n(6

10

InverseCDF2

GPU

CPU

InverseCDF2

)θSampled sin(

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

)θdN

/dsi

n(

610

Alias

GPU

CPU

Alias

)θSampled sin(

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

)θdN

/dsi

n(

610

Alias2

GPU

CPU

Alias2

)θSampled sin(

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

GP

U/C

PU

0.95

0.96

0.97

0.98

0.99

1

1.01

1.02

1.03

1.04

1.05Geant4Geant4

)θSampled sin(

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

GP

U/C

PU

0.95

0.96

0.97

0.98

0.99

1

1.01

1.02

1.03

1.04

1.05InverseCDFInverseCDF

)θSampled sin(

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

GP

U/C

PU

0.95

0.96

0.97

0.98

0.99

1

1.01

1.02

1.03

1.04

1.05InverseCDF2InverseCDF2

)θSampled sin(

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

GP

U/C

PU

0.95

0.96

0.97

0.98

0.99

1

1.01

1.02

1.03

1.04

1.05AliasAlias

)θSampled sin(

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

GP

U/C

PU

0.95

0.96

0.97

0.98

0.99

1

1.01

1.02

1.03

1.04

1.05Alias2Alias2

S. Y. Jun @GPU in HEP 9/12/2014 10

Page 11: Sampling Secondary Particles in High Energy Physics ......S. Y. Jun @GPU in HEP 9/12/2014 2 Simulation with Coprocessors • Optimal algorithms for coprocessors (GPUs, many cores)

Method Validation: Comparisons on CPU

• Energy and angle of scattered particles (4 models × 5 methods)

Sampled Energy (MeV)

100 200 300 400 500 600 700 800 900 1000

/dE

σd

210

310

410

510

610

710

Compton : Energy

inverseCDF

inverseCDF2

alias

alias2

geant4

Compton : Energy

)θSampled sin(

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

)θdN

/dsi

n(

610

Compton : AngleCompton : Angle

Sampled Energy (MeV)

100 200 300 400 500 600 700 800 900 1000

Met

hod/

Gea

nt4

0.75

0.8

0.85

0.9

0.95

1

1.05

1.1

1.15

1.2

1.25

Compton : Energy (Ratio)

inverseCDF

inverseCDF2

alias

alias2

Compton : Energy (Ratio)

)θSampled sin(

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Met

hod/

Gea

nt4

0.95

0.96

0.97

0.98

0.99

1

1.01

1.02

1.03

1.04

1.05

Compton : Angle (Ratio)

inverseCDF

inverseCDF2

alias

alias2

Compton : Angle (Ratio)

Sampled Energy (MeV)

100 200 300 400 500 600 700 800 900 1000

/dE

σd

210

310

410

510

610

710

SeltzerBerger : Energy

inverseCDF

inverseCDF2

alias

alias2

geant4

SeltzerBerger : Energy

)θSampled sin(

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

)θdN

/dsi

n(

410

510

610

710

SeltzerBerger : AngleSeltzerBerger : Angle

Sampled Energy (MeV)

100 200 300 400 500 600 700 800 900 1000

Met

hod/

Gea

nt4

0.75

0.8

0.85

0.9

0.95

1

1.05

1.1

1.15

1.2

1.25

SeltzerBerger : Energy (Ratio)

inverseCDF

inverseCDF2

alias

alias2

SeltzerBerger : Energy (Ratio)

)θSampled sin(

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Met

hod/

Gea

nt4

0.95

0.96

0.97

0.98

0.99

1

1.01

1.02

1.03

1.04

1.05

SeltzerBerger : Angle (Ratio)

inverseCDF

inverseCDF2

alias

alias2

SeltzerBerger : Angle (Ratio)

• The InverseCDF (uniform) shows a bias in Compton Scattering

S. Y. Jun @GPU in HEP 9/12/2014 11

Page 12: Sampling Secondary Particles in High Energy Physics ......S. Y. Jun @GPU in HEP 9/12/2014 2 Simulation with Coprocessors • Optimal algorithms for coprocessors (GPUs, many cores)

Method Validation: Comparisons on CPU

• Energy and angle of scattered particles (4 models × 5 methods)

Sampled Energy (MeV)

100 200 300 400 500 600 700 800 900 1000

/dE

σd

1

10

210

310

410

510

610

710

Moller : Energy

inverseCDF

inverseCDF2

alias

alias2

geant4

Moller : Energy

)θSampled sin(

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

)θdN

/dsi

n(

1

10

210

310

410

510

610

Moller : AngleMoller : Angle

Sampled Energy (MeV)

100 200 300 400 500 600 700 800 900 1000

Met

hod/

Gea

nt4

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3

1.4

1.5

Moller : Energy (Ratio)

inverseCDF

inverseCDF2

alias

alias2

Moller : Energy (Ratio)

)θSampled sin(

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Met

hod/

Gea

nt4

0.8

0.85

0.9

0.95

1

1.05

1.1

1.15

1.2

Moller : Angle (Ratio)Moller : Angle (Ratio)

Sampled Energy (MeV)

100 200 300 400 500 600 700 800 900 1000

/dE

σd

1

10

210

310

410

510

610

710

Bhabha : Energy

inverseCDF

inverseCDF2

alias

alias2

geant4

Bhabha : Energy

)θSampled sin(

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

)θdN

/dsi

n(

410

510

610

Bhabha : AngleBhabha : Angle

Sampled Energy (MeV)

100 200 300 400 500 600 700 800 900 1000

Met

hod/

Gea

nt4

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3

1.4

1.5

Bhabha : Energy (Ratio)Bhabha : Energy (Ratio)

)θSampled sin(

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Met

hod/

Gea

nt4

0.8

0.85

0.9

0.95

1

1.05

1.1

1.15

1.2

Bhabha : Angle (Ratio)

inverseCDF

inverseCDF2

alias

alias2

Bhabha : Angle (Ratio)

• Alias and Alias2 show good agreements compared with Geant4 for all models

S. Y. Jun @GPU in HEP 9/12/2014 12

Page 13: Sampling Secondary Particles in High Energy Physics ......S. Y. Jun @GPU in HEP 9/12/2014 2 Simulation with Coprocessors • Optimal algorithms for coprocessors (GPUs, many cores)

Performance Measurements

• HardwareHost (CPU) Device (GPU)

Intel R© Xeon R© E5-2620 Nvidia K20(Kepler)

12 cores @ 2.0 GHz 2496 cores @0.7 GHz

– compilation: cuda 6.0, nvcc -arch=sm 35 –optimize 3 –use fast math

• Initialization

– build tables on CPU and allocate them to GPU : ∼ 3 msec for q[100 × 100]– set up random states: (blocks × threads) curandState

• Measurement

– 1000 events, 79872 tracks/event, sample one secondary particle/track– thread execution: blocks per grid = 26, threads per block = 192– time in [msec]: CPU Time (1 core) vs. GPU Time (all cores)

• Comparisons: relative gain, speedup (CPU/GPU) with RMS errors

– 5 sampling methods and 4 physics models– 2 kernels: task1, task2 (GST)

S. Y. Jun @GPU in HEP 9/12/2014 13

Page 14: Sampling Secondary Particles in High Energy Physics ......S. Y. Jun @GPU in HEP 9/12/2014 2 Simulation with Coprocessors • Optimal algorithms for coprocessors (GPUs, many cores)

Relative Gain w.r.t Composition and Rejection

Sampling Method

InverseCDF InverseCDF2 Alias Alias2

Tim

e(R

ejec

tion)

/Tim

e(M

etho

d)

0

1

2

3

4Relative Gain: KleinNishina

CPU (+GST)CPUGPU (+GST)GPU

Relative Gain: KleinNishina

Sampling Method

InverseCDF InverseCDF2 Alias Alias2

Tim

e(R

ejec

tion)

/Tim

e(M

etho

d)

0

5

10

Relative Gain: SeltzerBergerCPU (+GST)CPUGPU (+GST)GPU

Relative Gain: SeltzerBerger

Sampling Method

InverseCDF InverseCDF2 Alias Alias2

Tim

e(R

ejec

tion)

/Tim

e(M

etho

d)

0

1

2

3

4Relative Gain: Moller

CPU (+GST)CPUGPU (+GST)GPU

Relative Gain: Moller

Sampling Method

InverseCDF InverseCDF2 Alias Alias2

Tim

e(R

ejec

tion)

/Tim

e(M

etho

d)

0

2

4

6Relative Gain: Bhabha

CPU (+GST)CPUGPU (+GST)GPU

Relative Gain: Bhabha

S. Y. Jun @GPU in HEP 9/12/2014 14

Page 15: Sampling Secondary Particles in High Energy Physics ......S. Y. Jun @GPU in HEP 9/12/2014 2 Simulation with Coprocessors • Optimal algorithms for coprocessors (GPUs, many cores)

Performance: CPU/GPU

Sampling Method

InverseCDF InverseCDF2 Alias Alias2 Rejection

<C

PU

Tim

e/G

PU

Tim

e>

0

50

100

CPU/GPU: KleinNishinaSamplingSampling + GST

CPU/GPU: KleinNishina

Sampling Method

InverseCDF InverseCDF2 Alias Alias2 Rejection

<C

PU

Tim

e/G

PU

Tim

e>

0

50

100

150

CPU/GPU: SeltzerBerger

SamplingSampling + GST

CPU/GPU: SeltzerBerger

Sampling Method

InverseCDF InverseCDF2 Alias Alias2 Rejection

<C

PU

Tim

e/G

PU

Tim

e>

0

20

40

60

80CPU/GPU: Moller

SamplingSampling + GST

CPU/GPU: Moller

Sampling Method

InverseCDF InverseCDF2 Alias Alias2 Rejection

<C

PU

Tim

e/G

PU

Tim

e>

0

20

40

60

80CPU/GPU: Bhabha

SamplingSampling + GST

CPU/GPU: Bhabha

S. Y. Jun @GPU in HEP 9/12/2014 15

Page 16: Sampling Secondary Particles in High Energy Physics ......S. Y. Jun @GPU in HEP 9/12/2014 2 Simulation with Coprocessors • Optimal algorithms for coprocessors (GPUs, many cores)

GPU Time by Kernel Components

• GPU time contribution by each step (reference: Moller InverseCDF)

– Base: invoke objects, read/write track energy (39%)– Random: invoke random states (33%)– InverseCDF: bin search and memory access for CDF (28%)– Alias (+22%), InverseCDF2 (+32%), Alias2 (+38%)

SampleBy Kernel

Base RandomInverseCDF

Alias InverseCDF2

Alias2

Mea

n T

ime/

Eve

nt [m

sec]

0

0.1

0.2

0.3

0.4GPU Time: Moller

S. Y. Jun @GPU in HEP 9/12/2014 16

Page 17: Sampling Secondary Particles in High Energy Physics ......S. Y. Jun @GPU in HEP 9/12/2014 2 Simulation with Coprocessors • Optimal algorithms for coprocessors (GPUs, many cores)

Impact of Implicit Loops: Divergence

• Degradation due to divergence is not proportional to Nmax ∼ O(〈Ns〉 log Ntotal)

– cost for an additional trial is relatively inexpensive in GPU– impact of implicit loops is relatively weak when 〈NS〉 or Nmax is large

N of Trial LoopsG4 1 2 3 4 5 6 7 8 9

Mea

n T

ime/

Eve

nt [m

sec]

0

20

40

60

80

Moller: CPU

N of Trial LoopsG4 1 2 3 4 5 6 7 8 9

Mea

n T

ime/

Eve

nt [m

sec]

0

0.2

0.4

0.6

0.8

Moller: GPU

Model G4 NS G4 GPU [ms] Fixed NS Fixed GPU [ms] G4/Fixed

Klein-Nishina 1.012 0.214 ± 0.002 1.0 0.209 ± 0.002 1.03

Seltzer-Berger 1.943 3.675 ± 0.061 2.0 1.866 ± 0.011 1.97

Moller 2.162 0.393 ± 0.007 2.0 0.243 ± 0.002 1.62

Bhabha 4.934 0.552 ± 0.014 5.0 0.273 ± 0.001 2.03

S. Y. Jun @GPU in HEP 9/12/2014 17

Page 18: Sampling Secondary Particles in High Energy Physics ......S. Y. Jun @GPU in HEP 9/12/2014 2 Simulation with Coprocessors • Optimal algorithms for coprocessors (GPUs, many cores)

SoA vs. AoS: Alias and Alias2

• AoS vs. SoA: struct AoS { struct SoA {

G4int a; G4int *a;

G4double p; G4double *p;

G4double q; G4double *q;

}; };

AoS *aos; SoA soa;

Sampling Model

AoS Alias SoA Alias AoS Alias2 SoA Alias2

Mea

n T

ime/

Eve

nt [m

sec]

0.2

0.25

0.3

0.35

0.4

GPU

Sampling Model

AoS Alias SoA Alias AoS Alias2 SoA Alias2

Mea

n T

ime/

Eve

nt [m

sec]

5

10

15

20

CPU

– GPU: SoA is worse than AoS (uncoalesced memory access by eachthread for random array lookups)

– CPU: SoA is better than AoS (∼ 20%)

S. Y. Jun @GPU in HEP 9/12/2014 18

Page 19: Sampling Secondary Particles in High Energy Physics ......S. Y. Jun @GPU in HEP 9/12/2014 2 Simulation with Coprocessors • Optimal algorithms for coprocessors (GPUs, many cores)

Texture

• 1D Texture (double) for sampling (inverse CDF, PDF) tables

texture<int2, 1> texInverseCDF;

static __inline__ __device__ G4double FetchToDouble(texture<int2, 1> T, int i)

{

int2 v = tex1Dfetch(T,i);

return __hiloint2double(v.y, v.x);

}

Sampling MethodInverseCDF InverseCDFText InverseCDF2 InverseCDF2Text

Mea

n T

ime/

Eve

nt [m

sec]

0

0.2

0.4

0.6

1D Texture (double)SeltzerBergerKleinNishinaMollerBhbha

1D Texture (double)

– 10-20% improvement (use int2 and cast it to double, no linear filtering)

S. Y. Jun @GPU in HEP 9/12/2014 19

Page 20: Sampling Secondary Particles in High Energy Physics ......S. Y. Jun @GPU in HEP 9/12/2014 2 Simulation with Coprocessors • Optimal algorithms for coprocessors (GPUs, many cores)

Conclusion

• Evaluated performance of different MC methods to sample secondaryparticles produced by EM physics processes

– performance degradation due to the thread divergence (with compositionand rejection method) is not as significant as expected

– inverse transform (inverse CDF method) or alias method improvecomputing performance in both CPU and GPU and may replace theconventional composition and rejection method

– alias methods show no bias in sampled distributions for EM modelstested and are suitable for both parallel and vector architectures

• Perspective

– adopt and develop HEP algorithms for HPC– GPU prototype as a task-driven (high throughput applications)– vector prototype as a framework-driven (vector pipeline)– portable codes for different architectures (code abstraction)

S. Y. Jun @GPU in HEP 9/12/2014 20

Page 21: Sampling Secondary Particles in High Energy Physics ......S. Y. Jun @GPU in HEP 9/12/2014 2 Simulation with Coprocessors • Optimal algorithms for coprocessors (GPUs, many cores)

Acknowledgment

• Geant Vector/Coprocessor R&D Team: http://geant.cern.ch

– John Apostolakis, Rene Brun, Federico Carminati,Andrei Gheata, Mihaly Novak, Sandro Wenzel (CERN)

– Philippe Canal, Victor Daniel Elvira, Soon Yung Jun,Guilherme Lima (Fermilab, US)

– Marilena Bandieramonte (Universita e INFN, IT)– Georgios Bitzes (University of Athens, GR)– Johannes Christof De Fine Licht (University of Copenhagen, DK)– Laurent Duhem (Intel)– Raman Sehgal (Bhabha Atomic Research Centre, IN)– Oksana Shadura (National Technical Univ. of Ukraine)

• Members of the US ASCR-HEP Collaboration

– Azamat Mametjanov (ANL, US)– Boyana Norris (Univ. of Oregon, US)

S. Y. Jun @GPU in HEP 9/12/2014 21

Page 22: Sampling Secondary Particles in High Energy Physics ......S. Y. Jun @GPU in HEP 9/12/2014 2 Simulation with Coprocessors • Optimal algorithms for coprocessors (GPUs, many cores)

Backup

S. Y. Jun @GPU in HEP 9/12/2014 22

Page 23: Sampling Secondary Particles in High Energy Physics ......S. Y. Jun @GPU in HEP 9/12/2014 2 Simulation with Coprocessors • Optimal algorithms for coprocessors (GPUs, many cores)

Composition and Rejection Method

• Combination of the “composition” and rejection methods for the case that

– f(x) is a complicated function– acceptance and rejection is inefficient

• Widely used in Geant4 to sample secondary particles for a given interaction

– to sample x from the distribution f(x) which can be written as

f(x) =

nX

i=1

αifi(x)gi(x)

– where αi > 0 (fraction), fi(x) (PDFs), 0 ≤ gi(x) ≤ 1 (rejection functions)– select i based on αi, try x′ from fi(x) and calculate gi(x

′)– x = x′ if gi(x

′) < u (random u[0, 1)), otherwise try again

• Very powerful method for sampling x based on the distribution f(x) if

– fi/gi can be sampled/evaluated easily– f(x) may not be normalized and the mean number of trials is∑

i αi/C where∫

f(x)dx = C ≈ 1

S. Y. Jun @GPU in HEP 9/12/2014 23

Page 24: Sampling Secondary Particles in High Energy Physics ......S. Y. Jun @GPU in HEP 9/12/2014 2 Simulation with Coprocessors • Optimal algorithms for coprocessors (GPUs, many cores)

Klein-Nishina Compton Scattering

• Klein-Nishina Differential Cross Section: γ(E, 0) → γ′(E′, sin θ)

dǫ=

Xonπr2ome

E2[1

ǫ+ ǫ][1 −

ǫ sin2 θ

1 + ǫ2]

ǫ =E′

E=

me

me + E(1 − cos θ)

• Composition and Rejection (Geant4)

f(ǫ) = [1

ǫ+ ǫ] =

2X

i=1

αifi(ǫ)

g(ǫ) = [1 −ǫ sin2 θ

1 + ǫ2]

decompose f(ǫ) with ǫo = 1/(1 + 2E/me) for the minimum γ energy as

α1 × f1(ǫ) = ln(1/ǫo) ×1

ǫ ln(1/ǫo), α2f2(ǫ) =

1

2(1 − ǫ

2o) ×

(1 − ǫ2o)

S. Y. Jun @GPU in HEP 9/12/2014 24

Page 25: Sampling Secondary Particles in High Energy Physics ......S. Y. Jun @GPU in HEP 9/12/2014 2 Simulation with Coprocessors • Optimal algorithms for coprocessors (GPUs, many cores)

Inverse Probability Distribution (Technical Details)

• Building Inverse PDF, {F−1(u)[k] | k = 0, N}

– find the bin index jk for F−1(u)[k] searching F (x)[k] until ∆u × k < F (x)[j]

– build F−1(u)[k] = ∆x[(jk − 1) + δj] taking into account the spread in[jk − 1, jk] (linear interpolation for the same j over k)

//linear interpolation within the interval with the same index

unsigned int last, cnt, lcnt;

last = cnt = lcnt = 0;

for(int i = 0; i < n ; ++i) {

if(ip[i] > last) {

for(int j = 0 ; j < (cnt-lcnt) ; ++j ) {

q[j+lcnt] = dy*(last + (1.0*j/(cnt-lcnt)));

}

last = ip[i];

lcnt = cnt;

}

++cnt;

}

//for the last bin

q[n-1] = 1.0;

S. Y. Jun @GPU in HEP 9/12/2014 25

Page 26: Sampling Secondary Particles in High Energy Physics ......S. Y. Jun @GPU in HEP 9/12/2014 2 Simulation with Coprocessors • Optimal algorithms for coprocessors (GPUs, many cores)

Alias Method (Technical Details)

• Building the alias table (a[n]) and the non-alias probability (q[n])

– build the discrete probability density function, p[k] = f(xk) with xk = k∆x

– calculate the average likelihood per equal probable event;

c =1

n − 1

– pick a pair (i, j) =(recipient,donor) such that non-zero p[i] ≤ c and p[j] > c– evaluate alias and non-alias probability for recipient

a[recipient] = donor;

q[recipient] = n*p[recipient];

– update pdfp[donor] = p[donor] - (c-p[recipient]);

p[recipient] = 0.0;

– repeat this for O(n) iterations

S. Y. Jun @GPU in HEP 9/12/2014 26

Page 27: Sampling Secondary Particles in High Energy Physics ......S. Y. Jun @GPU in HEP 9/12/2014 2 Simulation with Coprocessors • Optimal algorithms for coprocessors (GPUs, many cores)

Performance: Time

Sampling Method

InverseCDF InverseCDF2 Alias Alias2 Rejection

Mea

n T

ime/

Eve

nt [m

sec]

-110

1

10

210

Time: KleinNishinaCPU (+GST)CPUGPU (+GST)GPU

Time: KleinNishina

Sampling Method

InverseCDF InverseCDF2 Alias Alias2 Rejection

Mea

n T

ime/

Eve

nt [m

sec]

-110

1

10

210

310

Time: SeltzerBergerCPU (+GST)CPUGPU (+GST)GPU

Time: SeltzerBerger

Sampling Method

InverseCDF InverseCDF2 Alias Alias2 Rejection

Mea

n T

ime/

Eve

nt [m

sec]

-110

1

10

210

Time: MollerCPU (+GST)CPUGPU (+GST)GPU

Time: Moller

Sampling Method

InverseCDF InverseCDF2 Alias Alias2 Rejection

Mea

n T

ime/

Eve

nt [m

sec]

-110

1

10

210

Time: BhabhaCPU (+GST)CPUGPU (+GST)GPU

Time: Bhabha

S. Y. Jun @GPU in HEP 9/12/2014 27