breaking through the barriers to gpu accelerated monte ......operated by los alamos national...

29
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA Jeremy Sweezy Scientist Monte Carlo Methods, Codes and Applications Group 3/28/2018 LA-UR-18-XXXX Breaking Through the Barriers to GPU Accelerated Monte Carlo Particle Transport GTC 2018

Upload: others

Post on 20-Jan-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Breaking Through the Barriers to GPU Accelerated Monte ......Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA Jeremy Sweezy Scientist Monte Carlo

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

Jeremy SweezyScientist

Monte Carlo Methods, Codes and Applications Group

3/28/2018

LA-UR-18-XXXX

Breaking Through the Barriers to GPU Accelerated Monte Carlo Particle Transport

GTC 2018

Page 2: Breaking Through the Barriers to GPU Accelerated Monte ......Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA Jeremy Sweezy Scientist Monte Carlo

What is Monte Carlo Particle Transport?

3/23/18 | 2Los Alamos National Laboratory

– Follows the path of individual particles through a system– Uses pseudo-random numbers to sample processes– Randomly sample physical and non-physical processes– Attributed to Stanislaw Ulam and

Enrico Fermi– Named because Ulam had an

uncle who who would borrow money from relatives because he “just had to go to Monte Carlo”

FERMIAC

Page 3: Breaking Through the Barriers to GPU Accelerated Monte ......Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA Jeremy Sweezy Scientist Monte Carlo

Porting to Specialized Hardware is Prohibitively Expensive

3/23/18 | 3Los Alamos National Laboratory

–The world’s production Monte Carlo codes have decades of development–LANL’s MCNP code has been in development since 1977–Equally extensive amount of V&V effort–Codes have to run on desktop machines and super-computers–DOE HPC platforms have been in a state of flux for the last 10-years

• Cell Broadband Engine • Intel Xeon Phi (MIC)• GPUs• ARM???

Barrier #1: Limited Resources (Money, People, Time)

Page 4: Breaking Through the Barriers to GPU Accelerated Monte ......Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA Jeremy Sweezy Scientist Monte Carlo

Monte Carlo Random Walk on GPU Hardware has reached a Performance Wall

3/25/18 | 4Los Alamos National Laboratory

• A least 6 different research groups have ported the Monte Carlo random walk to GPU hardware for neutron transport

• All report results against different numbers of CPUs• All get the same results!• Almost all are extremely simplified• Production codes will likely have

worse performance.• What are the limitations?

– Conditional branching– Random data access– No small computational intensive kernel

to accelerate

Barrier #2: Performance of random walk on GPUs

4.5x

3.0x

Page 5: Breaking Through the Barriers to GPU Accelerated Monte ......Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA Jeremy Sweezy Scientist Monte Carlo

How do You Define Performance?

3/23/18 | 5Los Alamos National Laboratory

• A computer scientist might measure performance as an increase in speed.

𝑷 =𝑻𝑪𝑷𝑼𝑻𝑮𝑷𝑼

• A Monte Carlo specialist would measure performance as an balance between speed and statistical variance using a Figure-of-Merit

To date, almost all GPU implementations of Monte Carlo particle transport of have focused on increasing speed.

𝑬𝒙𝒂𝒎𝒑𝒍𝒆: 𝑭𝑶𝑴 =𝟎. 𝟏𝟐 7 𝟏min𝟎. 𝟎𝟓𝟐 7 𝟐min = 𝟐

𝑭𝑶𝑴 =𝝈𝑪𝑷𝑼𝟐 𝑻𝑪𝑷𝑼𝝈𝑮𝑷𝑼𝟐 𝑻𝑮𝑷𝑼

Page 6: Breaking Through the Barriers to GPU Accelerated Monte ......Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA Jeremy Sweezy Scientist Monte Carlo

Next Event Estimator

3/23/18 | 6Los Alamos National Laboratory

• Next-event estimator calculates the probability of a particle from a source or collision event reaches a point without interaction

• Typically used for image tallies

A

Cell 1

Cell 2

μ

Image Plane

B

𝑺 𝑹, 𝑬 =𝒘

𝟐𝝅𝑹𝟐 ×

C𝝈𝒊 𝑹, 𝑬𝝈𝑻

𝒑𝒊 𝝁, 𝑬 → 𝑬G exp(−M 𝚺𝑻 𝒔, 𝑬G 𝒅𝒔𝑹

𝟎)

𝑵

𝒊S𝟏Ray-cast

One to two orders of magnitude faster on GPU hardware

Page 7: Breaking Through the Barriers to GPU Accelerated Monte ......Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA Jeremy Sweezy Scientist Monte Carlo

Traditional Track-Length Estimator

3/25/18 | 7Los Alamos National Laboratory

• The standard Monte Carlo fluence estimator• Uses the sampled distance in each cell as fluence estimator• Only contributes to cells through which the particle passes • Easy to compute• Nothing to accelerate on GPU

Cell 1

B

Cell 2

Cell 3

Computing has changed, we need to change our algorithms too!

Page 8: Breaking Through the Barriers to GPU Accelerated Monte ......Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA Jeremy Sweezy Scientist Monte Carlo

Volumetric-Ray-Casting Estimator

3/25/18 | 8Los Alamos National Laboratory

• For use in place of the traditional track-length estimator on GPU• Multiple pseudo-rays are generated at each source and collision event• Computational intensive estimator with lower variance

Cell 1

B

Cell 2

Cell 3

F 𝒊, 𝑬′ = 𝒘 𝟏UVWX U𝚺𝑻,𝒊 𝑬Y 𝒍𝒊𝑵𝚺𝑻,𝒊(𝑬Y)

exp −∫ 𝚺𝑻 𝒓 + 𝛀′𝒔′, 𝑬G 𝒅𝒔′𝒓YU𝒓𝟎

Ray-cast

A neutron dance for a neutron fan. P.M. Dawn

Page 9: Breaking Through the Barriers to GPU Accelerated Monte ......Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA Jeremy Sweezy Scientist Monte Carlo

MonteRay - Accelerating Monte Carlo Transport with GPU Ray Tracing

3/23/18 | 9Los Alamos National Laboratory

• MonteRay – A library for accelerating Monte Carlo tallies with GPU • Random walk is maintained on CPU• Ray casting based tallies are calculated on the GPU

–Next-Event estimator –Volumetric-Ray-Casting estimator, a new estimator designed for GPUs–Supports neutron and photon tallies

• Can be incorporated into new and legacy Monte Carlo codes• Uses continuous energy cross-section data• Single precision ray casting• Single precision attenuation cross-sections• Double precision tallies

Reduces cost of accelerating an existing Monte Carlo code with GPUs

Page 10: Breaking Through the Barriers to GPU Accelerated Monte ......Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA Jeremy Sweezy Scientist Monte Carlo

MonteRay - Testing

3/23/18 | 10Los Alamos National Laboratory

• Tests use:–GeForce GTX TitanX GPU with NVIDIA Maxwell architecture–2 CPUs (Intel Haswell E5-2660 v3 at 2.60 GHz), with 10 cores each

• MonteRay linked with LANL’s C++ Monte Carlo code MCATK• MCATK uses MPI parallelism building shared ray buffers using MPI-3

shared memory• 3-D Cartesian Structured Mesh Geometry• 2 tests measured performance of the Next-event estimator• 4 tests measured the performance of the Volumetric-ray-casting

estimator• Volumetric-ray-casting estimator performance on GPU compared to the

Track-length estimator performance on the CPU• Base performance measured as compared to 8 CPU cores

Page 11: Breaking Through the Barriers to GPU Accelerated Monte ......Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA Jeremy Sweezy Scientist Monte Carlo

Testing the Next-Event Estimator on GPU Hardware:Two Radiography Tests

3/23/18 | 11Los Alamos National Laboratory

Page 12: Breaking Through the Barriers to GPU Accelerated Monte ......Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA Jeremy Sweezy Scientist Monte Carlo

MonteRay – Medical X-Ray Imaging Simulation

3/23/18 | 12Los Alamos National Laboratory

• 50-keV X-ray beam• 0.12mm spot size• Radiograph used Next-Event Estimator• Simulation useful for designing collimator to minimize scattered contribution

Page 13: Breaking Through the Barriers to GPU Accelerated Monte ......Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA Jeremy Sweezy Scientist Monte Carlo

MonteRay – Medical X-Ray Imaging Simulation

3/23/18 | 13Los Alamos National Laboratory

• Source and Collided contribution calculated separately

• Source contribution relatively easy to calculate

• Collided contribution important for collimator design

• Collided performance 15-18x

14.5x 15.3x

Page 14: Breaking Through the Barriers to GPU Accelerated Monte ......Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA Jeremy Sweezy Scientist Monte Carlo

MonteRay – Industrial Radiography

3/23/18 | 14Los Alamos National Laboratory

• Simulated a physical test object used at Los Alamos’ Dual Axis Radiographic Hydrodynamic Test Facility

• Used 4-MeV mono-energetic X-ray beam• 100 x 100 image grid (10,000 estimators) to simulate image detector • Calculation of scatter component needed to design

collimators and experiment, but too computational expensive

I'm a peeping-tom techie with x-ray eyes – Patrick Lee MacDonald

Page 15: Breaking Through the Barriers to GPU Accelerated Monte ......Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA Jeremy Sweezy Scientist Monte Carlo

MonteRay – Industrial Radiography

3/23/18 | 15Los Alamos National Laboratory

10

100

0 5 10 15 20

Re

lative

Pe

rfo

rma

nce

Number of CPU Cores / GPU

SourceCollided

Collided calculation performance 15-32x!

GPU Performance vs Number of CPU Cores

28.5x24.2x

Page 16: Breaking Through the Barriers to GPU Accelerated Monte ......Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA Jeremy Sweezy Scientist Monte Carlo

Volumetric-Ray-Casting Estimator on GPU Hardware vs

Track-Length Estimator on CPU Hardware

3/23/18 | 16Los Alamos National Laboratory

Page 17: Breaking Through the Barriers to GPU Accelerated Monte ......Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA Jeremy Sweezy Scientist Monte Carlo

Cancer Treatment Simulation

3/23/18 | 17Los Alamos National Laboratory

• 2-MeV Photon beam ( peak of 6MV medical accelerator photon spectrum)• 1-cm beam radius

Tumor

2-MeV Photon Beam

What is the dose to healthy tissue?

GPU Performance vs 8 CPU Cores

14x performance improvement in healthy tissue

Page 18: Breaking Through the Barriers to GPU Accelerated Monte ......Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA Jeremy Sweezy Scientist Monte Carlo

Cancer Treatment Simulation

3/23/18 | 18Los Alamos National Laboratory

GPU Performance vs Number of CPU Cores in Healthy Tissue

Performance is 14x vs 8 CPU cores or 10x vs 12 CPU cores

14.3x

10.2x

Page 19: Breaking Through the Barriers to GPU Accelerated Monte ......Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA Jeremy Sweezy Scientist Monte Carlo

Pressured Water Reactor Assembly Simulation

3/23/18 | 19Los Alamos National Laboratory

• 16x16 Fuel Assembly• Performance 7.5x in the Control Rods, 5x in the fuel, and 4.5x in the coolant

GPU Performance vs 8 CPU Cores

Control Rod

Fuel Pin

Page 20: Breaking Through the Barriers to GPU Accelerated Monte ......Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA Jeremy Sweezy Scientist Monte Carlo

Pressured Water Reactor Assembly Simulation

3/23/18 | 20Los Alamos National Laboratory

GPU Performance vs Number of CPU Cores

Compared to 8 CPU cores performance in control rod 7.2x and 6.0x in the fuel

7.2x

5.4x6.0x

4.4x

Page 21: Breaking Through the Barriers to GPU Accelerated Monte ......Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA Jeremy Sweezy Scientist Monte Carlo

Criticality Accident Simulation

3/23/18 | 21Los Alamos National Laboratory

• Critical Uranium sphere in the corner of a concrete room• Concrete floor, walls, ceiling, and 4 concrete pillars

GPU Performance vs 8 CPU CoresUranium Sphere

Performance increase of 14-16x in the center of the room

Page 22: Breaking Through the Barriers to GPU Accelerated Monte ......Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA Jeremy Sweezy Scientist Monte Carlo

Criticality Accident Simulation – Smoother Fluence Estimate

3/23/18 | 22Los Alamos National Laboratory

Track-Length Estimator Volumetric-Ray-Casting Estimator

Page 23: Breaking Through the Barriers to GPU Accelerated Monte ......Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA Jeremy Sweezy Scientist Monte Carlo

Criticality Accident Simulation

3/23/18 | 23Los Alamos National Laboratory

GPU Performance vs Number of CPU Cores

Things are going great, and they’re only getting better – Patrick Lee MacDonald

15x

10.5x

Page 24: Breaking Through the Barriers to GPU Accelerated Monte ......Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA Jeremy Sweezy Scientist Monte Carlo

Reflected Godiva Criticality Experiment Simulation

3/23/18 | 24Los Alamos National Laboratory

• U-235 sphere reflected by water• Performance Improvement

–2.5x in the core–1.0x in the water

GPU Performance vs 8 CPU Cores

Page 25: Breaking Through the Barriers to GPU Accelerated Monte ......Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA Jeremy Sweezy Scientist Monte Carlo

Reflected Godiva Criticality Experiment Simulation

3/23/18 | 25Los Alamos National Laboratory

• Variance of the Volumetric-Ray-Casting estimator approaches that of the Track-Length estimator is strong scattering material.

1

1.5

2

2.5

3

3.5

4

4.5

1 4 8 12 16 20

Varia

nce

Rat

io ( σ T

L2 / σ2 VR

C )

Number of Samples per Collision (N)

Performance is limited by the estimator variance, not the GPU speed

Variance Ratio vs Num. Collisions

GPU Performance vs. Num. CPU Cores

2.2x

2.2x

Page 26: Breaking Through the Barriers to GPU Accelerated Monte ......Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA Jeremy Sweezy Scientist Monte Carlo

Conclusions

3/23/18 | 26Los Alamos National Laboratory

• MonteRay provides a low cost method of providing GPU accelerated Monte Carlo particle transport–Can be incorporated into legacy codes at low cost.–Works with standard variance reduction methods

• Performance improvements of MonteRay are significant:–Up to 32 times for the Next-event estimator as compared to 8 CPU cores–Up to 14 times for the Volumetric-ray-casting estimator as compared to the Track-Length

estimator on 8 CPU cores

MonteRay provides a method of breaking through the barriers of limited resources and limited performance

Page 27: Breaking Through the Barriers to GPU Accelerated Monte ......Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA Jeremy Sweezy Scientist Monte Carlo

Questions?Jeremy Sweezy

[email protected]

3/23/18 | 27Los Alamos National Laboratory

Page 28: Breaking Through the Barriers to GPU Accelerated Monte ......Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA Jeremy Sweezy Scientist Monte Carlo

Extra

3/23/18 | 28Los Alamos National Laboratory

Page 29: Breaking Through the Barriers to GPU Accelerated Monte ......Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA Jeremy Sweezy Scientist Monte Carlo

Uncertainty - Pressured Water Reactor Assembly Simulation

3/23/18 | 29Los Alamos National Laboratory

Volumetric-Ray-Casting EstimatorTrack-Length Estimator

600 sec., 8 CPU Cores and 1 GPU93 cycles, 40000 Particles/Cycle8 rays/collision

600 sec., 8 CPU Cores124 cycles, 40000 Particles/Cycle