application of high performance computing to situation awareness simulations amit majumdar group...

Post on 28-Mar-2015

215 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Application of High Performance Computing to Situation Awareness Simulations

Amit Majumdar

Group Leader, Scientific Computing, San Diego Supercomputer CenterAssociate Professor, Dept of Radiation Oncology

University of California San Diego

Application of High Performance Computing to Near-Real Time Simulations

Outline

Academic High Performance Computing

Applications

Event-driven Science

Online Adaptive Cancer Radiotherapy

Dynamic Data Driven Image-guided Neurosurgery

Summary

2

Academic High Performance Computing

3

TeraGrid

NSF – National Science Foundation funds TeraGrid

TeraGrid – NSF funded supercomputer centers in US – high BW connection

Teraflop (TF) – 1012 floating point operations/sec

to

Petaflop (PF) – 1015 floating point operations/sec

range HPC machines

11 Resource Providers, One Facility

NSF - TeraGrid

TeraGrid is a facility that integrates computational, information, and analysis resources at the San Diego Supercomputer Center, the Texas Advanced Computing Center, the University of Chicago / Argonne National Laboratory, the National Center for Supercomputing Applications, Purdue University, Indiana University, Oak Ridge National Laboratory, the Pittsburgh Supercomputing Center, LSU, and the National Center for Atmospheric Research.

SDSCTACC

UC/ANL

NCSA

ORNL

PUIU

PSCNCAR

CaltechUSC-ISI

UtahIowa

Cornell

Buffalo

UNC-RENCI

Wisc

LSU

5 top Top500 HPC Machines

Top 5 November, 2009

Top 5 November, 2008

NSF HPC Perspective – Tflop - Pflop Track2 awards:

Two plus one – 3 awards Track2-A/B: 30M$ for machine plus ~8-10M$/year

operating cost - ~500 TF – 1PF range (peak)―Ranger at TACC, U Texas (579 TF, ~62K cores)―Kraken at NICS, ORNL (1 PF, ~99K cores)

Track2-D: Three different machines : Data intensive, Experimental, Grid research

Other awards for Visualization and Data systems Track1 award:

One award – ~200M$ Multi PF system with sustained PF performance on scientific

applications

Event-drive Science

8

On-demand Earthquake-induced Ground Wave Simulation

http://shakemovie.caltech.edu/

Prof Jeroen Tromp (at Caltech when we collaborated, currently at Princeton)

Caltech’s near real time simulation of southern California seismic events using SPECFEM3D software

Simulates SoCal seismic wave propagation based upon spectral element method (SEM) – a parallel MPI code

The movies illustrate the up (red) and down (blue) velocity of Earth’s surface

9

Events

Every time an earthquake of magnitude > 3.5 occurs in SoCal, 1000s of seismograms record at 100s of seismic stations epicenter, depth, intensity

Automatically collect these seismic recordings from the SCSN via internet

Subsequently simulate the seismic waves generated by the earthquake in a 3-D southern CA seismic velocity model using SCSN data

After full 3-D wave simulation collect the surface motion data (disp, vel, accl) and map on top of the topography

Render the data and generate movies Earthquake movies approved by a geophysicist at Caltech Movies are published – within ~45 mins of earthquake

10

On-demand HPC

Earthquake can happen anytime On-demand HPC resources needed for fast simulation Code uses 144 cores (Intel Woodcrest dual-socket dual-

core, 2.3 Ghz nodes) to complete simulations in about 20 mins

HPC resources setup at SDSC – called Ondemand HPC This has special queue where Caltech shakemovie jobs

can come in anytime automatically Batch software will kill other jobs to guarantee this job

gets resources Results sent back to Caltech – all with no human

intervention

11

Shake Movies

Implications

Emergency preparedness/response Tsunami warning

Work is being extended to do global simulation

Event: Sun Apr 11, 2010, 16:42:07; Lat:32.5285: Long: -115:3433

12

10602453-socalorange-small.mpg 10602453-laorange-small.mpg

Online Adaptive Cancer Therapy

13

http://radonc.ucsd.edu/Research/CART

14

Conventional Radiotherapy

Treatment simulation Build a virtual patient model

Treatment planning Perform virtual treatment using virtual machine on virtual patient

Treatment delivery Same treatment is repeated for many fractions Basic assumption: human body is a static system

Simulation Planning

Days Days

Treatment

Repeat

15

Human Body Is A Dynamic SystemWeek 1

Tumor

Week 3

Van de Bunt et al. ‘06

Tumor volume shrinkage in response to the treatment Tumor shape deformation due to filling state change of neighboring organs Relative position change between tumor and normal organs

Consequence of Patient Anatomical Variation

16

An optimal treatment plan may become less optimal or not optimal at all Dose to tumor ↓ Dose to normal tissues ↑

Dose to tumor ↓ → Tumor control ↓

Dose to normal tissues ↑ → Toxicity ↑

Toxicity ↑ → Prescribed tumor dose ↓ → Tumor control ↓

Solution

Develop a new treatment plan that is optimal to

patient’s new geometry

Adaptive radiation therapy (ART)

17

18

Simulation Planning

Days Days 5-8 min

On-board Imaging Re-planning Treatment

Repeat

Online ART

On-board volumetric imaging has recently become available

Major technical obstacle for clinical realization of online ART

Real-time re-planning

Imaging dose

Clinical workflow

Our Solution to Real-time Re planning Problem

Development of GPU-based computational tools

19

SCORE: Supercomputing On-line Re-planning Environment

Project Goal

To develop real-time re-planning tools based on GPUs

Funded by a UC Lab Research Grant

A collaboration with SDSC and Lawrence Livermore

National Laboratory

20

Online Re-planning Process

21

Deformable Image Regis

Dose Calculation

PlanRe-optimization

Deformed pCT and Contours

Dose Deposition

Coefficients

Planning CT w/ Contours

Beam Setup

Dose Distribution

Initial Plan

New Plan

Treatment Planning System

CBCT Reconstruction

Development of GPU-based Real-time Deformable Image Registration

22

Gu et al Phys Med Biol 55(1): 207-219, 2010

Deformable Image Registration

23

Morphing one image into another with correct correspondence

revbrad_0001.wmv

Deformable Image Registration with ‘Demons’

2424

Moving Image Im(r)

Updating nnn drrr 1

End

Static Image

Moving vector ndr

Gradient )( nsI r Gradient )( n

mI r

Compare )(),( 11 nm

ns II rr

CPU GPU

GPU CPU

)( nsI r

)( nmI r

Passive Force?

Active Force?

Stopping Criteria

Start

Yes

Yes Yes

No No

No

Gu et al Phys Med Biol 55(1): 207-219, 2010

Results for GPU-based Demons Algorithms

Method Case 1 Case 2 Case 3 Case 4 Case 5 Average

PF 1.11/6.80 1.04 /7.18 1.36/7.39 2.51/6.49 1.84/7.24 1.57/7.02

ePF 1.10/6.82 1.00/7.20 1.32/7.42 2.42/6.56 1.82/7.08 1.53/7.02

AF 1.15/8.29 1.05/9.24 1.39/8.79 2.34/7.75 1.81/8.44 1.55/8.50

DF 1.19/7.71 1.16/8.65 1.48/8.02 2.59/8.30 1.91/8.44 1.66/8.22

aDF 1.11/8.36 1.02/8.69 1.35/8.97 2.27/7.77 1.80/8.70 1.51/8.50

IC 1.24/11.07 1.28/11.47 1.42/11.54 3.27/10.46 1.67/10.98 1.78/11.10

25

3D spatial error (mm) / GPU time (s), image size 256×256×100~100x speedup compared to an Intel Xeon 2.27 GHz CPU

Development of GPU-based Real-time Dose Calculation

26

Gu et al Phys Med Biol 54(20) 6287-97, 2009Jia et al Phys Med Biol 2010 (in print)

Finite-size Pencil Beam (FSPB) Model

27

d

zb

ferf

d

zb

ferf

d

xa

ferf

d

xa

ferf

dAzdxD

ii

iii

iEFSPB

i

2

'2

2

'2

2

'2

2

'2

4

)(),,(

3

1

Results for GPU-based FSPB Algorithm

28

Voxel size (cm3) Beamlet size (cm2)

# Voxels ( 106 )

# Beamlets

CPU Time (sec)

GPU Time (sec)

Speedup

0.50x0.50x0.50 0.20x0.20 0.22 2500 21.22 0.06 373

0.37x0.37x0.37 0.20x0.20 0.51 2500 42.80 0.10 409

0.30x0..30x0.30 0.20x0.20 1.00 2500 78.27 0.18 419

0.25x0.25x0.25 0.20x0.20 1.73 2500 124.54 0.30 421

0.25x0.25x0.25 0.25x0.25 1.73 1600 120.14 0.29 415

0.25x0.25x0.25 0.33x0.33 1.73 900 112.78 0.27 416

0.25x0.25x0.25 0.50x0.50 1.73 400 100.77 0.24 417

~400x speedup compared to an Intel Xeon 2.27 GHz CPU< 1 sec for a 9-field prostate IMRT plan

Monte Carlo Dose Calculation on GPU

Directly map DPM code on GPU Treat a GPU card as a CPU cluster

29

Start

Transfer data to GPU including random # seeds, cross sections, and pre-generated e- tracks etc.

a). Clean local counterb). Simulate one MC history on thread #1

c). Put dose to global counter

Reach a preset # of histories ?

End

……

No

Yes

Transfer data from GPU to CPU

a). Clean local counterb). Simulate one MC history on thread #1

c). Put dose to global counter

a). Clean local counterb). Simulate one MC history on thread #1

c). Put dose to global counter

Results for GPU-based MC Dose Calculation

30

Case #

Sourcetype

# of Histories

Stan DevCPU(%)

Stan DevGPU (%)

TCPU

(min)

TGPU

(min)TCPU/TGPU

1 Electron 107 0.66 0.65 8.3 1.8 4.5

2 Photon 109 0.41 0.41 94 17 5.5

~5x speedup compared to an Intel Xeon 2.27 GHz CPU< 3 min for 1% sigma for photon beams

Development of GPU-based Real-time Plan Re-optimization

31

Men et al Phys Med Biol 54(21):6565-6573, 2009 Men et al Phys Med Biol 2010 (under review)Men et al Med Phys 2010 (to be submitted)

Results of Real-time Re-planning We have developed GPU-based computational

tools for real-time treatment re-planning

For a typical 9-field prostate case―The deformable registration can be done in 7 seconds

―The dose calculation takes less than 2 seconds

―The plan re-optimization takes less than 1 second (FMO), 2 seconds (DAP), or 30 seconds (VMAT)

A new plan can be developed in about 10-40 seconds

Online ART may substantially improve local tumor control while reducing normal tissue complications

Tools can be used to solve other radiotherapy problems

32

Dynamic Data Driven Image-guided Neurosurgery

A Majumdar1, A Birnbaum1, D Choi1, A Trivedi2, S. K. Warfield3, K. Baldridge1, and Petr Krysl2

1 San Diego Supercomputer Center University of California San Diego

2 Structural Engineering Dept University of California San Diego

3 Computational Radiology Lab Brigham and Women’s Hospital

Harvard Medical School

Grants: NSF: ITR 0427183,0426558; NIH:P41 RR13218, P01 CA67165, LM0078651, I3 grant (IBM)

33

Neurosurgery Challenge

Challenges : Remove as much tumor tissue as possible Minimize the removal of healthy tissue Avoid the disruption of critical anatomical structures Know when to stop the resection process

Compounded by the intra-operative brain shape deformation that happens as a result of the surgical process – preoperative plan diminishes

Important to be able to quantify and correct for these deformations while surgery is in progress by dynamically updating pre-operative images in a way that allows surgeons to react to these changing conditions

The simulation pipeline must meet the real-time constraints of neurosurgery – provide images approx. once/hour within few minutes during surgery lasting 6 to 8 hours

Intraoperative MRI Scanner at BWH

Brain Shape Deformation

Before surgery After surgery

Example of visualization: Intra-op Brain Tumor with Pre-op fMRI

Overall Process

Before image guided neurosurgery

During image guided neurosurgery

Segmentation and Visualization

Preoperative Planning ofSurgical Trajectory

Preoperative

Data Acquisition

Preoperative data

Intraoperative MRISegmentation Registration

Surfacematching

Solve biomechanicalModel for volumetricdeformation

Visualization Surgicalprocess

Timing During Surgery

Time (min)

Before surgery

During surgery

0 10 20 30 40

Preopsegmentation

Intraop MRISegmentationRegistration

Surfacedisplacement

Biomech simulationVisualization

Surgical progress

Current Prototype DDDAS Inside Hospital

Pre and Intra-op 3D MRI (once/hr)

Local computer

at BWHCrude linear elastic FEM solution

Merge pre and intra-op viz

Intr

a-op

sur

gica

l de

cisi

on a

nd s

teer

Segmentation, Registration, Surface Matching for BC

Once every hour or twofor a 6 or 8 hour surgery

Two Research Aspects

Grid Architecture – grid scheduling, on demand remote access to multi-teraflop machines, data transfer Data transfer from BWH to SDSC, solution of detail

advanced biomechanical model, transfer of results back to BWH for visualization need to be performed in a few minutes

Development of detailed advanced non-linear scalable viscoelastic biomechanical model To capture detail intraoperative brain deformation

End-to-end Timing of RTBM

• Timing of transferring ~20 MB files from BWH to SDSC, running simulations on 16 nodes (32 procs), transferring files back to BWH = 9* + (60** + 7***) + 50* = 124 sec.

• This shows that the grid infrastructure can provide biomechanical brain deformation simulation solutions (using the linear elastic model) to surgery rooms at BWH within ~ 2 mins using TG machines

• This satisfies the tight time constraint set by the neurosurgeons

Current and New Biomechanical Model

Current linear elastic material model – RTBMAdvanced model under development - FAMULSAdvanced model is based on conforming adaptive

refinement method – FAMULS package (AMR)Inspired by the theory of wavelets this refinement

produces globally compatible meshes by construction

First task was to replicate the linear elastic result produced by the RTBM code using FAMULS

Advanced Biomechanical Model

The current solver is based on small strain isotropic elastic principle

The new biomechanical model will be inhomogeneous scalable non-linear viscoelastic model with AMR

We also want to increase resolution close to the level of MRI voxels i.e. millions of FEM meshes

Since this complex model still has to meet the real time constraint of neurosurgery it requires fast access to remote multi-tflop systems

Summary

HPC resources can enable near real time simulations for various scientific, engineering, and medical applications

The architecture has to plan what are the right HPC resources how to access the HPC resources deal with data transfer etc.

Overall this can facilitate Natural or man-made event-driven rapid response and

preparedness Adaptive simulations to provide new capability Dynamic data driven simulations to enhance quality

45

top related