generation of planar radiographs from 3d anatomical models using the gpu

Post on 22-Nov-2014

957 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

Generation of planar radiographs from 3Danatomical models using the GPU

André dos Santos CardosoSupervisor: Jorge M. G. Barbosa

University of PortoFaculty of Engineering of University of Porto

11th February, 2011

André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 1/271/27

Contents

Introduction and Context

CUDA Platform

Input Data

Pre-Processing Steps

Developed Algorithms

Conclusion

André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 2/272/27

Introduction and Context

CUDA Platform

Input Data

Pre-Processing Steps

Developed Algorithms

Conclusion

André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 2/272/27

DRRs

• Digitally Reconstructed Radiographs – DRRs• Artificial Radiographs taken from vertebrae models

Figure: L3 Vertebra, frontal DRR Figure: L3 Vertebra, lateral DRR

André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 3/273/27

DRRs – Why?

• Shape recovery of human spine◦ 100s of DRRs per second

• Scoliosis Evaluation◦ Alternative to MRIs and CTs

André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 4/274/27

Project’s Objective

Build Fast DRR Algorithms• Common bottleneck!◦ Applications in medical area – high throughputs are demanded

• Take advantage new GPUs and APIs◦ Common workstations could do the job!

André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 5/275/27

Existing Solution – GLSL

• GLSL implementation – multi-pass working solution• Depth Peeling Based – Cass Everitt, InteractiveOrder-Independent Transparency

• Let’s try to enhance its performance!!

André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 6/276/27

Algorithm Concepts

P4

P3

P2

P1

Object

X-ray source

Image Plane

Problem!Potential Artifact Generation!

Object

• Each ray traverses the object◦ Energy is attenuatedPixelColor = exp ((||P2 − P1||+ ||P4 − P3||)× AttenuationFactor)

• Common edges may lead to artifact generation!André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 7/27

7/27

Introduction and Context

CUDA Platform

Input Data

Pre-Processing Steps

Developed Algorithms

Conclusion

André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 7/277/27

CUDA Platform

• Compute Unified Device Architecture◦ Parallel Computing Architecture◦ Exposes GPU functions and memory◦ SIMT execution model◦ Allows hierarchical configuration of

threads

• Cheap threads, dozens/hundreds of cores◦ Thousands of concurrent threads!

• GeForce GT 240◦ 96 cores◦ 12288 active threads

André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 8/278/27

CUDA Platform – Threading and Memory

André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 9/279/27

Introduction and Context

CUDA Platform

Input Data

Pre-Processing Steps

Developed Algorithms

Conclusion

André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 9/279/27

Inputs for Our Algorithms

• Geometry file – thevertebrae models

André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 10/2710/27

Inputs for Our Algorithms

• Geometry file – thevertebrae models

André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 10/2710/27

Inputs for Our Algorithms

• Camera Calibration Matrix

André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 10/2710/27

Inputs for Our Algorithms

• Camera Calibration Matrix

Figure: Pinhole Model

C =

αu λ u00 αv v00 0 1

P =

f 0 0 00 f 0 00 0 1 0

K =

[R t0T

3 1

]

s

uv1

= C.P.K.

XYZ1

André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 10/27

10/27

Introduction and Context

CUDA Platform

Input Data

Pre-Processing Steps

Developed Algorithms

Conclusion

André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 10/2710/27

Pre-Processing Steps

1. 2D Bounding Box

2. (Projection Source)3. Ray Direction

(for each pixel)

◦ R(t) = O + tD

André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 11/2711/27

Pre-Processing Steps

1. 2D Bounding Box

2. (Projection Source)3. Ray Direction

(for each pixel)

◦ R(t) = O + tD

André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 11/2711/27

Pre-Processing Steps

1. 2D Bounding Box2. (Projection Source)

3. Ray Direction(for each pixel)

◦ R(t) = O + tD

André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 11/2711/27

Pre-Processing Steps

1. 2D Bounding Box2. (Projection Source)

3. Ray Direction(for each pixel)

◦ R(t) = O + tD

André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 11/2711/27

Pre-Processing Steps

1. 2D Bounding Box2. (Projection Source)3. Ray Direction

(for each pixel)

◦ R(t) = O + tD

André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 11/2711/27

Pre-Processing Steps

1. 2D Bounding Box2. (Projection Source)3. Ray Direction

(for each pixel)

◦ R(t) = O + tD

André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 11/2711/27

Introduction and Context

CUDA Platform

Input Data

Pre-Processing Steps

Developed Algorithms

Conclusion

André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 11/2711/27

Image Order Approach

• Ray Casting!

1 Thread for Each Pixel• Thread ⇐⇒ Ray• Thread loops over ALL triangles◦ Tests intersections between ray and

triangle◦ Acumulates distances to source

along ray path

André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 12/2712/27

Image Order Approach – Problems

1. Many threads loopingover many triangles

2. Useless intersectiontests – heavyoperations!

3. Artifacts – hard to takecare of!

L3 Vertebra Model• 776 vertices, 1552 triangles• PA perspective: 266 × 138 pixels =36708 threads

André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 13/2713/27

Image Order Approach – Problems

1. Many threads loopingover many triangles

2. Useless intersectiontests – heavyoperations!

3. Artifacts – hard to takecare of!

André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 13/2713/27

Image Order Approach – Problems

1. Many threads loopingover many triangles

2. Useless intersectiontests – heavyoperations!

3. Artifacts – hard to takecare of!

André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 13/2713/27

Image Order Approach – Results

• L3 vertebra model• PA camera – 265 × 137pixels

• GPU time only!• Incomplete implementation

SLOW!

André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 14/2714/27

Object Order Approach

• Ray Casting!• Threads spanned foreach triangle◦ Reverse the approach

of the formeralgorithm!

1 Thread for Each Triangle• Thread loops over each pixel coveredby the triangle bounding box◦ Tests intersections between ray and

triangle◦ Acumulates distances to source

along ray path• Concurrency problems!

André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 15/2715/27

Object Order Approach – Problems

1. Concurrency problems onpixel data.◦ Fang Liu et al, FreePipe:

a programmable parallelrendering architecture forefficient multi-fragmenteffects

2. Still many intersectiontests

3. Artifacts still hard to avoidor correct

int index = atomicInc(sharedCounter);

Pixe

l Bu�

er

Concurrent Threads

André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 16/2716/27

Object Order Approach – Problems

1. Concurrency problems onpixel data.

2. Still many intersectiontests

3. Artifacts still hard to avoidor correct

André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 16/2716/27

Object Order Approach – Problems

1. Concurrency problems onpixel data.

2. Still many intersectiontests

3. Artifacts still hard to avoidor correct

André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 16/2716/27

Object Order Approach – Results

• L3 vertebra model• PA camera – 265 × 137pixels

• GPU time only!• Incomplete implementation

SLOW!

André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 17/2717/27

Multi-depth Approach - Principle

Assume a Simplification• Discard the Euclidean distance between intersections!• Consider only distance between Fragments, along depth axis!!

Source

P1

P2

P’2P’1

d1

d2

André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 18/2718/27

Multi-depth Approach - Pipeline

• Rasterization done using Scanline+Bresenham algorithm◦ Filling convention avoids artifacts :) !

• Interpolation in Integer interval◦ Depth = Z−Zmin

Zmax−Zmin × INT_MAX

• Saving depth in pixel array, raises concurrency problems (again)!

André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 19/2719/27

Multi-depth Approach - Depth arrayOrdering

atomicMin inserts in right place1: initializeDepthArrays(MAX_INTEGER)2: Znew ← interpolateDepth()3: for i = 0 to DEPTH_ARRAY_SIZE − 1 do4: Zold ← atomicMin(&(getPixelDepthArray(u, v , i)),Znew)5: if Zold == MAX_INTEGER then6: break7: end if8: Znew ← fmaxf (Znew ,Zold)9: end for

• Fang Liu et al, FreePipe: a programmable parallel renderingarchitecture for efficient multi-fragment effects

André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 20/2720/27

Multi-depth Approach - Results• Best time:◦ 202 × 132 pixels◦ GPU + CPU time!

◦ Performance With andWithout DRR transfer tohost!

BETTER!André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 21/27

21/27

Multi-depth Optimization

• Multi-depth allows for an ordered set of depths◦ More depths =⇒ more atomicMin() calls

We can postpone depth Ordering...1: index← atomicInc(&counter, INT_MAX)2: depthArray [index ]← Znew // RAW-hazard free!!!!

• depthArray has all the depth values;◦ Ordering can be done on a post-processing kernel!!!

André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 22/2722/27

Multi-depth Optimization

int index = atomicInc(sharedCounter);Pi

xel B

u�er

Concurrent Threads

André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 22/2722/27

Multi-depth Optimization – Results

• A-buffer Scheme Versus GLSL Solution• 202 × 132 pixels

André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 23/2723/27

Multi-depth Optimization – Results

Better than Current Solution

André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 23/2723/27

Introduction and Context

CUDA Platform

Input Data

Pre-Processing Steps

Developed Algorithms

Conclusion

André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 23/2723/27

Conclusion

• CUDA implementations for DRR extraction◦ Both pre-processing and main computation tasks◦ Artifact-free

• Single geometry pass• Shared memory model◦ May be adapted to other technologies

• Final implementation shows better performance than GLSL

André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 24/2724/27

Future Work

There’s a Big Chart to Fill Up...

André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 25/2725/27

Future Work

• Still some artifacts• Memory operations optimizations• Comparisons with other implementations, other geometrymodels

• Build a DRR generation library◦ possibly an open-source project

• Participation in IJUP’11 • Paper preparation forVIPIMAGE 2011. AbstractDeadline: 15th March.

André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 26/2726/27

Thank You for Listening!Ask Away!

André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27/2727/27

top related