generation of planar radiographs from 3d anatomical models using the gpu

Generation of planar radiographs from 3Danatomical models using the GPU

André dos Santos CardosoSupervisor: Jorge M. G. Barbosa

University of PortoFaculty of Engineering of University of Porto

11th February, 2011

André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 1/271/27

Contents

Introduction and Context

CUDA Platform

Input Data

Pre-Processing Steps

Developed Algorithms

Conclusion

CUDA Platform

Input Data

Conclusion

• Digitally Reconstructed Radiographs – DRRs• Artificial Radiographs taken from vertebrae models

Figure: L3 Vertebra, frontal DRR Figure: L3 Vertebra, lateral DRR

DRRs – Why?

• Shape recovery of human spine◦ 100s of DRRs per second

• Scoliosis Evaluation◦ Alternative to MRIs and CTs

Project’s Objective

Build Fast DRR Algorithms• Common bottleneck!◦ Applications in medical area – high throughputs are demanded

• Take advantage new GPUs and APIs◦ Common workstations could do the job!

Existing Solution – GLSL

• GLSL implementation – multi-pass working solution• Depth Peeling Based – Cass Everitt, InteractiveOrder-Independent Transparency

• Let’s try to enhance its performance!!

Algorithm Concepts

Object

X-ray source

Image Plane

Problem!Potential Artifact Generation!

Object

• Each ray traverses the object◦ Energy is attenuatedPixelColor = exp ((||P2 − P1||+ ||P4 − P3||)× AttenuationFactor)

• Common edges may lead to artifact generation!André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 7/27

CUDA Platform

Input Data

Conclusion

CUDA Platform

• Compute Unified Device Architecture◦ Parallel Computing Architecture◦ Exposes GPU functions and memory◦ SIMT execution model◦ Allows hierarchical configuration of

threads

• Cheap threads, dozens/hundreds of cores◦ Thousands of concurrent threads!

• GeForce GT 240◦ 96 cores◦ 12288 active threads

CUDA Platform – Threading and Memory

CUDA Platform

Input Data

Conclusion

Inputs for Our Algorithms

• Geometry file – thevertebrae models

• Camera Calibration Matrix

Figure: Pinhole Model

αu λ u00 αv v00 0 1

f 0 0 00 f 0 00 0 1 0

[R t0T

= C.P.K.

André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 10/27

CUDA Platform

Input Data

Conclusion

1. 2D Bounding Box

2. (Projection Source)3. Ray Direction

(for each pixel)

◦ R(t) = O + tD

1. 2D Bounding Box

2. (Projection Source)3. Ray Direction

(for each pixel)

◦ R(t) = O + tD

1. 2D Bounding Box2. (Projection Source)

3. Ray Direction(for each pixel)

◦ R(t) = O + tD

1. 2D Bounding Box2. (Projection Source)

3. Ray Direction(for each pixel)

◦ R(t) = O + tD

1. 2D Bounding Box2. (Projection Source)3. Ray Direction

(for each pixel)

◦ R(t) = O + tD

1. 2D Bounding Box2. (Projection Source)3. Ray Direction

(for each pixel)

◦ R(t) = O + tD

CUDA Platform

Input Data

Conclusion

Image Order Approach

• Ray Casting!

1 Thread for Each Pixel• Thread ⇐⇒ Ray• Thread loops over ALL triangles◦ Tests intersections between ray and

triangle◦ Acumulates distances to source

along ray path

Image Order Approach – Problems

1. Many threads loopingover many triangles

2. Useless intersectiontests – heavyoperations!

3. Artifacts – hard to takecare of!

L3 Vertebra Model• 776 vertices, 1552 triangles• PA perspective: 266 × 138 pixels =36708 threads

Image Order Approach – Results

• L3 vertebra model• PA camera – 265 × 137pixels

• GPU time only!• Incomplete implementation

Object Order Approach

• Ray Casting!• Threads spanned foreach triangle◦ Reverse the approach

of the formeralgorithm!

1 Thread for Each Triangle• Thread loops over each pixel coveredby the triangle bounding box◦ Tests intersections between ray and

triangle◦ Acumulates distances to source

along ray path• Concurrency problems!

Object Order Approach – Problems

1. Concurrency problems onpixel data.◦ Fang Liu et al, FreePipe:

a programmable parallelrendering architecture forefficient multi-fragmenteffects

2. Still many intersectiontests

3. Artifacts still hard to avoidor correct

int index = atomicInc(sharedCounter);

l Bu�

Concurrent Threads

1. Concurrency problems onpixel data.

Object Order Approach – Results

• L3 vertebra model• PA camera – 265 × 137pixels

• GPU time only!• Incomplete implementation

Multi-depth Approach - Principle

Assume a Simplification• Discard the Euclidean distance between intersections!• Consider only distance between Fragments, along depth axis!!

Source

P’2P’1

Multi-depth Approach - Pipeline

• Rasterization done using Scanline+Bresenham algorithm◦ Filling convention avoids artifacts :) !

• Interpolation in Integer interval◦ Depth = Z−Zmin

Zmax−Zmin × INT_MAX

• Saving depth in pixel array, raises concurrency problems (again)!

Multi-depth Approach - Depth arrayOrdering

atomicMin inserts in right place1: initializeDepthArrays(MAX_INTEGER)2: Znew ← interpolateDepth()3: for i = 0 to DEPTH_ARRAY_SIZE − 1 do4: Zold ← atomicMin(&(getPixelDepthArray(u, v , i)),Znew)5: if Zold == MAX_INTEGER then6: break7: end if8: Znew ← fmaxf (Znew ,Zold)9: end for

• Fang Liu et al, FreePipe: a programmable parallel renderingarchitecture for efficient multi-fragment effects

Multi-depth Approach - Results• Best time:◦ 202 × 132 pixels◦ GPU + CPU time!

◦ Performance With andWithout DRR transfer tohost!

BETTER!André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 21/27

Multi-depth Optimization

• Multi-depth allows for an ordered set of depths◦ More depths =⇒ more atomicMin() calls

We can postpone depth Ordering...1: index← atomicInc(&counter, INT_MAX)2: depthArray [index ]← Znew // RAW-hazard free!!!!

• depthArray has all the depth values;◦ Ordering can be done on a post-processing kernel!!!

Multi-depth Optimization

int index = atomicInc(sharedCounter);Pi

u�er

Concurrent Threads

Multi-depth Optimization – Results

• A-buffer Scheme Versus GLSL Solution• 202 × 132 pixels

Multi-depth Optimization – Results

Better than Current Solution

CUDA Platform

Input Data

Conclusion

• CUDA implementations for DRR extraction◦ Both pre-processing and main computation tasks◦ Artifact-free

• Single geometry pass• Shared memory model◦ May be adapted to other technologies

• Final implementation shows better performance than GLSL

Future Work

There’s a Big Chart to Fill Up...

Future Work

• Still some artifacts• Memory operations optimizations• Comparisons with other implementations, other geometrymodels

• Build a DRR generation library◦ possibly an open-source project

• Participation in IJUP’11 • Paper preparation forVIPIMAGE 2011. AbstractDeadline: 15th March.

Thank You for Listening!Ask Away!

generation of planar radiographs from 3d anatomical models using the gpu

Documents

cvm staging and handwrist radiographs

faulty radiographs

radiation field preference for radiographic anatomical...

commonly encountered radiographs during clerkship:

abo ideal photos-radiographs

vetcpd - radiology peer reviewed thoracic radiology€¦ ·...

04 radiographs

chronic liver disease outline:definitionepidemiology...

weld radiographs

articulos.clinicamartinromero.comarticulos.clinicamartinromero.com/gorlin.pdf ·...

ofemergency radiology chest radiographs-ii

radiographic interpretation: radiographs in diagnosis

segmentation of anatomical structures in chest radiographs...

lung segmentation in chest radiographs using anotomical...

radiographs in perio

practical radiographs

identification of radiographs

kneel: knee anatomical landmark localization using...

guide to superimposition of profile radiographs by “the...

principles of panoramic radiographs