gpus open new avenues in medical mri - nvidia · 2013. 8. 23. · •siemens magnetom trio tim....

1

Chris A. Cocosco

D. Gallichan, F. Testud, M. Zaitsev, and J. Hennig

Dept. of Radiology, Medical Physics,

GPUs Open New Avenues in

Medical MRI

UNIVERSITY MEDICAL CENTER FREIBURG

2 C.A. Cocosco, GTC-2012

Our research group:

Biomedical Magnetic

Resonance Imaging (MRI)

@ University Medical

Center Freiburg,

Germany:

> 50 scientists & PhD

students


B0 gradients (SEMs) for spatial encoding

+

-G +G 0

SEMs: spatial encoding magnetic fields

“k-space” :


B0 gradients (SEMs) for spatial encoding

Traditional (Linear) +

+ Quadratic (Non-linear)

-G +G 0

SEMs: spatial encoding magnetic fields


PatLoc:

• PatLoc = Parallel Acquisition Technique using Localized Gradients

[ Hennig J. et al., MAGMA 21(1-2):5-14 (2008) ].

• has the potential to allow:

(1) higher gradient switching rates while not exceeding the

Peripheral Nerve Stimulation (PNS) limits;

(2) novel encoding strategies (e.g. better suited to the anatomy).


First ever human PatLoc images:

[ Schultz G. et al., “Reconstruction of MRI

Data Encoded with Arbitrarily Shaped,

Curvilinear, Non-bijective Magnetic Fields”,

MRM 64(5):1390-1403 (2010) ]


Why PatLoc:

TSE 256x256, TR 5000 ms, slice thickness

2mm, acquisition time ~3min for 5 slices.


Imaging forward model:

m = E * p

p : image [NP] NP : number of image pixels

m : measured data [NT,NC] NT : number of measured (“k-space”) samples

NC : number of RF receive coils

E [ NT*NC, NP ]

Typical magnitudes:

NT,NC = 256 x 256

NC = 8


Conjugate Gradient Algorithm:

Conjugate Gradient Algorithm: numerically estimate an image

consistent with the measured data

[ Pruessman et al., MRM 2001;46:638-651 ].

But: no gridding, no FFT !

Repeat 15…25 times :

• q = E’ * (E * p)

1. E * p

2. E’ * Ep

• update p


Compute-on-demand Implementation:

E is very large, but: E = E ( Traj, SEM, B1map, B0map )

Traj [NT, NS]

SEM [NP, NS]

B1map [NP, NC]

B0map [NP]

where NS = number of SEMs (B0 gradients)

Foreach( NP )

Foreach( NT )

Foreach( NC )

CUDA implementation:

• blocks + threads

• accumulator in shared memory + block reduce


Matlab implementation:

• key to performance: vectorize your code!

• vector / matrix operations are automatically multi-threaded

• Parallel Computing Toolbox

• matlabpool + parfor : loop-level

• run CUDA ptx kernels

• both: spmd


PatLoc wardware setup:

• Siemens MAGNETOM Trio Tim.

• PatLoc gradient insert coil

[ Cocosco C.A. et al., ISMRM 2010

#3946 ].

• Additional set of 3 gradient

amplifiers; can synchronously

drive all the available gradients

simultaneously and

independently.


First PatLoc gradient human coil:


Application 1: Higher-dim gradient encoding

• 4DRIO [ Gallichan D. et al., “Simultaneously driven linear and

nonlinear spatial encoding fields in

MRI”, MRM 65(3), 2011 ]

• NS= 4

• NP= 320^2

• NT= 256^2

• NC= 8

E ~ 450 GB


Throughput CPU vs GPU:

• quad-socket Intel Xeon Nehalem-EX X7560 with 1024G RAM :

16 threads : 615s to compute E, 29s / iter

32 threads : 565s to compute E, 27s / iter

• dual-socket Intel Xeon Westmere-EP X5690 :

12 threads : 252s / iter

• Nvidia Tesla C2075 GPUs

8.1s / iter

7s / iter with hardcoded NS

• 4x Nvidia Tesla C2075 GPUs

2.3s / iter (3.5x) ( Matlab R2012a ; CUDA 4.1 )


Application 2: Ultra-fast imaging

• “single-shot” imaging

• Layton et al: “Region-specific

trajectory design for single-shot

imaging using linear and nonlinear

magnetic encoding fields”, ISMRM

2012.

• NS= 16 “gradients” (harmonics)

• NP= 128^2

• NT= 131^2

• NC= 8

E ~ 18 GB


Application 2: Ultra-fast imaging

Use a “Field Camera” : C. Barmet, K. Pruessmann, Inst. for Biomedical

Eng., University and ETH Zuerich [ Wilm et al, MRM 2011 ]


Throughput CPU vs GPU:

• dual-socket Intel Xeon Westmere-EP X5690, 96GB RAM :

12 threads : 37s to compute E, 3.7s/iter

• Nvidia Tesla C2075 GPUs

0.56s / iter

• 4x Nvidia Tesla C2075 GPUs

0.26s / iter

( Matlab R2012a ; CUDA 4.1 )


What if the subject is ... moving?

E = E ( Traj, SEM, B1map, B0map )

• Apply a 3D rigid-body transformation to SEM, B1map, B0map

for each “segment” of measured data (e.g. 256 segments for a

256^2 image)

• Size explosion for pre-computing E, but approachable with the

compute-on-demand GPU solution.

• Work in Progress…

No FFT, like in

[ Bammer et al. “Augmented generalized SENSE…” MRM2007;57(1):90-102]


Conclusions and outlook

• GPUs open new avenues in medical MRI

• Faster imaging: shorter sessions, more information

• Address limitations imposed by: physics, MRI hardware

technology, human subject

• Practical R&D process

• Feasible clinical implementation

• Wish list: more memory bandwidth, more registers &

shared memory, or both ;-)


Special Thanks to:

• Research funding: German Federal Ministry of Education and

Research, grant #13N9208; European Research Council

Advanced Grant 'OVOC' grant agreement 232908‘.

• Travel funding: Wissenschaftliche Gesellschaft in Freiburg im

Breisgau.

• C. Barmet, K. Pruessmann, (Institute for Biomedical Engineering,

University and ETH Zuerich, Switzerland).

• K. Layton (The University of Melbourne, Australia).

• J. Maclaren, and our colleagues in Medical Physics, Dept. of

Radiology, University Medical Center Freiburg.

• Bruker Biospin, Siemens Healthcare.


Chris A. Cocosco

[email protected]

GPUs Open New Avenues in

Medical MRI

gpus open new avenues in medical mri - nvidia · 2013. 8. 23. · •siemens magnetom trio tim....

Documents