gpus open new avenues in medical mri - nvidia · 2013. 8. 23. · •siemens magnetom trio tim....
TRANSCRIPT
-
1
Chris A. Cocosco
D. Gallichan, F. Testud, M. Zaitsev, and J. Hennig
Dept. of Radiology, Medical Physics,
GPUs Open New Avenues in
Medical MRI
UNIVERSITY MEDICAL CENTER FREIBURG
-
2 C.A. Cocosco, GTC-2012
Our research group:
Biomedical Magnetic
Resonance Imaging (MRI)
@ University Medical
Center Freiburg,
Germany:
> 50 scientists & PhD
students
-
3 C.A. Cocosco, GTC-2012
B0 gradients (SEMs) for spatial encoding
+
-G +G 0
SEMs: spatial encoding magnetic fields
“k-space” :
-
4 C.A. Cocosco, GTC-2012
B0 gradients (SEMs) for spatial encoding
Traditional (Linear) +
+ Quadratic (Non-linear)
-G +G 0
SEMs: spatial encoding magnetic fields
-
5 C.A. Cocosco, GTC-2012
PatLoc:
• PatLoc = Parallel Acquisition Technique using Localized Gradients
[ Hennig J. et al., MAGMA 21(1-2):5-14 (2008) ].
• has the potential to allow:
(1) higher gradient switching rates while not exceeding the
Peripheral Nerve Stimulation (PNS) limits;
(2) novel encoding strategies (e.g. better suited to the anatomy).
-
6 C.A. Cocosco, GTC-2012
First ever human PatLoc images:
[ Schultz G. et al., “Reconstruction of MRI
Data Encoded with Arbitrarily Shaped,
Curvilinear, Non-bijective Magnetic Fields”,
MRM 64(5):1390-1403 (2010) ]
-
7 C.A. Cocosco, GTC-2012
Why PatLoc:
TSE 256x256, TR 5000 ms, slice thickness
2mm, acquisition time ~3min for 5 slices.
-
8 C.A. Cocosco, GTC-2012
Imaging forward model:
m = E * p
p : image [NP] NP : number of image pixels
m : measured data [NT,NC] NT : number of measured (“k-space”) samples
NC : number of RF receive coils
E [ NT*NC, NP ]
Typical magnitudes:
NT,NC = 256 x 256
NC = 8
-
9 C.A. Cocosco, GTC-2012
Conjugate Gradient Algorithm:
Conjugate Gradient Algorithm: numerically estimate an image
consistent with the measured data
[ Pruessman et al., MRM 2001;46:638-651 ].
But: no gridding, no FFT !
Repeat 15…25 times :
• q = E’ * (E * p)
1. E * p
2. E’ * Ep
• update p
-
10 C.A. Cocosco, GTC-2012
Compute-on-demand Implementation:
E is very large, but: E = E ( Traj, SEM, B1map, B0map )
Traj [NT, NS]
SEM [NP, NS]
B1map [NP, NC]
B0map [NP]
where NS = number of SEMs (B0 gradients)
Foreach( NP )
Foreach( NT )
Foreach( NC )
CUDA implementation:
• blocks + threads
• accumulator in shared memory + block reduce
-
11 C.A. Cocosco, GTC-2012
Matlab implementation:
• key to performance: vectorize your code!
• vector / matrix operations are automatically multi-threaded
• Parallel Computing Toolbox
• matlabpool + parfor : loop-level
• run CUDA ptx kernels
• both: spmd
-
12 C.A. Cocosco, GTC-2012
PatLoc wardware setup:
• Siemens MAGNETOM Trio Tim.
• PatLoc gradient insert coil
[ Cocosco C.A. et al., ISMRM 2010
#3946 ].
• Additional set of 3 gradient
amplifiers; can synchronously
drive all the available gradients
simultaneously and
independently.
-
13 C.A. Cocosco, GTC-2012
First PatLoc gradient human coil:
-
14 C.A. Cocosco, GTC-2012
Application 1: Higher-dim gradient encoding
• 4DRIO [ Gallichan D. et al., “Simultaneously driven linear and
nonlinear spatial encoding fields in
MRI”, MRM 65(3), 2011 ]
• NS= 4
• NP= 320^2
• NT= 256^2
• NC= 8
E ~ 450 GB
-
15 C.A. Cocosco, GTC-2012
Throughput CPU vs GPU:
• quad-socket Intel Xeon Nehalem-EX X7560 with 1024G RAM :
16 threads : 615s to compute E, 29s / iter
32 threads : 565s to compute E, 27s / iter
• dual-socket Intel Xeon Westmere-EP X5690 :
12 threads : 252s / iter
• Nvidia Tesla C2075 GPUs
8.1s / iter
7s / iter with hardcoded NS
• 4x Nvidia Tesla C2075 GPUs
2.3s / iter (3.5x) ( Matlab R2012a ; CUDA 4.1 )
-
16 C.A. Cocosco, GTC-2012
Application 2: Ultra-fast imaging
• “single-shot” imaging
• Layton et al: “Region-specific
trajectory design for single-shot
imaging using linear and nonlinear
magnetic encoding fields”, ISMRM
2012.
• NS= 16 “gradients” (harmonics)
• NP= 128^2
• NT= 131^2
• NC= 8
E ~ 18 GB
-
17 C.A. Cocosco, GTC-2012
Application 2: Ultra-fast imaging
Use a “Field Camera” : C. Barmet, K. Pruessmann, Inst. for Biomedical
Eng., University and ETH Zuerich [ Wilm et al, MRM 2011 ]
-
18 C.A. Cocosco, GTC-2012
Throughput CPU vs GPU:
• dual-socket Intel Xeon Westmere-EP X5690, 96GB RAM :
12 threads : 37s to compute E, 3.7s/iter
• Nvidia Tesla C2075 GPUs
0.56s / iter
• 4x Nvidia Tesla C2075 GPUs
0.26s / iter
( Matlab R2012a ; CUDA 4.1 )
-
19 C.A. Cocosco, GTC-2012
What if the subject is ... moving?
E = E ( Traj, SEM, B1map, B0map )
• Apply a 3D rigid-body transformation to SEM, B1map, B0map
for each “segment” of measured data (e.g. 256 segments for a
256^2 image)
• Size explosion for pre-computing E, but approachable with the
compute-on-demand GPU solution.
• Work in Progress…
No FFT, like in
[ Bammer et al. “Augmented generalized SENSE…” MRM2007;57(1):90-102]
-
20 C.A. Cocosco, GTC-2012
Conclusions and outlook
• GPUs open new avenues in medical MRI
• Faster imaging: shorter sessions, more information
• Address limitations imposed by: physics, MRI hardware
technology, human subject
• Practical R&D process
• Feasible clinical implementation
• Wish list: more memory bandwidth, more registers &
shared memory, or both ;-)
-
21 C.A. Cocosco, GTC-2012
Special Thanks to:
• Research funding: German Federal Ministry of Education and
Research, grant #13N9208; European Research Council
Advanced Grant 'OVOC' grant agreement 232908‘.
• Travel funding: Wissenschaftliche Gesellschaft in Freiburg im
Breisgau.
• C. Barmet, K. Pruessmann, (Institute for Biomedical Engineering,
University and ETH Zuerich, Switzerland).
• K. Layton (The University of Melbourne, Australia).
• J. Maclaren, and our colleagues in Medical Physics, Dept. of
Radiology, University Medical Center Freiburg.
• Bruker Biospin, Siemens Healthcare.
-
22 C.A. Cocosco, GTC-2012
Chris A. Cocosco
GPUs Open New Avenues in
Medical MRI