gpus open new avenues in medical mri - nvidia · 2013. 8. 23. · •siemens magnetom trio tim....

22
1 Chris A. Cocosco D. Gallichan, F. Testud, M. Zaitsev, and J. Hennig Dept. of Radiology, Medical Physics, GPUs Open New Avenues in Medical MRI UNIVERSITY MEDICAL CENTER FREIBURG

Upload: others

Post on 12-Feb-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

  • 1

    Chris A. Cocosco

    D. Gallichan, F. Testud, M. Zaitsev, and J. Hennig

    Dept. of Radiology, Medical Physics,

    GPUs Open New Avenues in

    Medical MRI

    UNIVERSITY MEDICAL CENTER FREIBURG

  • 2 C.A. Cocosco, GTC-2012

    Our research group:

    Biomedical Magnetic

    Resonance Imaging (MRI)

    @ University Medical

    Center Freiburg,

    Germany:

    > 50 scientists & PhD

    students

  • 3 C.A. Cocosco, GTC-2012

    B0 gradients (SEMs) for spatial encoding

    +

    -G +G 0

    SEMs: spatial encoding magnetic fields

    “k-space” :

  • 4 C.A. Cocosco, GTC-2012

    B0 gradients (SEMs) for spatial encoding

    Traditional (Linear) +

    + Quadratic (Non-linear)

    -G +G 0

    SEMs: spatial encoding magnetic fields

  • 5 C.A. Cocosco, GTC-2012

    PatLoc:

    • PatLoc = Parallel Acquisition Technique using Localized Gradients

    [ Hennig J. et al., MAGMA 21(1-2):5-14 (2008) ].

    • has the potential to allow:

    (1) higher gradient switching rates while not exceeding the

    Peripheral Nerve Stimulation (PNS) limits;

    (2) novel encoding strategies (e.g. better suited to the anatomy).

  • 6 C.A. Cocosco, GTC-2012

    First ever human PatLoc images:

    [ Schultz G. et al., “Reconstruction of MRI

    Data Encoded with Arbitrarily Shaped,

    Curvilinear, Non-bijective Magnetic Fields”,

    MRM 64(5):1390-1403 (2010) ]

  • 7 C.A. Cocosco, GTC-2012

    Why PatLoc:

    TSE 256x256, TR 5000 ms, slice thickness

    2mm, acquisition time ~3min for 5 slices.

  • 8 C.A. Cocosco, GTC-2012

    Imaging forward model:

    m = E * p

    p : image [NP] NP : number of image pixels

    m : measured data [NT,NC] NT : number of measured (“k-space”) samples

    NC : number of RF receive coils

    E [ NT*NC, NP ]

    Typical magnitudes:

    NT,NC = 256 x 256

    NC = 8

  • 9 C.A. Cocosco, GTC-2012

    Conjugate Gradient Algorithm:

    Conjugate Gradient Algorithm: numerically estimate an image

    consistent with the measured data

    [ Pruessman et al., MRM 2001;46:638-651 ].

    But: no gridding, no FFT !

    Repeat 15…25 times :

    • q = E’ * (E * p)

    1. E * p

    2. E’ * Ep

    • update p

  • 10 C.A. Cocosco, GTC-2012

    Compute-on-demand Implementation:

    E is very large, but: E = E ( Traj, SEM, B1map, B0map )

    Traj [NT, NS]

    SEM [NP, NS]

    B1map [NP, NC]

    B0map [NP]

    where NS = number of SEMs (B0 gradients)

    Foreach( NP )

    Foreach( NT )

    Foreach( NC )

    CUDA implementation:

    • blocks + threads

    • accumulator in shared memory + block reduce

  • 11 C.A. Cocosco, GTC-2012

    Matlab implementation:

    • key to performance: vectorize your code!

    • vector / matrix operations are automatically multi-threaded

    • Parallel Computing Toolbox

    • matlabpool + parfor : loop-level

    • run CUDA ptx kernels

    • both: spmd

  • 12 C.A. Cocosco, GTC-2012

    PatLoc wardware setup:

    • Siemens MAGNETOM Trio Tim.

    • PatLoc gradient insert coil

    [ Cocosco C.A. et al., ISMRM 2010

    #3946 ].

    • Additional set of 3 gradient

    amplifiers; can synchronously

    drive all the available gradients

    simultaneously and

    independently.

  • 13 C.A. Cocosco, GTC-2012

    First PatLoc gradient human coil:

  • 14 C.A. Cocosco, GTC-2012

    Application 1: Higher-dim gradient encoding

    • 4DRIO [ Gallichan D. et al., “Simultaneously driven linear and

    nonlinear spatial encoding fields in

    MRI”, MRM 65(3), 2011 ]

    • NS= 4

    • NP= 320^2

    • NT= 256^2

    • NC= 8

    E ~ 450 GB

  • 15 C.A. Cocosco, GTC-2012

    Throughput CPU vs GPU:

    • quad-socket Intel Xeon Nehalem-EX X7560 with 1024G RAM :

    16 threads : 615s to compute E, 29s / iter

    32 threads : 565s to compute E, 27s / iter

    • dual-socket Intel Xeon Westmere-EP X5690 :

    12 threads : 252s / iter

    • Nvidia Tesla C2075 GPUs

    8.1s / iter

    7s / iter with hardcoded NS

    • 4x Nvidia Tesla C2075 GPUs

    2.3s / iter (3.5x) ( Matlab R2012a ; CUDA 4.1 )

  • 16 C.A. Cocosco, GTC-2012

    Application 2: Ultra-fast imaging

    • “single-shot” imaging

    • Layton et al: “Region-specific

    trajectory design for single-shot

    imaging using linear and nonlinear

    magnetic encoding fields”, ISMRM

    2012.

    • NS= 16 “gradients” (harmonics)

    • NP= 128^2

    • NT= 131^2

    • NC= 8

    E ~ 18 GB

  • 17 C.A. Cocosco, GTC-2012

    Application 2: Ultra-fast imaging

    Use a “Field Camera” : C. Barmet, K. Pruessmann, Inst. for Biomedical

    Eng., University and ETH Zuerich [ Wilm et al, MRM 2011 ]

  • 18 C.A. Cocosco, GTC-2012

    Throughput CPU vs GPU:

    • dual-socket Intel Xeon Westmere-EP X5690, 96GB RAM :

    12 threads : 37s to compute E, 3.7s/iter

    • Nvidia Tesla C2075 GPUs

    0.56s / iter

    • 4x Nvidia Tesla C2075 GPUs

    0.26s / iter

    ( Matlab R2012a ; CUDA 4.1 )

  • 19 C.A. Cocosco, GTC-2012

    What if the subject is ... moving?

    E = E ( Traj, SEM, B1map, B0map )

    • Apply a 3D rigid-body transformation to SEM, B1map, B0map

    for each “segment” of measured data (e.g. 256 segments for a

    256^2 image)

    • Size explosion for pre-computing E, but approachable with the

    compute-on-demand GPU solution.

    • Work in Progress…

    No FFT, like in

    [ Bammer et al. “Augmented generalized SENSE…” MRM2007;57(1):90-102]

  • 20 C.A. Cocosco, GTC-2012

    Conclusions and outlook

    • GPUs open new avenues in medical MRI

    • Faster imaging: shorter sessions, more information

    • Address limitations imposed by: physics, MRI hardware

    technology, human subject

    • Practical R&D process

    • Feasible clinical implementation

    • Wish list: more memory bandwidth, more registers &

    shared memory, or both ;-)

  • 21 C.A. Cocosco, GTC-2012

    Special Thanks to:

    • Research funding: German Federal Ministry of Education and

    Research, grant #13N9208; European Research Council

    Advanced Grant 'OVOC' grant agreement 232908‘.

    • Travel funding: Wissenschaftliche Gesellschaft in Freiburg im

    Breisgau.

    • C. Barmet, K. Pruessmann, (Institute for Biomedical Engineering,

    University and ETH Zuerich, Switzerland).

    • K. Layton (The University of Melbourne, Australia).

    • J. Maclaren, and our colleagues in Medical Physics, Dept. of

    Radiology, University Medical Center Freiburg.

    • Bruker Biospin, Siemens Healthcare.

  • 22 C.A. Cocosco, GTC-2012

    Chris A. Cocosco

    [email protected]

    GPUs Open New Avenues in

    Medical MRI