computing protein size di t ib ti i c t if tidistributions using … · 2013. 3. 22. · di t ib ti...

14
Computing Protein Size Di t ib ti i C t if ti Distributions using Centrifugation Techniques and the K20c GPU Robert Zigon Sr Staff Research Engineer Beckman Coulter

Upload: others

Post on 28-Jan-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

  • Computing Protein Size Di t ib ti i C t if tiDistributions using Centrifugation Techniques and the K20c GPU

    Robert ZigonSr Staff Research EngineergBeckman Coulter

  • OutlineOutline• Motivation• Centrifugation Background• The LS G* algorithm• The LS-G algorithm• Results• Summary and conclusion

  • Motivation• For the past 6 years …

    Motivationp y

    • Change the way scientists and• Change the way scientists and researchers work ….

    • By delivering interactive applications.y g pp

  • BackgroundBackground• Two types of centrifuges

    Preparative Analytical

    • Preparative are used to separate a sample into components (blood -> red, white, platelets, plasma)

    • Analytical are used to compute size and molecular weight distributionsweight distributions.

  • How does an Analytical Ultracentrifuge work?work?

  • Start with absorbance measurementsStart with absorbance measurements of the sample concentration at different points in time.

    Solve for histogram using LS-G* algorithm

    Produce a histogram describing diameter/weight of species and theirspecies and their concentration.

  • The LS-G* AlgorithmDo

    The LS-G Algorithm

    {Solve Linear Least Squares withSolve Linear Least Squares with

    Tikhonov Phillips regularizationNon negatively constrained

    } Until Histogram is “Good Enough”

  • The LS-G* AlgorithmDo{ Least squares term{

    min 222 LgUga 0..

    min

    igtsLgUga

    Regularization term} Until Histogram is “Good Enough”

    Regularization term

    * All computations are double precision floating point.

  • What does “Good Enough” mean?What does Good Enough mean?2

    000 min)( UgasatisfieswhereGGVarianceComputeV

    Do{ i 222 LU{

    0..min 2

    igtsLgUga

    } until Vj/V0 < F-1(1-confidence interval)} until Vj/V0 < F (1 confidence interval) otherwise adjust

  • Our CUDA approach for more speed• Constrain the problem

    Our CUDA approach for more speedp

    Hand written Conjugate Gradient to controlHand written Conjugate Gradient to control memory transfers.Number of columns a multiple of 32Number of columns a multiple of 32.Dense matrix vector multiplication by

    Noriyuki FujimotoNoriyuki Fujimoto.

  • Dell Results – K20c is 2X C2075

    10 00

    100.00

    1000.00

    econ

    ds)

    K20c

    0.10

    1.00

    10.00

    64 128 256 384 512

    Tim

    e (S

    eC2075

    CPU

    Histogram Resolution Our App K20 (sec) Our App C2075 (sec) SEDFIT CPU (sec) Speedup K20 / CPU Speedup C2075 / CPU

    0.10Resolution

    pp ( ) pp ( ) ( ) p p / p p /64 0.90 1.87 3.95 4.39 2.11128 1.35 2.60 7.10 5.26 2.73256 1.61 3.25 31.25 19.41 9.62384 1.87 3.94 88.91 47.55 22.57512 2.11 4.29 215.38 102.08 50.21

    Win 7/64, 4gb RAMDell T3400, core 2 duo @ 2.33 Ghz

  • Lenovo Results – K20c is 1.2X C2075

    100.00

    1000.00

    cond

    s)K20c

    0 10

    1.00

    10.00

    64 128 256 384 512

    Tim

    e (S

    ec0c

    C2075

    CPU

    0.10Histogram Resolution

    Histogram Resolution Our App K20 (sec) Our App C2075 (sec) SEDFIT CPU (sec) Speedup K20 / CPU Speedup C2075 / CPUpp ( ) pp ( ) ( ) p p / p p /

    64 0.94 1.09 3.10 3.30 2.84128 1.44 1.67 6.50 4.51 3.89256 1.79 2.10 28.70 16.03 13.67384 1.94 2.40 84.50 43.56 35.21512 2.30 2.94 206.20 89.65 70.14

    Win 7/64, 8gb RAMLenovo S20, quad W3520 @ 2.67 GHz

  • Summary and conclusionSummary and conclusion• C2075 is fast, K20c is 2 times faster

    C d d T l f l bl• Cuda and Telsa prefer large problems(250,000 rows by 512 columns)

    • Interactive rates are possiblep

    • Next steps A user interface to further demonstrate the interactivity now A user interface to further demonstrate the interactivity now

    possible. 100 millisecond compute time Use CUDA Dynamic Parallelism to implement entire Conjugate Use CUDA Dynamic Parallelism to implement entire Conjugate

    Gradient on GPU

  • Questions?Questions?

    robert zigon@beckman [email protected]