computing protein size di t ib ti i c t if tidistributions using … · 2013. 3. 22. · di t ib ti...

Computing Protein Size Di t ib ti i C t if tiDistributions using Centrifugation Techniques and the K20c GPU

Robert ZigonSr Staff Research EngineergBeckman Coulter

OutlineOutline• Motivation• Centrifugation Background• The LS G* algorithm• The LS-G algorithm• Results• Summary and conclusion

Motivation• For the past 6 years …

Motivationp y

• Change the way scientists and• Change the way scientists and researchers work ….

• By delivering interactive applications.y g pp

BackgroundBackground• Two types of centrifuges

Preparative Analytical

• Preparative are used to separate a sample into components (blood -> red, white, platelets, plasma)

• Analytical are used to compute size and molecular weight distributionsweight distributions.

How does an Analytical Ultracentrifuge work?work?

Start with absorbance measurementsStart with absorbance measurements of the sample concentration at different points in time.

Solve for histogram using LS-G* algorithm

Produce a histogram describing diameter/weight of species and theirspecies and their concentration.

The LS-G* AlgorithmDo

The LS-G Algorithm

{Solve Linear Least Squares withSolve Linear Least Squares with

Tikhonov Phillips regularizationNon negatively constrained

} Until Histogram is “Good Enough”

The LS-G* AlgorithmDo{ Least squares term{

min 222 LgUga 0..

min

igtsLgUga

Regularization term} Until Histogram is “Good Enough”

Regularization term

* All computations are double precision floating point.

What does “Good Enough” mean?What does Good Enough mean?2

000 min)( UgasatisfieswhereGGVarianceComputeV

Do{ i 222 LU{

0..min 2

igtsLgUga

} until Vj/V0 < F-1(1-confidence interval)} until Vj/V0 < F (1 confidence interval) otherwise adjust

Our CUDA approach for more speed• Constrain the problem

Our CUDA approach for more speedp

Hand written Conjugate Gradient to controlHand written Conjugate Gradient to control memory transfers.Number of columns a multiple of 32Number of columns a multiple of 32.Dense matrix vector multiplication by

Noriyuki FujimotoNoriyuki Fujimoto.

Dell Results – K20c is 2X C2075

10 00

100.00

1000.00

econ

ds)

K20c

0.10

1.00

10.00

64 128 256 384 512

Tim

e (S

eC2075

CPU

Histogram Resolution Our App K20 (sec) Our App C2075 (sec) SEDFIT CPU (sec) Speedup K20 / CPU Speedup C2075 / CPU

0.10Resolution

pp ( ) pp ( ) ( ) p p / p p /64 0.90 1.87 3.95 4.39 2.11128 1.35 2.60 7.10 5.26 2.73256 1.61 3.25 31.25 19.41 9.62384 1.87 3.94 88.91 47.55 22.57512 2.11 4.29 215.38 102.08 50.21

Win 7/64, 4gb RAMDell T3400, core 2 duo @ 2.33 Ghz

Lenovo Results – K20c is 1.2X C2075

100.00

1000.00

cond

s)K20c

0 10

1.00

10.00

64 128 256 384 512

Tim

e (S

ec0c

C2075

CPU

0.10Histogram Resolution

Histogram Resolution Our App K20 (sec) Our App C2075 (sec) SEDFIT CPU (sec) Speedup K20 / CPU Speedup C2075 / CPUpp ( ) pp ( ) ( ) p p / p p /

64 0.94 1.09 3.10 3.30 2.84128 1.44 1.67 6.50 4.51 3.89256 1.79 2.10 28.70 16.03 13.67384 1.94 2.40 84.50 43.56 35.21512 2.30 2.94 206.20 89.65 70.14

Win 7/64, 8gb RAMLenovo S20, quad W3520 @ 2.67 GHz

Summary and conclusionSummary and conclusion• C2075 is fast, K20c is 2 times faster

C d d T l f l bl• Cuda and Telsa prefer large problems(250,000 rows by 512 columns)

• Interactive rates are possiblep

• Next steps A user interface to further demonstrate the interactivity now A user interface to further demonstrate the interactivity now

possible. 100 millisecond compute time Use CUDA Dynamic Parallelism to implement entire Conjugate Use CUDA Dynamic Parallelism to implement entire Conjugate

Gradient on GPU

Questions?Questions?

robert zigon@beckman [email protected]

computing protein size di t ib ti i c t if tidistributions using … · 2013. 3. 22. · di t ib ti...

Documents