computing protein size di t ib ti i c t if tidistributions using … · 2013. 3. 22. · di t ib ti...
TRANSCRIPT
-
Computing Protein Size Di t ib ti i C t if tiDistributions using Centrifugation Techniques and the K20c GPU
Robert ZigonSr Staff Research EngineergBeckman Coulter
-
OutlineOutline• Motivation• Centrifugation Background• The LS G* algorithm• The LS-G algorithm• Results• Summary and conclusion
-
Motivation• For the past 6 years …
Motivationp y
• Change the way scientists and• Change the way scientists and researchers work ….
• By delivering interactive applications.y g pp
-
BackgroundBackground• Two types of centrifuges
Preparative Analytical
• Preparative are used to separate a sample into components (blood -> red, white, platelets, plasma)
• Analytical are used to compute size and molecular weight distributionsweight distributions.
-
How does an Analytical Ultracentrifuge work?work?
-
Start with absorbance measurementsStart with absorbance measurements of the sample concentration at different points in time.
Solve for histogram using LS-G* algorithm
Produce a histogram describing diameter/weight of species and theirspecies and their concentration.
-
The LS-G* AlgorithmDo
The LS-G Algorithm
{Solve Linear Least Squares withSolve Linear Least Squares with
Tikhonov Phillips regularizationNon negatively constrained
} Until Histogram is “Good Enough”
-
The LS-G* AlgorithmDo{ Least squares term{
min 222 LgUga 0..
min
igtsLgUga
Regularization term} Until Histogram is “Good Enough”
Regularization term
* All computations are double precision floating point.
-
What does “Good Enough” mean?What does Good Enough mean?2
000 min)( UgasatisfieswhereGGVarianceComputeV
Do{ i 222 LU{
0..min 2
igtsLgUga
} until Vj/V0 < F-1(1-confidence interval)} until Vj/V0 < F (1 confidence interval) otherwise adjust
-
Our CUDA approach for more speed• Constrain the problem
Our CUDA approach for more speedp
Hand written Conjugate Gradient to controlHand written Conjugate Gradient to control memory transfers.Number of columns a multiple of 32Number of columns a multiple of 32.Dense matrix vector multiplication by
Noriyuki FujimotoNoriyuki Fujimoto.
-
Dell Results – K20c is 2X C2075
10 00
100.00
1000.00
econ
ds)
K20c
0.10
1.00
10.00
64 128 256 384 512
Tim
e (S
eC2075
CPU
Histogram Resolution Our App K20 (sec) Our App C2075 (sec) SEDFIT CPU (sec) Speedup K20 / CPU Speedup C2075 / CPU
0.10Resolution
pp ( ) pp ( ) ( ) p p / p p /64 0.90 1.87 3.95 4.39 2.11128 1.35 2.60 7.10 5.26 2.73256 1.61 3.25 31.25 19.41 9.62384 1.87 3.94 88.91 47.55 22.57512 2.11 4.29 215.38 102.08 50.21
Win 7/64, 4gb RAMDell T3400, core 2 duo @ 2.33 Ghz
-
Lenovo Results – K20c is 1.2X C2075
100.00
1000.00
cond
s)K20c
0 10
1.00
10.00
64 128 256 384 512
Tim
e (S
ec0c
C2075
CPU
0.10Histogram Resolution
Histogram Resolution Our App K20 (sec) Our App C2075 (sec) SEDFIT CPU (sec) Speedup K20 / CPU Speedup C2075 / CPUpp ( ) pp ( ) ( ) p p / p p /
64 0.94 1.09 3.10 3.30 2.84128 1.44 1.67 6.50 4.51 3.89256 1.79 2.10 28.70 16.03 13.67384 1.94 2.40 84.50 43.56 35.21512 2.30 2.94 206.20 89.65 70.14
Win 7/64, 8gb RAMLenovo S20, quad W3520 @ 2.67 GHz
-
Summary and conclusionSummary and conclusion• C2075 is fast, K20c is 2 times faster
C d d T l f l bl• Cuda and Telsa prefer large problems(250,000 rows by 512 columns)
• Interactive rates are possiblep
• Next steps A user interface to further demonstrate the interactivity now A user interface to further demonstrate the interactivity now
possible. 100 millisecond compute time Use CUDA Dynamic Parallelism to implement entire Conjugate Use CUDA Dynamic Parallelism to implement entire Conjugate
Gradient on GPU
-
Questions?Questions?
robert zigon@beckman [email protected]