ruby for data processing
DESCRIPTION
Brian Chapados talks about using ruby & rake to build a simple workflow to coordinate external processes.Watch a video at http://www.bestechvideos.com/2008/07/01/sd-rb-episode-048-ruby-for-data-processingTRANSCRIPT
Data Processing with Ruby
Brian Chapadoshttp://chapados.org
SDRubyApril 3, 2008
> Archaeglobus PCNAMIDVIMTGELLKTVTRAIVALVSEARIHFLEKGLHSRAVDPANVAMVIVDIPKDSFEVYNIDEEKTIGVDMDRIFDISKSISTKDLVELIVEDESTLKVKFGSVEYKVALIDPSAIRKEPRIPELELPAKIVMDAGEFKKAIAAADKISDQVIFRSDKEGFRIEAKGDVDSIVFHMTETELIEFNGGEARSMFSVDYLKEFCKVAGSGDLLTIHLGTNYPVRLVFELVGGRAKVEYILAPRIESE
Understanding Proteins
sequence: 1-D linear chain
structure: 3-D after folding
Hard to do structures with several components
X-ray scattering
C. Trame, personal communication.Sousa et al. 2000. Cell 103: 633-643.
Raw Data Distance distribution function of
particle
R P(R) ERROR
0.0000E+00 0.0000E+00 0.0000E+00 0.5000E+00 0.3157E-02 0.0000E+00 0.1000E+01 0.6069E-02 0.0000E+00 0.1500E+01 0.8740E-02 0.0000E+00 0.2000E+01 0.1118E-01 0.0000E+00 0.2500E+01 0.1339E-01 0.0000E+00 0.3000E+01 0.1538E-01 0.0000E+00 0.3500E+01 0.1718E-01 0.0000E+00 0.4000E+01 0.1879E-01 0.0000E+00 0.4500E+01 0.2023E-01 0.0000E+00 0.5000E+01 0.2153E-01 0.0000E+00 0.5500E+01 0.2269E-01 0.0000E+00 0.6000E+01 0.2374E-01 0.0000E+00 0.6500E+01 0.2471E-01 0.0000E+00 0.7000E+01 0.2560E-01 0.0000E+00 0.7500E+01 0.2645E-01 0.0000E+00 0.8000E+01 0.2727E-01 0.0000E+00 0.8500E+01 0.2809E-01 0.0000E+00 0.9000E+01 0.2891E-01 0.0000E+00 0.9500E+01 0.2976E-01 0.0000E+00 0.1000E+02 0.3065E-01 0.0000E+00 0.1050E+02 0.3160E-01 0.0000E+00 0.1100E+02 0.3261E-01 0.0000E+00 0.1150E+02 0.3370E-01 0.0000E+00 0.1200E+02 0.3487E-01 0.0000E+00 0.1250E+02 0.3613E-01 0.0000E+00 0.1300E+02 0.3747E-01 0.0000E+00 0.1350E+02 0.3890E-01 0.0000E+00 0.1400E+02 0.4041E-01 0.0000E+00 0.1450E+02 0.4201E-01 0.0000E+00 0.1500E+02 0.4367E-01 0.0000E+00 0.1550E+02 0.4539E-01 0.0000E+00 0.1600E+02 0.4717E-01 0.0000E+00 0.1650E+02 0.4899E-01 0.0000E+00 0.1700E+02 0.5083E-01 0.0000E+00 0.1750E+02 0.5268E-01 0.0000E+00 0.1800E+02 0.5453E-01 0.0000E+00 0.1850E+02 0.5636E-01 0.0000E+00 0.1900E+02 0.5815E-01 0.0000E+00 0.1950E+02 0.5989E-01 0.0000E+00 0.2000E+02 0.6157E-01 0.0000E+00 0.2050E+02 0.6317E-01 0.0000E+00 0.2100E+02 0.6467E-01 0.0000E+00 0.2150E+02 0.6607E-01 0.0000E+00 0.2200E+02 0.6735E-01 0.0000E+00 0.2250E+02 0.6851E-01 0.0000E+00 0.2300E+02 0.6954E-01 0.0000E+00 0.2350E+02 0.7043E-01 0.0000E+00 0.2400E+02 0.7118E-01 0.0000E+00 0.2450E+02 0.7179E-01 0.0000E+00 0.2500E+02 0.7225E-01 0.0000E+00 0.2550E+02 0.7258E-01 0.0000E+00 0.2600E+02 0.7277E-01 0.0000E+00 0.2650E+02 0.7283E-01 0.0000E+00 0.2700E+02 0.7277E-01 0.0000E+00 0.2750E+02 0.7259E-01 0.0000E+00 0.2800E+02 0.7231E-01 0.0000E+00 0.2850E+02 0.7194E-01 0.0000E+00 0.2900E+02 0.7149E-01 0.0000E+00 0.2950E+02 0.7096E-01 0.0000E+00 0.3000E+02 0.7038E-01 0.0000E+00 0.3050E+02 0.6975E-01 0.0000E+00 0.3100E+02 0.6909E-01 0.0000E+00 0.3150E+02 0.6840E-01 0.0000E+00 0.3200E+02 0.6770E-01 0.0000E+00 0.3250E+02 0.6700E-01 0.0000E+00 0.3300E+02 0.6630E-01 0.0000E+00 0.3350E+02 0.6561E-01 0.0000E+00 0.3400E+02 0.6494E-01 0.0000E+00 0.3450E+02 0.6429E-01 0.0000E+00 0.3500E+02 0.6366E-01 0.0000E+00 0.3550E+02 0.6304E-01 0.0000E+00 0.3600E+02 0.6245E-01 0.0000E+00 0.3650E+02 0.6186E-01 0.0000E+00 0.3700E+02 0.6128E-01 0.0000E+00 0.3750E+02 0.6070E-01 0.0000E+00 0.3800E+02 0.6010E-01 0.0000E+00 0.3850E+02 0.5948E-01 0.0000E+00 0.3900E+02 0.5881E-01 0.0000E+00 0.3950E+02 0.5810E-01 0.0000E+00 0.4000E+02 0.5731E-01 0.0000E+00 0.4050E+02 0.5643E-01 0.0000E+00 0.4100E+02 0.5545E-01 0.0000E+00 0.4150E+02 0.5434E-01 0.0000E+00 0.4200E+02 0.5309E-01 0.0000E+00 0.4250E+02 0.5168E-01 0.0000E+00 0.4300E+02 0.5008E-01 0.0000E+00 0.4350E+02 0.4828E-01 0.0000E+00 0.4400E+02 0.4627E-01 0.0000E+00 0.4450E+02 0.4401E-01 0.0000E+00 0.4500E+02 0.4151E-01 0.0000E+00 0.4550E+02 0.3874E-01 0.0000E+00 0.4600E+02 0.3568E-01 0.0000E+00 0.4650E+02 0.3234E-01 0.0000E+00 0.4700E+02 0.2869E-01 0.0000E+00 0.4750E+02 0.2472E-01 0.0000E+00 0.4800E+02 0.2044E-01 0.0000E+00 0.4850E+02 0.1583E-01 0.0000E+00 0.4900E+02 0.1088E-01 0.0000E+00 0.4950E+02 0.5608E-02 0.0000E+00 0.5000E+02 0.0000E+00 0.0000E+00
Reciprocal space: Rg = 20.97 , I(0) = 0.2953E+02
Real space: Rg = 20.94 +- 0.026 I(0) = 0.2953E+02 +- 0.2278E+00
Existing SoftwareSvergun group @ EMBLhttp://www.embl-hamburg.de/ExternalInfo/Research/Sax/software.html
“interactive” interfacesnot easily scriptable
Works well, but...
requires running each program multiple times
no really... you have to see it to believe it
Help from Ruby
We want to use linux clusters with hundreds of CPUs
Ruby
Rake
wrap external programswrite shell scripts to run external programs
define relationships between inputs/outputs of different programs
launch external programs after dependencies are satisfied
Do more with Ruby
Define input parameters in a scriptDefine common tasks in a library
quick and dirty...
more robust...
Evolve towards a micro-framework
Ruby API for running commands
More sophisticated information processing
AcknowledgementsLab (Scripps Research Institute)
John TainerScott WilliamsChris Putnam
Data CollectionBeamline 12.3.1
The Advanced Light Source (ALS, LBNL)
FundingNIH, DOE, NCI