modelling proteins and proteomes using linux clusters ram samudrala university of washington
TRANSCRIPT
Modelling proteins and proteomes using Linux clustersRam Samudrala
University of Washington
Examples of biological problems
Protein structure prediction/docking simulations- need to run different trajectories that sometimes
talk with each other
Molecular dynamics simulations- need more cohesive parallelisation
Polarisable force fields - need true parallelisation
Bioinformatics searches/exploration- trivially parallelisable
Computational issues
Need efficient methods to start/stop jobs
Need load/balancing queuing system
Need fast communications at times
Need stability (months/years uptimes)
Need low maintainance/management overhead
Need low installation overhead
Needs to be cheap!
Hardware and operating system
256 AMD and Intel CPUs (1-2.5 GHz)
0.5-1 GB RAM, 100-200 GB HD, dual processor MBs
100Mbps ethernet connectivity for 64 processor sets
White boxes are good but use up space – 1u racks ideal
Minimal Linux installation – create clone “CD” – copy on all machines
Our solution
No single solution – user implements their own
Completely decentralised
Analyse problem and determine parallelisable parts
Implementation specific to problem
Use local scratch space for computation
Redundant storage of data for faster access
Limit problem space to specific problems
Problem specific implementation
MCSA/GA: socket-based communication of trajectories; multiple trajectories on different CPUs
Docking: sample different ligands/regions of the proteinon different CPUs
MD: Pairwise force-fields are additive
PFF: ?
Bioinformatics: trivial parallelisation; communication by disk
Semi-exhaustive segment-based foldingEFDVILKAAGANKVAVIKAVRGATGLGLKEAKDLVESAPAALKEGVSKDDAEALKKALEEAGAEVEVK
generatefragments from database14-state , model
… …
minimisemonte carlo with simulated annealingconformational space annealing, GA
… …
filter all-atom pairwise interactions, bad contactscompactness, secondary structure
T170/sfrp3 – 4.8 Å for all 69 aa
Ab initio prediction at CASP
Comparative modelling at CASP
T182 – 1.0 Å (249 aa; 41% id)
Prediction of SARS CoV proteinase inhibitors
Ekachai Jenwitheesuk
Bioverse – S. typhimurium protein-protein interaction network
Jason McDermott
Bioverse – H. sapiens protein-protein interaction network
Jason McDermott
Future directions
Network connection with multiple ethernet cards based on traffic analysis
Gigabit ethernet (switches are still expensive)
Better network filesystems