proteinshop: a tool for protein structure prediction and modeling silvia crivelli computational...
TRANSCRIPT
ProteinShop: A Tool for Protein Structure Prediction and
Modeling
Silvia Crivelli
Computational Research Division Lawrence Berkeley National Laboratory
The Protein Structure Prediction Problem
To determine how proteins, the building
blocks of living cells, fold themselves into
three-dimensional shapes that define the
role they play in life.
Importance of Protein Structure Prediction
• The shape of a protein determines its function.• Knowledge of structure is used in many ways:
– Drug design– Design of synthetic proteins– Re-engineering defective proteins
• Genome projects are providing sequences for many proteins whose structure will need to be determined.
Protein Structures
ProGly Leu Ser
Proteins consist of a long chain ofamino acids, the primary structure
N
O H
RH
N
O H
R H
N
O H
RH
N
O H
R H
N
OH
R H
N
OH
R H
N
OH
R H
N
OH
R H
Side chain
H-bond
Backbone
Amino acid
Protein Structures
ProGly Leu Ser
Proteins consist of a long chain ofamino acids, the primary structure
The constituent amino acids may encourage hydrogen bonding that form regular structures, called secondary structures
The secondary structures fold together to form a compact 3-dimensional shape, calledthe tertiary structure
-helix -sheet
The problem can be formulated as a global minimization problem, as it is assumed that the
tertiary structure occurs at the global minimum of the free energy function of the primary sequence
Ab Initio Approach
Our Goal: To provide an approach that relies more on physical principles than on information from known proteins
Ab Initio MethodTertiary structure is
believed to minimize potential energy:
Min VMM(x)where x = atom coordinates
Difficulties: Proposed energy function may not match natureO(en2) local minimaVery large parameter space
e.g., modestly sized protein100 amino acids~ 1,600 atoms~ 4,800 variables
The Search Algorithm
Given the amino acid sequence of aprotein, find the global minimum of
the free energy function.
GenerateStarting
Configurations
GlobalOptimization
Phase 1 Phase 2
Secondary Structure Predictions in Phase 1
SKIGIDGFGRIGRLVLRAALSCGAQ
SKIGIDGFGRIGRLVLRAALSCGAQCBBBB BCCCAAAAAAACCCBBBBBC1135522356789992888566733
Sequence:Type:
Weight:
Sequence:
Servers predict secondary structure likely to be in a target protein based on a large database of known proteins.
Matching the predicted strands is a combinatorial problem
Which strands are paired?
Which orientation?
? ??
parallel anti-parallel
Which residues are paired?
odd even
There are n!2 n-2 possible n-stranded motifs
96 motifs for n=4 960 motifs for n=5
It takes weeks tocreate some of theseconfigurations usingconstrained localminimizations!
Distribution of Beta Sheets in Proteins with Applications to Structure Prediction
Ruckzinski, Kooperberg, Bonneau, and Baker, Proteins 48,2002
CASP4 Competition
• Fourth community-wide experiment on the
Critical Assessment of Techniques for
Protein Structure Prediction (2000)
• Our group predicted 8 proteins
•Largest protein had 240 aa
•Most complex fold had 2 β-strands
ProteinShop• Interactive tool for protein manipulation• Designed to quickly create initial configurations
• It takes weeks to create a number of configurations using constrained minimizations
• It takes a few hours to create the same configurations with ProteinShop
Phase 1 with ProteinShop
Phase 1
Amino Acid Sequence
Phase 2
Initial Configurations
Final Configuration
2ndary StructurePrediction
GeometryGeneration
Structure Sequence
DirectManipulation
Pre-configuration
Initial Configurations
ProteinShoptakes minutes
CASP4 Competition (before ProteinShop)
CASP5 Competition (with ProteinShop)
•Our group predicted 20 proteins
•Largest protein had 417 aa
•Most complex fold had 13 β-strands
•Our group predicted 8 proteins
•Largest protein had 240 aa
•Most complex fold had 2 β-strands
Phase 2
Phase 1
Amino Acid Sequence
Phase2: GlobalOptimization
Initial Configurations
Final Configuration
SubspaceSelection
Initial Configurations
SubspaceOptimization
CandidateSelection
Final Configuration
Takes months to converge using hundreds of processors on Seaborg!
Phase 2 with ProteinShop
Phase 1
Amino Acid Sequence
Phase2: GlobalOptimization
Initial Configurations
Final Configuration
SubspaceSelection
Initial Configurations
SubspaceOptimization
CandidateSelection
Final Configuration
MonitoringSystem
DirectManipulation
Steering System
Will reduce computation time
Monitoring System• Monitor progress of overall optimization/each
optimization process
Monitoring System
• Monitor progress of overall optimization/each optimization process
• Alert user to important events during optimization• A sudden drop in internal energy• A group of processes getting stuck
• Test new heuristics for expanding nodes of the tree
Steering System
• Change configurations during optimization to account for developments not anticipated during Phase 1
• Manipulate proteins that don’t seem to be realistic or that are stuck in a local minimum
• Allow pruning of the optimization tree•Assign multiple processes to a configuration that just had a drop in internal energy•Assign stuck processes to other configurations
Plans for the FutureUse of the monitoring and steering features to develop and test a new method for protein structure prediction
Compete in CASP6 (Critical Assessment of Techniques for Protein Structure Prediction)
Expand and enhance ProteinShop
O. Kreylos, N. Max, B. Hamann,
S. Crivelli, and W. Bethel. Interactive Protein Manipulation, Winner of the Best Application
Award IEEE Visualization 2003, Seattle.
ProteinShop
Available to academic and non-profit organizations
proteinshop.lbl.gov