an optimization approach to protein structure prediction

31
An Optimization Approach to Protein Structure Prediction Richard Byrd Betty Eskow Robert Schnabel Brett Bader Lianjun Jiang University of Colorado Teresa Head-Gordon Univ. of California, Berkeley Silvia Crivelli Lawrence Berkeley Laboratory

Upload: marion

Post on 14-Feb-2016

40 views

Category:

Documents


0 download

DESCRIPTION

An Optimization Approach to Protein Structure Prediction. Richard Byrd Betty Eskow Robert Schnabel Brett Bader Lianjun Jiang University of Colorado Teresa Head-Gordon Univ. of California, Berkeley Silvia Crivelli Lawrence Berkeley Laboratory. Problem Definition. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: An Optimization Approach to Protein Structure Prediction

An Optimization Approach to Protein Structure Prediction

Richard ByrdBetty Eskow

Robert SchnabelBrett Bader

Lianjun JiangUniversity of Colorado

Teresa Head-GordonUniv. of California, Berkeley

Silvia CrivelliLawrence Berkeley Laboratory

Page 2: An Optimization Approach to Protein Structure Prediction

Problem DefinitionPredict the 3-dimensional shape, or

native state, of a protein given itssequence of constituent amino acids.

Assuming the native state of a protein corresponds to its minimum free energy state, use a global

optimization method to find the minimum energy configuration of the target protein.

Approach

Page 3: An Optimization Approach to Protein Structure Prediction

Importance of Protein Folding• 3-Dimensional structure useful in

molecular drug design.

• Laboratory experiments are expensive:– X-ray crystallography– NMR

• Genome projects are providing sequences for many proteins whose structure will need to be determined.

Page 4: An Optimization Approach to Protein Structure Prediction

Protein Structures

ProGly Leu Ser

Proteins consist of a long chain ofamino acids called the primary structure.

The constituent amino acids may encourage hydrogen bonding and form regular structures, called secondary structures.

The secondary structures fold together to form a compact 3-dimensional or tertiary structure.

-helix -sheet

Page 5: An Optimization Approach to Protein Structure Prediction

Chemistry of Proteins

N

O H

RH

N

O H

R H

N

O H

RH

N

O H

R H

N

OH

R H

N

OH

R H

N

OH

R H

N

OH

R H

Side chain

H-bond

Backbone

Amino acid

Hydrogen bonds strongly influence a protein’s shape. They largely occur in secondary structures and help hold the protein together.

Page 6: An Optimization Approach to Protein Structure Prediction

Computational Approaches to Protein Structure Prediction

• Comparative Modeling– Compares and aligns to a known protein sequence of

amino acids• Fold Recognition

– Searches for the best fitting fold template from a library of known protein folds

• New Fold Methods– Not based on knowledge of complete protein

sequences or folds– e.g. energy minimization

Page 7: An Optimization Approach to Protein Structure Prediction

Global Optimization ProblemThe 3-dimensional structure of the protein found in nature is

believed to minimize potential energy:

Min V(x)where x = atom coordinates

Challenges:• O(en2) local minima• Very large parameter space

e.g., modestly sized protein• 100-300 amino acids• ~ 1,600 atoms• ~ 4,800 variables

• Model of the energy surface may not match nature

Page 8: An Optimization Approach to Protein Structure Prediction

Amber Energy FunctionV(x) =

bondscl(b b0)2 (b = bond length)

+

bond anglesca( 0)2 ( = bond angle)

+

dihedral anglescd[1 + cos(n +)] ( = dihedral angle)

+

charged pairs (rij = distance)

+ nonbonded pairs

cwrij) ( = Lennard-Jones potential)

ij

ji

DrQQ

Internal coordinates are determined using bonds, bond angles and dihedral angles

Internal coordinates are determined using bonds, bond anglesand dihedral angles.

Page 9: An Optimization Approach to Protein Structure Prediction

Additional energy terms to model protein behavior in an aqueous environment

• Formulated from simulations of pairs of hydrophobic molecules in water

• ESOLVATION =

• Advantages of this model:– Provides stabilizing force for forming hydrophobic

cores.– Well defined model of the hydrophobic effect of small

hydrophobic groups in water.– Computationally tractable and differentiable

w

crk

kijhcNji Mk

k

)(2

,

exp

i,j are aliphatic carbons, M Gaussians with position(ck ), depth(hk) and width(wk) describe 2 minima: (1) molecules in contact and (2)mol-ecules separated by a distance of 1 water molecule.

Page 10: An Optimization Approach to Protein Structure Prediction

Global Optimization Approaches

• Deterministic methods– Branch and bound, interval methods– Very reliable, deterministic guarantees– Too expensive for more than 20-50 variables

• Stochastic methods– Random steps or sampling – Probabilistic guarantees– Practical for < 300 variables

• Heuristic search– e.g. Simulated annealing, Tabu search, Genetic algorithms– Effective on some very large problems– No practical guarantees

Page 11: An Optimization Approach to Protein Structure Prediction

A Stochastic-Perturbation Global Optimization Approach

• Generate and maintain a pool of candidates (configurations), as in genetic algorithms.

• Solve the full-dimensional problem as a series of small-dimensional ones.

• Use protein database information to bias toward likely substructures.

Page 12: An Optimization Approach to Protein Structure Prediction

Algorithm Phases

Simplify problem by utilizing domain-specific knowledge

Given the amino acid sequence of aprotein, find the 3-dimensional

structure likely to be found in nature.

GenerateInitial

Population

GlobalOptimization

Phase 1 Phase 2

Page 13: An Optimization Approach to Protein Structure Prediction

Phase 1: Create Initial Population

• Submit amino acid sequence to server:• EFIAIYDYKAETEEDLTIKKGEKLEIIEKEGDWWKAKAIGSGEIGY• IPANYIAAAE

• Use server predictions to determine the location of α-helices, β-strands, and coils :

• CCCCHHHHHHEEEEEEEEEEEECCEEEEEEEEEEEHHHHHHHHCCC– HHHHHHCCCC

• Use ProteinShop visualization tool to form configurations with secondary structure:

• Assign ideal values to the dihedral angles in the sequence according to the predictions. Manipulate β-strands to form β-sheets.

Perform Energy Minimizations

Page 14: An Optimization Approach to Protein Structure Prediction

Phase 2:Improve Local MinimaSelect a protein and a subset ofdihedral angles

Small-scale globaloptimization

Full-dimensionallocal optimization

itera

te

• Uses a combination of breadth-first and depth-first searches from initial pool

• Dihedral angles act as “internal coordinates” and reduce the number of variables, speeding an optimization run

Cluster minima and test stopping criteria

Page 15: An Optimization Approach to Protein Structure Prediction

Small Scale Global Optimization in Phase 2

• Minimize energy over 5-20 torsion angles’• Use a stochastic global optimization

algorithm base on sampling, sample pruning and local minimization (Rinooy-Kan et al).

• From best start points, do local minimizations using quasi-Newton

Page 16: An Optimization Approach to Protein Structure Prediction

Full-scale local minimizations

• Using best points from small-scale global, do local minimizations.

• Because of problem size we use limited-memory quasi-Newton.

• Best local minimizers are added to pool.

Page 17: An Optimization Approach to Protein Structure Prediction

Biasing functions

• Used to form secondary structure during in first phase and sometimes infull-dimensional local minimizations.

• Dihedral angle biasing:E= dihedrals k [1 – cos( - 0)] + k[1 – cos( - 0)] • Hydrogen Bond biasing

– For -helices:EHB= wiwi+4 / Dri,i+4 (w’s are weights from the server for

residues i and i+4 in the helix)– To form -sheets from -strands:EHB= wiwj / Dri,j

Page 18: An Optimization Approach to Protein Structure Prediction

Neural Network PredictionsSKIGIDGFGRIGRLVLRAALSCGAQ

SKIGIDGFGRIGRLVLRAALSCGAQ BBBB B AAAAAAA BBBBB 13552 6789992 56673

Sequence:Type:

Weight:

Sequence:

Neural nets trained on a large database of proteins can predict secondary structure likely to be in a target protein.

Page 19: An Optimization Approach to Protein Structure Prediction

Forming β-sheets from the predicted -strands is a combinatorial problem.

Which strands are paired?

Which orientation?

? ??

parallel anti-parallel

Which residues are paired?

odd even

Page 20: An Optimization Approach to Protein Structure Prediction

Distribution of Beta Sheets in Proteins with Applications to Structure Prediction

Ruckzinski, Kooperberg, Bonneau, and Baker Proteins 48, 2002

Page 21: An Optimization Approach to Protein Structure Prediction

Parallel Organization• Select k subsets of dihedral angles

• Maintain a queue of (configuration,subspace) for k optimization crews to work on

• Each optimization crew performs a small-scale global optimization of its assigned configuration and subspace.

• Gather intermediate results and re-insert them into the work queue. Idle optimization crews do full-dimensional local minimizations or additional small-scale global optimization.

Massively parallel exploration of optimization space Automatic load balancing

Page 22: An Optimization Approach to Protein Structure Prediction

2UTG_A: 7.5Å R.M.S.D. from Crystal

1POU: 6.3Å R.M.S.D. from NMR structure

Page 23: An Optimization Approach to Protein Structure Prediction

Community-wide experiment on the Critical Assessment of Techniques for Protein Structure Prediction

Protein crystallographers and NMR spectroscopists provide structures prior to their publication for blind prediction by participants.

Biannual competition open to all computational methods – including servers.

Difficulty of targets assessed by which type of methods work to predict the structure – CM, FR, NF.

We participated in CASP4 (Dec. 2000) and CASP5 (Dec. 2002).

CASP competition

Page 24: An Optimization Approach to Protein Structure Prediction

Our submitted CASP4 models ranked by target difficulty and relative accuracy

Page 25: An Optimization Approach to Protein Structure Prediction

Results on Phospholipase C beta C-terminus, turkey (containing 242 amino acids). Ribbon structure comparison between experiment (center), submitted M1 prediction (right), our lowest energy submission, had an RMSD with experiment of 8.46Å, and next generation run of the global optimization algorithm (left). This new run lowered the energy of our previous best minimizer, resulting in a new structure with an RMSD of 7.7Å.

Page 26: An Optimization Approach to Protein Structure Prediction

CASP4 Results Summary Best structure predicted on one of the hardest

targets Our method is more effective than some

knowledge-based methods on targets for which less information from known proteins is available.

Global optimization algorithm is very effective at improving structures from a small initial population.

Page 27: An Optimization Approach to Protein Structure Prediction

Our submitted CASP5 models ranked by target difficulty and relative accuracy

Page 28: An Optimization Approach to Protein Structure Prediction

Our submitted CASP5 models of targets (domains) that were assessed in the CASP5 NEW FOLD category.

Page 29: An Optimization Approach to Protein Structure Prediction

Our submissions for CASP5 Target 162

Page 30: An Optimization Approach to Protein Structure Prediction

CASP5 Results Summary• Ranked ~15/165 groups in assessments

of New Fold (and NF/FR) Results.• Our method uses less knowledge from

known protein structures than most other (New Fold) methods participating in CASP5

• More diverse starting populations (especially for -sheet proteins) using the visualization tool led to better performance in some cases.

Page 31: An Optimization Approach to Protein Structure Prediction

Future Research Directions

• Simpler energy models for early stages of the algorithm, and alternative models of solvation.

• New techniques for choosing -strand pairings.

• Improve our techniques for maintaining existing secondary structure in our models.