proteinshop: a tool for protein structure prediction and modeling silvia crivelli computational...

27
ProteinShop: A Tool for Protein Structure Prediction and Modeling Silvia Crivelli Computational Research Division Lawrence Berkeley National Laboratory

Upload: thomasine-parsons

Post on 30-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ProteinShop: A Tool for Protein Structure Prediction and Modeling Silvia Crivelli Computational Research Division Lawrence Berkeley National Laboratory

ProteinShop: A Tool for Protein Structure Prediction and

Modeling

Silvia Crivelli

Computational Research Division Lawrence Berkeley National Laboratory

Page 2: ProteinShop: A Tool for Protein Structure Prediction and Modeling Silvia Crivelli Computational Research Division Lawrence Berkeley National Laboratory

The Protein Structure Prediction Problem

To determine how proteins, the building

blocks of living cells, fold themselves into

three-dimensional shapes that define the

role they play in life.

Page 3: ProteinShop: A Tool for Protein Structure Prediction and Modeling Silvia Crivelli Computational Research Division Lawrence Berkeley National Laboratory

Importance of Protein Structure Prediction

• The shape of a protein determines its function.• Knowledge of structure is used in many ways:

– Drug design– Design of synthetic proteins– Re-engineering defective proteins

• Genome projects are providing sequences for many proteins whose structure will need to be determined.

Page 4: ProteinShop: A Tool for Protein Structure Prediction and Modeling Silvia Crivelli Computational Research Division Lawrence Berkeley National Laboratory

Protein Structures

ProGly Leu Ser

Proteins consist of a long chain ofamino acids, the primary structure

N

O H

RH

N

O H

R H

N

O H

RH

N

O H

R H

N

OH

R H

N

OH

R H

N

OH

R H

N

OH

R H

Side chain

H-bond

Backbone

Amino acid

Page 5: ProteinShop: A Tool for Protein Structure Prediction and Modeling Silvia Crivelli Computational Research Division Lawrence Berkeley National Laboratory

Protein Structures

ProGly Leu Ser

Proteins consist of a long chain ofamino acids, the primary structure

The constituent amino acids may encourage hydrogen bonding that form regular structures, called secondary structures

The secondary structures fold together to form a compact 3-dimensional shape, calledthe tertiary structure

-helix -sheet

Page 6: ProteinShop: A Tool for Protein Structure Prediction and Modeling Silvia Crivelli Computational Research Division Lawrence Berkeley National Laboratory

The problem can be formulated as a global minimization problem, as it is assumed that the

tertiary structure occurs at the global minimum of the free energy function of the primary sequence

Ab Initio Approach

Our Goal: To provide an approach that relies more on physical principles than on information from known proteins

Page 7: ProteinShop: A Tool for Protein Structure Prediction and Modeling Silvia Crivelli Computational Research Division Lawrence Berkeley National Laboratory

Ab Initio MethodTertiary structure is

believed to minimize potential energy:

Min VMM(x)where x = atom coordinates

Difficulties: Proposed energy function may not match natureO(en2) local minimaVery large parameter space

e.g., modestly sized protein100 amino acids~ 1,600 atoms~ 4,800 variables

Page 8: ProteinShop: A Tool for Protein Structure Prediction and Modeling Silvia Crivelli Computational Research Division Lawrence Berkeley National Laboratory

The Search Algorithm

Given the amino acid sequence of aprotein, find the global minimum of

the free energy function.

GenerateStarting

Configurations

GlobalOptimization

Phase 1 Phase 2

Page 9: ProteinShop: A Tool for Protein Structure Prediction and Modeling Silvia Crivelli Computational Research Division Lawrence Berkeley National Laboratory

Secondary Structure Predictions in Phase 1

SKIGIDGFGRIGRLVLRAALSCGAQ

SKIGIDGFGRIGRLVLRAALSCGAQCBBBB BCCCAAAAAAACCCBBBBBC1135522356789992888566733

Sequence:Type:

Weight:

Sequence:

Servers predict secondary structure likely to be in a target protein based on a large database of known proteins.

Page 10: ProteinShop: A Tool for Protein Structure Prediction and Modeling Silvia Crivelli Computational Research Division Lawrence Berkeley National Laboratory

Matching the predicted strands is a combinatorial problem

Which strands are paired?

Which orientation?

? ??

parallel anti-parallel

Which residues are paired?

odd even

Page 11: ProteinShop: A Tool for Protein Structure Prediction and Modeling Silvia Crivelli Computational Research Division Lawrence Berkeley National Laboratory

There are n!2 n-2 possible n-stranded motifs

96 motifs for n=4 960 motifs for n=5

It takes weeks tocreate some of theseconfigurations usingconstrained localminimizations!

Distribution of Beta Sheets in Proteins with Applications to Structure Prediction

Ruckzinski, Kooperberg, Bonneau, and Baker, Proteins 48,2002

Page 12: ProteinShop: A Tool for Protein Structure Prediction and Modeling Silvia Crivelli Computational Research Division Lawrence Berkeley National Laboratory

CASP4 Competition

• Fourth community-wide experiment on the

Critical Assessment of Techniques for

Protein Structure Prediction (2000)

• Our group predicted 8 proteins

•Largest protein had 240 aa

•Most complex fold had 2 β-strands

Page 13: ProteinShop: A Tool for Protein Structure Prediction and Modeling Silvia Crivelli Computational Research Division Lawrence Berkeley National Laboratory

ProteinShop• Interactive tool for protein manipulation• Designed to quickly create initial configurations

• It takes weeks to create a number of configurations using constrained minimizations

• It takes a few hours to create the same configurations with ProteinShop

Page 14: ProteinShop: A Tool for Protein Structure Prediction and Modeling Silvia Crivelli Computational Research Division Lawrence Berkeley National Laboratory

Phase 1 with ProteinShop

Phase 1

Amino Acid Sequence

Phase 2

Initial Configurations

Final Configuration

2ndary StructurePrediction

GeometryGeneration

Structure Sequence

DirectManipulation

Pre-configuration

Initial Configurations

ProteinShoptakes minutes

Page 15: ProteinShop: A Tool for Protein Structure Prediction and Modeling Silvia Crivelli Computational Research Division Lawrence Berkeley National Laboratory
Page 16: ProteinShop: A Tool for Protein Structure Prediction and Modeling Silvia Crivelli Computational Research Division Lawrence Berkeley National Laboratory
Page 17: ProteinShop: A Tool for Protein Structure Prediction and Modeling Silvia Crivelli Computational Research Division Lawrence Berkeley National Laboratory
Page 18: ProteinShop: A Tool for Protein Structure Prediction and Modeling Silvia Crivelli Computational Research Division Lawrence Berkeley National Laboratory
Page 19: ProteinShop: A Tool for Protein Structure Prediction and Modeling Silvia Crivelli Computational Research Division Lawrence Berkeley National Laboratory
Page 20: ProteinShop: A Tool for Protein Structure Prediction and Modeling Silvia Crivelli Computational Research Division Lawrence Berkeley National Laboratory

CASP4 Competition (before ProteinShop)

CASP5 Competition (with ProteinShop)

•Our group predicted 20 proteins

•Largest protein had 417 aa

•Most complex fold had 13 β-strands

•Our group predicted 8 proteins

•Largest protein had 240 aa

•Most complex fold had 2 β-strands

Page 21: ProteinShop: A Tool for Protein Structure Prediction and Modeling Silvia Crivelli Computational Research Division Lawrence Berkeley National Laboratory

Phase 2

Phase 1

Amino Acid Sequence

Phase2: GlobalOptimization

Initial Configurations

Final Configuration

SubspaceSelection

Initial Configurations

SubspaceOptimization

CandidateSelection

Final Configuration

Takes months to converge using hundreds of processors on Seaborg!

Page 22: ProteinShop: A Tool for Protein Structure Prediction and Modeling Silvia Crivelli Computational Research Division Lawrence Berkeley National Laboratory

Phase 2 with ProteinShop

Phase 1

Amino Acid Sequence

Phase2: GlobalOptimization

Initial Configurations

Final Configuration

SubspaceSelection

Initial Configurations

SubspaceOptimization

CandidateSelection

Final Configuration

MonitoringSystem

DirectManipulation

Steering System

Will reduce computation time

Page 23: ProteinShop: A Tool for Protein Structure Prediction and Modeling Silvia Crivelli Computational Research Division Lawrence Berkeley National Laboratory

Monitoring System• Monitor progress of overall optimization/each

optimization process

Page 24: ProteinShop: A Tool for Protein Structure Prediction and Modeling Silvia Crivelli Computational Research Division Lawrence Berkeley National Laboratory

Monitoring System

• Monitor progress of overall optimization/each optimization process

• Alert user to important events during optimization• A sudden drop in internal energy• A group of processes getting stuck

• Test new heuristics for expanding nodes of the tree

Page 25: ProteinShop: A Tool for Protein Structure Prediction and Modeling Silvia Crivelli Computational Research Division Lawrence Berkeley National Laboratory

Steering System

• Change configurations during optimization to account for developments not anticipated during Phase 1

• Manipulate proteins that don’t seem to be realistic or that are stuck in a local minimum

• Allow pruning of the optimization tree•Assign multiple processes to a configuration that just had a drop in internal energy•Assign stuck processes to other configurations

Page 26: ProteinShop: A Tool for Protein Structure Prediction and Modeling Silvia Crivelli Computational Research Division Lawrence Berkeley National Laboratory

Plans for the FutureUse of the monitoring and steering features to develop and test a new method for protein structure prediction

Compete in CASP6 (Critical Assessment of Techniques for Protein Structure Prediction)

Expand and enhance ProteinShop

Page 27: ProteinShop: A Tool for Protein Structure Prediction and Modeling Silvia Crivelli Computational Research Division Lawrence Berkeley National Laboratory

O. Kreylos, N. Max, B. Hamann,

S. Crivelli, and W. Bethel. Interactive Protein Manipulation, Winner of the Best Application

Award IEEE Visualization 2003, Seattle.

ProteinShop

Available to academic and non-profit organizations

proteinshop.lbl.gov