proteinshop: a tool for protein structure prediction and modeling silvia crivelli computational...

Post on 30-Dec-2015

215 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

ProteinShop: A Tool for Protein Structure Prediction and

Modeling

Silvia Crivelli

Computational Research Division Lawrence Berkeley National Laboratory

The Protein Structure Prediction Problem

To determine how proteins, the building

blocks of living cells, fold themselves into

three-dimensional shapes that define the

role they play in life.

Importance of Protein Structure Prediction

• The shape of a protein determines its function.• Knowledge of structure is used in many ways:

– Drug design– Design of synthetic proteins– Re-engineering defective proteins

• Genome projects are providing sequences for many proteins whose structure will need to be determined.

Protein Structures

ProGly Leu Ser

Proteins consist of a long chain ofamino acids, the primary structure

N

O H

RH

N

O H

R H

N

O H

RH

N

O H

R H

N

OH

R H

N

OH

R H

N

OH

R H

N

OH

R H

Side chain

H-bond

Backbone

Amino acid

Protein Structures

ProGly Leu Ser

Proteins consist of a long chain ofamino acids, the primary structure

The constituent amino acids may encourage hydrogen bonding that form regular structures, called secondary structures

The secondary structures fold together to form a compact 3-dimensional shape, calledthe tertiary structure

-helix -sheet

The problem can be formulated as a global minimization problem, as it is assumed that the

tertiary structure occurs at the global minimum of the free energy function of the primary sequence

Ab Initio Approach

Our Goal: To provide an approach that relies more on physical principles than on information from known proteins

Ab Initio MethodTertiary structure is

believed to minimize potential energy:

Min VMM(x)where x = atom coordinates

Difficulties: Proposed energy function may not match natureO(en2) local minimaVery large parameter space

e.g., modestly sized protein100 amino acids~ 1,600 atoms~ 4,800 variables

The Search Algorithm

Given the amino acid sequence of aprotein, find the global minimum of

the free energy function.

GenerateStarting

Configurations

GlobalOptimization

Phase 1 Phase 2

Secondary Structure Predictions in Phase 1

SKIGIDGFGRIGRLVLRAALSCGAQ

SKIGIDGFGRIGRLVLRAALSCGAQCBBBB BCCCAAAAAAACCCBBBBBC1135522356789992888566733

Sequence:Type:

Weight:

Sequence:

Servers predict secondary structure likely to be in a target protein based on a large database of known proteins.

Matching the predicted strands is a combinatorial problem

Which strands are paired?

Which orientation?

? ??

parallel anti-parallel

Which residues are paired?

odd even

There are n!2 n-2 possible n-stranded motifs

96 motifs for n=4 960 motifs for n=5

It takes weeks tocreate some of theseconfigurations usingconstrained localminimizations!

Distribution of Beta Sheets in Proteins with Applications to Structure Prediction

Ruckzinski, Kooperberg, Bonneau, and Baker, Proteins 48,2002

CASP4 Competition

• Fourth community-wide experiment on the

Critical Assessment of Techniques for

Protein Structure Prediction (2000)

• Our group predicted 8 proteins

•Largest protein had 240 aa

•Most complex fold had 2 β-strands

ProteinShop• Interactive tool for protein manipulation• Designed to quickly create initial configurations

• It takes weeks to create a number of configurations using constrained minimizations

• It takes a few hours to create the same configurations with ProteinShop

Phase 1 with ProteinShop

Phase 1

Amino Acid Sequence

Phase 2

Initial Configurations

Final Configuration

2ndary StructurePrediction

GeometryGeneration

Structure Sequence

DirectManipulation

Pre-configuration

Initial Configurations

ProteinShoptakes minutes

CASP4 Competition (before ProteinShop)

CASP5 Competition (with ProteinShop)

•Our group predicted 20 proteins

•Largest protein had 417 aa

•Most complex fold had 13 β-strands

•Our group predicted 8 proteins

•Largest protein had 240 aa

•Most complex fold had 2 β-strands

Phase 2

Phase 1

Amino Acid Sequence

Phase2: GlobalOptimization

Initial Configurations

Final Configuration

SubspaceSelection

Initial Configurations

SubspaceOptimization

CandidateSelection

Final Configuration

Takes months to converge using hundreds of processors on Seaborg!

Phase 2 with ProteinShop

Phase 1

Amino Acid Sequence

Phase2: GlobalOptimization

Initial Configurations

Final Configuration

SubspaceSelection

Initial Configurations

SubspaceOptimization

CandidateSelection

Final Configuration

MonitoringSystem

DirectManipulation

Steering System

Will reduce computation time

Monitoring System• Monitor progress of overall optimization/each

optimization process

Monitoring System

• Monitor progress of overall optimization/each optimization process

• Alert user to important events during optimization• A sudden drop in internal energy• A group of processes getting stuck

• Test new heuristics for expanding nodes of the tree

Steering System

• Change configurations during optimization to account for developments not anticipated during Phase 1

• Manipulate proteins that don’t seem to be realistic or that are stuck in a local minimum

• Allow pruning of the optimization tree•Assign multiple processes to a configuration that just had a drop in internal energy•Assign stuck processes to other configurations

Plans for the FutureUse of the monitoring and steering features to develop and test a new method for protein structure prediction

Compete in CASP6 (Critical Assessment of Techniques for Protein Structure Prediction)

Expand and enhance ProteinShop

O. Kreylos, N. Max, B. Hamann,

S. Crivelli, and W. Bethel. Interactive Protein Manipulation, Winner of the Best Application

Award IEEE Visualization 2003, Seattle.

ProteinShop

Available to academic and non-profit organizations

proteinshop.lbl.gov

top related