protein structure prediction. protein structure u amino-acid chains can fold to form 3-dimensional...

Post on 18-Jan-2016

212 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

.

Protein Structure Prediction

Protein Structure

Amino-acid chains can fold to form 3-dimensional structures

Proteins are sequencesthat have (more or less) stable 3-dimensional configuration

Why Structure is Important?

The structure a protein takes is crucial for its function Forms “pockets” that can recognize an enzyme

substrate Situates side chain of

specific groups to co-locate to form areas with desired chemical/electrical properties

Creates firm structures such ascollagen, keratins, fibroins

Determining Structure

X-Ray and NMR methods allow to determine the structure of proteins and protein complexes

These methods are expensive and difficult Could take several work months to process one

proteins

A centralized database (PDB) contains all solved protein structures

XYZ coordinate of atoms within specified precision

~19,000 solved structures

Growth of the Protein Data Bank

Structure is Sequence Dependent

Experiments show that for many proteins, the 3-dimensional structure is a function of the sequence

Force the protein to loose its structure, by introducing agents that change the environment

After sequences put back in water, original conformation/activity is restored

However, for complex proteins, there are cellular processes that “help” in folding

Amino Acids

What Forces Hold the Structure?

Structure is supported by several types of chemical bonds/forces

Hydrogen Bonds

What Forces Hold the Structure?

Charge-charge interactions Positive charged groups prefer to be situated

against negatively charged groups

What Forces Hold the Structure?

Disulfide bonds S-S bonds between

cysteine residues These form during

folding

What Forces Hold the Structure?

Hydrophobic effect

Levels of structure

Secondary Structure

-helix -strands

Hydrogen Bonds in -Helixes

-Strands form Sheets

parallel Anti-parallel

These sheets hold together by hydrogen bonds across strands

Angular Coordinates

Secondary structures force specific angles between residues

Ramachandran Plot

We can related angles to types of structures

Labeling Secondary Structure

Using both hydrogen bond patterns and angles, we can label secondary structure tags from XYZ coordinate of amino-acids

These do not lead to absolute definition of secondary structure

Prediction of Secondary Structure

Input: amino-acid sequence

Output: Annotation sequence of three classes:

alpha beta other (sometimes called coil/turn)

Measure of success: Percentage of residues that were correctly labeled

Protein Folds: sequential, spatial and topological arrangement of

secondary structures

The Globin foldThe Globin fold

Approaches for structure prediction

Homology modeling (25-30% identity as a predictor)

Fold recognition Remote homology

Ab initio Prediction Heavy computations

Newly Determined Structures-Fraction of New Folds

Fraction of new folds (PDB new entries in 1998)

Koppensteiner et al., 2000,Koppensteiner et al., 2000,JMB 296:1139-1152.JMB 296:1139-1152.

A Finite Number of Protein Folds

Aim: recognize fold that “matches” a given sequence

Approaches: PSI-Blast, Profile HMMs, etc. Threading

EEabab A C D E …..

A -3 -1 0 0 ..C -1 -4 1 2 ..D 0 1 5 6 ..E 0 2 6 7 ... . . . .

ACCECADAAC -3-1-4-4-1-4-3-3=-23

• structural templatestructural template

• neighbor definitionneighbor definition

• energy functionenergy function

11

22

33

44

55

66

77

1010

88

99

AA

CC

CC

EE

CC

AA

DDAA

AA

CC

E Eji, positions

ba ji

Threading: Essential components

MAHFPGFGQSLLFGYPVYVFGD...

Potential fold

...

1) ... 56) ... n)

...

-10 ... -123 ... 20.5

Find best fold for a protein sequence:

Fold recognition (threading)

GenTHREADER(Jones , 1999, JMB 287:797-815)

For each template provide MSA align the query sequence with the MSA assess the alignment by sequence alignment

score assess the alignment by pairwise potentials assess the alignment by solvation function record lengths of: alignment, query, template

Essentials of GenTHREADER

Ab-initio Structure Recognition

Goal: Predict structure from “first principles”

Benefits: Works for novel folds Shows that we understand the process

Approaches to Ab-initio Prediction

Molecular Dynamics Simulates the forces that governs the protein within

water Since proteins natural fold, this would lead to

solved structure

Problems: Thousands of atoms Huge number of time steps to reach folded protein

Intractable problem

Approaches to Ab-initio Prediction

Minimal Energy Assumption: folded form is the minimal energy

conformation of the protein

Decomposition: Define energy function Search for 3-D conformation that minimize energy

Energy Function

Account for the forces that apply on the molecule Van der wals forces Covalent bonds Hydrogen bonds Charges Hydrophobic effects

Issues: Estimating parameters How do we compute it --- O( (# atoms)^2 )

Simplified Energy Functions

Different levels of granularity Residue-Residue energy function (Bead model)

Partial model Backbone as a bid Side-chain as a rigid body that can move wrt to

backbone

Many other variants

Search Strategy

High dimensional search problem

How do we represent partial solutions?

Position of each atom (too detailed!) Position of each reside (too coarse!) Intermediate solutions (e.g., backbone and side

chain)

Search Strategy

Representation tradeoffs

X,Y,Z coordinates Easy to compute distances between residues Might represent infeasible solutions

Angles between successive residues Easy to ensure a “legal” protein Harder to compute distances

Search Strategy

Typical approach: Secondary structure prediction Attempts at different conformation keeping

secondary structure fixed Finer moves relaxing secondary structure

Use Greedy search Simulated annealing …

Rosetta Method

Idea: “Structural” signatures are reoccurring within

protein structures Use these as cues during structure search

Local structure motifs

diverging type-2 turn

Serine hairpin Type-I hairpin

Frayed helix

Proline helix C-capalpha-alpha corner

glycine helix N-cap

I-sites Library = a catalog of local sequence-structure correlations

Example: Non-polar Alpha-helix

Example: Non-polar beta-strand

Example: Gly alpha-C-cap Type 1

Construction of I-sites library

Construct profiles (PSI-BLAST like) for each solved structure

Collect each possible segments of fixed length(len = 3, 9, 15)

Perform k-means clustering of segments Check each cluster for a “coherent” structure (in

terms of dihedral angles Prune incoherent structures Iteratively refine remaining clusters by removing

structurally different segments, redefining cluster membership, etc.

All proteins can be constructed from fragments

Recent experiment:

For representative proteins, backbones were assembled from a library of 1000 different 5-residue fragments.

Fragment insertion Monte Carlo

Energyfunctionchange backbone

angles

Convert to 3D

accept or reject

Choose a fragment

frag

men

ts

backbone torsion angles

Rosetta: a folding simulation program

evaluate

Sequence dependent features

Rosetta’s energy function

Residue-residue contact energies are derived from the database

Current structure

Sequence-independent features

The energy score for a contact between secondary structures is summed using database statistics.

vector representationProbabilities from the database

Rosetta’s energy function

Rosetta prediction results

61% “topologically correct”

60% “locally correct”

73% secondary structure (Q3) correct

http://www.bioinfo.rpi.edu/~bystrc/hmmstr/server.php

Evaluation of partially correct predictions

RM

SD

L=30

L=20

L=8

6.0Å

Sequence

Tertiary structure %correct is the fraction of the sequence that is in a 30-residue window with RMSD < 6.0Å

MD

AL=windowsize

Ter

iary

str

uct

ure

Loc

al s

tru

ctu

re

mda = maximum deviation in backbone angles over an 8 residue window.

Local structure %correct is the fraction of the sequence that has mda < 90°.

90°

Sequence

T0116 262-322 (61 residues)

prediction true structure

Topologically correct (rmsd=5.9Å) but helix is mis-predicted as loop.

T0121 126-199 (66 residues)

prediction true structure

Topologically correct (rmsd=5.9Å) but loop is mis-predicted as helix.

T0122 57-153 (97 residues)

...contains a 53 residue stretch with max deviation = 96°

prediction true structure

T0112 153-213

Low rmsd (5.6Å) and all angles correct ( mda = 84°), but topologically wrong!!

prediction true structure

(this is rare)

top related