h-bonds d-h … a assignment: if d…a < 3.7 Å in crystal structure. (normally 2.7-3.1 Å)....

H-bonds

D-H … A Assignment: if D…A < 3.7 Å in crystal

structure. (normally 2.7-3.1 Å). Energy of stablization: -12~-40 kJ/mol) Tends to be linear. Only weakly stabilize proteins. (!?)

A survey over H-bonds in globular proteins (J. Mol. Biol. (1992) 226, 1143)

Local H-bonds? The authors made

this conclusion: “Most H-bonds are local.” Should be more

critically reviewed. Most H-bonds are

between beckbone atoms.

Source: K. Schulten GroupUniversity of Illinois Urbana-Champaign

Protein folding, dynamics and structural evolution

Chapter 9

Questions

How does a peptide sequence find its native, functional conformation? Is there a set of fundamental principles?

We discussed several factors determining protein structure. Can we *predict* the structure, from sequence information yet?

Determinants of Protein Folding We will discuss the following factors, as listed in V&V

chapter 9. Space Packing. Directed mainly by internal residues Protein structures are hierarchically organized. Protein structures are highly adaptable. Secondary structure can be context dependent. Changing the fold of a protein.

Still, keep in mind that most of them are based from observations and deductions. Exceptions are possible. Whether the statistical methods/criteria are acceptable is

another question.

Compactness

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Proteins are like liquids and glasses, instead of crystalline solids.

The reverse statement is, does compactness serve as a factor determining protein structure?





Compactness helps, but not enough.



Folding is directed mainly by internal residues

Mutations that change surface residues are accepted more frequently and are less likely to affect protein conformations than are changes of internal residues.

This is consistent with the idea of Hydrophobic force-driven folding.

Determinants of Protein Folding Space Packing. Directed mainly by internal residues Protein structures are hierarchically

organized. Protein structures are highly adaptable. Secondary structure can be context

dependent. Changing the fold of a protein.

Protein structures are quite “resistant” to mutations

A large number of single residue mutations do not yield a very different structure.

A complete study was done in phage T4 lysozyme by B. W. Matthews.

Homologous proteins comes with some sequence identity and they are often structurally similar.


are needed to see this picture.QuickTime™ and a

TIFF (LZW) decompressorare needed to see this picture.

Voe

t Bio

chem

istr

y 3e

© 2

004

John

Wile

y &

Son

s, In

c.

Table 9-1 Propensities and Classifications of Amino Acid Residues for Helical and Sheet Conformations.

Pag

e 30

0

Secondary structure

Secondary structure prediction can be done with more sophisticated algorithms. Artificial intelligence such as neuronal

networks or support vector machines. Basically look at a local sequence and

recongnize its pattern. Usually such methods need a training set.

I.e. knowledge-based methods.

Voe

t Bio

chem

istr

y 3e

© 2

004

John

Wile

y &

Son

s, In

c.

Figure 9-6 NMR structure of protein GB1.

Pag

e 28

0

• Green: residues 23-33.• Cyan: residues 42-53.• Chm-alpha: a new

sequence replaces green part.

• Chm-beta: the same new sequence replaced cyan part.

• Both are structurally similar to native GB1.

• The same sequence can be either an alpha helix or a beta sheet structure, depending on their context.

Homology

Proteins that share some sequence identity may be structurally similar. One evidence that support evolution. Proteins with as little as 20% sequence

identity may have similar structure. How much should be changed for a

protein to assume a different structure?

Voe

t Bio

chem

istr

y 3e

© 2

004

John

Wile

y &

Son

s, In

c.

Figure 9-7 X-Ray structure of Rop protein, a homodimer of motifs that associate to form a 4-helix bundle

Pag

e 28

1

• GB1 and Rop are structurally different.

• 50% of the residues of GB1 is changed, yielding a new polypeptide that assumes Rop-like structure.

• This new peptide has 41% sequence identity with native Rop.

• The idea of Protein design and engineering.

Protein Folding Levinthal’s paradox

If for each residue there are only two degrees of freedom (,).

Assume each can have only 3 stable values. This leads to 32n possible conformations. If a protein can explore 1013 conformation per

second. (10 per picosecond). Still requires an astronomical amount of time to fold a

protein. This is impossible. So protein must fold in a

way that does not randomly explore each possible conformations.

Molten Globule Much of the secondary structure that is present in

a native proteins forms within a few milliseconds. This is called hydrophobic collapse. Something called “Molten Globule”.

Slightly (5-15% in radius) larger than native conformation.

Significant amount of secondary structure formed. Side chains are still not ordered/packed. Structure fluctuation is much larger. Not very

thermodynamically stable.

Are proteins sticky tapes? Are they simply

hetereopolymers that like to form H-bond, hydrophobic interactions with each other?

Proteins are not any random hetero-polymers

• By observation: – every protein has a very stable native structure,– while polymers are usually random in their

conformation.

• Interesting observation for simple models: the “designability”. In the following materials are from:– R. Helling et al., “The designability of protein structures”, J. Mol.

Graphics and Modelling, 19, 157, (2001).

– J. Miller et al.,“Emergence of highly designable protein-backbone conformations in an off-lattice model” Proteins, 47, 506 (2002).

– Steven S. Plotkin and Jose N. Onuchic, “Understanding protein folding with energy landscape theory Part I : Basic concepts” Quart. Rev. of Biophys., 35, 2 (2002), 111.

A 3D lattice HP model

• Enumerate all 227 possible sequences.

• Each sequence has a lowest energy structure.

• Some sequences share the same structure. Count the number of sequence per unique structure NS.

• Plot the distribution of NS.

• Assuming only two kinds of residule H and P.

• Well-studied before.EHH=−2.3, EHP=−1, EPP=0

• (a) Histogram of NS for the 3 × 3 × 3 system. (b) Average energy gap between the ground state and the first excited state versus NS for the 3 × 3 × 3 system.

NS: Number of seq. corresponding to a structure S

3794 different sequences share ONE structure!

On the average, these structure have large energy gaps between the lowest energy structure and the next lowest one.

Some structures are very “popular” for many different sequences

NS: Number of seq. corresponding to a structure S

3794 different sequences share ONE structure!

On the average, these structure have large energy gaps between the lowest energy structure and the next lowest one.

Such a property does not depend on the model used

• Very similar behavior are seen in 2D 6×6 HP model and in 2D or 3D models with 20 different amino acids.

Off the lattice: 23mer 3 state model

Zinc finger of 1PSV

Off-lattice model: results

• a: Backbone configuration of the 11th most designable 23-mer structure

• b: Backbone configuration of the zinc finger 1NC8, truncated to 23 amino acids.

What does it mean?

Sequence Structure

The energy landscape

Proteins are in a special subset of heteropolymers

• Such that the number of possible structures are greatly reduced.

• Evolution!• Therefore protein structure prediction is not as hard as it

appears. (still a hard problem though..)• That also explains why knowledge-based methods

works.• Nevertheless, the tools developed offers valuable clues

for the structure of a new protein.

Computer Simulation

• Goals:1) Structure prediction: From primary sequences to tertiary

structures (so that we can infer its function)

2) Known structure (from X-Ray or NMR or another simulation). Want: dynamics (how it moves at room temperature, with a ligand, or with a mutation).

• (1) above is difficult but do-able. We will discuss about some of the methods.

• (2) is often done with the same methods developed for (1).

Computers

• Deals with numbers and logical operations.

• Needs some “principles” (written in mathematical equations).

• For protein simulations there are different approaches:

1. Physics-based2. Knowledge-based

Physics

• Laws for particles moving and interaction– Classical Mechanics (Newton’s Equation of

motion)

– Quantum Mechanics (Schrödinger’s Equation)

• Many developments in physical chemistry can be used.

amFrr

=

)()(ˆ :tIndependen-Time

),(ˆ),( :Dependent-Time

rr

rr

EH

tHtt

i

=

=∂∂

h

Physics-Based protein simulation

• All quantum mechanics (QM) calculation is not feasible.

• QM can be applied to a small set of atoms.– Modeling of an active site (other atoms: not

treated or treated as dielectric continuum)– Can get total energies (binding vs. non-

binding, pKa etc.), wave function (charge distribution).

– QM/MM simulations (other atoms: treated with Molecular Mechanics)

An example of using QM (Case et al., J. Biol. Inorg. Chem. 2002, 7, 632)

• Rieske iron-sulfur protein in bc-type cytochromes

• Calculations based on density functional theory (DFT) performed.

• pKa and redox potentials can be obtained from total energies of several states.

• Change of pKa (proton-binding) and redox potential (electron-binding) are strongly coupled, as observed in experiments.

Using classical mechanics for protein structure and dynamics

• Ignore electrons, assigning (empirical) force fields for atoms (or clusters of atoms).

• A very simple potential:

Force fields: bond stretching and bending

A. R. Leach, “Molecular Modelling”, 1996

Torsional potential

A. R. Leach, “Molecular Modelling”, 1996

Ab initio QM results

3 point charges

N2 molecules:Known to have anElectric quadrupole moment

5 point charges

Polarization: many-body effect

Physics based: methods• Energy Minimization

– Steepest descent– Conjugated gradient

• Monte Carlo Simulation– Random sampling– Stimulated annealing

• Molecular Dynamics– Compute conformational

changes.

Energy surface of two torsional angles

• Very shallow valleys.• Similar in energy.• Determining the (ψ,φ)

conformations of peptide backbones is even more complicated.

Trapping at a local minimum

• Standard practice: use Monte Carlo (random sampling) with stimulated annealing techniques.

Using classical mechanics for protein structure and dynamics

• With a Force field ( V(rN) ), for lowest energy structure• Find the structure that gives energy minimum. Hopefully

this is done within finite amount of computer resources. And hopefully this energy minimum gives the desired native protein structure.

• For protein dynamics: calculate trajectories (Newton’s eq.) at thermal condition and find the averaged physical quantities.

Questions to ask:

• Is the energy function correct? – Precise enough to discriminate other non-

native structure.– Yet simple enough for computers to carry out

efficiently.

• Is the conformational search good enough to cover the global minimum?

Take-home messages: Physics-based methods

• Protein folding without any prior knowledge about protein structure is a difficult task.

• Protein structure prediction is often quoted as an “N-P complete problem”, i.e. the complexity of the problem grows exponentially as the number of residues increases.

• Structures of small proteins (~101 - 102 a.a.) can be solved in principle.

h-bonds d-h … a assignment: if d…a < 3.7 Å in crystal structure. (normally 2.7-3.1 Å)....

Documents

protein conformations

nmr structure of protein

different structure

crystal structure

local hbonds

context dependent

internal residuesmutations

h aassignment