proteins ? 1d 3d protein grammar: strict regularities in structure-sequence relationship

40
Proteins ? 1D Protein Grammar: Strict Regularities in Structure-Sequence Relationship

Post on 19-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Proteins ? 1D  3D Protein Grammar: Strict Regularities in Structure-Sequence Relationship

Proteins

? 1D 3D

Protein Grammar:

Strict Regularities in Structure-Sequence Relationship

Page 2: Proteins ? 1D  3D Protein Grammar: Strict Regularities in Structure-Sequence Relationship

The main rule of protein Sequence – Structure relationship :

The amino acid sequence alone is sufficient to determine a protein's structure.

Christian Anfinsen, 1961

The Nobel Prize in Chemistry 1972

Page 3: Proteins ? 1D  3D Protein Grammar: Strict Regularities in Structure-Sequence Relationship

From The Anfinsen rule follows:

1) Protein folding is a physical problem but not biological (?)

Blue Gene is an IBM Research project dedicated to exploring thefrontiers in supercomputing: in computer architecture, in the software required to program and control massively parallel systems, and in the use of computation to advance our understanding of important biological processes such as protein folding

The Blue Gene/L machine has a peak speed of 596 Teraflops

ab initio approach

Page 4: Proteins ? 1D  3D Protein Grammar: Strict Regularities in Structure-Sequence Relationship

From The Anfinsen rule follows:

Thus a structure can be determined by analogy with known protein structures of similar sequences.

The idea that sequence similarity translates into structural similarity underlies most modern high-accuracy algorithms of structure prediction

2) it is to be expected that similar sequences would encode similar structures

homology approach

Page 5: Proteins ? 1D  3D Protein Grammar: Strict Regularities in Structure-Sequence Relationship

Amino acid Sequences

3D Protein structures

? relationship ?

similar ?

We need to define

similar structures ?

similar sequences ?

similar ?

Page 6: Proteins ? 1D  3D Protein Grammar: Strict Regularities in Structure-Sequence Relationship

Fundamental Units of Protein Structure

oxygen

hydrogen atoms

A. alpha-helix B. beta-sheets

Hydrogen bonds form helices - alpha form

and

beta-form

W. Astbury (1930s),

L. Pauling

(1939-1951 )

Similarity of structures ?

The first main rule of Protein structures

Page 7: Proteins ? 1D  3D Protein Grammar: Strict Regularities in Structure-Sequence Relationship

Beta Sandwich-like Proteins

two main -sheets packed against each other.

Page 8: Proteins ? 1D  3D Protein Grammar: Strict Regularities in Structure-Sequence Relationship

1 2 3 8 9 N…——>…——>…——>……——>…——>…

beta Sandwich-like Proteins

The goal is to find Are any rules in the packing of strands in Sandwich-like structure?

Folding pattern

Page 9: Proteins ? 1D  3D Protein Grammar: Strict Regularities in Structure-Sequence Relationship

We analyzed

81 superfamilies and 177 families

~ 8,000 structures.

Stage 1: Collection

All Sandwich Proteins are collected from SCOP and CATH databases.

Part I. Folding pattern of SSS

Page 10: Proteins ? 1D  3D Protein Grammar: Strict Regularities in Structure-Sequence Relationship

Stage 2: Structure description

Description of structures in strands: . 5-4-8-9-2 . 6-7-3-1

For over 40 years, researchers have looked at how Secondary structural elements – strands and helices assemble into structure

Secondary structure motif

Page 11: Proteins ? 1D  3D Protein Grammar: Strict Regularities in Structure-Sequence Relationship

1 2 3 8 9 N…——>…——>…——>……——>…——>…We introduced a new supersecondary structure unit –”strandon” - a set of the maximum number of consecutive strands, which are connected in the sequential order by hydrogen bonds in 3D structure.

i i +1

Strandon

Supersecondary structure elements

Page 12: Proteins ? 1D  3D Protein Grammar: Strict Regularities in Structure-Sequence Relationship

Description of structures in strandons:

IV VI II V III I

Now we suggest to investigate how Supersecondary structure elements - strandons assemble into structure.

Page 13: Proteins ? 1D  3D Protein Grammar: Strict Regularities in Structure-Sequence Relationship

(9 1 2 3) (6)

(8 7) (4 5)

I III

IV II

MOTIF

I

II

III

IV

All proteins were described in STRANDON’ notation

Stage 3: SSS classification

supermotif

Page 14: Proteins ? 1D  3D Protein Grammar: Strict Regularities in Structure-Sequence Relationship

supermotif motif protein structure

This is the basis of novel hierarchical classification in the Supersecondary structures (SSS) database

Page 15: Proteins ? 1D  3D Protein Grammar: Strict Regularities in Structure-Sequence Relationship

I III V I III II VI IV IV II

I III I III VII V II IV II VIII IV VI

I V III III I V VI II IV II IV VI

6 SUPERMOTIFS describe ~ 90% of all sandwich structures.

Page 16: Proteins ? 1D  3D Protein Grammar: Strict Regularities in Structure-Sequence Relationship

Stage 4. SSS regularities:

Analysis of all supermotifs in the SSS database led us to the discovery ofdiscovery of

the rule of Supermotifsthe rule of Supermotifs

Page 17: Proteins ? 1D  3D Protein Grammar: Strict Regularities in Structure-Sequence Relationship

The Rule of SupermotifsThe Rule of Supermotifs – – Rules of arrangement of strandons strandons in the two main beta sheets.

K=1, N=4 K=4, N=4 K=3, N=6

I III IV II III V I

II IV I III IV II VI

95% of all structures obey the rule of supermotifs.

Page 18: Proteins ? 1D  3D Protein Grammar: Strict Regularities in Structure-Sequence Relationship

The rule of supermotifs dramatically restricts the number of permissible

arrangements of strandons.

Analysis of observed arrangements of strands within the strandons leads us to formulate the rule of motifs.

Page 19: Proteins ? 1D  3D Protein Grammar: Strict Regularities in Structure-Sequence Relationship

Rule of Motifs

For two neighboring strandons in a sheet, or at the edges of the same side of two beta-sheets, the strands’ numbers in these two strandons will increase in opposite directions.

Page 20: Proteins ? 1D  3D Protein Grammar: Strict Regularities in Structure-Sequence Relationship

The Rule of Motifs ( ordering of strands within the strandons )

held true for all strandons in 82% of analyzed protein domains.

In 12 % of the structures the ordering of strands is obeys the Rule of Motifs in all strandons but ONE strandon.

(82 + 12) % domains

Page 21: Proteins ? 1D  3D Protein Grammar: Strict Regularities in Structure-Sequence Relationship

Question: How strands come together in structures of Sandwich proteins ?

Answer: Structures of beta-sandwich proteins are governed by well-defined rules:

the Rules of Motifs, and

the Rules of Supermotifs

These rules describe the Folding Patterns

End Part I - Folding patterns

Page 22: Proteins ? 1D  3D Protein Grammar: Strict Regularities in Structure-Sequence Relationship

Protein structures similarity definition:

Proteins with the same secondary structure motif and the same orientation of strands in two beta sheets have similar protein structures.

1 2 4

8 7 5 3 8 7 5 3

1 2 4

Sheet A Sheet A

Sheet B Sheet B

Page 23: Proteins ? 1D  3D Protein Grammar: Strict Regularities in Structure-Sequence Relationship

Amino acid Sequences

3D Protein structures

? relationship ?

similar

The main problem: How to extract a structure information from the sequence, and how to reconstruct tertiary structure?

Idea:

1) Collect all proteins with similar structures.

2) Find proteins with non-similar sequences (from different protein families.

3) Extract common sequence regularities if exist

Part II - Sequence patterns

Page 24: Proteins ? 1D  3D Protein Grammar: Strict Regularities in Structure-Sequence Relationship

Amino acid Sequences

3D Protein structures

? relationship ?

similar

The main problem: How to extract a structure information from the sequence, and how to reconstruct tertiary structure?

Idea:

1) Collect all proteins with similar structures.

2) Find proteins with non-similar sequences (from different protein families.

3) Extract common sequence regularities if exist

Part II - Sequence patterns

non similar

Page 25: Proteins ? 1D  3D Protein Grammar: Strict Regularities in Structure-Sequence Relationship

SSS database motif: sheet I: 1 2 5 4

sheet II: 7 6 3

This motif describes proteins from 3 families.

Sequences from different families are strongly dissimilar.

Sequence alignment reveals 1-4% of identical residues.

Alignment 2 sequences:

EMBOSS Needle program Blast program

#1: 1f42 #2: 1oke Identity: 1.9% No significant similarity found

Page 26: Proteins ? 1D  3D Protein Grammar: Strict Regularities in Structure-Sequence Relationship

1) Collect all proteins with identical SSS

2) Find proteins with non-similar sequences (from different protein families.

? 3) Extract common sequence regularities if exist

Hypothesis

Proteins with similar structures share a unique set of residues - ‘Structure -determining residues’ - even though they may belong to different protein families and have very low sequence similarities

Page 27: Proteins ? 1D  3D Protein Grammar: Strict Regularities in Structure-Sequence Relationship

The problem:

the widely used alignment algorithms - PSI-BLAST, NEEDLE - are not applicable to sequences with very low sequence similarity.

Therefore for comparison of sequences of proteins that share same SSS, we developed a new algorithm of structure-based multi-sequence alignment.

Alignment 2 sequences:

EMBOSS Needle program Blast program

#1: 1f42 #2: 1oke Identity: 1.9% No significant similarity found

Page 28: Proteins ? 1D  3D Protein Grammar: Strict Regularities in Structure-Sequence Relationship

The main feature of this algorithm:

Units of alignment are individual strands and loops, rather than whole sequences.

Strands

Alignment of strands in a beta sheet is based on hydrogen bond contacts.

No gaps are allowed within strands.

Loops

Local alignment of each loop separately

With gaps.

New SSS-based multi-sequence alignment algorithm

Page 29: Proteins ? 1D  3D Protein Grammar: Strict Regularities in Structure-Sequence Relationship

Structure A a1 a2 a3 a4 a’3 a5 a’1 a6 a7 a8 a’8 a9 a’6 a10 a’5 a11 a’4 Strand 1 Strand 2 Strand 3 b1 b2 b3 b4 b5 b6 b’5 b7 b’3 b8 b’1 b9 b’8 b10 b’7 Structure B Fig. 2

Page 30: Proteins ? 1D  3D Protein Grammar: Strict Regularities in Structure-Sequence Relationship

Select the best variant with max numbers of conserved positions (?)

Page 31: Proteins ? 1D  3D Protein Grammar: Strict Regularities in Structure-Sequence Relationship

conserved positions (?)

20 amino acids are divided into 2 groups:

Q, E, R,T, Y, P, S, D, G, H, K, N – HYDROPHILIC residues

W, I, A, F, L, C, V, M - HYYDROPHOBIC residues

matching position is conserved if

all (almost) residues in this position belong to one of these groups in all proteins.

Page 32: Proteins ? 1D  3D Protein Grammar: Strict Regularities in Structure-Sequence Relationship

601 sequences71,786 sequences

3 protein families

Proteins sheet I: 1 2 5 4

sheet II: 7 6 3

Page 33: Proteins ? 1D  3D Protein Grammar: Strict Regularities in Structure-Sequence Relationship

PDB S T R A N D 1 L O O P S T R A N D 2

code chain start end 1 2 3 4 5 6 7 81kcr H 117 218 … S V Y P … … A A A … … L G C L V K …1m7d B 114 213 … S V Y P … … G S S … … L G C L V K …2fbj H 119 220 … T I Y P … … S S D … … I G C L I H …

1ow0 A 242 342 … S L H R … … G S E … … L T C T L T …1fp5 A 336 438 … S A Y L … … K S P … … I T C L V V …

1hxm A 121 206 … S V F V … … N G T … … V A C L V K …1c16 A 181 276 … K A H V … … E G D … … L R C W A L …1svb A 303 395 … T W K R … … S G H … … V V M E V T …1oke A 298 398 … K F K V … … H G T … … I V I R V Q …1f42 A 88 211 … T F L ─ … … S G R … … F T C W W L …

10 representative sequences . for alignment

1) ••• 2) ••• 3) ••• 4) ••• ••• 10) •••

Page 34: Proteins ? 1D  3D Protein Grammar: Strict Regularities in Structure-Sequence Relationship

1) [STK] [VILAWF] (4,14)X [GAKSN] [GAS] [TASDEPHR] (0,6)X [LIVF] X [CMI] ...

Strand 1 Loop Strand 2

Set of Structure-determining residues

30 conserved positions:

19 - hydrophilic and 11 - hydrophobic

Question:

Are residues at 30 conserved positions specific and sensitive?

Page 35: Proteins ? 1D  3D Protein Grammar: Strict Regularities in Structure-Sequence Relationship

Testing specificity and sensitivity.

Are the residues at conserved positions the SSS-determining residues?

71,786 sequences

How many proteins describes the set of residues at conserved positions in 3 protein families (true positives) and in other proteins (false positives)?

Answer :

EMBOSS/Preg program revealed

304 - true positives (of 601 proteins)

Not good!

0 - false positives

Good!

Page 36: Proteins ? 1D  3D Protein Grammar: Strict Regularities in Structure-Sequence Relationship

Refining the definition of SSS-determining residues

1) [STK] [VILAWF] (4,14)X [GAKSN] [GAS] [TASDEPHR] (0,6)X [LIVF] X [CMI] ...2) [STKR] [VLIAWFY] (2,14)X [GAKSNEH] [GAS] [TASDEPHRQ] (0,8)X [LIVFY] X [CMIVLF] ...

Strand 1 Loop Strand 2

1) Find additional residues at the conserved positions: gradually add residues respectively to the hydrophobic and hydrophilic conserved positions and test how a new “extra” residue affects on specificity and sensitivity.

2) Vary the distance between a conserved position in a strand and a conserved position in a loop, and between conserved positions within loops.

Result: EMBOSS/Preg program with the new set of residues revealed:

573 - true positives (of 601 proteins)

0 - false positives

Page 37: Proteins ? 1D  3D Protein Grammar: Strict Regularities in Structure-Sequence Relationship

The set of SSS-determining residues with a single mismatch position.

To identify additional true positives any residue is allowed at any single position.

Result: EMBOSS/Preg program with the new set of residues revealed:

additional 18 true positive sequences, and no false positives.

Our analysis found that the remaining 6 sequences have 2 mismatching positions.

Important conclusion: substitution of a hydrophilic for a hydrophobic residue, or vice verse is allowed at just 1-2 conserved positions.

Page 38: Proteins ? 1D  3D Protein Grammar: Strict Regularities in Structure-Sequence Relationship

sequence structure . relationship

1) Sequence of amino acids defines 3-D structure;

1-D 3-D

2) Similar 3-D structures have an unique set of structural determinants.

3-D 1-D

Conclusion

Protein sequence-structure relationship is reciprocal.

Page 39: Proteins ? 1D  3D Protein Grammar: Strict Regularities in Structure-Sequence Relationship

Acknowledgements

Professor Israel Gelfand, Dr. Yih-Cheng Chiang

Dr. Cyrus ChothiaSSS database

Page 40: Proteins ? 1D  3D Protein Grammar: Strict Regularities in Structure-Sequence Relationship

Thank you all.