scop & cath

50
SCOP & CATH Dr. M.I. Hassan

Upload: rahil2989

Post on 03-Apr-2015

1.681 views

Category:

Documents


121 download

TRANSCRIPT

Page 1: SCOP & CATH

SCOP & CATH

Dr. M.I. Hassan

Page 2: SCOP & CATH

1. Protein Data Bank (PDB)

• Protein Data Bank: maintained by the Research Collaboratory for Structural Bioinformatics (RCSB)

• http://www.rcsb.org/pdb/– 30060 Structures 15-Mar-2005– 27570 Structures 05-Oct-2004– 23997 Structures 20-Jan-2004– 62787 Structures 20-Jan-2010

– Also contains structures of other bio-macromolecules: DNA, carbohydrates and protein-DNA complexes.

Page 3: SCOP & CATH

PDB Content Growth

Page 4: SCOP & CATH

Growth Of Unique Folds Per Year As Defined By SCOP

Page 5: SCOP & CATH

Growth Of Unique Topologies Per Year As Defined By CATH

Page 6: SCOP & CATH

Alternative Source of Structure: NCBI

Page 7: SCOP & CATH

Free Software for Protein Structure Visualization

• RASMOL: available for all platforms http://www.openrasmol.org

• Swiss PDB Viewer: from Swiss-Prot http://www.expasy.ch/spdbv/

• Chemscape Chime Plug-in: for PC and Mac http://www.mdl.com/downloads/downloadable/index.jsp

• YASARA: http://www.yasara.org/

• MOLMOL: MOLecule analysis and MOLecule display

http://129.132.45.141/wuthrich/software/molmol/index.html

Page 8: SCOP & CATH

• SCOP: Structural Classification of Proteins University of Cambridge, UK

http://scop.mrc-lmb.cam.ac.uk/scop/Hyperlink in Singapore: http://scop.bic.nus.edu.sg/

• CATH: Class—Architecture—Topology--Homologous SuperfamilySequence family

University College London, UKhttp://www.biochem.ucl.ac.uk/bsm/cath/

Hierarchical classification of protein domains: SCOP & CATH

Page 9: SCOP & CATH

Proteins adopt a limited number of topologiesMore than 50,000 sequences fold into ~1000 unique folds.

Homologous sequences have similar structures Usually, when sequence identity>30%, proteins adopt the same fold. Even in the absence of sequence homology, some folds are preferred by vastly different sequences.

The “active site” is highly conservedA subset of functionally critical residues are found to be conserved even the folds are varied.

Basis for protein classification

Page 10: SCOP & CATH

The hierarchy in SCOP

Root

Class

Fold

Superfamily

Family

Protein

Clear evolutionary relationship

Probable common ancestry

Have the same major secondary structure & topological connections

5 classes: All-, All-β, / β, + β, multi-domain

Page 11: SCOP & CATH

How many unique folds do organisms use to express functions?

Sequence space> 50,000

Conformationalspace

~1,000 ???????

Many sequences to form one unique fold

Page 12: SCOP & CATH

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

1986

1988

1990

1992

1994

1996

1998

2000

No

of

Seq

uen

ces

0

2000

4000

6000

8000

10000

12000

No

. o

f S

tru

ctu

res

and

Fo

ldsSequences

Structures

Folds

Growth of Protein Databases

Page 13: SCOP & CATH

• University of Cambridge, UK: http://scop.mrc-lmb.cam.ac.uk/scop/– mirrored at Singapore: http://scop.bic.nus.edu.sg/– contains PDB entries grouped hierachically by:

• Structural class, • Fold,• Superfamily,• Family,• Individual member

(domain-based)

Structural Classification of Proteins SCOP

Page 14: SCOP & CATH

• Family

Structural Classification of Proteins SCOP

• Proteins are clustered together into families on the basis of one of two criteria that imply their having a common evolutionary origin:

• All proteins that have residue identities of 30% and greater;

• Proteins with lower sequence identities but whose functions and structures are very similar

Example, globins with sequence identities of 15%.

Page 15: SCOP & CATH

• Superfamily

Structural Classification of Proteins SCOP

• Families, whose proteins have low sequence identities but whose structures and, in many cases, functional features suggest that a common evolutionary origin is probable, are placed together in superfamilies

• Example, actin, the ATPase domain of the heat-shock protein and hexokinase

Page 16: SCOP & CATH

• Fold

Structural Classification of Proteins SCOP

• Superfamilies and families are defined as having a common fold if their proteins have same major secondary structures in same arrangement with the same topological connections.

Page 17: SCOP & CATH

Structural Classification of Proteins SCOP

• Class– For convenience of users, the different folds have been grouped into

classes. Most of the folds are assigned to one of a few structural classes on the basis of the secondary structures of which they composed

Page 18: SCOP & CATH
Page 19: SCOP & CATH

SCOP Class: All- topologies

ferritin cytochrome b-562

Page 20: SCOP & CATH

SCOP Class: All- topologies

Page 21: SCOP & CATH

SCOP Class: All- topologies

Page 22: SCOP & CATH

SCOP Class: All- topologies

sandwiches -barrels

Page 23: SCOP & CATH

SCOP Class: All- topologies

Page 24: SCOP & CATH

SCOP Class: Topologies

horseshoe

Page 25: SCOP & CATH

barrels

SCOP Class: Topologies

Page 26: SCOP & CATH

SCOP Class: Topologies

Page 27: SCOP & CATH

SCOP Class: Alpha+Beta Topologies

Page 28: SCOP & CATH

SCOP Class: Alpha+Beta Topologies

Page 29: SCOP & CATH
Page 30: SCOP & CATH

Ubiquitin

1ubi

Page 31: SCOP & CATH

Ubiquitin

1ubi

Page 32: SCOP & CATH

Ubiquitin

1ubi

Page 33: SCOP & CATH

Ubiquitin

1ubi

Page 34: SCOP & CATH

CATH database

Orengo et al. CATH-a hierarchical classification of protein domain structures (1997) Structure 5, 1093-1108

Sequence identity >30% the same overall foldSequence identity >70% the same overall fold

+ the similar function

CATH: Class—Architecture—Topology--Homologous Superfamily--Sequence family

http://www.biochem.ucl.ac.uk/bsm/cath/

Page 35: SCOP & CATH

ClassClass

ArchitectureArchitecture

TopologyTopology

Homologous Homologous SuperfamilySuperfamily

SequenceSequence

3 classes: Mainly-, Mainly-β, -β

Classified based on sequence identity

Share a common ancestor

Both the overall shape & connectivity of secondary structure

Overall shape as determined by orientations of secondary structures

The hierarchy in CATH

Page 36: SCOP & CATH

CATH databaseClassDerived from secondary structure content, is assigned for more than 90% of protein structures automatically.

ArchitectureDescribes the gross orientation of secondary structures, independent of connectivities, is currently assigned manually.

Topology Clusters structures according to their topological connections and numbers of secondary structures.

Homologous superfamilies Cluster proteins with highly similar structures and functions. The assignments of structures to topology families and homologous superfamilies are made by sequence and structure comparisons.

Sequence familiesStructures within each H-level are further clustered on sequence identity. Domains clustered in the same sequence families have sequence identities >35%.

Non-identical sequence domains, Identical sequence domains, Domains

Page 37: SCOP & CATH

CATH database

Page 38: SCOP & CATH
Page 39: SCOP & CATH

The class (C), architecture (A) and topology (T) levels in the CATH database

Class

Architecture

Topology

Page 40: SCOP & CATH

The class (C), architecture (A) and topology (T) levels in the CATH database

Homologous Superfamily

Page 41: SCOP & CATH

CATH – architecturesCATH – architectures

Page 42: SCOP & CATH

CATH – architectures (cont.)CATH – architectures (cont.)

Page 43: SCOP & CATH

The protein structure universe in the PDB (1997) by a CATH wheel

The distribution of non-homologous structures (i.e. a single representative from each homologous superfamily at the H-level in CATH) amongst the different classes (C), architectures (A) and fold families (T) in the CATH database.

Page 44: SCOP & CATH

SCOP / CATH -> DALI

SCOP & CATHSCOP & CATH

• Hierarchical and based on abstractions• Include some manual aspects and are curated by

experts in the field of protein structure

Presentation of results of computer classification, where the methods that underlie the classification remain

internal

Structure comparison

Dali

Page 45: SCOP & CATH

DALI

anti parallel barrel

meander

More information about DALI

Touring protein fold space with Dali/FSSP: Liisa Holm and Chris Sander

Comparing protein structures in 3D

Page 46: SCOP & CATH

Compare 3D protein structures by Dali http://www.ebi.ac.uk/dali/

Page 47: SCOP & CATH

• The FSSP database (Fold classification based on Structure-Structure alignment of Proteins) is based on exhaustive all-against-all 3D structure comparison of protein structures currently in the Protein Data Bank (PDB).

• The classification and alignments are automatically maintained and continuously updated using the Dali search engine.

Dali Domain Dictionary

• Structural domains are delineated automatically using the criteria of recurrence and compactness. Each domain is assigned a Domain Classification number DC_l_m_n_p , where:

l - fold space attractor region

m - globular folding topology

n - functional family

p - sequence family

Compare 3D protein structures by Dali http://www.ebi.ac.uk/dali/

Page 48: SCOP & CATH

Functional families

• Evolutionary relationships from strong structural similarities which are accompanied by functional or sequence similarities.

• Functional families are branches of the fold dendrogram where all pairs have a high average neural network prediction for being homologous.

Sequence families

• Representative subset of the Protein Data Bank extracted using a 25 % sequence identity threshold.

• All-against-all structure comparison was carried out within the set of representatives.

• Homologues are only shown aligned to their representative.

Compare 3D protein structures by Dali http://www.ebi.ac.uk/dali/

Page 49: SCOP & CATH

Fold types

• Fold types are defined as clusters of structural neighbors in fold space with average pairwise Z-scores (by Dali) above 2.

Structural neighbours of 1urnA (top left). 1mli (bottom right) has the same topology even though there are shifts in the relative orientation of secondary structure elements

Compare 3D protein structures by Dali http://www.ebi.ac.uk/dali/

Page 50: SCOP & CATH

Summary

Protein structure database (PDB)

Protein structure visualization software

Structural classification, databases and servers