bioinformatics of protein structure
DESCRIPTION
Bioinformatics of Protein Structure. Protein structures often characterized by secondary structure content. All a All b a / b a + b There are tools available (for instance at www.expasy.ch that will allow one to predict secondary structure from sequence data. Sequence/structure. - PowerPoint PPT PresentationTRANSCRIPT
Protein structures often Protein structures often characterized by secondary characterized by secondary
structure contentstructure content– All – All / +– There are tools available (for instance at
www.expasy.ch that will allow one to predict secondary structure from sequence data
Sequence/structureSequence/structure
All -proteins begin to reveal sequence/structure relationship
Coiled-coil proteins exhibit periodicity with hydrophobic residues
Observe hydrophobic moments in membrane proteins
Common structures found in Common structures found in structuresstructures
BarrelsPropellersGreek keyJelly roll (Contains one Greek key)Helix
Anti-parallel structures exhibit Anti-parallel structures exhibit every other amino acid periodicityevery other amino acid periodicity
PropellersPropellers
Variable number of propeller blades
http://info.bio.cmu.edu/courses/03231/ProtStruc/b-props.htm
-crystallin has two domains -crystallin has two domains with identical topologywith identical topology
Protein evolution –
motif duplication and
fusion
Protein structures containing Protein structures containing and and
Distinction between / and + / - Mainly parallel beta sheets (beta-alpha-
beta units) + - Mainly antiparallel beta sheets
(segregated alpha and beta regions)
Generally, a tight hydrophobic Generally, a tight hydrophobic core found in core found in barrels barrels
How many folds are there?
To date we know ~26,000 protein structures
Within this dataset, 945 folds are recognized
Proteins have a common fold if they have the samemajor secondary structures in the same arrangementand with the same topological connections.
http://scop.mrc-lmb.cam.ac.uk/scop/
How many non-folds are How many non-folds are there?there?
http://www.scripps.edu/news/press/013102.html
30-40% of human genome encodes for “unstructured” native proteins
Transition to structural Transition to structural classificationsclassifications
Several useful databases link sequence analysis and protein structure information
Since structure is more highly conserved than sequence during evolution, structural alignment algorithms and classifications enable more distant evolutionary relatives to be identified.
CATH and SCOP are two databases that “organize” protein structures, each containing 950-1400 protein superfamilies
Structural AlignmentsStructural Alignments
Various algorithms allow structure vs. structure comparisons
VAST, DALI CATH (http://www.biochem.ucl.ac.uk/bsm/cath/)
also has SSAP and GRATH (one computationally intensive, one not)
[Sequence similarity to structural families for modeling often extracted using PSI-BLAST (Gene3D)]
Pairwise Structure Alignment: SSAP Pairwise Structure Alignment: SSAP [1,4]
[1] Taylor WR, Orengo CA, 1989, Protein structure alignment. J Mol Biol 208:1-22[4] Mueller L, 2003, Protein structure alignment. Paper presentations 27.5:16:30h
Sequence Alignment Structure AlignmentAADDEGHADCDEGH
-> Score by evolutionary distance and chemical similarities
A(x,y,z),B(x,y,z),C(x,y,z)D(x,y,z),E(x,y,z),F(x,y,z)
-> Score by comparison of structural environments (vectors of C-atoms)
Comparison of sequence and structure alignments:
Multiple structural alignmentsMultiple structural alignments
CORA – from CATH (where?)MultiProt -
http://bioinfo3d.cs.tau.ac.il/MultiProt/ DMAPS – (pre-calculated)
http://dmaps.sdsc.edu/ CE-MC - http://cemc.sdsc.edu/Others?
CATHCATH http://cathwww.biochem.ucl.ac.uk/latest/ Classification Scheme: Class, Architecture, Topology
and Homology Class – secondary structure composition and packing Architecture – orientation of secondary structures in 3D,
regardless of connectivity Topology – both orientation and connectivity of
secondary structure is accounted for Homologous superfamily – grouped based on whether an
evolutionary relationship exists (clustered at different levels of sequence ID)
CATH hierarchyCATH hierarchy
Structural alignments
To homologous super-
Family, then sequence
Alignments for sequence
Family, and then domains.
Protein structure predictionsProtein structure predictions
Identifying similar protein structures using only amino acid sequence
Modeling an amino acid sequence onto a known protein structure
Ab initio protein structure prediction
Test sequenceTest sequence
>rsp2570
MTLDGKTIAILIAPRGTEDVEYVRPKEALTQATVVTVSLEPGEAQTVNGDLDPGATHRVDRTFADVSADAFDGLVIPGGTVGADKIRSSEEAVAFVRGFVSAGKPVAAICHGPWALVEADVLKGREVTSYPSLATDIRNAGGRWVDREVVVDSGLVTSRKPDDLDAFCAKMIEEFAEGVHDGQRRSA
SCOP databaseSCOP database Classification scheme: Class, Fold, Superfamily, and
Family, Class – Type and organization of secondary structure Fold – Share common core structure, same secondary
structure elements in the same arrangement with the same topological connections
Superfamily – share very common structure and function
Family – protein domains share a clear common evolutionary origin as evidenced by sequence identity or similar structure/function
HMM’s are useful at SCOPHMM’s are useful at SCOP
For instance, SCOP (http://scop.mrc-lmb.cam.ac.uk/scop/) HMMs are derived from the PDB databank at www.rcsb.org
Identify sequence signatures for specific domains
Modeling protein structure Modeling protein structure based on homologybased on homology
SWISS-MODEL– http://swissmodel.expasy.org/
Using first approach mode, submit test sequence, and use your email
PSI-Blast identifies the most similar sequence with a protein structure, and SWISS-MODEL wraps your input sequence around it
Note you can also specify which structure you would like your sequence to wrap around