bioinformatics of protein structure

Bioinformatics of Protein Bioinformatics of Protein StructureStructure

Protein structures often Protein structures often characterized by secondary characterized by secondary

structure contentstructure content– All – All / +– There are tools available (for instance at

www.expasy.ch that will allow one to predict secondary structure from sequence data

Sequence/structureSequence/structure

All -proteins begin to reveal sequence/structure relationship

Coiled-coil proteins exhibit periodicity with hydrophobic residues

Observe hydrophobic moments in membrane proteins

~1/4 of all predicted proteins in a genome are membrane proteins

A different periodicity in A different periodicity in --structures structures

Common structures found in Common structures found in structuresstructures

BarrelsPropellersGreek keyJelly roll (Contains one Greek key)Helix

Barrels – anti-parallel sheetsBarrels – anti-parallel sheets

Anti-parallel structures exhibit Anti-parallel structures exhibit every other amino acid periodicityevery other amino acid periodicity

PropellersPropellers

Variable number of propeller blades

http://info.bio.cmu.edu/courses/03231/ProtStruc/b-props.htm

Quaternary structure of Quaternary structure of neuraminidaseneuraminidase

Looking for active sitesLooking for active sites

-crystallin has two domains -crystallin has two domains with identical topologywith identical topology

Protein evolution –

motif duplication and

fusion

Three sheet Three sheet -helix = -helix = TobleroneToblerone

Protein structures containing Protein structures containing and and

Distinction between / and + / - Mainly parallel beta sheets (beta-alpha-

beta units) + - Mainly antiparallel beta sheets

(segregated alpha and beta regions)

Interspersed Interspersed and and

Generally, a tight hydrophobic Generally, a tight hydrophobic core found in core found in barrels barrels

How many folds are there?

To date we know ~26,000 protein structures

Within this dataset, 945 folds are recognized

Proteins have a common fold if they have the samemajor secondary structures in the same arrangementand with the same topological connections.

http://scop.mrc-lmb.cam.ac.uk/scop/

How many non-folds are How many non-folds are there?there?

http://www.scripps.edu/news/press/013102.html

30-40% of human genome encodes for “unstructured” native proteins

Transition to structural Transition to structural classificationsclassifications

Several useful databases link sequence analysis and protein structure information

Since structure is more highly conserved than sequence during evolution, structural alignment algorithms and classifications enable more distant evolutionary relatives to be identified.

CATH and SCOP are two databases that “organize” protein structures, each containing 950-1400 protein superfamilies

Structural AlignmentsStructural Alignments

Various algorithms allow structure vs. structure comparisons

VAST, DALI CATH (http://www.biochem.ucl.ac.uk/bsm/cath/)

also has SSAP and GRATH (one computationally intensive, one not)

[Sequence similarity to structural families for modeling often extracted using PSI-BLAST (Gene3D)]

Pairwise Structure Alignment: SSAP Pairwise Structure Alignment: SSAP [1,4]

[1] Taylor WR, Orengo CA, 1989, Protein structure alignment. J Mol Biol 208:1-22[4] Mueller L, 2003, Protein structure alignment. Paper presentations 27.5:16:30h

Sequence Alignment Structure AlignmentAADDEGHADCDEGH

-> Score by evolutionary distance and chemical similarities

A(x,y,z),B(x,y,z),C(x,y,z)D(x,y,z),E(x,y,z),F(x,y,z)

-> Score by comparison of structural environments (vectors of C-atoms)

Comparison of sequence and structure alignments:

Multiple structural alignmentsMultiple structural alignments

CORA – from CATH (where?)MultiProt -

http://bioinfo3d.cs.tau.ac.il/MultiProt/ DMAPS – (pre-calculated)

http://dmaps.sdsc.edu/ CE-MC - http://cemc.sdsc.edu/Others?

CATHCATH http://cathwww.biochem.ucl.ac.uk/latest/ Classification Scheme: Class, Architecture, Topology

and Homology Class – secondary structure composition and packing Architecture – orientation of secondary structures in 3D,

regardless of connectivity Topology – both orientation and connectivity of

secondary structure is accounted for Homologous superfamily – grouped based on whether an

evolutionary relationship exists (clustered at different levels of sequence ID)

CATH hierarchyCATH hierarchy

Structural alignments

To homologous super-

Family, then sequence

Alignments for sequence

Family, and then domains.

Protein structure predictionsProtein structure predictions

Identifying similar protein structures using only amino acid sequence

Modeling an amino acid sequence onto a known protein structure

Ab initio protein structure prediction

Test sequenceTest sequence

>rsp2570

MTLDGKTIAILIAPRGTEDVEYVRPKEALTQATVVTVSLEPGEAQTVNGDLDPGATHRVDRTFADVSADAFDGLVIPGGTVGADKIRSSEEAVAFVRGFVSAGKPVAAICHGPWALVEADVLKGREVTSYPSLATDIRNAGGRWVDREVVVDSGLVTSRKPDDLDAFCAKMIEEFAEGVHDGQRRSA

SCOP databaseSCOP database Classification scheme: Class, Fold, Superfamily, and

Family, Class – Type and organization of secondary structure Fold – Share common core structure, same secondary

structure elements in the same arrangement with the same topological connections

Superfamily – share very common structure and function

Family – protein domains share a clear common evolutionary origin as evidenced by sequence identity or similar structure/function

HMM’s are useful at SCOPHMM’s are useful at SCOP

For instance, SCOP (http://scop.mrc-lmb.cam.ac.uk/scop/) HMMs are derived from the PDB databank at www.rcsb.org

Identify sequence signatures for specific domains

Modeling protein structure Modeling protein structure based on homologybased on homology

SWISS-MODEL– http://swissmodel.expasy.org/

Using first approach mode, submit test sequence, and use your email

PSI-Blast identifies the most similar sequence with a protein structure, and SWISS-MODEL wraps your input sequence around it

Note you can also specify which structure you would like your sequence to wrap around

Ab initio predictions Ab initio predictions

Protein folding is a complex problem

Ab initio attemptsAb initio attempts

Based on Ramachandran plot probabilities Measure interatomic

Interactions – has

worked for small proteins

<85 aa, which appear to

Favor H-bonds and van

Der Waal and ignore

Electrostatic interactions

bioinformatics of protein structure

Documents

protein structure alignment

structure alignments

structure comparisonsvast

bstructures common structures

z score

tobleroneprotein structures

samemajor secondary

protein structureswithin