protein structure databases, cont....
TRANSCRIPT
Protein Structure Databases, cont. 11/09/05
D Dobbs ISU - BCB 444/544X 1
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 1
11/9/05
Protein Structure Databases(continued)
Prediction & Modeling
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 2
Bioinformatics Seminars
Nov 10 Thurs 3:40 Com S Seminar in 223 AtanasoffComputational Epidemiology
Armin R. Mikler, Univ. North Texashttp://www.cs.iastate.edu/~colloq/#t3
Nov 10 Thurs 4:10 EEOB Seminar in 210 BesseyDiversity and Evolution of Plant Immunity Genes: Insights from Molecular Population Genetics
Peter Tiffin, Univ. of Minnesotahttp://www.cbs.umn.edu/tiffin/index.html
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 3
Bioinformatics SeminarsCORRECTION:
Next week - Baker Center/BCB Seminars: (seminar abstracts available at above link)
Nov 14 Mon 1:10 PM Doug Brutlag, StanfordDiscovering transcription factor binding sites
Nov 15 Tues 1:10 PM Ilya Vakser, Univ KansasModeling protein-protein interactions both seminars will be in Howe Hall Auditorium
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 4
Protein Structure & Function:Analysis & Prediction
Mon Protein structure: basics; classification,databases, visualization
Wed Protein structure databases - cont.
Thurs Lab Protein structure databases Protein structure analysis & prediction
Fri Protein structure prediction Protein-nucleic acid interactions
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 5
Reading Assignment (for Mon-Fri)Mount Bioinformatics
• Chp 10 Protein classification & structure predictionhttp://www.bioinformaticsonline.org/ch/ch10/index.html
• pp. 409-491• Ck Errata: http://www.bioinformaticsonline.org/help/errata2.html
Additional reading assignments for BCB 544:• Gene Prediction: Burge & Karlin 1997 JMB 268:78
Prediction of complete gene structures in human genomic DNA
• Structure Prediction: Schueler-Furman…Baker, Science 310:638Progress in modeling of protein structures and interactions
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 6
Review last lecture:
Protein Structure: Basics
Protein Structure Databases, cont. 11/09/05
D Dobbs ISU - BCB 444/544X 2
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 7
Protein Structure & Function
• Amino acids characteristics• Structural classes & motifs• Protein functions & functional families
(not much - more on this later)
• Classification• Databases• Visualization
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 8
Amino Acids
Each of 20 different amino acids has different"R-Group," side chain attached to Cα
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 9
Peptide bond is rigid and planar
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 10
Hydrophobic Amino Acids
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 11
Charged Amino Acids
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 12
Polar Amino Acids
Protein Structure Databases, cont. 11/09/05
D Dobbs ISU - BCB 444/544X 3
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 13
Certain side-chain configurations areenergetically favored (rotamers)
Ramachandran plot:"Allowable" psi & phi angles
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 14
Glycine is smallest amino acidR group = H atom
• Glycine residues increasebackbone flexibility becausethey have no R group
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 15
Proline is cyclic• Proline residuesreduce flexibility ofpolypeptide chain
• Proline cis-transisomerization is oftena rate-limiting step inprotein folding• Recent worksuggests it also mayalso regulate ligandbinding in nativeproteins -Andreotti
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 16
Cysteines can form disulfide bonds
• Disulfide bonds(covalent) stabilize3-D structures
• In eukaryotes,disulfide bonds arefound only in secretedproteins orextracellular domains
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 17
Globular proteins have a compacthydrophobic core
Packing of hydrophobic side chains into interior is maindriving force for folding
Problem? Polypeptide backbone is highly polar(hydrophilic) due to polar -NH and C=O in eachpeptide unit; these polar groups must be neutralized
Solution? Form regular secondary structures,e.g., α-helix, β-sheet, stabilized by H-bonds
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 18
Exterior surface of globular proteinsis generally hydrophilic
Hydrophobic core formed by packed secondarystructural elements provides compact, stable core
"Functional groups" of protein are attached to thisframework; exterior has more flexible regions(loops) and polar/charged residues
Hydrophobic "patches" on protein surface are ofteninvolved in protein-protein interactions
Protein Structure Databases, cont. 11/09/05
D Dobbs ISU - BCB 444/544X 4
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 19
Protein Secondary Structures
α−Helicesβ−SheetsLoopsCoils
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 20
α−helix: stabilized by H-bonds betweenevery ~ 4th residue in backbone
C = blackO = redN = blue
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 21
Certain amino acids are "preferred" &others are rare in α−helices
• Ala, Glu, Leu, Met = good helix formers• Pro, Gly Tyr, Ser = very poor• Amino acid composition & distribution varies, depending
on on location of helix in 3-D structure
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 22
β-sheets - also stabilized by H-bondsbetween back bone atoms
Anti-parallel Parallel
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 23
Loops• Connect helices and sheets• Vary in length and 3-D
configurations• Are located on surface of
structure• Are more "tolerant" of mutations• Are more flexible and can adopt
multiple conformations• Tend to have charged and polar
amino acids• Are frequently components of
active sites• Some fall into distinct
structural families (e.g.,hairpin loops, reverse turns)
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 24
Coils
• Regions of 2' structure that are nothelices, sheets, or recognizable turns
• Intrinsically disordered regions appear toplay important functional roles
Protein Structure Databases, cont. 11/09/05
D Dobbs ISU - BCB 444/544X 5
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 25
Globular proteins are built fromrecurring structural patterns
Motifs or supersecondary structures =combinations of 2' structural elements
Domains = combinations of motifs• Independently folding unit (foldon)• Functional unit
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 26
Simple motifs combine to form domains
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 27
6 main classes of protein structure1) α Domains
• Bundles of helices connected by loops
2) β Domains• Mainly antiparallel sheets, usually with 2 sheets forming
sandwich
3) α/β Domains• Mainly parallel sheets with intervening helices, also
mixed sheets
4) α+β Domains• Mainly segregated helices and sheets
5) Multidomain (α & β)• Containing domains from more than one class
6) Membrane & cell-surface proteins
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 28
α-domain structures: 4-helix bundles
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 29
β-sheets: up-and-down sheets & barrels
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 30
α/β-domains: leucine-rich motifs canform horseshoes
Protein Structure Databases, cont. 11/09/05
D Dobbs ISU - BCB 444/544X 6
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 31
New today:
Protein StructureDatabasesClassificationVisualization
Protein Structure PredictionSecondary structure Tertiary structure
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 32
Protein sequence databases
• UniProt (SwissProt, PIR, EBI)http://www.pir.uniprot.org
• NCBI Protein http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Protein
More on these later: protein function prediction
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 33
Protein sequence & structure: analysis• Diamond STING Millennium - many useful structure
analysis tools, including Protein Dossierhttp://trantor.bioc.columbia.edu/SMS/
• SwissProt (UniProt)protein knowledgebasehttp://us.expasy.org/sprot
• InterPROsequence analysis toolshttp://www.ebi.ac.uk/interpro
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 34
Protein structure databases• PDB Protein Data Bank http://www.rcsb.org/pdb/ (RCSB) - THE protein structure database
• MMDB Molecular Modeling Databasehttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Structure
(NCBI Entrez) - has "added" value
• MSD Molecular Structure Database http://www.ebi.ac.uk/msdEspecially good for interactions, binding sites
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 35
Protein structure classification• SCOP = Structural Classification of Proteins
Levels reflect both evolutionary and structural relationshipshttp://scop.mrc-lmb.cam.ac.uk/scop
• CATH = Classification by Class, Architecture, Topology & Homologyhttp://cathwww.biochem.ucl.ac.uk/latest/
• DALI/FSSP (recently moved to EBI & reorganized)• fully automated structure alignments
• DALI server http://www.ebi.ac.uk/dali/index.html• DALI Database (fold classification)
http://ekhidna.biocenter.helsinki.fi/dali/start
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 36
Protein structure visualization• Molecular Visualization Freeware:
http://www.umass.edu/microbio/rasmol
• MolviZ.Orghttp://www.umass.edu/microbio/chime
• Protein Explorer http://www.umass.edu/microbio/chime/pe/protexpl/frntdoor.htm• RASMOL (& many decendents: Protein Explorer,PyMol, MolMol, etc.)
http://www.umass.edu/microbio/rasmol/index2.htm• CHIME
http://www.umass.edu/microbio/chime/getchime.htm
• Cn3Dhttp://www.biosino.org/mirror/www.ncbi.nlm.nih.gov/Structure/cn3d/
• Deep View = Swiss-PDB Viewerhttp://www.expasy.org/spdbv
Protein Structure Databases, cont. 11/09/05
D Dobbs ISU - BCB 444/544X 7
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 37
PDB (RCSB)http://www.rcsb.org/pdb
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 38
RCSB PDB - Beta sitehttp://pdbbeta.rcsb.org/pdb/Welcome.do
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 39
RCSB PDB - New Tutorialhttp://core1.rcsb.org/tutorial
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 40
NCBI Structurehttp://www.ncbi.nlm.nih.gov/Structure
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 41
MMDBhttp://www.ncbi.nlm.nih.gov/Structure/MMDB/mmdb.shtml
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 42
Cn3Dhttp://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml
Protein Structure Databases, cont. 11/09/05
D Dobbs ISU - BCB 444/544X 8
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 43
MMDB: MMolecular MModeling Data Base
Derived PDB structure recordsValue added to PDB records including:
• Integration with other ENTREZ databases & tools• Conversion to parseable ASN.1 data description language• Correction of numbering discrepancies in structure vs sequence• Validation• Addition of explicit chemical graph information
Structure neighbors determined by Vector AlignmentSearch Tool (VAST)
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 44
Searching MMDB
1CET
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 45
MMDB Structure Summary
Cn3D viewer
VAST neighbors
BLAST neighbors
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 46
Cn3D : Displaying 2' Structures
Chloroquine
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 47
Cn3D : Displaying 3' Structures
Chloroquine
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 48
Cn3D: Structural Alignments
Chloroquine
NADH
Protein Structure Databases, cont. 11/09/05
D Dobbs ISU - BCB 444/544X 9
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 49
Protein Explorer (RasMol/Chime)
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 50
Protein Explorer
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 51
SCOP - Structure Classification
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 52
CATH - Structure Classification
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 53
Structural Genomics
~ 30,000 "traditional" genes in human genome(not counting: ???)
~ 3,000 proteins in a typical cell> 2 million sequences in UniProt> 33,000 protein structures in the PDB Experimental determination of protein structure
lags far behind sequence determination!Goal: Determine structures of "all" protein folds in nature, using
combination of experimental structure determination methods(X-ray crystallography, NMR, mass spectrometry) & structureprediction
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 54
Structural Genomics Projects
TargetDB: database of structural genomics targetshttp://targetdb.pdb.org
Protein Structure Prediction?
Protein Structure Databases, cont. 11/09/05
D Dobbs ISU - BCB 444/544X 10
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 55
Protein Folding
"Major unsolved problem in molecular biology"
In cells: spontaneousassisted by enzymesassisted by chaperones
In vitro: many proteins fold spontaneously & many do not!
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 56
Steps in Protein Folding1- "Collapse"- driving force is burial of hydrophobic aa’s
(fast - msecs)2- Molten globule - helices & sheets form, but "loose"
(slow - secs)3- "Final" native folded state - compaction, some 2'
structures rearranged
Native state? - assumed to be lowest free energy - may be an ensemble of structures
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 57
Protein Dynamics
• Protein in native state is NOT static• Function of many proteins depends on conformational
changes, sometimes large, sometimes small• Globular proteins are inherently "unstable"
(NOT evolved for maximum stability)• Energy difference between native and denatured
state is very small (5-15 kcal/mol)(this is equivalent to 1 or 2 H-bonds!)
• Folding involves changes in both entropy & enthalpy
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 58
Protein Structure Prediction
• Structure is largely determined by sequence BUT:
• Similar sequences can assume different structures• Dissimilar sequences can assume similar structures• Many proteins are multi-functional• Protein folding:
• determination of folding pathways• prediction of tertiary structure
still largely unsolved problems