introduction to structural bioinformatics
DESCRIPTION
Introduction to Structural Bioinformatics. Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia Columbia, MO 65211-2060 E-mail: [email protected] 573-882-7064 (O) http://digbio.missouri.edu. Structural Bioinformatics. - PowerPoint PPT PresentationTRANSCRIPT
Introduction to Structural
Bioinformatics
Dong Xu
Computer Science Department271C Life Sciences Center
1201 East Rollins RoadUniversity of Missouri-Columbia
Columbia, MO 65211-2060E-mail: [email protected]
573-882-7064 (O)http://digbio.missouri.edu
Structural Bioinformatics
Prediction and modeling
Protein structure
DNA structure
RNA structure
Membrane structures
Large-complex structure
An Overview
o A protein folds into a unique 3D structure under the physiological condition
Lysozyme sequence (129 amino acids):KVFGRCELAA AMKRHGLDNY RGYSLGNWVC AAKFESNFNT QATNRNTDGS
TDYGILQINS RWWCNDGRTP GSRNLCNIPC SALLSSDITA SVNCAKKIVS
DGNGMNAWVA WRNRCKGTDV QAWIRGCRL
Protein backbones: Side chain
Protein Structure Representations
Lysozyme structure:
ball & stick strand surface
[ PDB: http://www.pdb.org ]
Growth of Protein Data Bank (PDB)
Protein Structure Database: PDB (1)
PDB (Protein Data Bank) Web site: http://www.rcsb.org/pdb/ 33,252 Structures as of 25-Oct-2005 PDB ID: 4-character identifier (1cau, 1gox, and 256b) Search methods
* search by PDB ID (e.g. 1lyz);
* SearchLite: protein name, author's name, etc. (e.g., HIV protease);
* SearchFields: EC Number, the name of the binding ligand (e.g.,
inhibitor), the range of the protein size, and the secondary
structure content.
Protein Structure Database: PDB (2)
PDB format (headers + coordinates):
HEADER OXIDOREDUCTASE (OXYGEN(A)) 14-JUN-89 1GOX 1GOX
COMPND GLYCOLATE OXIDASE (E.C.1.1.3.1) 1GOX
...
ATOM 232 N ALA 29 54.035 64.332 19.352 1.00 23.93 1GOX
ATOM 233 CA ALA 29 52.992 65.356 19.569 1.00 24.74 1GOX
ATOM 234 C ALA 29 53.519 66.762 19.309 1.00 25.43 1GOX
ATOM 235 O ALA 29 54.648 67.179 19.655 1.00 25.66 1GOX
ATOM 236 CB ALA 29 52.433 65.340 20.993 1.00 24.54 1GOX
...
HETATM 3165 O HOH 658 62.480 62.480 0.000 0.50 65.79 1GOX
...
END
Molecular Visualization
RasMol: http://www.umass.edu/microbio/rasmol/index2.htm
VMD: http://www.ks.uiuc.edu/Research/vmd
Relevance of Protein Structurein the Post-Genome Era
sequence
structure
function
medicine
Structure-Function Relationship
Certain level of function can be found without structure. But a structure is a key to understand the detailed mechanism.
A predicted structure is a powerful tool for function inference. Trp repressor as a function switch
Structure-Based Drug Design
HIV protease inhibitor
Structure-based rational drug design is still a major method for drug discovery.
Structures in Protein
Language:
Letters Words Sentences
Protein:
Residues Secondary Structure Tertiary Structure
Primary, Secondary and Tertiary Structures of
Proteins
helix
Single protein chain (local) Shape maintained by
intramolecular H bondingbetween -C=O and H-N-
sheet
Several protein chains
Shape maintained byintramolecular H bondingbetween chains
Non-local on protein sequence
-sheet (parallel, anti-parallel)
Dihedral angles
Ramachandran plot (alpha)
Ramachandran plot (beta)
Protein Structure Domain (1)
o Structure domain: compact, globular unit
glycoprotein actin
Protein Structure Domain (2)
o Structure domain is evolutionary, functional, and folding unit of a protein
o Domain insertion: insert: zinc metalloproteinase
+ parent: thioredoxin
(disulfide oxidoreductase)
Dsba: disulfide bond forming protein
o Protein design (growth hormone)o Threading target
Structure Is Better Conserved during
Evolution
Structure can adopt a wide range of mutations.
Physical forces favorcertain structures.
Concept of fold.
Number of fold is limited. Currently ~800 Total: 1,000s ~10,000s TIM barrel
The number of different protein folds is limitedP
DB
sub
mis
sion
s pe
r ye
ar
Year
Already known folds
New folds
Protein Folding Problem
A protein folds into a unique 3D structure under the physiological condition
Lysozyme sequence: KVFGRCELAA AMKRHGLDNY
RGYSLGNWVC AAKFESNFNT
QATNRNTDGS TDYGILQINS
RWWCNDGRTP GSRNLCNIPC
SALLSSDITA SVNCAKKIVS
DGNGMNAWVA WRNRCKGTDV
QAWIRGCRL
Web Addresses
Resource: http://digbio.missouri.edu/resource/
Further reading (a review on protein modeling):
www.bentham.org/cpps1-1/Dong%20Xu/xu_cpps.htm