proper structural fold of protein molecule is essential to execute its precise functional mission md...
TRANSCRIPT
Proper structural fold of protein molecule is essential to execute its precise functional
mission
Md Abu Reza, PhD
Date : 24th March, 2012
Venue : Dept of Statistics, RU
Associate ProfessorDept of Genetic Eng & Biotech
University of Rajshahi
Bioinformatics Workshop-1
Higher Education Quality Enhance Project1
Molecular Organization of a cell
2
Proteins control all biological systems in a cell
They either act in constituting structure or perform distinct biological function in any physiological system
Many proteins perform their functions independently, the vast majority of proteins interact with others for proper biological activity
To perform the function effectively a proper structure is essential. Without proper structure a protein is useless or even cause malfunction in system
Conformation and functional-group chemistry controls function
Made up of 20 different types of amino-acid monomers
Proteins define what an organism is, what it looks like, how it behaves, etc. (responsible for most phenotype)
Protein – The Master Molecule
3
Protein Function
4
Function of Proteins
5
Protein Function is Related to Protein Function is Related to StructureStructure 6
What are Proteins ?What are Proteins ?
Proteins are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form in a biologically functional way.
A polypeptide is a single linear polymer chain of amino acids bonded together by peptide bonds
20 natural amino acids join in different permutation and combinations in different lengths
Once linked in the protein chain, an individual amino acid is called a residue, and the linked series of carbon, nitrogen, and oxygen atoms are known as the main chain or protein backbone 7
Lysine with the carbon atoms in the side-chain labeled
Amino Terminal
Carboxy Terminal
Amino AcidsAmino Acids
8
How peptide bonds are formed ?How peptide bonds are formed ?
•Here amino acids are both Alanine in which the R group is a single hydrogen.
•The carboxyl acid end on the first amino acid is orientated to the amino group of the second amino acid.
•The -OH group and -H are removed to form water (condensation reaction).
•The bond forms between the terminal carbon on the first amino acid and the nitrogen on the second amino acid.
•The backbone of the molecule has the sequence N-C-C-N-C-C
•Polypeptides maintain this sequence no matter how long the chain.
•The R groups project from the backbone.
•As the amino acids are added in translation the polypeptide folds up into it specific shape.
9
Element Color Name
Carbon light grey
Oxygen red
Hydrogen white
Nitrogen light blue
Sulfur yellow
Phosphorus orange
Chlorine green
Bromine, Zinc brown
Sodium blue
Iron orange
Magnesium dark green
Calcium dark grey
Unknown deep pink
Colour codes used for atoms
10
StereochemistryStereochemistry
The The CCOORRNN Law Law
HHHH
View in 3DView in 3D 11
Structure of the 20 naturally occurring Amino Acids
12
Structure of the 20 naturally occurring Amino Acids
13
The 20 amino acids can be divided into several groups based on their properties. Important factors are charge, hydrophilicity or hydrophobicity, size, and functional groups water-soluble proteins tend to have their hydrophobic residues (Leu, Ile, Val, Phe, and Trp) buried in the middle of the protein, whereas hydrophilic side-chains are exposed to the aqueous solvent.
Livingstone & Barton, CABIOS, 9, 745-756, 1993
Amino Acid Properties
14
15
Group I: Nonpolar amino acidsGroup I: Nonpolar amino acidsGroup I amino acids are alanine, valine, leucine, isoleucine, proline, phenylalanine, methionine, and tryptophan. The R groups of these amino acids have either aliphatic or aromatic groups. This makes them hydrophobic (“water fearing”). In aqueous solutions, globular proteins will fold into a three-dimensional shape to bury these hydrophobic side chains in the protein interior.
16
Group II: Polar, uncharged amino Group II: Polar, uncharged amino acidsacids Group II amino acids are glycine, serine, cysteine, threonine, tyrosine, asparagine, and glutamine. The side chains in this group possess a spectrum of functional groups. However, most have at least one atom (nitrogen, oxygen, or sulfur) with electron pairs available for hydrogen bonding to water and other molecules. Polar aa are hydrophilic.
17
Group III: Acidic amino acidsGroup III: Acidic amino acids The two amino acids in this group are aspartic acid and glutamic acid. Each has a carboxylic acid on its side chain that gives it acidic (proton-donating) properties. In an aqueous solution at physiological pH, all three functional groups on these amino acids will ionize, thus giving an overall charge of −1. In the ionic forms, the amino acids are called aspartate and glutamate. .
18
Group IV: Basic amino acidsGroup IV: Basic amino acidsThe three amino acids in this group are arginine, histidine, and lysine. Each side chain is basic (i.e., can accept a proton). Lysine and arginine both exist with an overall charge of +1 at physiological pH. The guanidino group in arginine’s side chain is the most basic of all R groups (a fact reflected in its pKa value of 12.5). As mentioned above for aspartate and glutamate, the side chains of arginine and lysine also form ionic bonds. The chemical structures of Group IV amino acids are
19
20
FunctionsFunctions
Diverse functions related to Diverse functions related to structurestructure
Structural components of cellsStructural components of cells Motor proteinsMotor proteins EnzymesEnzymes AntibodiesAntibodies HormonesHormones Hemoglobin/myoglobinHemoglobin/myoglobin Transport proteins in bloodTransport proteins in blood
Why Proteins Need Why Proteins Need Structure !Structure !
21
Protein structure - bondingProtein structure - bonding Interactions (forces) governing protein structureInteractions (forces) governing protein structure
Covalent InteractionCovalent Interaction Peptide bondPeptide bond Disulfide bondDisulfide bond
Non Covalent interactionNon Covalent interaction Hydrogen bondHydrogen bond Ionic bond (Ionic bond (Electrostatic interactions)Electrostatic interactions) Salt bridgeSalt bridge Van-der-Waals interactionsVan-der-Waals interactions Hydrophobic forceHydrophobic force
22
Covalent bond Covalent bond between sulfur between sulfur atoms on two atoms on two cysteine amino acidscysteine amino acids
Very strong Very strong IntereactionIntereaction
From: Elliott, WH. Elliott, DC. (1997) Biochemistry and Molecular Biology. Oxford: Oxford University
Press. p32
Disulfide bondDisulfide bond
23
Levels of Protein Structure
24
Primary structure (Amino acid sequence)↓
Secondary structure (α-helix, β-sheet )↓
Tertiary structure (Three-dimensional structure formed by assembly of secondary
structures)↓
Quaternary structure (Structure formed by more than one polypeptide chains)
Hierarchical nature of Hierarchical nature of protein structureprotein structure
25
Primary protein structurePrimary protein structure
Linear sequence of Linear sequence of amino acids forms amino acids forms primary structureprimary structure
Sequence essential Sequence essential for proper for proper physiological functionphysiological function
Bettelheim & March (1990) Introduction to Organic & Biochemistry
(International Edition) Philadelphia: Saunders College Publishing, p299
Primary structure of insulin
26
Sickle cell anemiaSickle cell anemia
27
Sic
kle-C
ell
Sic
kle-C
ell
Dis
ease
Dis
ease
28
Secondary structure = local folding of residues into regular patterns
29
Secondary protein Secondary protein structurestructure
Peptide chains fold into Peptide chains fold into secondary structures:secondary structures: - helix- helix - pleated sheet- pleated sheet Random coilRandom coil
30
Peptide Bonds are Planar
For a pair of amino acids linked by a peptide bond , six atoms lie in the same plane: the carbon atom and CO group of the first amino acid and the NH group and carbon atom of the second amino acid
The C-N distance in a peptide bond is typically 1.32Å Two configurations are possible for a planar peptide bond.
In the trans configuration, the 2 carbon atoms are on opposite sides of the peptide bond. In the cis confi guration, these groups are on the same side of the peptide bond. Almost all peptide bonds are trans
31
The peptide bond is planar
32
Torsion Angle In contrast with the peptide bond, the bonds between the
amino group and the carbon atom and between the carbon atom and the carbonyl group are pure single bonds. The two adjacent rigid peptide units may rotate about these bonds, taking on various orientations
This freedom of rotation about two bonds of each amino acid allows proteins to fold in many different ways. The rotations about these bonds can be specified by torsion angles
The angle of rotation about the bond between the nitrogen and the carbon atoms is called phi ( )
The angle of rotation about the bond between the carbon and the carbonyl carbon atoms is called psi ( )
A clockwise rotation about either bond as viewed from the nitrogen atom toward the carbon atom or from the carbonyl group toward the carbon atom corresponds to a positive value
The and angles determine the path of the polypeptide chain
33
The peptide bond is planar
34
Ramachandran plot -- shows and angles for secondary structures
A measure of the rotation of a and bond usually lie between - 180 and + 180
35
Secondary structure conformation
Residue Conformational Preference
Conformation helix A, L, M, Q, K, R, E Strand V, I,, Y, C, W, F, T Turn G, N, P, S, D
and angles for secondary structures
36
Alpha Helix• In the -helix, the carbonyl oxygen of residue “i” forms a hydrogen bond with the amide of residue “i+4”.
• Although each hydrogen bond is relatively weak in isolation, the sum of the hydrogen bonds in a helix makes it quite stable.
• The propensity of a peptide for forming an -helix also depends on its sequence.37
- helix- helix
Shape Shape maintained by maintained by hydrogen hydrogen bondsbonds between between C=O and N-H C=O and N-H groups in groups in backbonebackbone
R groups R groups directed directed outward from outward from coilcoil
From: Elliott, WH. Elliott, DC. (1997) Biochemistry and Molecular Biology. Oxford: Oxford University Press. p28
38
α-Helixα-Helix
A loop of 13 atoms is formed between the A loop of 13 atoms is formed between the hydrogen bond.hydrogen bond.
3.6 amino acids per turn of helix.3.6 amino acids per turn of helix.
Helices observed in proteins can range from Helices observed in proteins can range from four to over forty residues long, but a typical four to over forty residues long, but a typical helix contains about ten amino acids (about helix contains about ten amino acids (about three turns). three turns).
α-Helix is also called 3.6α-Helix is also called 3.61313 helix, compared to helix, compared to π-helix 4.4π-helix 4.41616 and 3 and 31010 helix. helix.
Proline is the α-breaker.Proline is the α-breaker. 39
Different amino-acid sequences have different propensities for forming α-helical structure. Methionine, alanine, leucine, uncharged glutamate, and lysine ("MALEK" in the amino-acid 1-letter codes) all have especially high helix-forming propensities, whereas proline and glycine have poor helix-forming propensities. Proline either breaks or kinks a helix, both because it cannot donate an amide hydrogen bond (having no amide hydrogen), and also because its side-chain interferes sterically with the backbone of the preceding turn - inside a helix, this forces a bend of about 30° in the helix axis
Propensities for forming α-helical structurePropensities for forming α-helical structure
40
Examples of α-Helical Proteins:
α-helical coiled coil proteins:
Form superhelix
Found in myosin, tropomyosin (muscle), fibrin (blood clots), keratin (hair)
Hair
Also fingernails and wool are α-helical proteins; silk is β 41
A polypeptide chain, called a β-strand, in a β-sheet is almost fully extended rather than being tightly coiled as in the -helix
The distance between adjacent amino acids along a strand is approximately 3.5Å, in contrast to a distance of 1. 5Å along an helix
sheet is formed by linking two or more strands lying next to one another through hydrogen bonds
All residues in Beta sheet have nearly the same and angle
Hydrogen bonds can only formed between adjacent polypeptide chains.
R groups are directed above and below backbone
β-sheet (-pleated sheet)-
42
• The adjacent polypeptide chains in a -sheet can be either parallel or anti-parallel (having the same or opposite amino-to-carboxyl orientations, respectively).
Parallel or Anti-parallel -Sheet
H bonds between 2 same aa H bonds between different aa
43
4444
Examples of β-sheet Proteins:
Fatty acid binding protein -> β barrels structure
Antibodies
OmpX: E. coli porin
more β sheets
44
Tertiary Structure: 3D structure of a polypeptide chain
Quaternary Structure: Polypeptide chains assemble into multisubunit structures
Cell-surface receptor CD4
Tetramer of hemoglobin
45
Deoxyhaemoglobin
QUATERNARY STRUCTURE
46
B-Turns and Loops
-turns allow the protein backbone to make abrupt turns.
• Again, the propensity of a peptide for forming b-turns depends on its sequence.
• In this reverse turns, the CO group of residue i of a polypeptide is hydrogen bonded to the NH group of residue i + 3
• In other cases, more elaborate structures are responsible for chain reversals. These structures are called loops or sometimes loops (omega loops) to suggest their overall shape
47
Why not here
48
49
Random coilRandom coil Not really random Not really random
structure, just non-structure, just non-repeatingrepeating ‘‘Random’ coil has fixed Random’ coil has fixed
structure within a structure within a given proteingiven protein
Commonly called Commonly called ‘connecting loop ‘connecting loop region’region’
Structure determined Structure determined by bonding of side by bonding of side chains (i.e. not chains (i.e. not necessarily necessarily hydrogen hydrogen bondsbonds))
From: Elliott, WH. Elliott, DC. (1997) Biochemistry and Molecular Biology. Oxford: Oxford University
Press. p27
50
Tertiary protein structureTertiary protein structure
Secondary structures fold and pack together to Secondary structures fold and pack together to
form tertiary structureform tertiary structure Usually globular shapeUsually globular shape
But can be fibrousBut can be fibrous
Tertiary structure stabilized by bonds between Tertiary structure stabilized by bonds between
R groupsR groups (i.e. side-chains) (i.e. side-chains)
51
Tertiary structure = global folding of a protein chain
52
Tertiary structures are quite varied
53
Quaternary structures
54
Each Protein has a unique Each Protein has a unique structurestructure
Amino acid sequence
NLKTEWPELVGKSVEEAKKVILQDKPEAQIIVLPVGTIVTMEYRIDRVRLFVDKLDNIAE
VPRVGFolding!
55
5656
Protein FoldingFolding is a
highly cooperative process (all or none)
Folding by stabilization of Intermediates
Protein Folding by Chaperons• Chaperone proteins provide a site where
misfolded proteins can fold correctly. 56
Central DogmaCentral Dogma
DNADNA
Pre mRNA Pre mRNA (hnRNA)
mRNAmRNA
proteinprotein
Transcription
Splicing, Processing and maturation
Translation
57
Chaparonins
Chaparonins Assist in Protein Folding
They segregate protein folding from “bad
influences” in the cell 58
Classes of proteinsFunctional definition:Enzymes: Accelerate biochemical reactions
Structural: Form biological structures
Transport: Carry biochemically important substances
Defense: Protect the body from foreign invaders
Structural definition:Globular: Complex folds, irregularly shaped tertiary structures
Fibrous: Extended, simple folds -- generally structural proteins
Cellular localization definition:Membrane: In direct physical contact with a membrane; generally
water insoluble.
Soluble: Water soluble; can be anywhere in the cell.59
Components of Tertiary Components of Tertiary StructureStructure
FoldFold – used differently in different contexts – – used differently in different contexts – most broadly a reproducible and recognizable 3 most broadly a reproducible and recognizable 3 dimensional arrangementdimensional arrangement
DomainDomain – a compact and self folding – a compact and self folding component of the protein that usually represents component of the protein that usually represents a discreet structural and functional unita discreet structural and functional unit
MotifMotif (aka supersecondary structure) a (aka supersecondary structure) a recognizable subcomponent of the fold – several recognizable subcomponent of the fold – several motifs usually comprise a domainmotifs usually comprise a domain
Like all fields these terms are not used strictly Like all fields these terms are not used strictly making capturing data that conforms to these making capturing data that conforms to these terms all the more difficultterms all the more difficult
60
Protein Structure Computational Goals
• Compare all known structures to each other• Compute distances between protein structures • Classify and organize all structures in a biologically
meaningful way• Discover conserved substructure domain• Discover conserved substructural motifs• Find common folding patterns and
structural/functional motifs• Discover relationship between structure and
function.• Study interactions between proteins and other
proteins, ligands and DNA (Protein Docking)• Use known structures and folds to infer structure
from sequence (Protein Threading)• Use known structural motifs to infer function from
structure• Many more…
Structural Classification of Proteins (SCOP)
http://scop.berkeley.edu/
• Classo Similar secondary
structure contento All α, all β,alternating
α/βetc
• Fold (Architecture)o Major structural
similarityo SSE’s in similar
arrangement
• Superfamily (Topology)o Probable common
ancestryo HMM family
membership
• Familyo Clear evolutionary
relationshipo Pairwise sequence
similarity > 25%
Classes of Protein Structures
• Mainly • Mainly alternating
o Parallel sheets, -- units
• o Anti-parallel sheets,
segregated and regionso helices mostly on one side of
sheet
Classes of Protein Structures
• Otherso Multi-domain, membrane and cell
surface, small proteins, peptides and fragments, designed proteins
Folds / Architectures
• Mainly αo Bundle o Non-Bundle
• Mainly βo Single sheeto Rollo Barrelo Clamo Sandwicho Prismo 4/6/7/8
Propellero Solenoid
• α/β and α+β• Closed
• Barrel
• Roll, ...
• Open
• Sandwich
• Clam, ...
The TIM Barrel Fold
A Conceptual Problem ...
• Protein DataBaseo Multiple Structure Viewerso Sequence & Structure Comparison Toolso Derived Data
SCOP CATH pFAM Go Terms
o Education on Protein Structureo Download Structures and Entire Database
PDB Protein Databasehttp://www.rcsb.org/pdb/
Program Web access
DIAL http://www.ncbs.res.in/~faculty/mini/ddbase/dial.html
DomainParser http://compbio.ornl.gov/structure/domainparser
DOMAK http://www.compbio.dundee.ac.uk/Software/Domak/domak.html
PDP http://123d.ncifcrf.gov/pdp.html
Web services for domain identification
70
Protein structure prediction has Protein structure prediction has remained elusive over half a remained elusive over half a
centurycentury
“Can we predict a protein structure from its amino acid sequence?”
71
Table 6-4
Protein Misfolding Diseases
72
7373
Misfolded proteins and Resulting Disorders
• causes protein fibrillation
Alzheimer’s Disease
• Cause ( BSE) “mad cow disease” in cattle
•Prions: molecules resembling ion channels, causing serious illnesses in animals and humans
73
A normal prion (left), compared to an aberrant, disease-causing prion (right).
Cellular processing of PrP. (1). The PrP can be internalized before degradation by proteosome or lysosomal proteases. In PrPsc, processing results in limited proteolysis (2). Limited degradation produces PrPsc fragments, which accumulate overtime and may have a role in cell death. These fragments lead to propagation of the PrPsc infection in adjacent cells.
A) Normal PrP can refold into PrPsc in the extra cellular space. B) Fragments of PrPsc may remain within the cell or may be externalized by transport vesicles or by cellular rupture upon death. C) Intracellular PrPsc could interact with PrP during intracellular processing resulting in conversion of PrP to PrPsc in the infected cell. D) Intracellular PrP may spontaneously change conformation to PrPsc.
MOLECULAR BIOLOGY OF PRION DISEASE
74
Possible routes of propagation of ingested prions. After oral uptake, prions may penetrate the intestinal mucosa through Mcells and reach Peyer's patches as well as the enteric nervous system. Depending on the host, prions may replicate and accumulate in spleen and lymph nodes. Myeloid dendritic cells are thought to mediate transport within the lymphoreticular system. From the lymphoreticular system and likely from other sites prions proceed along the peripheral nervous system to finally reach the brain, either directly via the vagus nerve or via the spinal cord, under involvement of the sympathetic nervous system.
75
PRIONS CONT.
Sheep with scrapie
Kuru and Creutzfeldt-Jakob Kuru and Creutzfeldt-Jakob disease in humansdisease in humans
76
How To Determine Protein Structure ?
77
Protein Structure Prediction
Structure:Traditional experimental methods:
X-Ray or NMR to solve structures;generate a few structures per day worldwidecannot keep pace for new protein sequences
Strong demand for structure prediction:more than 30,000 human genes;10,000 genomes will be sequenced in the next 10 years.
Unsolved problem after efforts of two decades.
78
Protein structure and functions are intimately related
Proteins interact with each other
The structure of a protein influences its function by determining the other molecules with which it can interact and the consequences of those interactions.
79
Experimental methods available to detect protein structure and interactions vary in their level of resolution.
These observations can be classified into four levels: (a) atomic scale, (b) binary interactions, (c) complex interactions, and (d) cellular scale.
80
Atomic-scale methods:showing the precise structural relationships between interacting atoms and residues
The highest resolution methods: e.g., X-ray crystallography and NMR
Not yet applied to study protein interactions in a high-throughput manner.
81
Binary-interaction methods:Methods to detect interactions between pairs of proteins
Do not reveal the precise chemical nature of the interactions but simply report such interactions take place
The major high-throughput technology: the yeast two-hybrid system
82
Complex-interaction methods:Methods to detect interactions between multiple proteins that form complexes.
Do not reveal the precise chemical nature of the interactions but simply report that such interactions take place.
The major high-throughput technology: systematic affinity purification followed by mass spectrometry
83
Cellular-scale methods:Methods to determine where proteins are localized (e.g., immunofluorescence)
It may be possible to determine the function of a protein directly from its localization
84
Principles of protein-Principles of protein-protein interaction analysisprotein interaction analysis
These small-scale analysis methods are also useful in proteomics because the large-scale methods tend to produce a significant number of false positives
They include (a) genetic methods, (b) bioinformatic methods, (c) Affinity-based biochemical methods, and (d) Physical methods.
85
Genetic methodsGenetic methodsClassical genetics can be used to investigate protein interactions by combining different mutations in the same cell or organism and observing the resulting phenotype
Suppressor mutation: A secondary mutation that can correct the phenotype of a primary mutation.
86
Suppressor mutationSuppressor mutation
87
Synthetic lethal effectSynthetic lethal effect
88
Bioinformatic methodsBioinformatic methods(A) The domain fusion method (or Rosetta
stone method):
The sequence of protein X (a single-domain protein from genome 1) is used as a similarity search query on genome 2. This identifies any single-domain proteins related to protein X and also any multi-domain proteins, which we can define as protein X-Y.
As part of the same protein, domain X and Y are likely to be functionally related.
89
The domain fusion method The domain fusion method (or Rosetta stone method)(or Rosetta stone method)The sequence of domain Y can then be used to identify single-domain orthologs in genome 1.
Thus, Gene Y, formerly an orphan with no known function, becomes annotated due to its association with Gene X. The two proteins are also likely to interact.
The sequence of protein X-Y may also identify further domain fusions, such as protein Y-Z. This links three proteins into a functional group and possibly identifies an interacting complex.
90
The domain fusion method The domain fusion method (or Rosetta stone method)(or Rosetta stone method)
91
Bioinformatics methodsBioinformatics methods(B) The phylogenetic profile:
It describes the pattern of presence or absence of a particular protein across a set of organisms whose genomes have been sequenced. If two proteins have the same phylogenetic profile (that is, the same pattern of presence or absence) in all surveyed genomes, it is inferred that the two proteins have a functional link.
A protein’s phylogenetic profile is a nearly unique characterization of its pattern of distribution among genomes. Hence any two proteins having identical or similar phylogenetic profiles are likely to be engaged in a common pathway or complex.
92
Sequence to Structure to FunctionSequence to Structure to Function>132L:_ LYSOZYME (E.C.3.2.1.17)KVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAKFESNFNTQATNRNTDGSTDYGILQINSRWWCNDGRTPGSRNLCNIPCSALLSSDITASVNCAKKIVSDGNGMNAWVAWRNRCKGTDVQAWIRGCRL
Cell wall degrading enzyme9393
Correlation Between Structure & Correlation Between Structure & FunctionFunction
•Homologous proteins
• Conserved sequence, similar structure and function
• Example: cytochrome c
•Similar function, different sequences
• Conserved and variable regions
• Example: dehydrogenases, kinases
•Similar structure, different function
• Example: thioredoxin 9494
Why must we predict structures?Why must we predict structures?
Limitations of current techniquesLimitations of current techniques Proteins often too large for molecular modeling Proteins often too large for molecular modeling
techniquestechniques
Difficult to crystallize some proteins (X-ray), slow Difficult to crystallize some proteins (X-ray), slow
throughputthroughput
Difficulty getting NMR results, reliance on modelingDifficulty getting NMR results, reliance on modeling
Far more sequences elucidated than structuresFar more sequences elucidated than structures
3D structures are better conserved than sequence 3D structures are better conserved than sequence
during evolution.during evolution.
9595
Predicting 3D structures from Predicting 3D structures from Sequence?Sequence?
Levinthal’s paradoxLevinthal’s paradoxprotein with 100 amino acids => 31100 protein with 100 amino acids => 31100
possible structurespossible structures
10-13 seconds to sample each structure 10-13 seconds to sample each structure
1.6*1027 years to go through each structure.1.6*1027 years to go through each structure.
Models improve these oddsModels improve these oddsBased on structure stability, x-ray crystallographyBased on structure stability, x-ray crystallography
9696
Structure prediction methodsStructure prediction methods
Ab initio Ab initio Determining structure without reference to existing Determining structure without reference to existing
protein structures.protein structures.
Comparative/Homology modelingComparative/Homology modeling Determines structure based on sequence similarity.Determines structure based on sequence similarity.
Fold recognition/threadingFold recognition/threading Limited number of foldsLimited number of folds
Determine structure similarities independent from Determine structure similarities independent from
sequence similarity.sequence similarity. 9797
http://www.bmm.icnet.uk/people/rob/CCP11BBS/
Structure Prediction ProcessStructure Prediction Process
9898
Protein Structure-function paradigmProtein Structure-function paradigm
Origins in the lock and key model for enzymatic activity.Origins in the lock and key model for enzymatic activity.
Claims that rigid 3D structure of protein determines the function. Claims that rigid 3D structure of protein determines the function.
Active areas of protein structure for example active sites on Active areas of protein structure for example active sites on
enzymes are highly conserved, other regions are more variable.enzymes are highly conserved, other regions are more variable.
Conserved motifs are responsible for conserved functionality.Conserved motifs are responsible for conserved functionality.
Forms the basis of proteomic studies and many other branches.Forms the basis of proteomic studies and many other branches.
Homology is claimed to be responsible for the correlation.Homology is claimed to be responsible for the correlation.
9999
Structure Similarity Refers to how well (or poorly) 3D folded
structures of proteins can be aligned Expected to reflect functional similarities
(interaction with other molecules)
Proteins in the TIM barrel fold family100
Structure Similarity Refers to how well (or poorly) 3D folded
structures of proteins can be aligned Is expected to reflect functional similarities
(interaction with other molecules) 2000: ~ 20,000 structures in PDB
~ 4,000 different folds (1:5 ratio) Three possible reasons:
- evolution, - physical constraints (e.g., few ways to maximize hydrophobic interactions), - limits in techniques used for structure determination
Given a new structure, the probability is high that it is similar to an existing one 101
Sequence Structure Function
sequencesimilarity
structuresimilarity
Why Comparing Protein Folded Structures?
Low sequence similarity may yield very similar structures Sometimes high sequence similarity yields different structures Structure comparison is expected to provide more pertinent
information about functional (dis-)similarity among proteins, especially with non-evolutionary relationships or non-detectable evolutionary relationships
102
Assisted Protein folding
Structure-function paradigm
Allosteric interactions Proteins as
biomachinesEnzyme catalysis
3D structure analysis
de novo proteins
Protein self organization
Protein misfolding and diseases Biomedicine
Protein engineering
Proteomics
Biotechnology
Extensions of ParadigmExtensions of Paradigm
103103
Paradigms in structure-function Paradigms in structure-function theorytheory
Orthologues possess similar function.Orthologues possess similar function.
Enzyme homolgues are enzymes.Enzyme homolgues are enzymes.
Regulatory domain homologues are not enzymes.Regulatory domain homologues are not enzymes.
Equivalent cellular functions are mediated in different species by Equivalent cellular functions are mediated in different species by
homologues.homologues.
Coding regions mutate at a slower rate than non coding regions.Coding regions mutate at a slower rate than non coding regions.
Domain homologues are localised in sequence and 3d structure and Domain homologues are localised in sequence and 3d structure and
possess same order of sec. structures.possess same order of sec. structures.
Disulphide bridges are invariant among homologues.Disulphide bridges are invariant among homologues.
Convergent evolution of sequences does not occur.Convergent evolution of sequences does not occur.
Domains possess single conformations.Domains possess single conformations.
104104
Function AssessmentFunction Assessment
Statistical analysis is hard to apply to functionality assessment.Statistical analysis is hard to apply to functionality assessment.
Function prediction by homology is thus qualitative requiring expert Function prediction by homology is thus qualitative requiring expert
knowledge and careful study.knowledge and careful study.
Assignment of experimental knowledge from one homologue to un-Assignment of experimental knowledge from one homologue to un-
characterized sequence is basis of function prediction.characterized sequence is basis of function prediction.
Works best in case of orthologues, can be misleading in paralogues. Works best in case of orthologues, can be misleading in paralogues.
Orthologue identification is most powerful tool in molecular function Orthologue identification is most powerful tool in molecular function
prediction. Paralogues also can have overlapping functionality, esp. prediction. Paralogues also can have overlapping functionality, esp.
in eukaryotes.in eukaryotes.
105105
Fold-function CorrelationFold-function Correlation
Common folds are found in unrelated protein Common folds are found in unrelated protein
families.families.
Folds accommodating many families are called Folds accommodating many families are called
“superfolds”. ex: TIM-barrel“superfolds”. ex: TIM-barrel
Folds in combination define overall function.Folds in combination define overall function.
Function is better assessed as a whole of parts.Function is better assessed as a whole of parts.
106106
Exceptions to the rule – natively Exceptions to the rule – natively unfolded proteinsunfolded proteins
Class of proteins inherently unstable structure, yet Class of proteins inherently unstable structure, yet
functional. Ex : Regulatory proteins.functional. Ex : Regulatory proteins.
Unfolded in physiological state, may fold during Unfolded in physiological state, may fold during
functional cycle.functional cycle.
Lack of fixed structure allows binding to multiple targets.Lack of fixed structure allows binding to multiple targets.
Target induces folding in the protein. Ex : protein-DNA or Target induces folding in the protein. Ex : protein-DNA or
protein-RNA interactionsprotein-RNA interactions
Unfolded proteins easier to transfer across membranes.Unfolded proteins easier to transfer across membranes.
107107
Ground rules for Structure Ground rules for Structure PredictionPrediction
Don't always believe what Don't always believe what programsprograms tell you tell you they're often misleading & sometimesthey're often misleading & sometimes wrong! wrong!
Don't always believe what Don't always believe what databasesdatabases tell you tell you they're often misleading & sometimesthey're often misleading & sometimes wrong!wrong!
Don't always believe what Don't always believe what oothersthers tell you tell you they're often misleading & sometimesthey're often misleading & sometimes wrong!wrong!
In short, don't be a In short, don't be a naivenaive user user when computers are applied to biology, it is vital to understand the when computers are applied to biology, it is vital to understand the
difference between mathematical & biological significancedifference between mathematical & biological significance
computers don’t do biology, computers don’t do biology, they do sums they do sums quickly!quickly!
108108
Implication of Protein Structure and Function
109
Structure-Based Drug Design
HIV protease inhibitor
Structure-based rational drug design is still a major method for drug discovery.
Implication of Protein Structure and Function
110
CD4 on Mini Scaffold
Rational engineering of a mini-protein that reproduces the core of the CD4 site interacting with
HIV-1 envelop glycoprotein
Vita, C. et al. Proc. Natl. Acad. Sci. USA (1999)
111
The EnvelopeThe Envelope
• Bi-layer lipid outer coat
• Layer of matrix protein p17
• ~72 copies of a complex HIV protein called spikes projects through the surface of the virus particle (Gelderblom et al.,Virology 1987)
• Spike protein
• Cap (3 gp120 molecule)• Stem (3 gp41molecule)
• EnvelopeEnvelope
• Viral CoreViral Core
Gelderblom et al. 1987
HIVHIV
112
The Viral CoreThe Viral Core
• Bullet shaped core or capsid made of viral protein p24
• The capsid surrounds 2 single strand of HIV RNA each of which has a copy of 9 viral gene
• ‘gap’ ‘pol’ and ‘env’ - codes for structural proteins to make new virus particle
• ‘env’-codes gp160 that is broken by a viral enzyme to form gp120 and gp41 (Janeway et al. 1999)
• ‘rev’, ‘vif’, ‘vpr’, ’nef’, ‘tat’, and ‘vpu’ - infection and replication
• REVERSE TRANSCRIPTASE, INTEGRASE and PROTEASE
• EnvelopeEnvelope
• Viral CoreViral Core
Gelderblom et al. 1987
HIVHIV
113
LymphocyteLymphocyte
T-CellT-Cell
Large granular Lymphocyte
Helper: (recognize antigen, releases cytokine which signals B-cell to produce antibody, Helps differentiation of B-cellSuppressor: After battle stops antibody formation, and slows down the activity of B- and other T-cellMemory: Memorize the antigen and helps in quick response on next attackCytotoxic T-cell: Recognize and directly kills infected cells
Plasma cells : produces antibody i.e. makes enough receptor molecule in soluble form which binds to the microorganismMemory cells: same as memory T cells and both works together
B-CellB-Cell
LGL or Natural Killer CellLGL or Natural Killer Cell
Function known fully, Kills tumor cell and virus infected cells
OUR IMMUNE SYSTEMOUR IMMUNE SYSTEM
114
Four basic kinds of T cell – T helper
Secretion of chemical messenger - cytokines which in turn stimulates more T helper cell.
So the T cells must have a particular receptor molecule to receive this message
This receptor molecule is referred to us as a CD or Cluster of Differentiation (Around 130 CDs has been identified so far)
CD4 is one of these receptor which is the main target of HIV to anchor to the T-cell and thereby get entry to the cell and replicate there
T-CellT-Cell
115
gp41 Fusogenic domain mediates Fusion
116
Fight Against AIDSFight Against AIDS
Reverse transcriptase, integrase and protease are the enzymes targeted to design the anti HIV drugs
9 of 15 FDA approved drugs targets ‘reverse transcriptase’ eg. Zidovedin, Nevirapine, delavirdine
These are big molecules and have severe side effects mainly on kidney
Most of the time not so effective – viral genome is able to undergo numerous mutations in its critical areas
Co-receptor (CCR5/CXCR4) blocking but low efficiency
The other way to prevent HIV attack may be to block the viral glycoprotein to come in contact to the CD4 receptor of T-cell
Make fool of the virus :Design a CD4 mimic using a mini scaffold
A group from France used scorpion toxin as scaffold and designed a chimeric protein which mimics CD4 activity11
7
Whole CD4 does not bind to GP 120. It is only a domain that binds. D1 the most important domain of CD4 to bind to gp120
D1 has a CDR2 like loop which is the main part of D1 domain to interact with gp120
Kwong et al. solved the structure of CD4-gp120 complex
Solved structure showed that Phe at 43rd position and Arg at 59th position are important for binding of CD4 to gp120
Interaction between CD4 and gp120
CD4
118
Whole CD4 does not bind to GP 120. It is only a domain that binds. D1 the most important domain of CD4 to bind to gp120
D1 has a CDR2 like loop which is the main part of D1 domain to interact with gp120
Kwong et al. solved the structure of CD4-gp120 complex
Solved structure showed that Phe at 43rd position and Arg at 59th position are important for binding of CD4 to gp120
Interaction between CD4 and gp120
119
Designing CD4 mimic using mini scaffoldDesigning CD4 mimic using mini scaffold
120
Designing of CD4M
Solvent exposed amino acid residues of the CDR2 like loop of CD4 was transferred to charybdotoxin scaffold
The chimeric miniprotein designed was 33 amino acid residues long
Solvent exposed residues
D1 domain of CD4 Charybdotoxinscaffold
121
Implications of designing a CD4 mimic
The designed mimic can be used as an antiviral agent
In complex with viral coat proteins the CD4 mimic can be used to formulate a vaccine against AIDS
The designed CD4 mimic can be used for developing broad spectrum neutralizing antibodies
122
Fight Against AIDSFight Against AIDS
123
• Samson, M. et al. Resistance to HIV-1 infection in Caucasian individuals bearing mutant alleles of the CCR5 chemokine receptor gene. Nature 382, 722–725 (1996)
• Liu, R. et al. Homozygous defect in HIV-1 coreceptor accounts for resistance of some multiply-exposed individuals to HIV-1 infection. Cell 86, 367–377 (1996)
Homozygous 32 deletion in the HIV co-receptor CCR5 confers resistance to HIV infection
CCR5-Δ32 (or CCR5-D32 or CCR5 delta 32) is a genetic variant of CCR5
This allele is found in 5-14% of Europeans but is rare in Africans and Asians
It has been hypothesized that this allele was favored by natural selection during the Black Death (1347), which was one of the worst epidemic in history & 1/3 of the population of Europe died
124
• Samson, M. et al. Resistance to HIV-1 infection in Caucasian individuals bearing mutant alleles of the CCR5 chemokine receptor gene. Nature 382, 722–725 (1996)
• Liu, R. et al. Homozygous defect in HIV-1 coreceptor accounts for resistance of some multiply-exposed individuals to HIV-1 infection. Cell 86, 367–377 (1996)
Homozygous 32 deletion in the HIV co-receptor CCR5 confers resistance to HIV infection
CCR5-Δ32 (or CCR5-D32 or CCR5 delta 32) is a genetic variant of CCR5
This allele is found in 5-14% of Europeans but is rare in Africans and Asians
It has been hypothesized that this allele was favored by natural selection during the Black Death (1347), which was one of the worst epidemic in history & 1/3 of the population of Europe died
125
The authors have created a CCR5 mutant T-cell and they have used these cells in vitro and also in in vivo mouse model to show that it confers complete resistance to HIV
They used an engineered Zinc Finger Nuclease to target human CCR5 efficiently to generate a double-strand break at a predetermined site in the CCR5 coding region same as CCR5-Δ32 genotype
126
BIOINFORMATICSBIOINFORMATICStook the leading roletook the leading roleFor this developmentFor this development
127
Bioinformatics Bottlenecks in Bangladesh
Lack of Facilities
Lack of coordinated research
Improper course curriculum in Statistics
Improper course curriculum in Biology
128
THANK THANK YOUYOU
129