proper structural fold of protein molecule is essential to execute its precise functional mission md...

Proper structural fold of protein molecule is essential to execute its precise functional

mission

Md Abu Reza, PhD

Date : 24th March, 2012

Venue : Dept of Statistics, RU

Associate ProfessorDept of Genetic Eng & Biotech

University of Rajshahi

Bioinformatics Workshop-1

Higher Education Quality Enhance Project1

Molecular Organization of a cell

2

Proteins control all biological systems in a cell

They either act in constituting structure or perform distinct biological function in any physiological system

Many proteins perform their functions independently, the vast majority of proteins interact with others for proper biological activity

To perform the function effectively a proper structure is essential. Without proper structure a protein is useless or even cause malfunction in system

Conformation and functional-group chemistry controls function

Made up of 20 different types of amino-acid monomers

Proteins define what an organism is, what it looks like, how it behaves, etc. (responsible for most phenotype)

Protein – The Master Molecule

3

Protein Function

4

Function of Proteins

5

Protein Function is Related to Protein Function is Related to StructureStructure 6

What are Proteins ?What are Proteins ?

Proteins are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form in a biologically functional way.

A polypeptide is a single linear polymer chain of amino acids bonded together by peptide bonds

20 natural amino acids join in different permutation and combinations in different lengths

Once linked in the protein chain, an individual amino acid is called a residue, and the linked series of carbon, nitrogen, and oxygen atoms are known as the main chain or protein backbone 7

Lysine with the carbon atoms in the side-chain labeled

Amino Terminal

Carboxy Terminal

Amino AcidsAmino Acids

8

How peptide bonds are formed ?How peptide bonds are formed ?

•Here amino acids are both Alanine in which the R group is a single hydrogen.

•The carboxyl acid end on the first amino acid is orientated to the amino group of the second amino acid.

•The -OH group and -H are removed to form water (condensation reaction).

•The bond forms between the terminal carbon on the first amino acid and the nitrogen on the second amino acid.

•The backbone of the molecule has the sequence N-C-C-N-C-C

•Polypeptides maintain this sequence no matter how long the chain.

•The R groups project from the backbone.

•As the amino acids are added in translation the polypeptide folds up into it specific shape.

9

Element Color Name

Carbon light grey

Oxygen red

Hydrogen white

Nitrogen light blue

Sulfur yellow

Phosphorus orange

Chlorine green

Bromine, Zinc brown

Sodium blue

Iron orange

Magnesium dark green

Calcium dark grey

Unknown deep pink

Colour codes used for atoms

10

StereochemistryStereochemistry

The The CCOORRNN Law Law

HHHH

View in 3DView in 3D 11

Structure of the 20 naturally occurring Amino Acids

12

Structure of the 20 naturally occurring Amino Acids

13

The 20 amino acids can be divided into several groups based on their properties. Important factors are charge, hydrophilicity or hydrophobicity, size, and functional groups water-soluble proteins tend to have their hydrophobic residues (Leu, Ile, Val, Phe, and Trp) buried in the middle of the protein, whereas hydrophilic side-chains are exposed to the aqueous solvent.

Livingstone & Barton, CABIOS, 9, 745-756, 1993

Amino Acid Properties

14

Group I: Nonpolar amino acidsGroup I: Nonpolar amino acidsGroup I amino acids are alanine, valine, leucine, isoleucine, proline, phenylalanine, methionine, and tryptophan. The R groups of these amino acids have either aliphatic or aromatic groups. This makes them hydrophobic (“water fearing”). In aqueous solutions, globular proteins will fold into a three-dimensional shape to bury these hydrophobic side chains in the protein interior.

16

Group II: Polar, uncharged amino Group II: Polar, uncharged amino acidsacids Group II amino acids are glycine, serine, cysteine, threonine, tyrosine, asparagine, and glutamine. The side chains in this group possess a spectrum of functional groups. However, most have at least one atom (nitrogen, oxygen, or sulfur) with electron pairs available for hydrogen bonding to water and other molecules. Polar aa are hydrophilic.

17

Group III: Acidic amino acidsGroup III: Acidic amino acids The two amino acids in this group are aspartic acid and glutamic acid. Each has a carboxylic acid on its side chain that gives it acidic (proton-donating) properties. In an aqueous solution at physiological pH, all three functional groups on these amino acids will ionize, thus giving an overall charge of −1. In the ionic forms, the amino acids are called aspartate and glutamate. .

18

Group IV: Basic amino acidsGroup IV: Basic amino acidsThe three amino acids in this group are arginine, histidine, and lysine. Each side chain is basic (i.e., can accept a proton). Lysine and arginine both exist with an overall charge of +1 at physiological pH. The guanidino group in arginine’s side chain is the most basic of all R groups (a fact reflected in its pKa value of 12.5). As mentioned above for aspartate and glutamate, the side chains of arginine and lysine also form ionic bonds. The chemical structures of Group IV amino acids are

19

FunctionsFunctions

Diverse functions related to Diverse functions related to structurestructure

Structural components of cellsStructural components of cells Motor proteinsMotor proteins EnzymesEnzymes AntibodiesAntibodies HormonesHormones Hemoglobin/myoglobinHemoglobin/myoglobin Transport proteins in bloodTransport proteins in blood

Why Proteins Need Why Proteins Need Structure !Structure !

21

Protein structure - bondingProtein structure - bonding Interactions (forces) governing protein structureInteractions (forces) governing protein structure

Covalent InteractionCovalent Interaction Peptide bondPeptide bond Disulfide bondDisulfide bond

Non Covalent interactionNon Covalent interaction Hydrogen bondHydrogen bond Ionic bond (Ionic bond (Electrostatic interactions)Electrostatic interactions) Salt bridgeSalt bridge Van-der-Waals interactionsVan-der-Waals interactions Hydrophobic forceHydrophobic force

22

Covalent bond Covalent bond between sulfur between sulfur atoms on two atoms on two cysteine amino acidscysteine amino acids

Very strong Very strong IntereactionIntereaction

From: Elliott, WH. Elliott, DC. (1997) Biochemistry and Molecular Biology. Oxford: Oxford University

Press. p32

Disulfide bondDisulfide bond

23

Levels of Protein Structure

24

Primary structure (Amino acid sequence)↓

Secondary structure (α-helix, β-sheet )↓

Tertiary structure (Three-dimensional structure formed by assembly of secondary

structures)↓

Quaternary structure (Structure formed by more than one polypeptide chains)

Hierarchical nature of Hierarchical nature of protein structureprotein structure

25

Primary protein structurePrimary protein structure

Linear sequence of Linear sequence of amino acids forms amino acids forms primary structureprimary structure

Sequence essential Sequence essential for proper for proper physiological functionphysiological function

Bettelheim & March (1990) Introduction to Organic & Biochemistry

(International Edition) Philadelphia: Saunders College Publishing, p299

Primary structure of insulin

26

Sickle cell anemiaSickle cell anemia

27

Sic

kle-C

ell

Sic

kle-C

ell

Dis

ease

Dis

ease

28

Secondary structure = local folding of residues into regular patterns

29

Secondary protein Secondary protein structurestructure

Peptide chains fold into Peptide chains fold into secondary structures:secondary structures: - helix- helix - pleated sheet- pleated sheet Random coilRandom coil

30

Peptide Bonds are Planar

For a pair of amino acids linked by a peptide bond , six atoms lie in the same plane: the carbon atom and CO group of the first amino acid and the NH group and carbon atom of the second amino acid

The C-N distance in a peptide bond is typically 1.32Å Two configurations are possible for a planar peptide bond.

In the trans configuration, the 2 carbon atoms are on opposite sides of the peptide bond. In the cis confi guration, these groups are on the same side of the peptide bond. Almost all peptide bonds are trans

31

The peptide bond is planar

32

Torsion Angle In contrast with the peptide bond, the bonds between the

amino group and the carbon atom and between the carbon atom and the carbonyl group are pure single bonds. The two adjacent rigid peptide units may rotate about these bonds, taking on various orientations

This freedom of rotation about two bonds of each amino acid allows proteins to fold in many different ways. The rotations about these bonds can be specified by torsion angles

The angle of rotation about the bond between the nitrogen and the carbon atoms is called phi ( )

The angle of rotation about the bond between the carbon and the carbonyl carbon atoms is called psi ( )

A clockwise rotation about either bond as viewed from the nitrogen atom toward the carbon atom or from the carbonyl group toward the carbon atom corresponds to a positive value

The and angles determine the path of the polypeptide chain

33

The peptide bond is planar

34

Ramachandran plot -- shows and angles for secondary structures

A measure of the rotation of a and bond usually lie between - 180 and + 180

35

Secondary structure conformation

Residue Conformational Preference

Conformation helix A, L, M, Q, K, R, E Strand V, I,, Y, C, W, F, T Turn G, N, P, S, D

and angles for secondary structures

36

Alpha Helix• In the -helix, the carbonyl oxygen of residue “i” forms a hydrogen bond with the amide of residue “i+4”.

• Although each hydrogen bond is relatively weak in isolation, the sum of the hydrogen bonds in a helix makes it quite stable.

• The propensity of a peptide for forming an -helix also depends on its sequence.37

- helix- helix

Shape Shape maintained by maintained by hydrogen hydrogen bondsbonds between between C=O and N-H C=O and N-H groups in groups in backbonebackbone

R groups R groups directed directed outward from outward from coilcoil

From: Elliott, WH. Elliott, DC. (1997) Biochemistry and Molecular Biology. Oxford: Oxford University Press. p28

38

α-Helixα-Helix

A loop of 13 atoms is formed between the A loop of 13 atoms is formed between the hydrogen bond.hydrogen bond.

3.6 amino acids per turn of helix.3.6 amino acids per turn of helix.

Helices observed in proteins can range from Helices observed in proteins can range from four to over forty residues long, but a typical four to over forty residues long, but a typical helix contains about ten amino acids (about helix contains about ten amino acids (about three turns). three turns).

α-Helix is also called 3.6α-Helix is also called 3.61313 helix, compared to helix, compared to π-helix 4.4π-helix 4.41616 and 3 and 31010 helix. helix.

Proline is the α-breaker.Proline is the α-breaker. 39

Different amino-acid sequences have different propensities for forming α-helical structure. Methionine, alanine, leucine, uncharged glutamate, and lysine ("MALEK" in the amino-acid 1-letter codes) all have especially high helix-forming propensities, whereas proline and glycine have poor helix-forming propensities. Proline either breaks or kinks a helix, both because it cannot donate an amide hydrogen bond (having no amide hydrogen), and also because its side-chain interferes sterically with the backbone of the preceding turn - inside a helix, this forces a bend of about 30° in the helix axis

Propensities for forming α-helical structurePropensities for forming α-helical structure

40

Examples of α-Helical Proteins:

α-helical coiled coil proteins:

Form superhelix

Found in myosin, tropomyosin (muscle), fibrin (blood clots), keratin (hair)

Hair

Also fingernails and wool are α-helical proteins; silk is β 41

A polypeptide chain, called a β-strand, in a β-sheet is almost fully extended rather than being tightly coiled as in the -helix

The distance between adjacent amino acids along a strand is approximately 3.5Å, in contrast to a distance of 1. 5Å along an helix

sheet is formed by linking two or more strands lying next to one another through hydrogen bonds

All residues in Beta sheet have nearly the same and angle

Hydrogen bonds can only formed between adjacent polypeptide chains.

R groups are directed above and below backbone

β-sheet (-pleated sheet)-

42

• The adjacent polypeptide chains in a -sheet can be either parallel or anti-parallel (having the same or opposite amino-to-carboxyl orientations, respectively).

Parallel or Anti-parallel -Sheet

H bonds between 2 same aa H bonds between different aa

43

4444

Examples of β-sheet Proteins:

Fatty acid binding protein -> β barrels structure

Antibodies

OmpX: E. coli porin

more β sheets

44

Tertiary Structure: 3D structure of a polypeptide chain

Quaternary Structure: Polypeptide chains assemble into multisubunit structures

Cell-surface receptor CD4

Tetramer of hemoglobin

45

Deoxyhaemoglobin

QUATERNARY STRUCTURE

46

B-Turns and Loops

-turns allow the protein backbone to make abrupt turns.

• Again, the propensity of a peptide for forming b-turns depends on its sequence.

• In this reverse turns, the CO group of residue i of a polypeptide is hydrogen bonded to the NH group of residue i + 3

• In other cases, more elaborate structures are responsible for chain reversals. These structures are called loops or sometimes loops (omega loops) to suggest their overall shape

47

Why not here

48

Random coilRandom coil Not really random Not really random

structure, just non-structure, just non-repeatingrepeating ‘‘Random’ coil has fixed Random’ coil has fixed

structure within a structure within a given proteingiven protein

Commonly called Commonly called ‘connecting loop ‘connecting loop region’region’

Structure determined Structure determined by bonding of side by bonding of side chains (i.e. not chains (i.e. not necessarily necessarily hydrogen hydrogen bondsbonds))

From: Elliott, WH. Elliott, DC. (1997) Biochemistry and Molecular Biology. Oxford: Oxford University

Press. p27

50

Tertiary protein structureTertiary protein structure

Secondary structures fold and pack together to Secondary structures fold and pack together to

form tertiary structureform tertiary structure Usually globular shapeUsually globular shape

But can be fibrousBut can be fibrous

Tertiary structure stabilized by bonds between Tertiary structure stabilized by bonds between

R groupsR groups (i.e. side-chains) (i.e. side-chains)

51

Tertiary structure = global folding of a protein chain

52

Tertiary structures are quite varied

53

Quaternary structures

54

Each Protein has a unique Each Protein has a unique structurestructure

Amino acid sequence

NLKTEWPELVGKSVEEAKKVILQDKPEAQIIVLPVGTIVTMEYRIDRVRLFVDKLDNIAE

VPRVGFolding!

55

5656

Protein FoldingFolding is a

highly cooperative process (all or none)

Folding by stabilization of Intermediates

Protein Folding by Chaperons• Chaperone proteins provide a site where

misfolded proteins can fold correctly. 56

Central DogmaCentral Dogma

DNADNA

Pre mRNA Pre mRNA (hnRNA)

mRNAmRNA

proteinprotein

Transcription

Splicing, Processing and maturation

Translation

57

Chaparonins

Chaparonins Assist in Protein Folding

They segregate protein folding from “bad

influences” in the cell 58

Classes of proteinsFunctional definition:Enzymes: Accelerate biochemical reactions

Structural: Form biological structures

Transport: Carry biochemically important substances

Defense: Protect the body from foreign invaders

Structural definition:Globular: Complex folds, irregularly shaped tertiary structures

Fibrous: Extended, simple folds -- generally structural proteins

Cellular localization definition:Membrane: In direct physical contact with a membrane; generally

water insoluble.

Soluble: Water soluble; can be anywhere in the cell.59

Components of Tertiary Components of Tertiary StructureStructure

FoldFold – used differently in different contexts – – used differently in different contexts – most broadly a reproducible and recognizable 3 most broadly a reproducible and recognizable 3 dimensional arrangementdimensional arrangement

DomainDomain – a compact and self folding – a compact and self folding component of the protein that usually represents component of the protein that usually represents a discreet structural and functional unita discreet structural and functional unit

MotifMotif (aka supersecondary structure) a (aka supersecondary structure) a recognizable subcomponent of the fold – several recognizable subcomponent of the fold – several motifs usually comprise a domainmotifs usually comprise a domain

Like all fields these terms are not used strictly Like all fields these terms are not used strictly making capturing data that conforms to these making capturing data that conforms to these terms all the more difficultterms all the more difficult

60

Protein Structure Computational Goals

• Compare all known structures to each other• Compute distances between protein structures • Classify and organize all structures in a biologically

meaningful way• Discover conserved substructure domain• Discover conserved substructural motifs• Find common folding patterns and

structural/functional motifs• Discover relationship between structure and

function.• Study interactions between proteins and other

proteins, ligands and DNA (Protein Docking)• Use known structures and folds to infer structure

from sequence (Protein Threading)• Use known structural motifs to infer function from

structure• Many more…

http://creativecommons.org/licenses/by/3.0/

Structural Classification of Proteins (SCOP)

http://scop.berkeley.edu/

• Classo Similar secondary

structure contento All α, all β,alternating

α/βetc

• Fold (Architecture)o Major structural

similarityo SSE’s in similar

arrangement

• Superfamily (Topology)o Probable common

ancestryo HMM family

membership

• Familyo Clear evolutionary

relationshipo Pairwise sequence

similarity > 25%

http://scop.berkeley.edu/


Classes of Protein Structures

• Mainly • Mainly alternating

o Parallel sheets, -- units

• o Anti-parallel sheets,

segregated and regionso helices mostly on one side of

sheet


Classes of Protein Structures

• Otherso Multi-domain, membrane and cell

surface, small proteins, peptides and fragments, designed proteins


Folds / Architectures

• Mainly αo Bundle o Non-Bundle

• Mainly βo Single sheeto Rollo Barrelo Clamo Sandwicho Prismo 4/6/7/8

Propellero Solenoid

• α/β and α+β• Closed

• Barrel

• Roll, ...

• Open

• Sandwich

• Clam, ...


The TIM Barrel Fold


A Conceptual Problem ...


Fold versus Topology

Another example:Globin

vs.Colicin


• Protein DataBaseo Multiple Structure Viewerso Sequence & Structure Comparison Toolso Derived Data

SCOP CATH pFAM Go Terms

o Education on Protein Structureo Download Structures and Entire Database

PDB Protein Databasehttp://www.rcsb.org/pdb/

http://www.rcsb.org/pdb/


Program Web access

DIAL http://www.ncbs.res.in/~faculty/mini/ddbase/dial.html

DomainParser http://compbio.ornl.gov/structure/domainparser

DOMAK http://www.compbio.dundee.ac.uk/Software/Domak/domak.html

PDP http://123d.ncifcrf.gov/pdp.html

Web services for domain identification

70

Protein structure prediction has Protein structure prediction has remained elusive over half a remained elusive over half a

centurycentury

“Can we predict a protein structure from its amino acid sequence?”

71

Table 6-4

Protein Misfolding Diseases

72

7373

Misfolded proteins and Resulting Disorders

• causes protein fibrillation

Alzheimer’s Disease

• Cause ( BSE) “mad cow disease” in cattle

•Prions: molecules resembling ion channels, causing serious illnesses in animals and humans

73

A normal prion (left), compared to an aberrant, disease-causing prion (right).

Cellular processing of PrP. (1). The PrP can be internalized before degradation by proteosome or lysosomal proteases. In PrPsc, processing results in limited proteolysis (2). Limited degradation produces PrPsc fragments, which accumulate overtime and may have a role in cell death. These fragments lead to propagation of the PrPsc infection in adjacent cells.

A) Normal PrP can refold into PrPsc in the extra cellular space. B) Fragments of PrPsc may remain within the cell or may be externalized by transport vesicles or by cellular rupture upon death. C) Intracellular PrPsc could interact with PrP during intracellular processing resulting in conversion of PrP to PrPsc in the infected cell. D) Intracellular PrP may spontaneously change conformation to PrPsc.

MOLECULAR BIOLOGY OF PRION DISEASE

74

Possible routes of propagation of ingested prions. After oral uptake, prions may penetrate the intestinal mucosa through Mcells and reach Peyer's patches as well as the enteric nervous system. Depending on the host, prions may replicate and accumulate in spleen and lymph nodes. Myeloid dendritic cells are thought to mediate transport within the lymphoreticular system. From the lymphoreticular system and likely from other sites prions proceed along the peripheral nervous system to finally reach the brain, either directly via the vagus nerve or via the spinal cord, under involvement of the sympathetic nervous system.

75

PRIONS CONT.

Sheep with scrapie

Kuru and Creutzfeldt-Jakob Kuru and Creutzfeldt-Jakob disease in humansdisease in humans

76

How To Determine Protein Structure ?

77

Protein Structure Prediction

Structure:Traditional experimental methods:

X-Ray or NMR to solve structures;generate a few structures per day worldwidecannot keep pace for new protein sequences

Strong demand for structure prediction:more than 30,000 human genes;10,000 genomes will be sequenced in the next 10 years.

Unsolved problem after efforts of two decades.

78

Protein structure and functions are intimately related

Proteins interact with each other

The structure of a protein influences its function by determining the other molecules with which it can interact and the consequences of those interactions.

79

Experimental methods available to detect protein structure and interactions vary in their level of resolution.

These observations can be classified into four levels: (a) atomic scale, (b) binary interactions, (c) complex interactions, and (d) cellular scale.

80

Atomic-scale methods:showing the precise structural relationships between interacting atoms and residues

The highest resolution methods: e.g., X-ray crystallography and NMR

Not yet applied to study protein interactions in a high-throughput manner.

81

Binary-interaction methods:Methods to detect interactions between pairs of proteins

Do not reveal the precise chemical nature of the interactions but simply report such interactions take place

The major high-throughput technology: the yeast two-hybrid system

82

Complex-interaction methods:Methods to detect interactions between multiple proteins that form complexes.

Do not reveal the precise chemical nature of the interactions but simply report that such interactions take place.

The major high-throughput technology: systematic affinity purification followed by mass spectrometry

83

Cellular-scale methods:Methods to determine where proteins are localized (e.g., immunofluorescence)

It may be possible to determine the function of a protein directly from its localization

84

Principles of protein-Principles of protein-protein interaction analysisprotein interaction analysis

These small-scale analysis methods are also useful in proteomics because the large-scale methods tend to produce a significant number of false positives

They include (a) genetic methods, (b) bioinformatic methods, (c) Affinity-based biochemical methods, and (d) Physical methods.

85

Genetic methodsGenetic methodsClassical genetics can be used to investigate protein interactions by combining different mutations in the same cell or organism and observing the resulting phenotype

Suppressor mutation: A secondary mutation that can correct the phenotype of a primary mutation.

86

Suppressor mutationSuppressor mutation

87

Synthetic lethal effectSynthetic lethal effect

88

Bioinformatic methodsBioinformatic methods(A) The domain fusion method (or Rosetta

stone method):

The sequence of protein X (a single-domain protein from genome 1) is used as a similarity search query on genome 2. This identifies any single-domain proteins related to protein X and also any multi-domain proteins, which we can define as protein X-Y.

As part of the same protein, domain X and Y are likely to be functionally related.

89

The domain fusion method The domain fusion method (or Rosetta stone method)(or Rosetta stone method)The sequence of domain Y can then be used to identify single-domain orthologs in genome 1.

Thus, Gene Y, formerly an orphan with no known function, becomes annotated due to its association with Gene X. The two proteins are also likely to interact.

The sequence of protein X-Y may also identify further domain fusions, such as protein Y-Z. This links three proteins into a functional group and possibly identifies an interacting complex.

90

The domain fusion method The domain fusion method (or Rosetta stone method)(or Rosetta stone method)

91

Bioinformatics methodsBioinformatics methods(B) The phylogenetic profile:

It describes the pattern of presence or absence of a particular protein across a set of organisms whose genomes have been sequenced. If two proteins have the same phylogenetic profile (that is, the same pattern of presence or absence) in all surveyed genomes, it is inferred that the two proteins have a functional link.

A protein’s phylogenetic profile is a nearly unique characterization of its pattern of distribution among genomes. Hence any two proteins having identical or similar phylogenetic profiles are likely to be engaged in a common pathway or complex.

92

Sequence to Structure to FunctionSequence to Structure to Function>132L:_ LYSOZYME (E.C.3.2.1.17)KVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAKFESNFNTQATNRNTDGSTDYGILQINSRWWCNDGRTPGSRNLCNIPCSALLSSDITASVNCAKKIVSDGNGMNAWVAWRNRCKGTDVQAWIRGCRL

Cell wall degrading enzyme9393

Correlation Between Structure & Correlation Between Structure & FunctionFunction

•Homologous proteins

• Conserved sequence, similar structure and function

• Example: cytochrome c

•Similar function, different sequences

• Conserved and variable regions

• Example: dehydrogenases, kinases

•Similar structure, different function

• Example: thioredoxin 9494

Why must we predict structures?Why must we predict structures?

Limitations of current techniquesLimitations of current techniques Proteins often too large for molecular modeling Proteins often too large for molecular modeling

techniquestechniques

Difficult to crystallize some proteins (X-ray), slow Difficult to crystallize some proteins (X-ray), slow

throughputthroughput

Difficulty getting NMR results, reliance on modelingDifficulty getting NMR results, reliance on modeling

Far more sequences elucidated than structuresFar more sequences elucidated than structures

3D structures are better conserved than sequence 3D structures are better conserved than sequence

during evolution.during evolution.

9595

Predicting 3D structures from Predicting 3D structures from Sequence?Sequence?

Levinthal’s paradoxLevinthal’s paradoxprotein with 100 amino acids => 31100 protein with 100 amino acids => 31100

possible structurespossible structures

10-13 seconds to sample each structure 10-13 seconds to sample each structure

1.6*1027 years to go through each structure.1.6*1027 years to go through each structure.

Models improve these oddsModels improve these oddsBased on structure stability, x-ray crystallographyBased on structure stability, x-ray crystallography

9696

Structure prediction methodsStructure prediction methods

Ab initio Ab initio Determining structure without reference to existing Determining structure without reference to existing

protein structures.protein structures.

Comparative/Homology modelingComparative/Homology modeling Determines structure based on sequence similarity.Determines structure based on sequence similarity.

Fold recognition/threadingFold recognition/threading Limited number of foldsLimited number of folds

Determine structure similarities independent from Determine structure similarities independent from

sequence similarity.sequence similarity. 9797

http://www.bmm.icnet.uk/people/rob/CCP11BBS/

Structure Prediction ProcessStructure Prediction Process

9898

Protein Structure-function paradigmProtein Structure-function paradigm

Origins in the lock and key model for enzymatic activity.Origins in the lock and key model for enzymatic activity.

Claims that rigid 3D structure of protein determines the function. Claims that rigid 3D structure of protein determines the function.

Active areas of protein structure for example active sites on Active areas of protein structure for example active sites on

enzymes are highly conserved, other regions are more variable.enzymes are highly conserved, other regions are more variable.

Conserved motifs are responsible for conserved functionality.Conserved motifs are responsible for conserved functionality.

Forms the basis of proteomic studies and many other branches.Forms the basis of proteomic studies and many other branches.

Homology is claimed to be responsible for the correlation.Homology is claimed to be responsible for the correlation.

9999

Structure Similarity Refers to how well (or poorly) 3D folded

structures of proteins can be aligned Expected to reflect functional similarities

(interaction with other molecules)

Proteins in the TIM barrel fold family100

Structure Similarity Refers to how well (or poorly) 3D folded

structures of proteins can be aligned Is expected to reflect functional similarities

(interaction with other molecules) 2000: ~ 20,000 structures in PDB

~ 4,000 different folds (1:5 ratio) Three possible reasons:

- evolution, - physical constraints (e.g., few ways to maximize hydrophobic interactions), - limits in techniques used for structure determination

Given a new structure, the probability is high that it is similar to an existing one 101

Sequence Structure Function

sequencesimilarity

structuresimilarity

Why Comparing Protein Folded Structures?

Low sequence similarity may yield very similar structures Sometimes high sequence similarity yields different structures Structure comparison is expected to provide more pertinent

information about functional (dis-)similarity among proteins, especially with non-evolutionary relationships or non-detectable evolutionary relationships

102

Assisted Protein folding

Structure-function paradigm

Allosteric interactions Proteins as

biomachinesEnzyme catalysis

3D structure analysis

de novo proteins

Protein self organization

Protein misfolding and diseases Biomedicine

Protein engineering

Proteomics

Biotechnology

Extensions of ParadigmExtensions of Paradigm

103103

Paradigms in structure-function Paradigms in structure-function theorytheory

Orthologues possess similar function.Orthologues possess similar function.

Enzyme homolgues are enzymes.Enzyme homolgues are enzymes.

Regulatory domain homologues are not enzymes.Regulatory domain homologues are not enzymes.

Equivalent cellular functions are mediated in different species by Equivalent cellular functions are mediated in different species by

homologues.homologues.

Coding regions mutate at a slower rate than non coding regions.Coding regions mutate at a slower rate than non coding regions.

Domain homologues are localised in sequence and 3d structure and Domain homologues are localised in sequence and 3d structure and

possess same order of sec. structures.possess same order of sec. structures.

Disulphide bridges are invariant among homologues.Disulphide bridges are invariant among homologues.

Convergent evolution of sequences does not occur.Convergent evolution of sequences does not occur.

Domains possess single conformations.Domains possess single conformations.

104104

Function AssessmentFunction Assessment

Statistical analysis is hard to apply to functionality assessment.Statistical analysis is hard to apply to functionality assessment.

Function prediction by homology is thus qualitative requiring expert Function prediction by homology is thus qualitative requiring expert

knowledge and careful study.knowledge and careful study.

Assignment of experimental knowledge from one homologue to un-Assignment of experimental knowledge from one homologue to un-

characterized sequence is basis of function prediction.characterized sequence is basis of function prediction.

Works best in case of orthologues, can be misleading in paralogues. Works best in case of orthologues, can be misleading in paralogues.

Orthologue identification is most powerful tool in molecular function Orthologue identification is most powerful tool in molecular function

prediction. Paralogues also can have overlapping functionality, esp. prediction. Paralogues also can have overlapping functionality, esp.

in eukaryotes.in eukaryotes.

105105

Fold-function CorrelationFold-function Correlation

Common folds are found in unrelated protein Common folds are found in unrelated protein

families.families.

Folds accommodating many families are called Folds accommodating many families are called

“superfolds”. ex: TIM-barrel“superfolds”. ex: TIM-barrel

Folds in combination define overall function.Folds in combination define overall function.

Function is better assessed as a whole of parts.Function is better assessed as a whole of parts.

106106

Exceptions to the rule – natively Exceptions to the rule – natively unfolded proteinsunfolded proteins

Class of proteins inherently unstable structure, yet Class of proteins inherently unstable structure, yet

functional. Ex : Regulatory proteins.functional. Ex : Regulatory proteins.

Unfolded in physiological state, may fold during Unfolded in physiological state, may fold during

functional cycle.functional cycle.

Lack of fixed structure allows binding to multiple targets.Lack of fixed structure allows binding to multiple targets.

Target induces folding in the protein. Ex : protein-DNA or Target induces folding in the protein. Ex : protein-DNA or

protein-RNA interactionsprotein-RNA interactions

Unfolded proteins easier to transfer across membranes.Unfolded proteins easier to transfer across membranes.

107107

Ground rules for Structure Ground rules for Structure PredictionPrediction

Don't always believe what Don't always believe what programsprograms tell you tell you they're often misleading & sometimesthey're often misleading & sometimes wrong! wrong!

Don't always believe what Don't always believe what databasesdatabases tell you tell you they're often misleading & sometimesthey're often misleading & sometimes wrong!wrong!

Don't always believe what Don't always believe what oothersthers tell you tell you they're often misleading & sometimesthey're often misleading & sometimes wrong!wrong!

In short, don't be a In short, don't be a naivenaive user user when computers are applied to biology, it is vital to understand the when computers are applied to biology, it is vital to understand the

difference between mathematical & biological significancedifference between mathematical & biological significance

computers don’t do biology, computers don’t do biology, they do sums they do sums quickly!quickly!

108108

Implication of Protein Structure and Function

109

Structure-Based Drug Design

HIV protease inhibitor

Structure-based rational drug design is still a major method for drug discovery.

Implication of Protein Structure and Function

110

CD4 on Mini Scaffold

Rational engineering of a mini-protein that reproduces the core of the CD4 site interacting with

HIV-1 envelop glycoprotein

Vita, C. et al. Proc. Natl. Acad. Sci. USA (1999)

111

The EnvelopeThe Envelope

• Bi-layer lipid outer coat

• Layer of matrix protein p17

• ~72 copies of a complex HIV protein called spikes projects through the surface of the virus particle (Gelderblom et al.,Virology 1987)

• Spike protein

• Cap (3 gp120 molecule)• Stem (3 gp41molecule)

• EnvelopeEnvelope

• Viral CoreViral Core

Gelderblom et al. 1987

HIVHIV

112

The Viral CoreThe Viral Core

• Bullet shaped core or capsid made of viral protein p24

• The capsid surrounds 2 single strand of HIV RNA each of which has a copy of 9 viral gene

• ‘gap’ ‘pol’ and ‘env’ - codes for structural proteins to make new virus particle

• ‘env’-codes gp160 that is broken by a viral enzyme to form gp120 and gp41 (Janeway et al. 1999)

• ‘rev’, ‘vif’, ‘vpr’, ’nef’, ‘tat’, and ‘vpu’ - infection and replication

• REVERSE TRANSCRIPTASE, INTEGRASE and PROTEASE

• EnvelopeEnvelope

• Viral CoreViral Core

Gelderblom et al. 1987

HIVHIV

113

LymphocyteLymphocyte

T-CellT-Cell

Large granular Lymphocyte

Helper: (recognize antigen, releases cytokine which signals B-cell to produce antibody, Helps differentiation of B-cellSuppressor: After battle stops antibody formation, and slows down the activity of B- and other T-cellMemory: Memorize the antigen and helps in quick response on next attackCytotoxic T-cell: Recognize and directly kills infected cells

Plasma cells : produces antibody i.e. makes enough receptor molecule in soluble form which binds to the microorganismMemory cells: same as memory T cells and both works together

B-CellB-Cell

LGL or Natural Killer CellLGL or Natural Killer Cell

Function known fully, Kills tumor cell and virus infected cells

OUR IMMUNE SYSTEMOUR IMMUNE SYSTEM

114

Four basic kinds of T cell – T helper

Secretion of chemical messenger - cytokines which in turn stimulates more T helper cell.

So the T cells must have a particular receptor molecule to receive this message

This receptor molecule is referred to us as a CD or Cluster of Differentiation (Around 130 CDs has been identified so far)

CD4 is one of these receptor which is the main target of HIV to anchor to the T-cell and thereby get entry to the cell and replicate there

T-CellT-Cell

115

gp41 Fusogenic domain mediates Fusion

116

Fight Against AIDSFight Against AIDS

Reverse transcriptase, integrase and protease are the enzymes targeted to design the anti HIV drugs

9 of 15 FDA approved drugs targets ‘reverse transcriptase’ eg. Zidovedin, Nevirapine, delavirdine

These are big molecules and have severe side effects mainly on kidney

Most of the time not so effective – viral genome is able to undergo numerous mutations in its critical areas

Co-receptor (CCR5/CXCR4) blocking but low efficiency

The other way to prevent HIV attack may be to block the viral glycoprotein to come in contact to the CD4 receptor of T-cell

Make fool of the virus :Design a CD4 mimic using a mini scaffold

A group from France used scorpion toxin as scaffold and designed a chimeric protein which mimics CD4 activity11

7

Whole CD4 does not bind to GP 120. It is only a domain that binds. D1 the most important domain of CD4 to bind to gp120

D1 has a CDR2 like loop which is the main part of D1 domain to interact with gp120

Kwong et al. solved the structure of CD4-gp120 complex

Solved structure showed that Phe at 43rd position and Arg at 59th position are important for binding of CD4 to gp120

Interaction between CD4 and gp120

CD4

118

Whole CD4 does not bind to GP 120. It is only a domain that binds. D1 the most important domain of CD4 to bind to gp120

D1 has a CDR2 like loop which is the main part of D1 domain to interact with gp120

Kwong et al. solved the structure of CD4-gp120 complex

Solved structure showed that Phe at 43rd position and Arg at 59th position are important for binding of CD4 to gp120

Interaction between CD4 and gp120

119

Designing CD4 mimic using mini scaffoldDesigning CD4 mimic using mini scaffold

120

Designing of CD4M

Solvent exposed amino acid residues of the CDR2 like loop of CD4 was transferred to charybdotoxin scaffold

The chimeric miniprotein designed was 33 amino acid residues long

Solvent exposed residues

D1 domain of CD4 Charybdotoxinscaffold

121

Implications of designing a CD4 mimic

The designed mimic can be used as an antiviral agent

In complex with viral coat proteins the CD4 mimic can be used to formulate a vaccine against AIDS

The designed CD4 mimic can be used for developing broad spectrum neutralizing antibodies

122

Fight Against AIDSFight Against AIDS

123

• Samson, M. et al. Resistance to HIV-1 infection in Caucasian individuals bearing mutant alleles of the CCR5 chemokine receptor gene. Nature 382, 722–725 (1996)

• Liu, R. et al. Homozygous defect in HIV-1 coreceptor accounts for resistance of some multiply-exposed individuals to HIV-1 infection. Cell 86, 367–377 (1996)

Homozygous 32 deletion in the HIV co-receptor CCR5 confers resistance to HIV infection

CCR5-Δ32 (or CCR5-D32 or CCR5 delta 32) is a genetic variant of CCR5

This allele is found in 5-14% of Europeans but is rare in Africans and Asians

It has been hypothesized that this allele was favored by natural selection during the Black Death (1347), which was one of the worst epidemic in history & 1/3 of the population of Europe died

124

• Samson, M. et al. Resistance to HIV-1 infection in Caucasian individuals bearing mutant alleles of the CCR5 chemokine receptor gene. Nature 382, 722–725 (1996)

• Liu, R. et al. Homozygous defect in HIV-1 coreceptor accounts for resistance of some multiply-exposed individuals to HIV-1 infection. Cell 86, 367–377 (1996)

Homozygous 32 deletion in the HIV co-receptor CCR5 confers resistance to HIV infection

CCR5-Δ32 (or CCR5-D32 or CCR5 delta 32) is a genetic variant of CCR5

This allele is found in 5-14% of Europeans but is rare in Africans and Asians

It has been hypothesized that this allele was favored by natural selection during the Black Death (1347), which was one of the worst epidemic in history & 1/3 of the population of Europe died

125

The authors have created a CCR5 mutant T-cell and they have used these cells in vitro and also in in vivo mouse model to show that it confers complete resistance to HIV

They used an engineered Zinc Finger Nuclease to target human CCR5 efficiently to generate a double-strand break at a predetermined site in the CCR5 coding region same as CCR5-Δ32 genotype

126

BIOINFORMATICSBIOINFORMATICStook the leading roletook the leading roleFor this developmentFor this development

127

Bioinformatics Bottlenecks in Bangladesh

Lack of Facilities

Lack of coordinated research

Improper course curriculum in Statistics

Improper course curriculum in Biology

128

THANK THANK YOUYOU

129

proper structural fold of protein molecule is essential to execute its precise functional mission md...

Documents

amino group

natural amino acids

individual amino acid

function of proteins

different types of amino

protein function

protein chain

acid monomers proteins