Download - Introduction to
![Page 1: Introduction to](https://reader035.vdocuments.us/reader035/viewer/2022062222/56814a4c550346895db76c07/html5/thumbnails/1.jpg)
1
Introduction to
Bioinformatics
![Page 2: Introduction to](https://reader035.vdocuments.us/reader035/viewer/2022062222/56814a4c550346895db76c07/html5/thumbnails/2.jpg)
2
Introduction to Bioinformatics.
LECTURE 6: Natural selection at the molecular basis
* Chapter 6: Fighting HIV
![Page 3: Introduction to](https://reader035.vdocuments.us/reader035/viewer/2022062222/56814a4c550346895db76c07/html5/thumbnails/3.jpg)
3
Introduction to BioinformaticsLECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS
6.1 Acquired Immune Deficiency Syndrome (AIDS)
* First noticed in 1979 as peculiar disease in US
* Only 1981 recognized as transmissible disease: AIDS
* Infectious agent: HIV (Human immunodeficiency Virus)
* Still not curable, more than 20 M victims, expensive medication (eg AZT) to keep the virus in check
* How does HIV manage to evade our attempts to destroy it?
![Page 6: Introduction to](https://reader035.vdocuments.us/reader035/viewer/2022062222/56814a4c550346895db76c07/html5/thumbnails/6.jpg)
6
HIV is a retrovirus
A retrovirus is an enveloped virus possessing a RNA genome, and replicate via a DNA intermediate.
Retroviruses rely on the enzyme reverse-transcriptase to perform the reverse transcription of its genome from RNA into DNA, which can then be integrated into the host's genome with an integrase enzyme.
Introduction to BioinformaticsLECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS
![Page 7: Introduction to](https://reader035.vdocuments.us/reader035/viewer/2022062222/56814a4c550346895db76c07/html5/thumbnails/7.jpg)
7
![Page 8: Introduction to](https://reader035.vdocuments.us/reader035/viewer/2022062222/56814a4c550346895db76c07/html5/thumbnails/8.jpg)
8
![Page 9: Introduction to](https://reader035.vdocuments.us/reader035/viewer/2022062222/56814a4c550346895db76c07/html5/thumbnails/9.jpg)
9Scanning electron micrograph of HIV-1 budding from lymphocyte.
![Page 10: Introduction to](https://reader035.vdocuments.us/reader035/viewer/2022062222/56814a4c550346895db76c07/html5/thumbnails/10.jpg)
10
![Page 11: Introduction to](https://reader035.vdocuments.us/reader035/viewer/2022062222/56814a4c550346895db76c07/html5/thumbnails/11.jpg)
11
Introduction to BioinformaticsLECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS
![Page 12: Introduction to](https://reader035.vdocuments.us/reader035/viewer/2022062222/56814a4c550346895db76c07/html5/thumbnails/12.jpg)
12
THE WORLD
Mark Newman (http://www-personal.umich.edu/~mejn/)
![Page 13: Introduction to](https://reader035.vdocuments.us/reader035/viewer/2022062222/56814a4c550346895db76c07/html5/thumbnails/13.jpg)
13
PEOPLE LIVING WITH HIV/AIDS
Mark Newman (http://www-personal.umich.edu/~mejn/)
![Page 14: Introduction to](https://reader035.vdocuments.us/reader035/viewer/2022062222/56814a4c550346895db76c07/html5/thumbnails/14.jpg)
14
Introduction to BioinformaticsLECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS
6.2 Evolution and natural selection
1859: Charles Darwin: on the origin of species by means of natural selection.
At the molecular level: natural selection :
* removes deleterious mutations: purifying or negative selection
* Promotes spread of advantageous mutation: positive selection
![Page 15: Introduction to](https://reader035.vdocuments.us/reader035/viewer/2022062222/56814a4c550346895db76c07/html5/thumbnails/15.jpg)
15
Introduction to BioinformaticsLECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS
6.3 HIV and the human immune system
* HIV has a 9.5 Kb RNA genome - no DNA!!!
* HIV is a retro-virus: RNA DNA virus
* HIV recognizes helper T-cells of the human immune system
* Infected T-cells have viral proteins sticking out that can be recognized by the immune system
* Short reproduction span: 1.5 days to reproduce
* RNA High error rate
![Page 16: Introduction to](https://reader035.vdocuments.us/reader035/viewer/2022062222/56814a4c550346895db76c07/html5/thumbnails/16.jpg)
16
Introduction to Bioinformatics6.3: HIV and the human immune system
Fast reproduction + High error rate =
FAST EVOLUTION
Evolutionary arms race between human immune system and HIV
![Page 17: Introduction to](https://reader035.vdocuments.us/reader035/viewer/2022062222/56814a4c550346895db76c07/html5/thumbnails/17.jpg)
17
Introduction to BioinformaticsLECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS
6.4 Quantifying natural selection on DNA sequences
* Mutations arise in the germ-line of one single individual and eventually become fixed in the population
* We observe fixed mutations as differences between individuals
* Most fixed mutations are neutral: genetic drift
* Some 80-90% of the non-neutral mutations are detrimental to the organismal function.
* A very small fraction of mutations is advantageous – but this is the engine for evolution.
![Page 18: Introduction to](https://reader035.vdocuments.us/reader035/viewer/2022062222/56814a4c550346895db76c07/html5/thumbnails/18.jpg)
18
Introduction to Bioinformatics6.4 QUANTIFYING NATURAL SELECTION ON DNA SEQUENCES
* How to measure whether mutations are neutral, deleterious, or advantageous?
* Experimentally very difficult: short-lived simple organisms, and large populations (typical a virus)
* Alternative: count number of mutations that can change the protein and those that don’t
* Synonymous and non-synonymous mutations.
![Page 19: Introduction to](https://reader035.vdocuments.us/reader035/viewer/2022062222/56814a4c550346895db76c07/html5/thumbnails/19.jpg)
19
Introduction to Bioinformatics6.4 QUANTIFYING NATURAL SELECTION ON DNA SEQUENCES
Remember the translation from nucleotides to aminoacids
(read from centre outwards)
![Page 20: Introduction to](https://reader035.vdocuments.us/reader035/viewer/2022062222/56814a4c550346895db76c07/html5/thumbnails/20.jpg)
20
Introduction to Bioinformatics6.4 QUANTIFYING NATURAL SELECTION ON DNA SEQUENCES
* Synonymous mutation: the new codon translates for the same amino-acid, example: GTT (Val) → GTA (Val).
* Non-synonymous mutations do not
* Mutations in the first position are sometimes synonymous (5%)
* Mutations in the second position are never synonymous
* Mutations in the third position are mostly synonymous
![Page 21: Introduction to](https://reader035.vdocuments.us/reader035/viewer/2022062222/56814a4c550346895db76c07/html5/thumbnails/21.jpg)
21
Introduction to Bioinformatics6.4 QUANTIFYING NATURAL SELECTION ON DNA SEQUENCES
* Almost all synonymous mutations are neutral.
* A priori, there are many more non-synonymous mutations possible than synonymous.
* In most genes 70% of the mutations are non-synonymous
* KA: #non-synonymous substitutions per non-synonymous site
* KS: #synonymous substitutions per synonymous site
![Page 22: Introduction to](https://reader035.vdocuments.us/reader035/viewer/2022062222/56814a4c550346895db76c07/html5/thumbnails/22.jpg)
22
Introduction to Bioinformatics6.4 QUANTIFYING NATURAL SELECTION ON DNA SEQUENCES
Motoo Kimura (1977):
Comparison of the non-synonymous to the synonymous substitutions in a gene tells us about the strength and form of the natural selection, i.e.: the ratio KA / KS.
Reasoning:
* Advantageous mutations are very rare* Deleterious mutations will ‘not’ spread through a population* Therefore, most mutations are neutral
Strong negative selection → Few non-synonymous substitutions
![Page 23: Introduction to](https://reader035.vdocuments.us/reader035/viewer/2022062222/56814a4c550346895db76c07/html5/thumbnails/23.jpg)
23
Introduction to Bioinformatics6.4 QUANTIFYING NATURAL SELECTION ON DNA SEQUENCES
* f0 = fraction of non-synonymous mutations that are neutral.* v = mutation rate
* # non-synonymous mutations after time t : KA = v f0 t* # synonymous mutations after time t : KS = v t
* KA / KS = f0
* Strong negative selection: f0 is small thus KA / KS < 1
* If KA / KS is > 1 this is evidence for advantageous non-synonymous mutations
![Page 24: Introduction to](https://reader035.vdocuments.us/reader035/viewer/2022062222/56814a4c550346895db76c07/html5/thumbnails/24.jpg)
24
Introduction to Bioinformatics6.4 QUANTIFYING NATURAL SELECTION ON DNA SEQUENCES
* Define: α = fraction of non-synonymous mutations that are advantageous
* Then after time t : KA = v(f0 + α)t
* and: KA / KS = f0 + α
* Thus KA / KS is gauge for the natural selection on genes
* negative selection dominates: KA / KS < 1
* positive selection dominates: KA / KS > 1
* But averaged over the gene!
![Page 25: Introduction to](https://reader035.vdocuments.us/reader035/viewer/2022062222/56814a4c550346895db76c07/html5/thumbnails/25.jpg)
25
Introduction to BioinformaticsLECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS
6.5 Estimating KA/KS
How to determine KA/KS?
Simplest way: just count and compare the number of synonymous and non-synonymous sites and ditto differences between two aligned strings
Correct for multiple substitutions (e.g. Jukes-Cantor)
Thus obtain a normalized ratio
![Page 26: Introduction to](https://reader035.vdocuments.us/reader035/viewer/2022062222/56814a4c550346895db76c07/html5/thumbnails/26.jpg)
26
Introduction to BioinformaticsLECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS
6.5 Estimating KA/KS
Based upon this idea the algorithm of Masatoshi Nei and Takashi Gojobori (1986):
Assume that rate of transitions and transversions is the same
There is no bias towards codon usage (i.e. no information on the ensuing protein)
![Page 27: Introduction to](https://reader035.vdocuments.us/reader035/viewer/2022062222/56814a4c550346895db76c07/html5/thumbnails/27.jpg)
27
Introduction to Bioinformatics6.5 ESTIMATING KA/KS
Nei-Gojobori algorithm
* Consider two aligned homologous sequences without gaps s1 and s2
* Sc = #synonymous sites between s1 and s2
* Ac = #non-synonymous sites between s1 and s2
* Sd = #synonymous differences between s1 and s2
* Ad = #non-synonymous differences between s1 and s2
![Page 28: Introduction to](https://reader035.vdocuments.us/reader035/viewer/2022062222/56814a4c550346895db76c07/html5/thumbnails/28.jpg)
28
Introduction to Bioinformatics6.5 ESTIMATING KA/KS
Nei-Gojobori algorithm
* As the two sequences s1 and s2 are aligned there should be a correspondence between their codons.
NOTE: point mutations only act on nucleotides and not on codons but here we analyse whether a mutation results in different aminoacids
![Page 29: Introduction to](https://reader035.vdocuments.us/reader035/viewer/2022062222/56814a4c550346895db76c07/html5/thumbnails/29.jpg)
29
Introduction to Bioinformatics6.5 ESTIMATING KA/KS
Nei-Gojobori algorithm
STEP 1: Count A and S sites
![Page 30: Introduction to](https://reader035.vdocuments.us/reader035/viewer/2022062222/56814a4c550346895db76c07/html5/thumbnails/30.jpg)
30
Introduction to Bioinformatics6.5 NEI-GOJOBORI ALGORITHM
STEP 1: Count A and S sites
Example:
Consider the alignment : TTTTTA
This is – say – the k-th codon of a sequence.
![Page 31: Introduction to](https://reader035.vdocuments.us/reader035/viewer/2022062222/56814a4c550346895db76c07/html5/thumbnails/31.jpg)
31
Introduction to Bioinformatics6.5: NEI-GOJOBORI ALGORITHM
Now define:
sc(ck) = #synonymous sites in this codonac(ck) = 1 - sc(ck) = #non-synonymous sites in this codon
fi : fraction of changes in at i-th position of codon that result in a synonymous change (i=1,2,3)
Then:
sc(ck) = ∑ fi and: ac(ck) = 3 - sc(ck) = 3 - ∑ fi
![Page 32: Introduction to](https://reader035.vdocuments.us/reader035/viewer/2022062222/56814a4c550346895db76c07/html5/thumbnails/32.jpg)
32
Introduction to Bioinformatics6.5: NEI-GOJOBORI ALGORITHM
In our example:
Codon: TTA codes for: Leucine
The 6 synonyms for Leucine (table 2.2 chapter 2, p. 27):
CTA CTG CTC CTT TTA TTG
f1 : 1 (ATA(-),GTA(-),CTA(+) from 3 changes, so: 1/3
f2 : 0 (TAA(-),TGA(-),TCA(-)) from 3 changes, so: 0/3
f3 : 1 (TTG(+),TTC(-),TTT(-)) from 3 changes, so: 1/3
So:
sc(ck) = ∑ fi = 2/3 ac(ck) = 3 - sc(ck) = 3 - ∑ fi = 7/3
![Page 33: Introduction to](https://reader035.vdocuments.us/reader035/viewer/2022062222/56814a4c550346895db76c07/html5/thumbnails/33.jpg)
33
Introduction to Bioinformatics6.5: NEI-GOJOBORI ALGORITHM
For a DNA sequence of r codons:
Sc = ∑k=1:r sc(ck)
Ac = 3r - Sc
For multiple sequences: average these quantities
Note: do not include the STOP codon
![Page 34: Introduction to](https://reader035.vdocuments.us/reader035/viewer/2022062222/56814a4c550346895db76c07/html5/thumbnails/34.jpg)
34
Introduction to Bioinformatics6.5: NEI-GOJOBORI ALGORITHM
Nei-Gojobori algorithm
STEP 2: Count A and S differences
![Page 35: Introduction to](https://reader035.vdocuments.us/reader035/viewer/2022062222/56814a4c550346895db76c07/html5/thumbnails/35.jpg)
35
Introduction to Bioinformatics6.5: NEI-GOJOBORI ALGORITHM
Now define:
sd(ck) = #synonymous differences in this codonad(ck) = 1 - sd(ck) = #non-synonymous differences
Example:
sequence 1: GTT (Val)sequence 2: GTA (Val)
there is only 1 difference and it is synonymous, so:
sd = 1 and ad = 0
![Page 36: Introduction to](https://reader035.vdocuments.us/reader035/viewer/2022062222/56814a4c550346895db76c07/html5/thumbnails/36.jpg)
36
Introduction to Bioinformatics6.5: NEI-GOJOBORI ALGORITHM
Multiple nucleotide differences between two codons: If there are n differences between two codons (n=0,1,2,3)then there are n! pathways from the first to the second codon
Example:
sequence 1: TTT (Phe)sequence 2: GTA (Val)
the two possible pathways are :
pathway 1 : TTT (Phe) ↔ GTT (Val) ↔ GTA (Val)pathway 2 : TTT (Phe) ↔ TTA (Leu) ↔ GTA (Val)
![Page 37: Introduction to](https://reader035.vdocuments.us/reader035/viewer/2022062222/56814a4c550346895db76c07/html5/thumbnails/37.jpg)
37
Introduction to Bioinformatics6.5: NEI-GOJOBORI ALGORITHM
Example (Continued):
the two possible pathways are :
pathway 1 : TTT (Phe) ↔ GTT (Val) ↔ GTA (Val)pathway 2 : TTT (Phe) ↔ TTA (Leu) ↔ GTA (Val)
Pathway 1 has: 1 non-syn and 1 syn substitutionPathway 2 has: 2 non-syn and 0 syn substitutions
Assume that both pathways occur with same probability
Therefore:
sd = 1 syn / 2 pathways = 0.5ad = 3 non-syns / 2 pathways = 1.5
![Page 38: Introduction to](https://reader035.vdocuments.us/reader035/viewer/2022062222/56814a4c550346895db76c07/html5/thumbnails/38.jpg)
38
Introduction to Bioinformatics6.5: NEI-GOJOBORI ALGORITHM
For a codon with n differences:
* Consider all n! pathways of n point-mutations* Evaluate sd and ad as above:* Average over all paths with equal weights* The total number of syn and non-syn differences is:
Sd = ∑k=1:r sd(ck)
Ad = ∑k=1:r ad(ck)
Note: Sd + Ad is the total number of differences between the two sequences
![Page 39: Introduction to](https://reader035.vdocuments.us/reader035/viewer/2022062222/56814a4c550346895db76c07/html5/thumbnails/39.jpg)
39
Introduction to Bioinformatics6.5: NEI-GOJOBORI ALGORITHM
Nei-Gojobori algorithm
STEP 3: Compute KA and KS
![Page 40: Introduction to](https://reader035.vdocuments.us/reader035/viewer/2022062222/56814a4c550346895db76c07/html5/thumbnails/40.jpg)
40
Introduction to Bioinformatics6.5: NEI-GOJOBORI ALGORITHM
* Approximate the proportion of synonymous (ds) and non-synonymous differences by:
and
* Use the Jukes-Cantor correction to find the number of substitutions:
For both ds and da to obtain KS and KA.
c
ds
S
Sd
ˆ
c
da
A
Ad
ˆ
dK 34
43 1ln
![Page 41: Introduction to](https://reader035.vdocuments.us/reader035/viewer/2022062222/56814a4c550346895db76c07/html5/thumbnails/41.jpg)
41
Introduction to Bioinformatics6.5: NEI-GOJOBORI ALGORITHM
SUMMARY of Nei-Gojobori algorithm:
see box on page 105 of the book
Remark: the algorithm is linear in the size of the sequences
![Page 42: Introduction to](https://reader035.vdocuments.us/reader035/viewer/2022062222/56814a4c550346895db76c07/html5/thumbnails/42.jpg)
42
Introduction to BioinformaticsLECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS
6.6 Case study: natural selection and the HIV genome
* HIV is a fast evolving virus
* HIV is a different kind of virus and has RNA and no DNA
* An analysis of KA/KS over a gene is not so informative as it averages over positive and negative selection
* Sliding window plot gives information on smaller scale of evolution pressure.
![Page 43: Introduction to](https://reader035.vdocuments.us/reader035/viewer/2022062222/56814a4c550346895db76c07/html5/thumbnails/43.jpg)
43
Introduction to BioinformaticsLECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS
![Page 44: Introduction to](https://reader035.vdocuments.us/reader035/viewer/2022062222/56814a4c550346895db76c07/html5/thumbnails/44.jpg)
44
Introduction to BioinformaticsLECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS
6.6 Case study: natural selection and the HIV genome
* STEP 1: ORF finding
![Page 45: Introduction to](https://reader035.vdocuments.us/reader035/viewer/2022062222/56814a4c550346895db76c07/html5/thumbnails/45.jpg)
45
Introduction to BioinformaticsLECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS
HIV-I genome
![Page 46: Introduction to](https://reader035.vdocuments.us/reader035/viewer/2022062222/56814a4c550346895db76c07/html5/thumbnails/46.jpg)
46
Introduction to BioinformaticsLECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS
6.6 Case study: natural selection and the HIV genome
* STEP 1: ORF finding
* STEP 2: Nei-Gojobori to find high KA/KS ratios with sliding window plot.
![Page 47: Introduction to](https://reader035.vdocuments.us/reader035/viewer/2022062222/56814a4c550346895db76c07/html5/thumbnails/47.jpg)
47
Introduction to BioinformaticsLECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS
HIV epitopes: the ENV geneAn epitope is the part of a macromolecule that is recognized by the immune system, specifically by antibodies.
ENV: Envelope and docking: strong selection pressure from human immune system
![Page 48: Introduction to](https://reader035.vdocuments.us/reader035/viewer/2022062222/56814a4c550346895db76c07/html5/thumbnails/48.jpg)
48
Introduction to BioinformaticsLECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS
![Page 49: Introduction to](https://reader035.vdocuments.us/reader035/viewer/2022062222/56814a4c550346895db76c07/html5/thumbnails/49.jpg)
49
Introduction to BioinformaticsLECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS
HIV epitopes: the GAG polyprotein
1500 bp : viral core
Strong selection pressure from human immune system
![Page 50: Introduction to](https://reader035.vdocuments.us/reader035/viewer/2022062222/56814a4c550346895db76c07/html5/thumbnails/50.jpg)
50
Introduction to BioinformaticsLECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS
![Page 51: Introduction to](https://reader035.vdocuments.us/reader035/viewer/2022062222/56814a4c550346895db76c07/html5/thumbnails/51.jpg)
51
Introduction to BioinformaticsLECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS
Visualisation of the fast evolution of the HIV virus with a phylogenetic tree
![Page 52: Introduction to](https://reader035.vdocuments.us/reader035/viewer/2022062222/56814a4c550346895db76c07/html5/thumbnails/52.jpg)
52
END of LECTURE 6