Download - Varriation Within and Between Species
![Page 1: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/1.jpg)
1
Introduction to
Bioinformatics
![Page 2: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/2.jpg)
2
Introduction to Bioinformatics.
LECTURE 5: Variation within and between species
* Chapter 5: Are Neanderthals among us?
![Page 3: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/3.jpg)
3
Neandertal, Germany, 1856
Initial interpretations:
* bear skull* pathological idiot* Old Dutchman ...
![Page 4: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/4.jpg)
4
Introduction to BioinformaticsLECTURE 5: INTER- AND INTRASPECIES VARIATION
![Page 5: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/5.jpg)
5
Introduction to BioinformaticsLECTURE 5: INTER- AND INTRASPECIES VARIATION
![Page 6: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/6.jpg)
6
Introduction to BioinformaticsLECTURE 5: INTER- AND INTRASPECIES VARIATION
![Page 7: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/7.jpg)
7
Introduction to BioinformaticsLECTURE 5: INTER- AND INTRASPECIES VARIATION
5.1 Variation in DNA sequences
* Even closely related individuals differ in genetic sequences
* (point) mutations : copy error at certain location
* Sexual reproduction – diploid genome
![Page 8: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/8.jpg)
8
Introduction to Bioinformatics5.1 VARIATION IN DNA SEQUENCES
Diploid chromosomes
![Page 9: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/9.jpg)
9
Introduction to Bioinformatics5.1 VARIATION IN DNA SEQUENCES
Mitosis: diploid reproduction
![Page 10: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/10.jpg)
10
Introduction to Bioinformatics5.1 VARIATION IN DNA SEQUENCES
Meiosis: diploid (=double) → haploid (=single)
![Page 11: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/11.jpg)
11
Introduction to Bioinformatics5.1 VARIATION IN DNA SEQUENCES
* typing error rate very good typist: 1 error / 1K typed letters
* all our diploid cells constantly reproduce 7 billion letters
* typical cell copying error rate is ~ 1 error /1 Gbp
![Page 12: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/12.jpg)
12
Introduction to Bioinformatics5.1 VARIATION IN DNA SEQUENCES
GERM LINE
Reverse time and follow your cells:
• Now you count ~ 1013 cells• One generation ago you had 2 cells ‘somewhere’ in your parents body• Small T generations ago you had (2T – multiple ancestors) cells• Large T generations ago you counted #(fertile ancestors) cells• Congratulations: you are 3.4 billion years old !!!
Fast-forward time and follow your cells:
• Only a few cells in your reproductive organs have a chance to live on in the next generations
• The rest (including you) will die …
![Page 13: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/13.jpg)
13
Introduction to Bioinformatics5.1 VARIATION IN DNA SEQUENCES
GERM LINE MUTATIONS
This potentially immortal lineage of (germ) cells is called the GERM LINE
All mutations that we have accumulated are en route on the germ line
![Page 14: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/14.jpg)
14
Introduction to Bioinformatics5.1 VARIATION IN DNA SEQUENCES
* Polymorphism : multiple possibilities for a nucleotide: allelle
* Single Nucleotide Polymorphism – SNP (“snip”) point mutation example: AAATAAA vs AAACAAA
* Humans: SNP = 1/1500 bases = 0.067%
* STR = Short Tandem Repeats (microsatelites) example: CACACACACACACACACA …
* Transition - transversion
![Page 15: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/15.jpg)
15
Introduction to Bioinformatics5.1 VARIATION IN DNA SEQUENCES
Purines – Pyrimidines
![Page 16: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/16.jpg)
16
Introduction to Bioinformatics5.1 VARIATION IN DNA SEQUENCES
Transitions – Transversions
![Page 17: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/17.jpg)
17
Introduction to BioinformaticsLECTURE 5: INTER- AND INTRASPECIES VARIATION
5.2 Mitochondrial DNA
* mitochondriae are inherited only via the maternal line!!!
* Very suitable for comparing evolution, not reshuffled
![Page 18: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/18.jpg)
18
Introduction to Bioinformatics 5.2 MITOCHONDRIAL DNA
H.sapiens mitochondrion
![Page 19: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/19.jpg)
19
Introduction to Bioinformatics 5.2 MITOCHONDRIAL DNA
EM photograph of H. Sapiens mtDNA
![Page 20: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/20.jpg)
20
Introduction to Bioinformatics 5.2 MITOCHONDRIAL DNA
![Page 21: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/21.jpg)
21
Introduction to BioinformaticsLECTURE 5: INTER- AND INTRASPECIES VARIATION
5.3 Variation between species
* genetic variation accounts for morphological-physiological-behavioral variation
* Genetic variation (c.q. distance) relates to phylogenetic relation (=relationship)
* Necessity to measure distances between sequences: a metric
![Page 22: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/22.jpg)
22
Introduction to Bioinformatics5.3 VARIATION BETWEEN SPECIES
Substitution rate
* Mutations originate in single individuals
* Mutations can become fixed in a population
* Mutation rate: rate at which new mutations arise
* Substitution rate: rate at which a species fixes new mutations
* For neutral mutations
![Page 23: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/23.jpg)
23
Introduction to Bioinformatics5.3 VARIATION BETWEEN SPECIES
Substitution rate and mutation rate
* For neutral mutations
* ρ = 2Nμ*1/(2N) = μ
* ρ = K/(2T)
![Page 24: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/24.jpg)
24
Introduction to BioinformaticsLECTURE 5: INTER- AND INTRASPECIES VARIATION
5.4 Estimating genetic distance
* Substitutions are independent (?)
* Substitutions are random
* Multiple substitutions may occur
* Back-mutations mutate a nucleotide back to an earlier value
![Page 25: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/25.jpg)
25
Introduction to Bioinformatics 5.4 ESTIMATING GENETIC DISTANCE
Multiple substitutions and Back-mutations
conceal the real genetic distance
GACTGATCCACCTCTGATCCTTTGGAACTGATCGTTTCTGATCCACCTCTGATCCTTTGGAACTGATCGTTTCTGATCCACCTCTGATCCATCGGAACTGATCGTGTCTGATCCACCTCTGATCCATTGGAACTGATCGT
observed : 2 (= d)actual : 4 (= K)
![Page 26: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/26.jpg)
26
Introduction to Bioinformatics 5.4 ESTIMATING GENETIC DISTANCE
* Saturation: on average one substitution per site
* Two random sequences of equal length will match for approximately ¼ of their sites
* In saturation therefore the proportional genetic distance is ¼
![Page 27: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/27.jpg)
27
Introduction to Bioinformatics5.4 ESTIMATING GENETIC DISTANCE
* True genetic distance (proportion): K
* Observed proportion of differences: d
* Due to back-mutations K ≥ d
![Page 28: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/28.jpg)
28
Introduction to Bioinformatics 5.4 ESTIMATING GENETIC DISTANCE
SEQUENCE EVOLUTION is a Markov process: a sequence at generation (= time) t depends only the sequence at generation t-1
![Page 29: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/29.jpg)
29
Introduction to Bioinformatics 5.4 ESTIMATING GENETIC DISTANCE
The Jukes-Cantor model
Correction for multiple substitutions
Substitution probability per site per second is α
Substitution means there are 3 possible replacements (e.g. C → {A,G,T})
Non-substitution means there is 1 possibility(e.g. C → C)
![Page 30: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/30.jpg)
30
Introduction to Bioinformatics 5.4 THE JUKES-CANTOR MODEL
Therefore, the one-step Markov process has the following transition matrix:
MJC =
A C G T
A 1-α α/3 α/3 α/3
C α/3 1-α α/3 α/3
G α/3 α/3 1-α α/3
T α/3 α/3 α/3 1-α
![Page 31: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/31.jpg)
31
Introduction to Bioinformatics 5.4 THE JUKES-CANTOR MODEL
After t generations the substitution probability is:
M(t) = MJCt
Eigen-values and eigen-vectors of M(t):
λ1 = 1, (multiplicity 1): v1 = 1/4 (1 1 1 1)T
λ2..4 = 1-4α/3, (multiplicity 3): v2 = 1/4 (-1 -1 1 1)T
v3 = 1/4 (-1 -1 -1 1)T
v4 = 1/4 (1 -1 1 -1)T
![Page 32: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/32.jpg)
32
Introduction to Bioinformatics 5.4 THE JUKES-CANTOR MODEL
Spectral decomposition of M(t):
MJCt = ∑i λi
tviviT
Define M(t) as:
MJCt =
Therefore, substitution probability s(t) per site after t generations is:
s(t) = ¼ - ¼ (1 - 4α/3)t
r(t) s(t) s(t) s(t)
s(t) r(t) s(t) s(t)
s(t) s(t) r(t) s(t)
s(t) s(t) s(t) r(t)
![Page 33: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/33.jpg)
33
Introduction to Bioinformatics 5.4 THE JUKES-CANTOR MODEL
substitution probability s(t) per site after t generations:
s(t) = ¼ - ¼ (1 - 4α/3)t
observed genetic distance d after t generations ≈ s(t) :
d = ¼ - ¼ (1 - 4α/3)t
For small α :
( )dt 341ln
4
3 −−≈α
![Page 34: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/34.jpg)
34
Introduction to Bioinformatics 5.4 THE JUKES-CANTOR MODEL
For small α the observed genetic distance is:
The actual genetic distance is (of course):
K = αt
So:
This is the Jukes-Cantor formula : independent of α and t.
( )dt 341ln
4
3 −−≈α
( )dK 34
43 1ln −−≈
![Page 35: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/35.jpg)
35
Introduction to Bioinformatics 5.4 THE JUKES-CANTOR MODEL
The Jukes-Cantor formula :
For small d using ln(1+x) ≈ x : K ≈ d So: actual distance ≈ observed distance
For saturation: d ↑ ¾ : K →∞So: if observed distance corresponds to random sequence-distance then the actual distance becomes indeterminate
( )dK 34
43 1ln −−≈
![Page 36: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/36.jpg)
36
Jukes-Cantor
![Page 37: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/37.jpg)
37
Introduction to Bioinformatics 5.4 THE JUKES-CANTOR MODEL
Variance in K
If: K = f(d) then:
So:
Generation of a sequence of length n with substitution rate
d is a binomial process:
and therefore with variance: Var(d) = d(1-d)/n
Because of the Jukes-Cantor formula:
knk ddk
nk −−
= )1()(Prob
dd
K
341
1
−=
∂∂
)(Var)(Var2
dd
KK
∂∂=
22
2 dd
KKd
d
KK δδδδ
∂∂=⇒
∂∂=
![Page 38: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/38.jpg)
38
Introduction to Bioinformatics 5.4 THE JUKES-CANTOR MODEL
Variance in K
Variance: Var(d) = d(1-d)/n
Jukes-Cantor:
So:
dd
K
341
1
−=
∂∂
234 )1(
)1()(Var
dn
ddK
−−≈
![Page 39: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/39.jpg)
39
Var(K)
![Page 40: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/40.jpg)
40
Introduction to Bioinformatics 5.4 THE JUKES-CANTOR MODEL
EXAMPLE 5.4 on page 90
* Create artificial data with n = 1000: generate K* mutations
* Count d
* With Jukes-Cantor relation reconstruct estimate K(d)
* Plot K(d) – K*
![Page 41: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/41.jpg)
41
Introduction to Bioinformatics 5.4 EXAMPLE 5.4 on page 90
![Page 42: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/42.jpg)
42
Introduction to Bioinformatics 5.4 EXAMPLE 5.4 on page 90
![Page 43: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/43.jpg)
43
Introduction to Bioinformatics 5.4 EXAMPLE 5.4 on page 90
![Page 44: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/44.jpg)
44
Introduction to Bioinformatics 5.4 EXAMPLE 5.4 on page 90 (= FIG 5.3)
![Page 45: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/45.jpg)
45
Introduction to Bioinformatics 5.4 ESTIMATING GENETIC DISTANCE
The Kimura 2-parameter model
Include substitution bias in correction factor
Transition probability (G↔A and T↔C) per site per second is α
Transversion probability (G↔T, G↔C, A↔T, and A↔C) per site per second is β
![Page 46: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/46.jpg)
46
Introduction to Bioinformatics 5.4 THE KIMURA 2-PARAM MODEL
The one-step Markov process substitution matrix now becomes:
MK2P =
A C G T
A 1-α-β β α β
C β 1-α-β β α
G α β 1-α-β β
T β α β 1-α-β
![Page 47: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/47.jpg)
47
Introduction to Bioinformatics 5.4 THE KIMURA 2-PARAM MODEL
After t generations the substitution probability is:
M(t) = MK2Pt
Determine of M(t):
eigen-values {λi}
and eigen-vectors {vi}
![Page 48: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/48.jpg)
48
Introduction to Bioinformatics 5.4 THE KIMURA 2-PARAM MODEL
Spectral decomposition of M(t):
MK2Pt = ∑i λi
tviviT
Determine fraction of transitions per site after t generations : P(t)
Determine fraction of transitions per site after t generations : Q(t)
Genetic distance: K ≈ - ½ ln(1-2P-Q) – ¼ ln(1 – 2Q)
Fraction of substitutions d = P + Q → Jukes-Cantor
![Page 49: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/49.jpg)
49
Introduction to Bioinformatics 5.4 ESTIMATING GENETIC DISTANCE
Other models for nucleotide evolution
* Different types of transitions/transversions
* Pairwise substitutions GTR (= General Time Reversible) model
* Amino-acid substitutions matrices
* …
![Page 50: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/50.jpg)
50
Introduction to Bioinformatics 5.4 ESTIMATING GENETIC DISTANCE
Other models for nucleotide evolution
DEFICIT:
all above models assume symmetric substitution probs;
prob(A→T) = prob(T→A)
Now strong evidence that this assumption is not true
Challenge: incorporate this in a self-consistent model
![Page 51: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/51.jpg)
51
Introduction to BioinformaticsLECTURE 5: INTER- AND INTRASPECIES VARIATION
5.5 CASE STUDY: Neanderthals
* mtDNA of 206 H. sapiens from different regions
* Fragments of mtDNA of 2 H. neanderthaliensis, including the original 1856 specimen.
* all 208 samples from GenBank
* A homologous sequence of 800 bp of the HVR could be found in all 208 specimen.
![Page 52: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/52.jpg)
52
Introduction to Bioinformatics5.5 CASE STUDY: Neanderthals
* Pairwise genetic difference – corrected with Jukes-Cantor formula
* d(i,j) is JC-corrected genetic difference between pair (i,j);
* dT = d
* MDS (Multi Dimensional Scaling): translate distance table d to a nD-map X, here 2D-map
![Page 53: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/53.jpg)
53
Introduction to Bioinformatics5.5 CASE STUDY: Neanderthals
distance map d(i,j)
![Page 54: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/54.jpg)
54
Introduction to Bioinformatics5.5 CASE STUDY: Neanderthals
MDS
H. sapiens
H. neanderthaliensiswell-separated
![Page 55: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/55.jpg)
55
Introduction to Bioinformatics5.5 CASE STUDY: Neanderthals
phylogentic tree
![Page 56: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/56.jpg)
56
END of LECTURE 5
![Page 57: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/57.jpg)
57
Introduction to BioinformaticsLECTURE 5: INTER- AND INTRASPECIES VARIATION
![Page 58: Varriation Within and Between Species](https://reader034.vdocuments.us/reader034/viewer/2022052411/5564c927d8b42a7e178b5863/html5/thumbnails/58.jpg)
58