molecular evolution and population genetics with matlab ® james j. cai
Post on 18-Dec-2015
228 views
TRANSCRIPT
![Page 1: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/1.jpg)
Molecular Evolution and Population Molecular Evolution and Population Genetics with MATLABGenetics with MATLAB®®
James J. Cai
![Page 2: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/2.jpg)
OutlineOutline
Introduction Data Manipulation Phylogenetic Inference Nonneutrality Detection
![Page 3: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/3.jpg)
I n t r o d u c t i o nI n t r o d u c t i o n
![Page 4: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/4.jpg)
MBEToolbox and PGEToolboxMBEToolbox and PGEToolbox
MBEToolbox (Molecular Biology & Evolution) Since March 2003 248 functions (version 2.20) Published
BMC Bioinformatics 2005, 6:64 (22Mar2005) – version 1.0 Evolutionary Bioinformatics Online, in press – version 2.0
m-source code released
PGEToolbox (Population Genetics & Evolution) Since October 2005 227 functions (version 1.37) Published in Journal of Heredity 2008 Feb 29. m-source code released
![Page 5: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/5.jpg)
MBEToolbox Broad classes of substitution models (nt, aa and codon) Wide range of evolutionary distances Synonymous & nonsynonymous substitution rate calculation Model/Tree parameter optimization Site-specific rate estimation (ML and EB methods)
PGEToolbox Sequences and SNP (genotype and haplotype) data manipulation Neutrality tests Coalescent simulations Recombination, LD, and long haplotype tests
![Page 6: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/6.jpg)
MBEToolbox GUIMBEToolbox GUI
Figure 1. MBEToolbox GUI. (a) Sequences submenu; (b) Distances submenu; (c) Phylogeny submenu and DNAML dialog; (d) Polymorphism submenu and DPRS table dialog.
(a)
(b)
(c)
(d)
![Page 7: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/7.jpg)
![Page 8: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/8.jpg)
![Page 9: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/9.jpg)
500 1000 1500 2000 2500 30000
0.5
1
1.5
2
2.5
Subs
titutio
nra
te
Codon site
Sliding Windows Analysis
synnonsyn
500 1000 1500 2000 2500 3000
-100
-80
-60
-40
-20
0
20
Enhanced Sliding Windows Analysis
Subs
titutio
nra
te
-80-60
-40-20
0
0
50
100
0
10
20
30
40
50
XY
Z
0 10 20 30 40 50 60 70 80 90 1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Generations
Alle
le F
requ
ency
p
Change in allele frequency for population (diploid) of size N=100
0 50 100 150 200 250 300 350 400 450 5000
0.5
1
0 500 1000 1500 2000 2500 3000 3500 4000 4500 50000
0.5
1
0 0.05 0.1 0.15 0.2 0.25 0.3 0.350
50
100
150
A R N D C Q E G H I L K M F P S T W Y V
ARNDCQEGHI
LKMFPST
WYV
JTT
0 0.05 0.1 0.15 0.2 0.25 0.3
human-ECP
chimp-ECP
gorla-ECP
orang-ECP
macaq-ECP
human-EDN
chimp-EDN
gorla-EDN
orang-EDN
macaq-EDN
tamar-EDN
0 0.05 0.1 0.15 0.2 0.250
0.02
0.04
0.06
0.08
0.1
0.12Transitions & Transversions vs. Distance
Distance (HKY)
Tra
nsiti
ons
& T
rans
vers
ions
TransitionTransversion
Figure 2. MBEToolbox Output Examples. (a) Graph submenu; (b) Alignment shading; (c) Enhanced sliding window analysis; (d) Tajima’s test; (e) JTT matrix; (f) Distance vs. transition and transversion; (g) Genetic drift simulation; (h) Distance matrix; (i) NJ tree; (j) 3D Z-curve; (k) MCMC estimation of JC distance.
(a)
(c)(d)
(h)
(e)
(f)
(i)
(g)
(j) (k)
MBEToolbox OutputsMBEToolbox Outputs
![Page 10: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/10.jpg)
PGEToolbox GUIPGEToolbox GUI
![Page 11: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/11.jpg)
snptoolsnptool
![Page 12: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/12.jpg)
snptoolsnptool
![Page 13: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/13.jpg)
![Page 14: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/14.jpg)
D a t aD a t aM a n i p u l a t i o nM a n i p u l a t i o n
![Page 15: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/15.jpg)
human-ECP ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTATGGGTGhuman-ECP ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTATGGGTGchimp-ECP ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTATGGGTGchimp-ECP ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTATGGGTGgorla-ECP ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTATGGGTGgorla-ECP ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTATGGGTGorang-ECP ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTAGTGGTGorang-ECP ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTAGTGGTGmacaq-ECP ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTATGGGTGmacaq-ECP ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTATGGGTGhuman-EDN ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTCTGGCTGhuman-EDN ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTCTGGCTGchimp-EDN ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTCTGGCTGchimp-EDN ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTCTGGCTGgorla-EDN ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTCTGGCAGgorla-EDN ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTCTGGCAGorang-EDN ATGGTTCCAAAACTGTTCACTTCTCAAATTTCCCTGCTTCTTCTGTTGGGGCTTCTGGCTGorang-EDN ATGGTTCCAAAACTGTTCACTTCTCAAATTTCCCTGCTTCTTCTGTTGGGGCTTCTGGCTGmacaq-EDN ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTATGGGTGmacaq-EDN ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTATGGGTGtamar-EDN ATGGTTCCAAAACTGTTCACTTCCCAAATTTGCGTGCTTCTTCTTTTCGGGCTTTTGAGTGtamar-EDN ATGGTTCCAAAACTGTTCACTTCCCAAATTTGCGTGCTTCTTCTTTTCGGGCTTTTGAGTG
![Page 16: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/16.jpg)
human-ECP 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 human-ECP 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 chimp-ECP 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 chimp-ECP 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 gorla-ECP 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 gorla-ECP 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 orang-ECP 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 orang-ECP 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 macaq-ECP 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 macaq-ECP 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 human-EDN 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 human-EDN 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 chimp-EDN 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 chimp-EDN 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 gorla-EDN 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 gorla-EDN 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 orang-EDN 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 4 2 1 1 1 4 4 4 2 2 2 4 3 2 4 4 2 4 4 2 4 3 4 4 orang-EDN 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 4 2 1 1 1 4 4 4 2 2 2 4 3 2 4 4 2 4 4 2 4 3 4 4 macaq-EDN 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 macaq-EDN 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 tamar-EDN 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 2 3 4 3 2 4 4 2 4 4 2 4 4 4 4 tamar-EDN 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 2 3 4 3 2 4 4 2 4 4 2 4 4 4 4
![Page 17: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/17.jpg)
>S = [1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1;>S = [1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1;1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 4 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 4 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1]1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1]
![Page 18: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/18.jpg)
>S = [1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1;>S = [1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1;1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 4 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 4 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1]1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1]
>> S(1,:)>> S([3,4],:)>> S(:,[1:3:end])
![Page 19: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/19.jpg)
========================================== ========================================== Genotype View Genotype View ========================================== ========================================== Idv_1Idv_1 CT TT GG GG CT TT AG GG TT CC GG TT TT GG AACT TT GG GG CT TT AG GG TT CC GG TT TT GG AA Idv_2Idv_2 CT TT GG GG CT TT GG GG TT CC GG TT TT GG AACT TT GG GG CT TT GG GG TT CC GG TT TT GG AA Idv_3Idv_3 TT TT GG GG TT CT GG GG TT CC GG TT AT GG AATT TT GG GG TT CT GG GG TT CC GG TT AT GG AA Idv_4Idv_4 CT TT GT AG CT CT GG GG TT CC GG TT AT GG AACT TT GT AG CT CT GG GG TT CC GG TT AT GG AA Idv_5Idv_5 TT TT GT AG TT CT GG GG TT CC GG TT AT GG AATT TT GT AG TT CT GG GG TT CC GG TT AT GG AA Idv_6Idv_6 TT TT GG GG TT TT AG GG TT CC GG TT TT GG AATT TT GG GG TT TT AG GG TT CC GG TT TT GG AA Idv_7Idv_7 CT TT GG GG CT TT GG GG TT CC GG TT TT GG AACT TT GG GG CT TT GG GG TT CC GG TT TT GG AA Idv_8Idv_8 TT TT GT AG TT CT GG GG TT CC GG TT AT GG AATT TT GT AG TT CT GG GG TT CC GG TT AT GG AA Idv_9Idv_9 TT TT GG GG TT TT GG GG CT CC GG TT TT AG AATT TT GG GG TT TT GG GG CT CC GG TT TT AG AA Idv_10Idv_10 CT TT GG GG CT TT GG GG CT CC GG TT TT GG AACT TT GG GG CT TT GG GG CT CC GG TT TT GG AA Idv_11Idv_11 CT TT GT AG CT CT GG GG TT CC GG TT AT GG AACT TT GT AG CT CT GG GG TT CC GG TT AT GG AA Idv_12Idv_12 TT TT GG GG TT TT AG GG TT CC GG TT TT GG AATT TT GG GG TT TT AG GG TT CC GG TT TT GG AA Idv_13Idv_13 CT TT GG GG CT TT AG GG TT CC GG TT TT GG AACT TT GG GG CT TT AG GG TT CC GG TT TT GG AA Idv_14Idv_14 CT TT GG GG CT TT GG GG TT CC GG TT TT GG AACT TT GG GG CT TT GG GG TT CC GG TT TT GG AA Idv_15Idv_15 TT TT GG GG TT TT GG AG TT CT AG CT AT GG AGTT TT GG GG TT TT GG AG TT CT AG CT AT GG AG Idv_16Idv_16 CT TT GG GG CT TT GG GG TT CC GG TT TT GG AACT TT GG GG CT TT GG GG TT CC GG TT TT GG AA Idv_17Idv_17 TT TT GT AG TT CT GG GG TT CC GG TT AT GG AATT TT GT AG TT CT GG GG TT CC GG TT AT GG AA Idv_18Idv_18 TT CT GT AG TT CT AG GG TT CC GG TT AT GG AATT CT GT AG TT CT AG GG TT CC GG TT AT GG AA Idv_19Idv_19 TT TT GG GG TT TT GG GG TT CC GG TT TT GG AATT TT GG GG TT TT GG GG TT CC GG TT TT GG AA Idv_20Idv_20 CT CT GG GG CT TT GG GG TT CC GG TT TT AG AACT CT GG GG CT TT GG GG TT CC GG TT TT AG AA
![Page 20: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/20.jpg)
Human Genome Diversity Project (HGDP)Human Genome Diversity Project (HGDP)
1,043 individuals x 650,000 SNP loci 51 different populations from Africa, Europe, the Middle
East, South and Central Asia, East Asia, Oceania and the Americas.
Using uint8uint8 and memorymapmemorymap functions
![Page 21: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/21.jpg)
P h y l o g e n e t i cP h y l o g e n e t i cI n f e r e n c e sI n f e r e n c e s
![Page 22: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/22.jpg)
Phylogenetic TreePhylogenetic Tree
Binary tree Tree topology Branch lengths (evolutionary time)
![Page 23: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/23.jpg)
Computing the likelihood of a tree modelComputing the likelihood of a tree model
Assumption Given correct multiple alignment of n sequences of length L
X = {xi,j} jth character in the ith sequence
Xj = jth column of the alignment
Tree model Q : substitution rate matrix of dimension : tree topology : a vector of branch lengths : a vector of equilibrium base frequencies
xi,jxj
![Page 24: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/24.jpg)
Computing the likelihood of a tree modelComputing the likelihood of a tree model
The likelihood of a given tree model
With assumption of site independence Reduce computing the likelihood of each column Xi
Again, it can be reduced the summation of all possible labelings of ancestral nodes of a tree
L is a labeling of the n-1 ancestral nodes of the tree with elements from
![Page 25: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/25.jpg)
Computing the likelihood of a tree modelComputing the likelihood of a tree model
Probability must be summed over all possible combinations of ancestral nucleotides.
(Here we have 3 internal nodes giving 64 possible combinations)
![Page 26: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/26.jpg)
Computing the likelihood of a tree modelComputing the likelihood of a tree model
Assume ancestral states were ‘A’s. Start computation at any internal or external node.
Pr = Pr = GG∙∙ PPGAGA(t(t11))∙∙PPAAAA(t(t22))∙∙ PPAAAA(t(t33))∙ ∙ • • • • •• ∙∙ PPACAC(t(t66))1 2 3 6Pr ( ) ( ) ( ) ( )G GA AA AA ACP t P t P t P t
![Page 27: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/27.jpg)
Models of DNA SubstitutionModels of DNA Substitution
Probabilistic model parameters (simplest case): Continuous-time Markov rate matrix:
Q={qi,j}, where q is nucleotide-nucleotide substitution rate;
P(b|a,t) : probability that a base b is substituted for a base a over a branch of length t
![Page 28: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/28.jpg)
Instantaneous Rate Matrix
Equilibrium Frequencies
![Page 29: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/29.jpg)
Transition Rate MatrixInstantaneous Rate Matrix
Equilibrium Frequencies
![Page 30: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/30.jpg)
Probability Matrix (Function of time t)
Transition Rate Matrix
Equilibrium Frequencies
Instantaneous Rate Matrix
![Page 31: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/31.jpg)
Define a Substitution ModelDefine a Substitution Model
>> model=>> model=modeljcmodeljcans = ans =
name: 'jc'name: 'jc' R: [4x4 double]R: [4x4 double] freq: [0.2500 0.2500 0.2500 0.2500]freq: [0.2500 0.2500 0.2500 0.2500]
>> Q=model.R*>> Q=model.R*diagdiag(model.freq)(model.freq) >> P=>> P=expmexpm(Q*0.5)(Q*0.5)
P =P =
1.0027 0.0435 0.0435 0.04351.0027 0.0435 0.0435 0.0435 0.0435 1.0027 0.0435 0.04350.0435 1.0027 0.0435 0.0435 0.0435 0.0435 1.0027 0.04350.0435 0.0435 1.0027 0.0435 0.0435 0.0435 0.0435 1.00270.0435 0.0435 0.0435 1.0027
>> model.R>> model.R
ans =ans =
0 0.3333 0.3333 0.33330 0.3333 0.3333 0.3333 0.3333 0 0.3333 0.33330.3333 0 0.3333 0.3333 0.3333 0.3333 0 0.33330.3333 0.3333 0 0.3333 0.3333 0.3333 0.3333 00.3333 0.3333 0.3333 0
![Page 32: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/32.jpg)
Find a Better TreeFind a Better Tree
>> tree1='((gorla-ECP:0.00664,(chimp-ECP:0.00578,>> tree1='((gorla-ECP:0.00664,(chimp-ECP:0.00578,((orang-ECP:0.02515,(((((human-EDN:0.00542,chimp-((orang-ECP:0.02515,(((((human-EDN:0.00542,chimp-EDN:0.00312):0.00277,gorla-EDN:0.00365):0.01918,orang-EDN:0.00312):0.00277,gorla-EDN:0.00365):0.01918,orang-EDN:0.02427):0.01979,macaq-EDN:0.07058):0.02561,tamar-EDN:0.02427):0.01979,macaq-EDN:0.07058):0.02561,tamar-EDN:0.11203):0.05274):0.02151,macaq-EDN:0.11203):0.05274):0.02151,macaq-ECP:0.04586):0.00935):0.00064):0.00095,human-ECP:0.04586):0.00935):0.00064):0.00095,human-ECP:0.00095); 'ECP:0.00095); '
>> tree2='((gorla-ECP:>> tree2='((gorla-ECP:0.001730.00173,(chimp-ECP:0.00578,,(chimp-ECP:0.00578,((orang-ECP:0.02515,(((((human-EDN:((orang-ECP:0.02515,(((((human-EDN:0.026710.02671,chimp-,chimp-EDN:0.00312):EDN:0.00312):0.001770.00177,gorla-EDN:0.00365):0.01918,orang-,gorla-EDN:0.00365):0.01918,orang-EDN:0.02427):0.01979,macaq-EDN:0.07058):0.02561,tamar-EDN:0.02427):0.01979,macaq-EDN:0.07058):0.02561,tamar-EDN:0.11203):0.05274):0.02151,macaq-EDN:0.11203):0.05274):0.02151,macaq-ECP:0.04586):0.00935):0.00064):0.00095,human-ECP:0.04586):0.00935):0.00064):0.00095,human-ECP:0.00095); 'ECP:0.00095); '
>> model=>> model=modeljcmodeljc;; >> lnL1=>> lnL1=treeliketreelike(aln,tree1,model)(aln,tree1,model) >> lnL2=>> lnL2=treeliketreelike(aln,tree2,model)(aln,tree2,model)
![Page 33: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/33.jpg)
Find a Better ModelFind a Better Model
>> tree='((gorla-ECP:0.00664,(chimp-ECP:0.00578,>> tree='((gorla-ECP:0.00664,(chimp-ECP:0.00578,((orang-ECP:0.02515,(((((human-EDN:0.00542,chimp-((orang-ECP:0.02515,(((((human-EDN:0.00542,chimp-EDN:0.00312):0.00277,gorla-EDN:0.00312):0.00277,gorla-EDN:0.00365):0.01918,orang-EDN:0.00365):0.01918,orang-EDN:0.02427):0.01979,macaq-EDN:0.02427):0.01979,macaq-EDN:0.07058):0.02561,tamar-EDN:0.07058):0.02561,tamar-EDN:0.11203):0.05274):0.02151,macaq-EDN:0.11203):0.05274):0.02151,macaq-ECP:0.04586):0.00935):0.00064):0.00095,human-ECP:0.04586):0.00935):0.00064):0.00095,human-ECP:0.00095); 'ECP:0.00095); '
>> model1=>> model1=modeljcmodeljc;; >> model2=>> model2=modelk2pmodelk2p(2);(2); >> lnL1=>> lnL1=treeliketreelike(aln,tree,model1)(aln,tree,model1) >> lnL2=>> lnL2=treeliketreelike(aln,tree,model2)(aln,tree,model2)
![Page 34: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/34.jpg)
Goldman and Yang’s Codon Model (GY94) Goldman and Yang’s Codon Model (GY94)
j
j
j
0, if and differ by two or more positions
, if and differ by a synonymous transversion
, if and differ by a synonymous transition
, if and differ by a nonsynon
ij
i j
i j
q k i j
i j
j
ymous transition
, if and differ by a nonsynonymous transversionk i j
![Page 35: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/35.jpg)
![Page 36: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/36.jpg)
![Page 37: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/37.jpg)
![Page 38: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/38.jpg)
Modified Codon Model (GY94m)Modified Codon Model (GY94m)
,
0, if and differ by more than one nucleotide difference
, if and differ by a synonymous transversion
, if and differ by a synonymous transtion between p
j
R j
ij i j
i j
i j
i j
q
urines
, if and differ by a synonymous transtion between pyrimidines
, if and differ by a nonsynonymous transversion
, if and differ by a nonsynonymous transtion betw
Y j
j
R j
i j
i j
i j
een purines
, if and differ by a nonsynonymous transtion between pyrimidinesY j i j
Zhang et al (2006)
![Page 39: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/39.jpg)
>> [dS,dN,dN_dS,lnL] = >> [dS,dN,dN_dS,lnL] = dc_gy94dc_gy94(aln,i,j)(aln,i,j) >> [dS,dN,dN_dS,lnL] = >> [dS,dN,dN_dS,lnL] = dc_gy94mdc_gy94m(aln,i,j)(aln,i,j)
![Page 40: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/40.jpg)
N o n n e u t r a l i t yN o n n e u t r a l i t yD e t e c t i o nD e t e c t i o n
![Page 41: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/41.jpg)
Methods for Detecting SelectionMethods for Detecting Selection
Phylogenetic Likelihood Method (with Divergence Data) e.g., dN/dS Test
SFS-based Methods (with Polymorphism Data) e.g., Fay and Wu H Test
LD-based Tests e.g., EHH Test
Methods Using both Polymorphism and Divergence Data e.g., McDonald and Kreitman Test
![Page 42: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/42.jpg)
![Page 43: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/43.jpg)
PGEToolbox GUIPGEToolbox GUI
![Page 44: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/44.jpg)
![Page 45: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/45.jpg)
PGEToolbox GUIPGEToolbox GUI
![Page 46: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/46.jpg)
Locus under positive selectionLocus under positive selection
![Page 47: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/47.jpg)
![Page 48: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/48.jpg)
MK test GUIMK test GUI
![Page 49: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/49.jpg)
AvailabilityAvailability
MBEToolbox http://www.bioinformatics.org/mbetoolbox
PGEToolbox http://www.bioinformatics.org/pgetoolbox
Thank You!Thank You!
![Page 50: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/50.jpg)
Polymorphism vs. Divergence MethodPolymorphism vs. Divergence Method
Neutral Theory of Molecular Evolution: Most genomic regions are
thought to be evolving neutrally; that is, they accumulate mutations (by random genetic drift) that do not influence the fitness of the organism.
Neutral Theory of Molecular Evolution - M. Kimura
![Page 51: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/51.jpg)
Polymorphism vs. Divergence MethodPolymorphism vs. Divergence Method
Neutral theory predicts that the ratio of replacement to silent substitutions should be the same both within and between species – “null” model
When comparing a gene between species a greater proportion of replacement substitutions between species (“fixed” differences) would indicate positive selection for divergence
![Page 52: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/52.jpg)
McDonald-Kreitman TestMcDonald-Kreitman Test
![Page 53: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/53.jpg)
Subset of Data from McDonald & Kreitman Subset of Data from McDonald & Kreitman (1991)(1991)
![Page 54: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/54.jpg)
<- within spp.between spp ->.
Results – McDonald & Kreitman (1991)Results – McDonald & Kreitman (1991)
2/7 << 42/17
7/17 >> 2/42 indicates positive selection
Go Back
![Page 55: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/55.jpg)
L i m i t a t i o n sL i m i t a t i o n s
![Page 56: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/56.jpg)
Parallelism in MATLABParallelism in MATLAB
There are mainly 4 approaches to providing parallel functionalities to Matlab:
1.1. Provide communication routines (MPI/PVM) in Provide communication routines (MPI/PVM) in Matlab. Matlab.
2.2. Provide routines to split up work among multiple Provide routines to split up work among multiple Matlab sessions. Matlab sessions.
3.3. Provide parallel backend to Matlab. Provide parallel backend to Matlab.
4.4. Compile Matlab scripts into native parallel code.Compile Matlab scripts into native parallel code.
![Page 57: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/57.jpg)
MATLAB Pointers LibraryMATLAB Pointers Library
![Page 58: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/58.jpg)
Site
Sequence
Frequency class:
A G G C T T A A AA T G C T C G A AG T G T T C A C GA G G C T C A A GA G A C C C G A A
163
975
1972
2188
3529
4424
4961
5286
7019
1
2
3
4
5
1 2 1 1 1 4 2 1 3
Ancestral Derived
1 2 3 4
1
2
3
4
5
Frequency class
Cou
nt
The frequency spectrum
Site-Frequency SpectrumSite-Frequency Spectrum
![Page 59: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/59.jpg)
1 2 3 4 5 6 7 8 9
10
20
30
40
50
60
Frequency class
Cou
ntObserved frequency spectra
Putatively neutral
Potentially selected
Comparing frequency spectra for different Comparing frequency spectra for different classes of mutationclasses of mutation
![Page 60: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/60.jpg)
)/(
/
aKV
aKD
Tests of selection based on estimates of Θ Tests of selection based on estimates of Θ Tajima’s D Tajima’s D
Tajima’s D tests whether the estimate of Θ from π is significantly different to the estimate from K: If there are a lot of polymorphisms at very low frequency (as
expected under purifying selection) then the estimate from K will be high (i.e. D will have a negative sign).
On the other hand if allele-frequencies are being increased by overdominant (balancing) selection π will be increased without any effect on K (D will have a positive sign)
![Page 61: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/61.jpg)
![Page 62: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai](https://reader035.vdocuments.us/reader035/viewer/2022062216/56649d235503460f949f9bb7/html5/thumbnails/62.jpg)