dna assembly with gaps: simulating sequence evolution
DESCRIPTION
DNA Assembly with Gaps: Simulating Sequence Evolution. Reed A. Cartwright Department of Genetics University of Georgia. Synopsis. Explain the importance of simulations. Introduce Dawg, a new sequence simulation program. Example usage of Dawg. Why Simulate Phylogenies?. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: DNA Assembly with Gaps: Simulating Sequence Evolution](https://reader035.vdocuments.us/reader035/viewer/2022071714/56812ddf550346895d9334ac/html5/thumbnails/1.jpg)
DNA Assembly with Gaps: Simulating Sequence Evolution
Reed A. CartwrightDepartment of GeneticsUniversity of Georgia
![Page 2: DNA Assembly with Gaps: Simulating Sequence Evolution](https://reader035.vdocuments.us/reader035/viewer/2022071714/56812ddf550346895d9334ac/html5/thumbnails/2.jpg)
3.12.2005 RA Cartwright [email protected] - http://scit.us/
2
Synopsis
Explain the importance of simulations.
Introduce Dawg, a new sequence simulation program.
Example usage of Dawg.
![Page 3: DNA Assembly with Gaps: Simulating Sequence Evolution](https://reader035.vdocuments.us/reader035/viewer/2022071714/56812ddf550346895d9334ac/html5/thumbnails/3.jpg)
3.12.2005 RA Cartwright [email protected] - http://scit.us/
3
Why Simulate Phylogenies?
Biologists use many techniques to reconstruct phylogenies based on biological data.
However, true phylogenies are unknown, except for a few instances.
How then can we test the accuracy of these reconstruction methods?
Use simulations.
![Page 4: DNA Assembly with Gaps: Simulating Sequence Evolution](https://reader035.vdocuments.us/reader035/viewer/2022071714/56812ddf550346895d9334ac/html5/thumbnails/4.jpg)
3.12.2005 RA Cartwright [email protected] - http://scit.us/
4
Why Simulate Phylogenies?
Techniques are often based on certain models of evolution.
Simulating sequence evolution based on these models produces an ideal situation to test the techniques.
Using other models can test how robust a technique is.
![Page 5: DNA Assembly with Gaps: Simulating Sequence Evolution](https://reader035.vdocuments.us/reader035/viewer/2022071714/56812ddf550346895d9334ac/html5/thumbnails/5.jpg)
3.12.2005 RA Cartwright [email protected] - http://scit.us/
5
Testing Procedure
A B C DA AATTCTTTGAGTTAAB AATTCTTTGAGTTAAC AATTCTTAAAGTTAAD AATTCTTAAAGTTAA
A AAAAGATAAAGCAAA--AB GAAAGATAAAGCAAA--AC GAAAGATAAAGAAAAACAD GAAAGATAAAGAAAAACA
A B C D
A B C D
1. Start with a “known” tree.
2. Simulate sequencesets based on the tree.
3. Estimate the treesof the simulated data.
4. Compare estimated treesto the original tree.
![Page 6: DNA Assembly with Gaps: Simulating Sequence Evolution](https://reader035.vdocuments.us/reader035/viewer/2022071714/56812ddf550346895d9334ac/html5/thumbnails/6.jpg)
3.12.2005 RA Cartwright [email protected] - http://scit.us/
6
Simulating Evolution
Proper simulation of molecular evolution should include both substitutions and indels.
However, existing programs either do not include indels or use an unjustified model of indel formation.
Dawg was created to address this gap.
![Page 7: DNA Assembly with Gaps: Simulating Sequence Evolution](https://reader035.vdocuments.us/reader035/viewer/2022071714/56812ddf550346895d9334ac/html5/thumbnails/7.jpg)
3.12.2005 RA Cartwright [email protected] - http://scit.us/
7
What is Dawg?
Dawg stands for “DNA Assembly with Gaps.”
A portable and robust program for simulating molecular evolution.
Development Website: http://scit.us/dawg/
![Page 8: DNA Assembly with Gaps: Simulating Sequence Evolution](https://reader035.vdocuments.us/reader035/viewer/2022071714/56812ddf550346895d9334ac/html5/thumbnails/8.jpg)
3.12.2005 RA Cartwright [email protected] - http://scit.us/
8
Comparing Software
Feature Seq-Gen Evolver Rose Dawg
Indels Yes Yes
Indel Parameter Estimator
Yes
Recombination Yes Yes
Substitution GTR GTR PAM GTR
Rate Heterogeneity Γ+I Γ Γ+I Γ+I
Input Format Switch File File File
Unix Yes Yes Yes Yes
Mac OS X Yes Yes Yes Yes
Win32 Yes Yes Yes
![Page 9: DNA Assembly with Gaps: Simulating Sequence Evolution](https://reader035.vdocuments.us/reader035/viewer/2022071714/56812ddf550346895d9334ac/html5/thumbnails/9.jpg)
3.12.2005 RA Cartwright [email protected] - http://scit.us/
9
Parameters Tree phylogeny TreeScale coefficient to scale branch lengths by Sequence root sequences Length length of generated root sequences Rates rate of evolution of each root nucleotide Model model of evolution: GTR|JC|K2P|K3P|HKY|F81|F84|TN Freqs nucleotide (ACGT) frequencies Params parameters for the model of evolution Width block width for indels and recombination Scale block position scales Gamma coefficients of variance for rate heterogeneity Alpha shape parameters Iota proportions of invariant sites GapModel models of indel formation: NB|PL|US Lambda rates of indel formation GapParams parameter for the indel model Reps number of data sets to output File output file Format output format: Fasta|Nexus|Phylip|Clustal GapSingleChar output gaps as a single character GapPlus distinguish insertions from deletions in alignment LowerCase output sequences in lowercase Translate translate outputed sequences to amino acids NexusCode text or file to include between datasets in Nexus format Seed PRNG seed (integers)
![Page 10: DNA Assembly with Gaps: Simulating Sequence Evolution](https://reader035.vdocuments.us/reader035/viewer/2022071714/56812ddf550346895d9334ac/html5/thumbnails/10.jpg)
3.12.2005 RA Cartwright [email protected] - http://scit.us/
10
Sample Input File# example.dawgTree = ((AY727331:0.001359,AY727330:0.001359):0.084512,
(AY727327:0.006116,AY727326:0.006116):0.079756);Model = "GTR"Params = {1.08031, 2.45581, 0.44452, 1.09145, 4.06519, 1.00000}Freqs = {0.353470, 0.143681, 0.178206, 0.324643}Length = 300Lambda = 0.143120GapModel = "NB"GapParams = {1, 0.753247}Format = "Clustal"File = "example.aln"Seed = 1981
![Page 11: DNA Assembly with Gaps: Simulating Sequence Evolution](https://reader035.vdocuments.us/reader035/viewer/2022071714/56812ddf550346895d9334ac/html5/thumbnails/11.jpg)
3.12.2005 RA Cartwright [email protected] - http://scit.us/
11
CLUSTAL multiple sequence alignment (Created by DAWG Version 1.0.0)
AY727326 TTCGAAAATATGTTAGTACTCAATATGAATTCTTTGAGTTAAAAAAGATAAAGCAAA--AAY727327 TTCGAAAATATGTTAGTACTCAATATGAATTCTTTGAGTTAAGAAAGATAAAGCAAA--AAY727330 TTCAAAAATATGCTAGGACTGAATATGAATTCTTAAAGTTAAGAAAGATAAAGAAAAACAAY727331 TTCAAAAATATGCTAGGACTGAATATGAATTCTTAAAGTTAAGAAAGATAAAGAAAAACA
AY727326 ATACATAATGTGATTTCAATATTCCAATTACCTAACAATACGGCTATCAATTAAACGATTAY727327 ATACATAATGTGATTTCAATATTCCAATTACCTAACAATACGGCTATCAATTAAACGATTAY727330 GTACATAATGTAAA----TTATTGCAA---------AAAACGGCTAACAATTAGACGATTAY727331 GTACATAATGTAAA----TTATTGCAA---------AAAACGGCTAACAATTAGACGATT
AY727326 TTAGGATTACACCGACAAATATTAGGCCGATATGAATTTAACATCATGTTGTATTTAGATAY727327 TTAGGATTACACCGACAAATATTAGGCCGATATGAATTTACCATCATGTTGTATTTAGATAY727330 TTAGGATTACGCTGACAAATATTAGGATGATATTAATTTA------TCTTGTATTTAGATAY727331 TTAGGATTACGCTGACAAATATTAGGATGATATTAATTTA------TCTTGTATTTAGAT
AY727326 GCTGTCTTTTATTAACATTCATCATTAAAT-TTGGAACCTTTTGCATTTAAGAAGTACATAY727327 GCTGTCTTTTATTAACATTCATCATTAAAT-TTGGAACCTTTTGTATTTAAGAAGTACATAY727330 GCTGTCTTTTATCAACATTCATCACTAGATATTGGAACCTATTGCATCTAAGAAGTACATAY727331 GCTGTCTTTTATCAACATTCATCACTAGATATTGGAACCTATTGCATCTAAGAAGTACAT
AY727326 GTTTAATAGTGTTTAAAA-TATATATGAAATTGATCATAAGGA---TCTATAAATGCGGTAY727327 GTTTAATAGTGTTTATAA-TATATATGAAATTGATCGTAAGGA---TCTATAAATGCAGTAY727330 GTTTAATAGGGTT-AAAACTATATATGAAGTCGATTATAAGGAATTTCTATAAATGTAGCAY727331 GTTTAATAGGGTT-AAAACTATATATGAAGTCGATTATAAGGAATTTCTATAAATGTAGC
AY727326 TCTTCAATTTCTTGAY727327 TCTTCAATTTCTTGAY727330 TCTTCAATTTCCTAAY727331 TCTTCAATTTCCTA
![Page 12: DNA Assembly with Gaps: Simulating Sequence Evolution](https://reader035.vdocuments.us/reader035/viewer/2022071714/56812ddf550346895d9334ac/html5/thumbnails/12.jpg)
3.12.2005 RA Cartwright [email protected] - http://scit.us/
12
Estimating Indel Rate
Dawg would be of little benefit if biologists could not estimate parameters of indel formation from real data.
Dawg’s indel model allows such estimation, which is implemented in a Perl script, lambda.pl.
![Page 13: DNA Assembly with Gaps: Simulating Sequence Evolution](https://reader035.vdocuments.us/reader035/viewer/2022071714/56812ddf550346895d9334ac/html5/thumbnails/13.jpg)
3.12.2005 RA Cartwright [email protected] - http://scit.us/
13
Example Usage:Confidence Interval of Indel Rate
I aligned the sequences of chloroplast trnK introns from two Hibiscus and two Prunus species.
Using Paup*, I estimated the phylogeny and substitution parameters.
Using lambda.pl, I estimated the indel formation parameters.
![Page 14: DNA Assembly with Gaps: Simulating Sequence Evolution](https://reader035.vdocuments.us/reader035/viewer/2022071714/56812ddf550346895d9334ac/html5/thumbnails/14.jpg)
3.12.2005 RA Cartwright [email protected] - http://scit.us/
14
Example Usage
From these estimated parameters of evolution, I constructed an input file for Dawg.
From the input file Dawg produced a thousand simulated sequence sets.
The rate of indel formation was estimated for each of the simulated sequences.
![Page 15: DNA Assembly with Gaps: Simulating Sequence Evolution](https://reader035.vdocuments.us/reader035/viewer/2022071714/56812ddf550346895d9334ac/html5/thumbnails/15.jpg)
3.12.2005 RA Cartwright [email protected] - http://scit.us/
15
Results
The estimated rate of indel formation was 0.143120.
Bootstrapping gave a 95% CI of 0.078530 to 0.213560.
Biologically this is 8 to 21 indels per 100 substitutions.
![Page 16: DNA Assembly with Gaps: Simulating Sequence Evolution](https://reader035.vdocuments.us/reader035/viewer/2022071714/56812ddf550346895d9334ac/html5/thumbnails/16.jpg)
3.12.2005 RA Cartwright [email protected] - http://scit.us/
16
Synopsis
Explain the importance of simulations.
Introduce Dawg, a new sequence simulation program.
Example usage of Dawg.
![Page 17: DNA Assembly with Gaps: Simulating Sequence Evolution](https://reader035.vdocuments.us/reader035/viewer/2022071714/56812ddf550346895d9334ac/html5/thumbnails/17.jpg)
3.12.2005 RA Cartwright [email protected] - http://scit.us/
17
Thanks
Marjorie Asmussen Wyatt Anderson John Avise Jim Hamrick Ron Pulliam Paul Schliekelman
Jeff Ross-Ibarra Beth Dakin Douglas Theobald Yong-Kyu Kim