stem-hy: species tree estimation using maximum likelihood ... · what is stem-hy? using stem-hy for...

65
What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis Practice Datasets STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization) Laura Salter Kubatko Departments of Statistics and Evolution, Ecology, and Organismal Biology The Ohio State University [email protected] Tutorial info at: www.stat.osu.edu/lkubatko/uga2012.html UGA Bioinformatics Symposium 2012

Upload: others

Post on 17-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

STEM-hy: Species Tree Estimation usingMaximum likelihood (with hybridization)

Laura Salter KubatkoDepartments of Statistics and

Evolution, Ecology, and Organismal BiologyThe Ohio State University

[email protected]

Tutorial info at: www.stat.osu.edu/∼lkubatko/uga2012.html

UGA Bioinformatics Symposium 2012

Page 2: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

What is STEM-hy?Assumptions and Methods

Using STEM-hy for Species Tree EstimationData PreparationExample 1: Small Example with Intraspecific SamplingExample 2: Small Example with Missing DataExample 3: Bootstrap Analysis

Using STEM-hy for Hybridization AnalysisBackground: STEM’s Hybrid Species ModelsExample 4: Hybridization in Heliconius

Practice Datasets

UGA Bioinformatics Symposium 2012

Page 3: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Assumptions and Methods

What is STEM-hy?

I STEM-hy is a program to perform maximum likelihoodanalysis for estimation of the species tree from multilocusdata under the coalescent process. It includes the capability ofevaluating hybrid taxa.

I Basic functions:I Return the ML species tree.I Search the space of all species trees and return the k trees

with the highest likelihoods found.I Compute the likelihood of a user-specified tree with branch

lengths.I Find optimal branch lengths on a user-specified tree.I Carry out a bootstrap analysis to obtain bootstrap support

values for nodes in the species tree.I Evaluate hypotheses of hybridization in a model selection

framework.

UGA Bioinformatics Symposium 2012

Page 4: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Assumptions and Methods

Assumptions

I No recombination within loci

I Free recombination between loci

I No gene flow following speciation

I Only source of variability in single-gene histories is due to thecoalescence process

I There is a single θ for the entire tree, for each locus

I Evolutionary rates may vary across loci

UGA Bioinformatics Symposium 2012

Page 5: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Assumptions and Methods

Methods: ML Estimate of the Species Tree

I Liu et al. (2009) showed that the ML estimate of the species treecan be computed by sequentially clustering minimum observeddivergence times between pairs of species across genes.

I They have shown that when gene trees are known without error, theML species tree is a consistent estimator.

I A similar result was obtained by Roch & Mossel (2010) – they calltheir estimator the GLASS tree (an acronym for Global LAteStSplit, based on the algorithm they developed to compute it).

I STEM computes the ML estimate of the species tree this way.

UGA Bioinformatics Symposium 2012

Page 6: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Assumptions and Methods

Methods: Estimation of ML Times for an Arbitrary Species Tree

I The results of Liu et al. (2009) can be extended to derive theML estimates of the speciation times for an arbitrary speciestree.

I Thus, the likelihood of any species tree can be readilycomputed by using this result to obtain ML branch lengths.

I This is important in that it allows us to compare alternativephylogenetic hypotheses.

UGA Bioinformatics Symposium 2012

Page 7: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Assumptions and Methods

Methods: Searching Species Tree Space for Trees of High Likelihood

I A simulated annealing algorithm is used to search the space ofall species trees for trees that have high likelihoods.

I The k best trees found during the search are saved andprinted to a file (k is set by the user).

I Exploration of the likelihood surface is particularly importantfor many of these problems.

I The details of the simulated annealing algorithm are similar tothose given in Salter & Pearl (2001).

UGA Bioinformatics Symposium 2012

Page 8: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Assumptions and Methods

Features of STEM-hy

I No limits (that I know of) on the number of taxa or thenumber of loci.

I Can handle intraspecific sampling.

I Allows information concerning mutation rate for each locus tobe used in the analysis.

I Can handle different taxon samples across genes.

I Version 1.0 is written in Java (using Clojure).

UGA Bioinformatics Symposium 2012

Page 9: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Data PreparationExample 1: Small Example with Intraspecific SamplingExample 2: Small Example with Missing DataExample 3: Bootstrap Analysis

Data Preparation - Gene Trees

I STEM-hy takes as its input one gene tree for each locus.

I Thus, a first step in an analysis using STEM-hy is to estimategene trees with branch lengths for each locus.

I Any method can be used to do this, but note a couplerequirements:

I Branch lengths are assumed to be in units of expected numberof substitutions per site per unit time.

I Branch lengths must be estimated subject to a molecularclock. This is not checked by the program.

I Gene trees must be fully resolved; however, polytomies can beincluded by setting branch lengths to 0 for an arbitraryresolution of the polytomy.

UGA Bioinformatics Symposium 2012

Page 10: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Data PreparationExample 1: Small Example with Intraspecific SamplingExample 2: Small Example with Missing DataExample 3: Bootstrap Analysis

Data Preparation - Population Genetics Parameters

I A value of the parameter θ = 4Nµ must be provided. Notethat this is the “per-site θ”, not a “per-locus” value as usedby other population genetics programs.

I This will be used to convert gene tree branch lengths tocoalescent units (number of 2N generations) by dividing allgene tree branch lengths by θ.

I Estimates of θ could be obtained by standard methods.Typical values of θ will be between 0.001 and 0.1.

UGA Bioinformatics Symposium 2012

Page 11: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Data PreparationExample 1: Small Example with Intraspecific SamplingExample 2: Small Example with Missing DataExample 3: Bootstrap Analysis

Data Preparation - Population Genetics and Mutation Parameters

I Each locus can also be given a rate multiplier.

I These can adjust forI Variation in mutation rate across loci.I Ploidy (e.g., haploid loci – mtDNA – should be given a rate of

0.5).

I At the least, one should estimate rate variation from the databy something like the following:

I Compute average pairwise sequence divergence of eachsequence to the outgroup.

I Divide all of these values by their overall mean, and assign thatnumber as the rate multiplier for each gene.

I Adjust specific genes for ploidy, if necessary.

UGA Bioinformatics Symposium 2012

Page 12: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Data PreparationExample 1: Small Example with Intraspecific SamplingExample 2: Small Example with Missing DataExample 3: Bootstrap Analysis

Example 1: Small Example with Intraspecific Sampling

I Start with a small example where we can work things out byhand

I Four species, eight lineages, and two loci (N = 2)

I Suppose that the gene trees for the two loci are

1

2

3

4

5

6

7

8

3.46

2.46

1.23

1.20

2.40

1.00

1.20

1

2

3

4

5

6

7

8

3.50

2.86

2.54

1.23

1.20

1.00

1.10

UGA Bioinformatics Symposium 2012

Page 13: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Data PreparationExample 1: Small Example with Intraspecific SamplingExample 2: Small Example with Missing DataExample 3: Bootstrap Analysis

STEM Example 1: Small Example with Intraspecific Sampling

I Now we can run STEM and look at output

I First, let’s compute the relevant distances by hand:

{D1ab}:

2 3 4 5 6 7 81 1.23 2.46 2.46 3.46 3.46 3.46 3.462 - 2.46 2.46 3.46 3.46 3.46 3.463 - - 1.2 3.46 3.46 3.46 3.464 - - - 3.46 3.46 3.46 3.465 - - - - 1.0 2.4 2.46 - - - - - 2.4 2.47 - - - - - - 1.2

{D2ab}:

2 3 4 5 6 7 81 1.23 2.56 2.56 2.86 2.86 3.5 3.52 - 2.56 2.56 2.86 2.86 3.5 3.53 - - 1.2 2.86 2.86 3.5 3.54 - - - 2.86 2.86 3.5 3.55 - - - - 1.0 3.5 3.56 - - - - - 3.5 3.57 - - - - - - 1.1

UGA Bioinformatics Symposium 2012

Page 14: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Data PreparationExample 1: Small Example with Intraspecific SamplingExample 2: Small Example with Missing DataExample 3: Bootstrap Analysis

STEM Example 1: Small Example with Intraspecific Sampling

I Now we can run STEM and look at output

I First, let’s compute the relevant distances by hand:

2 3 4 5 6 7 81 1.23 2.46 2.46 3.46 3.46 3.46 3.462 - 2.46 2.46 3.46 3.46 3.46 3.463 - - 1.2 3.46 3.46 3.46 3.464 - - - 3.46 3.46 3.46 3.465 - - - - 1.0 2.4 2.46 - - - - - 2.4 2.47 - - - - - - 1.2

→S1 S2 S3 S4

S1 - 1.2 3.46 3.46S2 - 1.0 2.4S3 - 1.2S4 -

2 3 4 5 6 7 81 1.23 2.56 2.56 2.86 2.86 3.5 3.52 - 2.56 2.56 2.86 2.86 3.5 3.53 - - 1.2 2.86 2.86 3.5 3.54 - - - 2.86 2.86 3.5 3.55 - - - - 1.0 3.5 3.56 - - - - - 3.5 3.57 - - - - - - 1.1

→S1 S2 S3 S4

S1 - 1.2 2.86 3.5S2 - 1.0 3.5S3 - 1.1S4 -

UGA Bioinformatics Symposium 2012

Page 15: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Data PreparationExample 1: Small Example with Intraspecific SamplingExample 2: Small Example with Missing DataExample 3: Bootstrap Analysis

STEM Example 1: Small Example with Intraspecific Sampling

I Now we can run STEM and look at output

I First, let’s compute the relevant distances by hand:

S1 S2 S3 S4S1 - 1.2 3.46 3.46S2 - 1.0 2.4S3 - 1.2S4 -

S1 S2 S3 S4S1 - 1.2 2.86 3.5S2 - 1.0 3.5S3 - 1.1S4 -

→S1 S2 S3 S4

S1 - 1.2 2.86 3.46S2 - 1.0 2.4S3 - 1.1S4 -

UGA Bioinformatics Symposium 2012

Page 16: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Data PreparationExample 1: Small Example with Intraspecific SamplingExample 2: Small Example with Missing DataExample 3: Bootstrap Analysis

STEM Example 1: Small Example with Intraspecific Sampling

I First, let’s compute the relevant distances by hand:

S1 S2 S3 S4S1 - 1.2 3.46 3.46S2 - 1.0 2.4S3 - 1.2S4 -

S1 S2 S3 S4S1 - 1.2 2.86 3.5S2 - 1.0 3.5S3 - 1.1S4 -

S1 S2 S3 S4S1 - 1.2 2.86 3.46S2 - 1.0 2.4S3 - 1.1S4 -

1

4

2

3

1.2

1.1

1.0

UGA Bioinformatics Symposium 2012

Page 17: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Data PreparationExample 1: Small Example with Intraspecific SamplingExample 2: Small Example with Missing DataExample 3: Bootstrap Analysis

Example 1: Small Example with Intraspecific Sampling

Step 1: Prepare the gene trees

Option 1: Place all gene trees in a single file called genetrees.tre:Newick format requiredOne gene tree per lineRate multipliers must be given in brackets in front of each gene tree

[1.0](((Name1:0.00123,Name2:0.00123):0.00123,(Name3:0.00121,Name4:0.00121):0.00125):0.0010,((Name5:0.0010,Name6:0.0010):0.0014,(MyName7:0.0012,Name8:0.0012):0.0012):0.00106);[1.0]((((Name1:0.00123,Name2:0.00123):0.00133,(Name3:0.0012,Name4:0.0012):0.00134):0.0003,(Name5:0.0010,Name6:0.0010):0.00186):0.00064,(MyName7:0.0011,Name8:0.0011):0.0024);

UGA Bioinformatics Symposium 2012

Page 18: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Data PreparationExample 1: Small Example with Intraspecific SamplingExample 2: Small Example with Missing DataExample 3: Bootstrap Analysis

Example 1: Small Example with Intraspecific Sampling

Step 1: Prepare the gene trees

Option 2: Place sets of gene trees in separate filesFile names will be supplied to STEM-hy in the settings fileRate multipliers will also be supplied in the settings fileAll genes in a single file are assumed to have the same rate

genetrees1.tre:

(((Name1:0.00123,Name2:0.00123):0.00123,(Name3:0.00121,Name4:0.00121):0.00125):0.0010,((Name5:0.0010,Name6:0.0010):0.0014,(MyName7:0.0012,Name8:0.0012):0.0012):0.00106);

genetrees2.tre:

((((Name1:0.00123,Name2:0.00123):0.00133,(Name3:0.0012,Name4:0.0012):0.00134):0.0003,(Name5:0.0010,Name6:0.0010):0.00186):0.00064,(MyName7:0.0011,Name8:0.0011):0.0024);

UGA Bioinformatics Symposium 2012

Page 19: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Data PreparationExample 1: Small Example with Intraspecific SamplingExample 2: Small Example with Missing DataExample 3: Bootstrap Analysis

Example 1: Small Example with Intraspecific Sampling

Step 2: Prepare the settings file - input option 1

yaml format: headings with indented parameters defined below

properties:run: 1 #0=user-tree, 1=MLE, 2=search, 3=hybridization, 4=bootstraptheta: 0.001num saved trees: 15beta: 0.0005seed: 3435893

species:Species1: Name1, Name2, Name3Species2: Name4, Name5Species3: Name6, MyName7Species4: Name8

UGA Bioinformatics Symposium 2012

Page 20: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Data PreparationExample 1: Small Example with Intraspecific SamplingExample 2: Small Example with Missing DataExample 3: Bootstrap Analysis

Example 1: Small Example with Intraspecific Sampling

Step 2: Prepare the settings file - input option 2

yaml format: headings with indented parameters defined below

properties:run: 1 #0=user-tree, 1=MLE, 2=search, 3=hybridization, 4=bootstraptheta: 0.001num saved trees: 15beta: 0.0005seed: 3435893

species:Species1: Name1, Name2, Name3Species2: Name4, Name5Species3: Name6, MyName7Species4: Name8

files:genetrees1.tre: 1.0 # notice the space after each ’:’genetrees2.tre: 1.0

UGA Bioinformatics Symposium 2012

Page 21: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Data PreparationExample 1: Small Example with Intraspecific SamplingExample 2: Small Example with Missing DataExample 3: Bootstrap Analysis

Example 1: Small Example with Intraspecific Sampling

Step 2: Prepare the settings file

yaml format: headings with indented parameters defined below

properties:run: 1 #0=user-tree, 1=MLE, 2=search, 3=.....theta: 0.001num saved trees: 15beta: 0.0005seed: 3435893

species:Species1: Name1, Name2, Name3Species2: Name4, Name5Species3: Name6, MyName7Species4: Name8

files:genetrees1.tre: 1.0 # notice the space after each ’:’genetrees2.tre: 1.0

Some parameters will

only be used for

certain “run” settings.

They are ignored

otherwise, and can be

omitted from the

settings file.

UGA Bioinformatics Symposium 2012

Page 22: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Data PreparationExample 1: Small Example with Intraspecific SamplingExample 2: Small Example with Missing DataExample 3: Bootstrap Analysis

Example 1: Small Example with Intraspecific Sampling - Results

I Analysis 1: Find the ML species tree (run with run: 1)

I Run at the command line with: java -jar stem-hy.jar

***************************************** Welcome to STEM 2.0 *****************************************

The settings file was successfully parsed...

Using theta = 0.0010

The settings file contained 4 species and 8 lineages.

The species-to-lineage mappings are:

Species4: Name8Species3: MyName7, Name6Species2: Name4, Name5Species1: Name1, Name2, Name3

UGA Bioinformatics Symposium 2012

Page 23: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Data PreparationExample 1: Small Example with Intraspecific SamplingExample 2: Small Example with Missing DataExample 3: Bootstrap Analysis

Example 1: Small Example with Intraspecific Sampling - Results

I Analysis 1: Find the ML species tree (run with run: 1)

I Run at the command line with: java -jar stem.jar

I Results are written to the file mle.tre

****************Results*****************

D AB Matrix:

[ 0.00000 1.20000 2.86000 3.46000][ 0.00000 0.00000 1.00000 2.40000][ 0.00000 0.00000 0.00000 1.10000][ 0.00000 0.00000 0.00000 0.00000]

Likelihood Species Tree (Newick format):

(Species1:1.20000,(Species4:1.10000,(Species2:1.00000,Species3:1.00000):0.10000):0.10000);

Log likelihood for tree: -52.43701947216076

****************** Done ****************

UGA Bioinformatics Symposium 2012

Page 24: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Data PreparationExample 1: Small Example with Intraspecific SamplingExample 2: Small Example with Missing DataExample 3: Bootstrap Analysis

Example 1: Small Example with Intraspecific Sampling - Results

I Analysis 2: Find likelihood of all 15 trees (run with run: 2)

I Output files:

***************************************** Welcome to STEM 2.0 *****************************************

The settings file was successfully parsed......

Beginning search now (this could take a while)...

Search completed.

Here are the results (also written to file ’search.tre’):

[-52.43702] (Species1:1.20000,(Species4:1.10000,(Species2:1.00000,Species3:1.00000):0.10000):0.10000);[-53.63718] (Species1:1.20000,(Species3:1.00000,(Species2:1.00000,Species4:1.00000):0.00000):0.20000);[-56.63684] ((Species4:1.10000,Species1:1.10000):0.00000,(Species2:1.00000,Species3:1.00000):0.10000);[-56.63720] (Species4:1.10000,(Species1:1.10000,(Species2:1.00000,Species3:1.00000):0.10000):0.00000);[-60.23760] (Species4:1.10000,(Species2:1.00000,(Species1:1.00000,Species3:1.00000):0.00000):0.10000);[-62.63758] ((Species1:1.00000,Species3:1.00000):0.00000,(Species2:1.00000,Species4:1.00000):0.00000);[-62.63778] (Species3:1.00000,(Species1:1.00000,(Species2:1.00000,Species4:1.00000):0.00000):0.00000);[-62.63790] (Species2:1.00000,((Species1:1.00000,Species4:1.00000):0.00000,Species3:1.00000):0.00000);[-62.63806] (Species2:1.00000,((Species1:1.00000,Species3:1.00000):0.00000,Species4:1.00000):0.00000);

****************** Done ****************

UGA Bioinformatics Symposium 2012

Page 25: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Data PreparationExample 1: Small Example with Intraspecific SamplingExample 2: Small Example with Missing DataExample 3: Bootstrap Analysis

Example 1: Small Example with Intraspecific Sampling - Results

I Analysis 3: Find the likelihood of a particular species tree

I Place the tree(s) of interest in the file user tree in the samedirectory as STEM-hy

((Species1:0.000222,Species3:0.000222):0.016444,(Species2:0.000222,Species4:0.000222):0.016444);

I Branch lengths must be included. STEM-hy gives thelikelihood of the tree with the user-specified branch lengths,as well as the ML branch lengths along the user tree.

UGA Bioinformatics Symposium 2012

Page 26: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Data PreparationExample 1: Small Example with Intraspecific SamplingExample 2: Small Example with Missing DataExample 3: Bootstrap Analysis

Example 1: Small Example with Intraspecific Sampling - Results

***************************************** Welcome to STEM 2.0 *****************************************

The settings file was successfully parsed...

.

.

.

Read 1 species tree[s] from ’user.tre’

****************Results*****************

User tree:((Species1:0.000222,Species3:0.000222):0.016444,(Species2:0.000222,Species4:0.000222):0.016444)Log likelihood for tree: -153.62929947216077

**************Optimized Trees************

Optimized user tree:((Species1:0.99995,Species3:0.99995):0.00005,(Species2:0.99995,Species4:0.99995):0.00005);Log likelihood: -62.63865947216076

****************** Done ****************

UGA Bioinformatics Symposium 2012

Page 27: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Data PreparationExample 1: Small Example with Intraspecific SamplingExample 2: Small Example with Missing DataExample 3: Bootstrap Analysis

Example 2: Missing Data Example

I genetrees.tre:

[1.0](((Name1:0.00123,Name2:0.00123):0.00123,(Name3:0.00121,Name4:0.00121):0.00125):0.0010,((Name5:0.0010,Name6:0.0010):0.0014,(MyName7:0.0012,Name8:0.0012):0.0012):0.00106);[1.0]((((Name1:0.00123,Name2:0.00123):0.00133,(Name3:0.0012,Name4:0.0012):0.00134):0.00032,(Name5:0.0010,Name6:0.0010):0.00186);[1.0](((Name1:0.00123,Name2:0.00123):0.00123,(Name3:0.00121,Name4:0.00121):0.00125):0.0010,((Name5:0.0010,Name6:0.0010):0.0014,(MyName7:0.0012,Name8:0.0012):0.0012):0.00106);[1.0]((((Name1:0.00123,Name2:0.00123):0.00133,(Name3:0.0012,Name4:0.0012):0.00134):0.0003,(Name5:0.0010,Name6:0.0010):0.00186):0.00064,(MyName7:0.0011,Name8:0.0011):0.0024);[1.0](((Name1:0.00123,Name2:0.00123):0.00123,(Name3:0.00121,Name4:0.00121):0.00125):0.0010,((Name5:0.0010,Name6:0.0010):0.0014,(MyName7:0.0012,Name8:0.0012):0.0012):0.00106);[1.0]((((Name1:0.00123,Name2:0.00123):0.00133,(Name3:0.0012,Name4:0.0012):0.00134):0.0003,(Name5:0.0010,Name6:0.0010):0.00186):0.00064,(MyName7:0.0011,Name8:0.0011):0.0024);[1.0](((Name1:0.00123,Name2:0.00123):0.00123,(Name3:0.00121,Name4:0.00121):0.00125):0.0010,((Name5:0.0010,Name6:0.0010):0.0014,(MyName7:0.0012,Name8:0.0012):0.0012):0.00106);[1.0]((((Name1:0.00123,Name2:0.00123):0.00133,(Name3:0.0012,Name4:0.0012):0.00134):0.0003,(Name5:0.0010,Name6:0.0010):0.00186):0.00064,(MyName7:0.0011,Name8:0.0011):0.0024);

UGA Bioinformatics Symposium 2012

Page 28: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Data PreparationExample 1: Small Example with Intraspecific SamplingExample 2: Small Example with Missing DataExample 3: Bootstrap Analysis

Example 2: Missing Data Example

I Look at gene trees:

Name1

Name2

Name3

Name4

Name5

Name6

MyName7

Name8

4 loci

Name1

Name2

Name3

Name4

Name5

Name6

1 locus

Name1

Name2

Name3

Name4

Name5

Name6

MyName7

Name8

3 loci

UGA Bioinformatics Symposium 2012

Page 29: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Data PreparationExample 1: Small Example with Intraspecific SamplingExample 2: Small Example with Missing DataExample 3: Bootstrap Analysis

Example 2: Missing Data Example

Note: The settings file remains unchanged.Below is the output.

****************Results*****************

D AB Matrix:

[ 0.00000 1.20000 2.86000 3.46000][ 0.00000 0.00000 1.00000 2.40000][ 0.00000 0.00000 0.00000 1.10000][ 0.00000 0.00000 0.00000 0.00000]

Maximum Likelihood Species Tree (Newick format):

(Species1:1.20000,(Species4:1.10000,(Species2:1.00000,Species3:1.00000):0.10000):0.10000);

log likelihood for tree: -967.874444171144

****************** Done ****************

UGA Bioinformatics Symposium 2012

Page 30: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Data PreparationExample 1: Small Example with Intraspecific SamplingExample 2: Small Example with Missing DataExample 3: Bootstrap Analysis

Example Data: Heliconius Butterflies

H. melpomene H. heurippa H. cydno

2

1

3

H. hecale

ABCD

BCD

CDBD

UGA Bioinformatics Symposium 2012

Page 31: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Data PreparationExample 1: Small Example with Intraspecific SamplingExample 2: Small Example with Missing DataExample 3: Bootstrap Analysis

Example 3: Bootstrap Analysis

I The current version of STEM-hy can be used to estimate bootstrapproportions on the ML tree, as well as to construct a bootstrapconsensus tree.

I Sequence data must be provided in PHYLIP format (separatefiles need to be used for each gene).

I Each gene is bootstrapped a user-specified number of times, B,to produce B bootstrap samples (alignments) for each gene.

I Gene trees are estimated for each bootstrap sample using theprogram SSA. This program uses a simulated annealingmethod to estimate gene trees under the assumption of amolecular clock.

I B species trees are reconstructed using STEM-hy and printedto both the screen and to the file bootstrap.results.

UGA Bioinformatics Symposium 2012

Page 32: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Data PreparationExample 1: Small Example with Intraspecific SamplingExample 2: Small Example with Missing DataExample 3: Bootstrap Analysis

Example 3: Bootstrap Analysis

For this example, we’ll consider four taxa and six genes inHeliconius butterflies.

The settings file is shown below, with changes in blue

properties:run: 4 #0=user-tree, 1=MLE, 2=search, 3=hybridization, 4=bootstrapbootstrap samples: 100phylip files: co 4tax.phy,dll 4tax.phy,inv 4tax.phy,sd 4tax.phy,tpi 4tax.phy,white 4tax.phytheta: 0.01num saved trees: 15beta: 0.0005seed: 3435893

species:H. melpomene: M95H. hecale: HhH. cordula: M187H. heurippa: Strib40

UGA Bioinformatics Symposium 2012

Page 33: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Data PreparationExample 1: Small Example with Intraspecific SamplingExample 2: Small Example with Missing DataExample 3: Bootstrap Analysis

Example 3: Bootstrap Analysis

Below is the output. All bootstrap trees are written to a file calledbootstrap.results and can be read into another program andsummarized.

...The species-to-lineage mappings are:

H. heurippa: Strib40H. cordula: M187H. hecale: HhH. melpomene: M95

Bootstrapping trees (this might take a while)...

****************Results*****************

The maximum likelihood species tree estimate is:

(H. hecale:6.82133,(H. melpomene:0.74608,(H. heurippa:0.07658,H. cordula:0.07658):0.66950):6.07525);

The 100 bootstrapped species trees:

(H. heurippa:0.29907,(H. hecale:0.17664,(H. melpomene:0.12424,H. cydno:0.12424):0.05240):0.12243);(H. hecale:1.52825,(H. melpomene:0.35022,(H. heurippa:0.31089,H. cydno:0.31089):0.03933):1.17803);

UGA Bioinformatics Symposium 2012

Page 34: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Data PreparationExample 1: Small Example with Intraspecific SamplingExample 2: Small Example with Missing DataExample 3: Bootstrap Analysis

Some Notes on Program Versions

I There are some important differences between STEMv1.1aand STEMv2.0/STEM-hyv.10

I Multifurcations are handled differently.

STEM v1.1a and lower: Zero-length branches are set to0.00001.

STEMv2.0 / STEM-hyv1.0: Zero-length branches are treatedas missing data.

I Other big differences are improvements to input format andincreased functionality in later versions.

UGA Bioinformatics Symposium 2012

Page 35: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Background: STEM’s Hybrid Species ModelsExample 4: Hybridization in Heliconius

STEM’s Hybrid Species Model

A B C

ττ

γγ

1−γγ

A B C

P(C(AB)) = 1−(2/3)exp(−ττ)P(A(BC))=(1/3)exp(−ττ)P(B(AC))=(1/3)exp(−ττ)

MutationProcess

ττ

A B C

P(C(AB))=(1/3)exp(−ττ)P(A(BC))=1−(2/3)exp(−ττ)P(B(AC))=(1/3)exp(−ττ)

MutationProcess

ττ

UGA Bioinformatics Symposium 2012

Page 36: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Background: STEM’s Hybrid Species ModelsExample 4: Hybridization in Heliconius

STEM’s Hybrid Species Model

Species tree subject to hybridization

A B C

ττ

γγ

1−γγ

A B C

P(C(AB)) = 1−(2/3)exp(−ττ)P(A(BC))=(1/3)exp(−ττ)P(B(AC))=(1/3)exp(−ττ)

MutationProcess

ττ

A B C

P(C(AB))=(1/3)exp(−ττ)P(A(BC))=1−(2/3)exp(−ττ)P(B(AC))=(1/3)exp(−ττ)

MutationProcess

ττ

UGA Bioinformatics Symposium 2012

Page 37: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Background: STEM’s Hybrid Species ModelsExample 4: Hybridization in Heliconius

STEM’s Hybrid Species Model

Hybridization parameter to model the extent of the contributionfrom each parent

A B C

ττ

γγ

1−γγ

A B C

P(C(AB)) = 1−(2/3)exp(−ττ)P(A(BC))=(1/3)exp(−ττ)P(B(AC))=(1/3)exp(−ττ)

MutationProcess

ττ

A B C

P(C(AB))=(1/3)exp(−ττ)P(A(BC))=1−(2/3)exp(−ττ)P(B(AC))=(1/3)exp(−ττ)

MutationProcess

ττ

UGA Bioinformatics Symposium 2012

Page 38: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Background: STEM’s Hybrid Species ModelsExample 4: Hybridization in Heliconius

STEM’s Hybrid Species Model

Possible parental species trees

A B C

ττ

γγ

1−γγ

A B C

P(C(AB)) = 1−(2/3)exp(−ττ)P(A(BC))=(1/3)exp(−ττ)P(B(AC))=(1/3)exp(−ττ)

MutationProcess

ττ

A B C

P(C(AB))=(1/3)exp(−ττ)P(A(BC))=1−(2/3)exp(−ττ)P(B(AC))=(1/3)exp(−ττ)

MutationProcess

ττ

UGA Bioinformatics Symposium 2012

Page 39: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Background: STEM’s Hybrid Species ModelsExample 4: Hybridization in Heliconius

STEM’s Hybrid Species Model

Probabilities associated with each gene tree topology for eachparental tree under the coalescent model

A B C

ττ

γγ

1−γγ

A B C

P(C(AB)) = 1−(2/3)exp(−ττ)P(A(BC))=(1/3)exp(−ττ)P(B(AC))=(1/3)exp(−ττ)

MutationProcess

ττ

A B C

P(C(AB))=(1/3)exp(−ττ)P(A(BC))=1−(2/3)exp(−ττ)P(B(AC))=(1/3)exp(−ττ)

MutationProcess

ττ

UGA Bioinformatics Symposium 2012

Page 40: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Background: STEM’s Hybrid Species ModelsExample 4: Hybridization in Heliconius

STEM’s Hybrid Species Model

Sequence evolution proceeds along gene trees

A B C

ττ

γγ

1−γγ

A B C

P(C(AB)) = 1−(2/3)exp(−ττ)P(A(BC))=(1/3)exp(−ττ)P(B(AC))=(1/3)exp(−ττ)

MutationProcess

ττ

A B C

P(C(AB))=(1/3)exp(−ττ)P(A(BC))=1−(2/3)exp(−ττ)P(B(AC))=(1/3)exp(−ττ)

MutationProcess

ττ

UGA Bioinformatics Symposium 2012

Page 41: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Background: STEM’s Hybrid Species ModelsExample 4: Hybridization in Heliconius

Inference of Trees Subject to Hybridization

Assumptions:

I Hybridization results in a mosaic genome, so that a sampledgene has a probability distribution that its history originatedfrom one of several parental species trees

I Genes in the sample are independent given the species tree

I Hybridization events happen only between sister taxa

I No factors other than coalescence and hybridization lead toincongruence between gene trees and the species tree

UGA Bioinformatics Symposium 2012

Page 42: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Background: STEM’s Hybrid Species ModelsExample 4: Hybridization in Heliconius

Likelihood Calculation for the Three-taxon Case

I Let f (gi |S) be the probability density of gene tree gi givenspecies tree S under the coalescent model (Rannala and Yang,2003)

I The likelihood function for the three-taxon case is

N∏i=1

{γf (gi |S1) + (1− γ)f (gi |S2)}

whereS1 and S2 are two possible parental species treesγ ∈ [0, 1]

UGA Bioinformatics Symposium 2012

Page 43: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Background: STEM’s Hybrid Species ModelsExample 4: Hybridization in Heliconius

Likelihood Calculation for the Three-taxon Case

I Let f (gi |S) be the probability density of gene tree gi givenspecies tree S under the coalescent model (Rannala and Yang,2003)

I The likelihood function for the three-taxon case is

N∏i=1

{γf (gi |S1) + (1− γ)f (gi |S2)}

whereS1 and S2 are two possible parental species treesγ ∈ [0, 1]

UGA Bioinformatics Symposium 2012

Page 44: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Background: STEM’s Hybrid Species ModelsExample 4: Hybridization in Heliconius

Likelihood Calculation for the Three-taxon Case

N∏i=1

{γf (gi |S1) + (1− γ)f (gi |S2)}

A B C

ττ

γγ

1−γγ

A B Cf(g | S1) Mutation

Process

ττ

A B Cf(g | S2) Mutation

Process

ττ

UGA Bioinformatics Symposium 2012

Page 45: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Background: STEM’s Hybrid Species ModelsExample 4: Hybridization in Heliconius

Beyond Three Taxa . . .

I Propose a method which incorporates any number ofhybridization events, provided they occur between sister taxa

I Each putative hybridization event is assigned a parameter,γ1, γ2, . . .

I The likelihood is computed by looking at all combinations ofpossible parental species trees, weighted appropriately by theγj parameters

UGA Bioinformatics Symposium 2012

Page 46: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Background: STEM’s Hybrid Species ModelsExample 4: Hybridization in Heliconius

A Big(-ger) Example

Recall our motivatingexample:

A B C D E F A B C D E F

A B C D E F A B C D E F

Consider the hybrid species tree:

A C E FDB

UGA Bioinformatics Symposium 2012

Page 47: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Background: STEM’s Hybrid Species ModelsExample 4: Hybridization in Heliconius

A Big(-ger) Example

Recall our motivatingexample:

A B C D E F A B C D E F

A B C D E F A B C D E F

Consider the hybrid species tree:

A C E FDB

UGA Bioinformatics Symposium 2012

Page 48: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Background: STEM’s Hybrid Species ModelsExample 4: Hybridization in Heliconius

The Likelihood Function

A C E FDB

→A B C D E F

γγ1γγ2

S1

A B C D E F(1−γγ1)γγ2

S3

A B C D E Fγγ11−γγ2)

S2

A B C D E F(1−γγ1)(1−γγ2)

S4

∏Ni=1 {γ1γ2f (gi |S1) + γ1(1− γ2)f (gi |S2)

+(1− γ1)γ2f (gi |S3) + (1− γ1)(1− γ2)f (gi |S4)}

UGA Bioinformatics Symposium 2012

Page 49: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Background: STEM’s Hybrid Species ModelsExample 4: Hybridization in Heliconius

Comments on Computation

I Parameters in the likelihood function: γ1, γ2, branch lengths

I For a given hybrid species tree and sample of gene trees withdivergence times, maximum likelihood branch lengths can beanalytically determined

I Fitting the likelihood model for a hypothesized hybrid speciestree only requires optimization of γ parameters

I Implemented in a modified version of the program STEM,called STEM-hy (“stemmy”)

UGA Bioinformatics Symposium 2012

Page 50: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Background: STEM’s Hybrid Species ModelsExample 4: Hybridization in Heliconius

Selecting the Best Hybrid Species Tree

I For the example hybrid species tree, pick the best hybridmodel from among possible models using the AIC:

Model Tree γ1 γ2 Number of Parameters

1 A B C D E F 0 0 5

2 A B C D E F 0 1 5

3 A B C D E F 1 0 5

4 A B C D E F 1 1 5

UGA Bioinformatics Symposium 2012

Page 51: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Background: STEM’s Hybrid Species ModelsExample 4: Hybridization in Heliconius

Selecting the Best Hybrid Species Tree

Model Tree γ1 γ2 Number of Parameters

5 A B C E FD 0 (0,1) 6

6 A B C E FD 1 (0,1) 6

7 A C D E FB (0,1) 0 6

8 A C D E FB (0,1) 1 6

9 A C E FDB (0,1) (0,1) 7

UGA Bioinformatics Symposium 2012

Page 52: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Background: STEM’s Hybrid Species ModelsExample 4: Hybridization in Heliconius

STEM-hy: Assumptions

I In practice, the γi are not given (neither are times ofspeciation or hybridization events). The algorithm finds MLEsfor these parameters.

I STEM-hy inherits all of STEM-hy’s other assumptions (e.g.,no gene flow after speciation if no hybridization, gene treevariability is not taken into consideration, etc.).

UGA Bioinformatics Symposium 2012

Page 53: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Background: STEM’s Hybrid Species ModelsExample 4: Hybridization in Heliconius

STEM-hy: Assumptions

I One important point: STEM-hy looks for evidence ofhybridization in the presence of incomplete lineage sorting.

I By using the model in STEM-hy to compute likelihoods, thecoalescent process is incorporated.

I The AIC is used to compare models:

I AIC = −2lnL(M|D) + 2k

where M is the model and D is the data. LnL(M|D) is thelikelihood from STEM-hy for the hybridization model underconsideration.

UGA Bioinformatics Symposium 2012

Page 54: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Background: STEM’s Hybrid Species ModelsExample 4: Hybridization in Heliconius

Example 4: Hybridization in Heliconius

I Input data format is the same as for previous analyses:

I Gene trees are placed in the file called genetrees.tre (option1) or the files containing the gene trees are listed in thesettings file (option 2).

I The settings file (in yaml format) is used to give usersettings (e.g., θ).

I The run option is set to 3.

UGA Bioinformatics Symposium 2012

Page 55: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Background: STEM’s Hybrid Species ModelsExample 4: Hybridization in Heliconius

Example 4: Hybridization in Heliconius

I The user must additionally provide information abouthybridization:

I The only option at present is to use a user-specified tree – thepresent version of the program assumes that the overall speciesphylogeny is known.

I The user-specified tree is one of the possible parental trees – itdoesn’t matter which one.

I The putative hybrid species are identified in thesettings.yaml file.

UGA Bioinformatics Symposium 2012

Page 56: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Background: STEM’s Hybrid Species ModelsExample 4: Hybridization in Heliconius

Example 4: Hybridization in Heliconius

H. melpomene H. heurippa H. cydno

2

1

3

H. hecale

ABCD

BCD

CDBD

UGA Bioinformatics Symposium 2012

Page 57: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Background: STEM’s Hybrid Species ModelsExample 4: Hybridization in Heliconius

STEM-hy Example: Heliconius Butterflies

I Example genetrees.tre file:

[0.37137]((Hheurippa:0.005989,(Hcydno:0.001322,Hmelpomene:0.001322):0.004667):0.022778,Hhecale:0.028767);[1.17059]((Hmelpomene:0.049843,(Hcydno:0.000001,Hheurippa:0.000001):0.049843):0.001,Hhecale:0.049943);[0.11434](((Hcydno:0.021024,Hheurippa:0.021024):0.020051,Hmelpomene:0.041076):0.002610,Hhecale:0.043685);[1.35454](((Hheurippa:0.010740,Hcydno:0.010740):0.003498,Hmelpomene:0.014238):0.037654,Hhecale:0.051892);[0.39096](((Hheurippa:0.008764,Hmelpomene:0.008764):0.001686,Hcydno:0.010450):0.003969,Hhecale:0.014419);[1.22683](((Hheurippa:0.002431,Hcydno:0.002431):0.062919,Hmelpomene:0.065350):0.0000001,Hhecale:0.065351);

UGA Bioinformatics Symposium 2012

Page 58: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Background: STEM’s Hybrid Species ModelsExample 4: Hybridization in Heliconius

STEM-hy Example: Heliconius Butterflies

I Example settings file:

properties:run: 3theta: 0.001beta: 0.0005burnin: 100seed: 3435893bound totali ter : 20num savedt rees : 10hybrid species: H. heurippahybrid tree: user-heliconius.tre

species:H. melpomene: M95H. hecale: HhH. cordula: M187H. heurippa: Strib40

UGA Bioinformatics Symposium 2012

Page 59: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Background: STEM’s Hybrid Species ModelsExample 4: Hybridization in Heliconius

Example 4: Hybridization in Heliconius

I Example user-heliconius.tre:

(((H. heurippa:0.000085,H. cydno:0.000085):0.347479,H. melpomene:0.355979):3.332091,H. hecale:3.68807);

UGA Bioinformatics Symposium 2012

Page 60: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Background: STEM’s Hybrid Species ModelsExample 4: Hybridization in Heliconius

Example 4: Hybridization in Heliconius

****************Results*****************....

Parental trees:

gamma(H. heurippa) = 1((H. cydno:0.00009,(H. heurippa:0.00009,H. melpomene:0.00009):0.00000):3.68801,H. hecale:3.68810);Lik: -357.4325907499209AIC: 720.8651814998418k: 3

gamma(H. heurippa) = 0(((H. heurippa:0.00009,H. cydno:0.00009):0.35589,H. melpomene:0.35598):3.33212,H. hecale:3.68810);Lik: -349.9185707499209AIC: 705.8371414998418k: 3

Hybrid trees:

(((H. heurippa:0.00009,H. cydno:0.00009):0.35589,H. melpomene:0.35598):3.33212,H. hecale:3.68810);Lik: -349.5409832924012gamma(H. heurippa): 0.6600000000000004AIC: 707.0819665848024k: 4

****************** Done ****************

UGA Bioinformatics Symposium 2012

Page 61: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Background: STEM’s Hybrid Species ModelsExample 4: Hybridization in Heliconius

What hybrid species can be considered?

I Care must be taken in selecting hybrid species:

I Both members of a sister group cannot be selected as hybridtaxa in a single analysis. However, two analyses can be run(one with each of the sister group identified as the hybrid) andresults will be comparable across runs.

I The outgroup cannot be selected as a hybrid.

I Both of these restrictions result from the fact that for nowhybridization is only considered between sister taxa.

I More general hybridization relationships can be considered “byhand” using the user-specified tree feature of STEM-hy.

UGA Bioinformatics Symposium 2012

Page 62: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Background: STEM’s Hybrid Species ModelsExample 4: Hybridization in Heliconius

STEM-hy: Strengths and Weaknesses

I STEM-hy makes some fairly strong assumptions:

I Error in estimating gene trees and branch lengths is notincorporated!!!! But the possibility of carrying out bootstrapanalysis helps.

I Information in the sequence data is not used directly; it is onlyused as summarized by estimated gene divergence times.

I There is a single value of θ for the entire tree.

I There are trade-offs involved, and STEM-hy does some things well:

I It is quick (even the tree search does not take long).I It can handle missing data easily and intuitively.I Simulations demonstrate reasonable performance (unlikely to

be misleading; may be uninformative).

UGA Bioinformatics Symposium 2012

Page 63: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Background: STEM’s Hybrid Species ModelsExample 4: Hybridization in Heliconius

STEM-hy: Strengths and Weaknesses

I STEM-hy makes some fairly strong assumptions:

I Error in estimating gene trees and branch lengths is notincorporated!!!! But the possibility of carrying out bootstrapanalysis helps.

I Information in the sequence data is not used directly; it is onlyused as summarized by estimated gene divergence times.

I There is a single value of θ for the entire tree.

I There are trade-offs involved, and STEM-hy does some things well:

I It is quick (even the tree search does not take long).I It can handle missing data easily and intuitively.I Simulations demonstrate reasonable performance (unlikely to

be misleading; may be uninformative).

UGA Bioinformatics Symposium 2012

Page 64: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

Challenge Datasets

I I’ve created four datasets under varying conditions:

M1 No hybridization, long intervals between speciation events.

M2 No hybridization, short intervals between speciation events.

M3 Low-levels of hybridization - B is a hybrid of A and C (speciestree as in M1 and M2).

M4 Extensive hybridization - B is a hybrid of A and C (species treeas in M1 and M2).

I All data sets have 6 species, 2 individuals/species, and 10 loci.

I GOAL: match the data set to the condition listed aboveSolutions are at www.stat.osu.edu/∼lkubatko/uga2012.html

UGA Bioinformatics Symposium 2012

Page 65: STEM-hy: Species Tree Estimation using Maximum likelihood ... · What is STEM-hy? Using STEM-hy for Species Tree Estimation Using STEM-hy for Hybridization Analysis ... Small Example

What is STEM-hy?Using STEM-hy for Species Tree Estimation

Using STEM-hy for Hybridization AnalysisPractice Datasets

STEM-hy Information, References, etc.

Recommended citations - species tree estimation:

I Kubatko, L.S., B. C.Carstens, and L. L. Knowles. 2009. STEM: Species Tree Estimation using Maximumlikelihood under coalescence. Bioinformatics 25(7): 971-973.

I Liu, L., L. Yu, and D.K. Pearl. 2009. Maximum tree: a consistent estimator of the species tree. Journal ofMathematical Biology 60(1):95-106.

I Mossel, E. and S. Roch. 2010. Incomplete lineage sorting: Consistent phylogeny estimation from multipleloci. IEEE/ACM Transactions on Computational Biology and Bioinformatics 7(1): 166-171.

Recommended citations - hybridization:

I Kubatko, LS. 2009. Identifying Hybridization Events in the Presence of Coalescence via Model Selection,Systematic Biology 58(5): 478-488.

Thank you!

STEM-hy is available at http://www.stat.osu.edu/∼lkubatko/software/STEM/

Questions concerning the programs can be sent to [email protected].

UGA Bioinformatics Symposium 2012