a computational phylogenetic approach to interaction analysis cynthia sims parr university of...
Post on 21-Dec-2015
215 views
TRANSCRIPT
A computational phylogenetic approach to interaction analysis
Cynthia Sims Parr
University of Maryland College Park
Ecological Society of America Montreal, Canada August 9, 2005
Predicting Ecological Interactions
?
Terminology & Outline
Describe computational framework for predicting links
Propose general algorithms and discuss implications
Preliminary results Simple model using large database
and evolutionary trees does a surprisingly good job.
web
node
link
Evolutionary trees
Family Genus
Species
Computational framework
Database
Interaction Web
DatabaseADW
DB andGraph Vis tools
Algorithms
Field Test Predictions
Predictions
Explore forpatterns
PhylogeniesClassifications
Note: More than one way to do it!
Predicting Links: parameterized functions
Step 1. Select functions that might predict links using characteristics of taxa. For example, size or stoichiometry.
Step 2. Determine parameters using known links among all taxa across whole or partial database.
For taxon A and taxon B with known link status: LinkStatusAB
LinkStatusAB= ƒ(α, sizeA, sizeB) + ƒ(β, stoichA, stoichB)
Step 3. Use parameterized equation to estimate LinkStatus between target taxa C and D.
Implications: parameterized functions
Requires good data for target species Can incrementally add natural history functions to
get better estimate, try different functions from literature or use genetic algorithms
Parameterizing functions: multivariate statistics, machine learning, fuzzy inference
Could use evolutionary info if you localize parameter estimates to clades or taxonomic subsets
LinkPredictedCD = ƒ(α , sizeC,sizeD) + ƒ(β , stoichC,stoichD)
Predicting Links: neighbor distance weighting
E.g. for taxa X and Y, where X has nearest neighbor A and Y has nearest neighbor B, where LinkStatus between A,B is known N
LinkPredictedXY= 1 (LinkStatusAB) 1 + distanceXA + distanceYB
Step 1. Provide distance threshold or number of neighbors N to use.
Step 2. Find nearest neighbors to your target nodes in evolutionary or trait space with known link status.
Step 3. Combine LinkStatus weighted by distances:
Implications:Neighbor distance weighting
Evolutionary Uses phylogeny or classification or
combination of these Distance could be branch length or # steps Does not explicitly take advantage of
natural history
Trait space e.g. Euclidean distance in N-space Uses richest possible natural history data Could include evolutionary distance as a
term
Missing data avoid it avoid comparisons with nodes without complete data substitute value of relative otherwise closest in trait space “Ancestral” Node Reconstruction e.g. Phylogenetic Mixed
Model (Houseworth et al. 2001) Nodes that do not map to taxa e.g. detritus,
suspended organic matter Treat as if they are a phylogenetic unit all in one polytomy Can create a “phylogeny” of neighbors. For example,
“detritus” may be part of a reasonable heirarchy of organic material.
Nodes that are not resolved to species Doesn’t matter for these algorithms
Problems and suggested solutions
Picture of tree from TaxonTree overview
Take advantage of all information as needed
Whole web solutionsSome links affect others
use a priori prediction of strongest links to run first, allow status of these links to enter link predictions.
Webs should be realisticVary parameters (e.g. scale of parameterization, thresholds) and rerun analyses until criterion met for the whole web
Criteria: “natural” values for connectedness, stability, chain length, trophic level ratios, etc. Methodology: parsimony or likelihood analysis
Computational demands will be highS2 possible links, simultaneous multivariate equations by all variants of runs. May need heuristics.
Summary of approaches
Link prediction Parameterized functions Weighted distances
Evolutionary Trait space
Total community solution Parsimony or likelihood solution Include other links as terms and run prioritized,
stepwise analysis
Data needed
Wide range of well-identified taxa Cross section of habitats Natural history data
Database status
Source Webs Nodes Links
Animal Diversity Web n/a 1012 2869
Webs on the Web 17 1537 6328
Interaction Web DB 26 2177 9882
EcoWEB 213 4064 6363
Total 256 8790 25,442
4214 unique taxa
Evolutionary tree as in Parr et al. 2004. Bioinformatics.
LinkPredictor preliminary resultsData 43% of nodes mapped to species level 16% nodes have no evolutionary information at all. Using only presence or absence of links
Procedure Pulling out one food web at a time and predicting its links
based on the rest of the data Up to 4 steps up and down the evolutionary tree, no weighting
yet for distance
Results On average, 49% of actual links are correctly predicted 38% of predicted links are false positives
Take home: Our DB and evolutionary approach does surprisingly well at predicting food links
…With SPIRE at UMBC
More questions
What about predicting links among taxa from big studies outside the current database?
How much improvement comes from adding links to the DB?
How robust are results to differing degrees of phylogenetic resolution or taxon sampling?
How robust are results to missing data? How to handle data quality issues? Error estimates?
Future work with SPIRE
Role in ELVIS – LinkEP (Evidence Provider) Integrate into platform that
takes location as input generates list of taxa gives evidence for interaction among taxa models change due to invasive species
Pull data from semantic web rather than local database
Acknowledgements
NSF IDM/ITR 0219492 (PI Bederson) Bongshin Lee NBII
Joel Sachs and Andrey Parafiynyk Bill Fagan and lab members Michael Kantor EcoWeb (Joel Cohen) NCEAS Interaction Web Database (Diego Vázquez) WoW (J. Dunne and N. Martinez)
http://www.cs.umd.edu/hcil/biodiversityhttp://spire.umbc.edu/linkpredictor/