the iplant tree of life project and toolkit
DESCRIPTION
The iPlant Tree of Life Project and Toolkit: Building aCyberinfrastructure for Plant Science ResearchGiven at the National Museum of National History in 2011An overview of iPlant and iPToLTRANSCRIPT
The iPlant Tree of Life Project and Toolkit: Building a
Cyberinfrastructure for Plant Science Research
Naim Matasci520 303 8623
The iPlant CollaborativeNational Museum of Natural History
Jul 14, 2011
What is iPlant?
Discovery Environment
NEW RELEASE COMING SOON!
http://www.iplantcollaborative.org/discovery-environment-preview-access
4
Physical Infrastructure
Computation• 63K cores cluster• 20K cores cluster • 1 TB RAM• 512 GPUs
Storage• 2 PB • 20 PB archive• High speed
parallel data transfer
6
Cloud Storage
• Store, access and share large datasets
• Multiple points of entry: web interface, mounted FS, API
• Free and secure
AVAILABLE NOW!
http://www.iplantcollaborative.org/about/policies/data-set-hosting
Cloud Computing
• Virtual Machines– Up to 4 cores, 32 GB RAM, 100
GB dedicated disk– Run any x86-compatible OS
(even Windows)– Persistent or on-demand– Log in via SSH or secure VNC
• Use Cases– Internet-enabled Servers– Database management
appliances– Virtual desktops– …The sky is the limit!
AVAILABLE NOW!
http://www.iplantcollaborative.org/atmosphere-preview
9
Consumer Applications
iPlant's CI
iPlant Tree of Life Grand Challange
Large phylogenetic inferenceBuilding a tree of life for up to 500,000 green plants
Tree VisualizationScalable visualization for small to large trees
Data Assembly and IntegrationAcquisition, organization and processing the data
Taxonomic IntelligenceSorting out different names for the same species
Tree ReconciliationResolving discordant gene and species trees
Trait EvolutionUsing trees to understand how traits evolved
BIG TREESTo optimize existing methods to construct phylogenetic trees in the order of 500K taxa.
Big Trees
NINJA/WINDJAMMER (Travis Wheeler)Neighbor-Joining implementation that can analyze > 200K species
Six day run time reduced 32-fold to 4.5 hours for 220K species data set
Two/three day run time reduced 1,800-folds to 2 minutes for distance matrix calculation on 220K set
RAxML-Light (Alexandros Stamatakis)
Large Scale Maximum Likelihood implementation
55K Tree published (Stephen A. Smith et al., “Understanding angiosperm diversification using small and large phylogenetic trees,” American Journal of Botany 98, no. 3 (2011): 404 -414)
AVAILABLE NOW!
TREE VISUALIZATIONTo develop an application for viewing, analyzing and exploring large phylogenetic trees.
Tree Visualization
• > 500K Taxa• Fast• Web based, platform independent• Semantic zooming• Metadata driven display of information
iPlant Tree Viewer PrototypeAVAILABLE NOW!
http://portnoy.iplantcollaborative.org/
1KPCollaboration (1KP) – To support the data analysis of the Thousand Plant Transcriptomes Project
1KP
unexplored territory
N(g
enes
)
dozens of species completed genomes
N(species)
dozens of genes PCR in 104 species
Broad phylogenetic coverage
algae non-flowering flowering (angiosperm)
on role of polyploidy in
Darwin’s “abominable
mystery”
Phylogenomics of 1000 species across plant taxa
TREE RECONCILIATIONTo reconcile the evolutionary history of genes and species.
Gene family data courtesy John Bowers
Tree Reconciliation
TAXONOMIC NAME RESOLUTIONCollaboration (BIEN) - To unify and resolve synonymous, erroneous, or other conflicting taxonomic names.
Taxonomic uncertainty
1. Non-existent names• Misspellings• Contamination
• Annotations• Morphospecies• Digitization issues (frame shifts, character encoding)Lexical
variants (digitization conventions)
2. Synonymy• Nomenclatural synonyms• Taxonomic synonyms / concepts
3. Misidentifications, incomplete identifications
a) Centaurium curvistamineum (Wittr.) Abrams (1951)
b) Centaurium minimum (Howell) Piper (1915)
c) Centaurium muhlenbergii (Griseb.) Wight ex Piper (1906)
d) Centaurium muhlenbergii (Griseb.) Wight ex Piper forma albiflorum (Suksd.) St. John (1937)
e) Centaurium muhlenbergii (Griseb.) Wight ex Piper var. albiflorum Suksd. (1927)
f) Centaurodes muhlenbergii (Griseb.) Kuntze (1891)
g) Erythraea curvistaminea Wittr. (1886)
h) Erythraea minima Howell (1901)
i) Erythraea muhlenbergii Griseb. (1839)
Image: Gordon Leppig & Andrea J. Pickart
How to figure that out?
…or ask around at My-Plant.org
Makemake at de.wikipedia
Non-existent names: Herbarium specimens
*New World plant specimens, 34 herbaria, simple match against IPNI and TROPICOS, excluding authors
Total specimens: 1.1 million
Unique species names: 53,052
Published names (legitimate & illegitimate): 44,532
Misspelled names: 9371 (18%)
Specimens with misspelled names: 101,237 (9%)
Hans Hillewaert
Taxonomic Name Resolution Service
• Computer assisted standardization of plant names
• Corrects spelling errors and alternative spellings to a standard list of names
• Convert out-of-date names to currently accepted names
Availability
Source code (3-clause BSD)
http://github.com/iPlantCollaborativeOpenSource/TNRS
Web + API instructions
http://tnrs.iplantcollaborative.org
TRAIT EVOLUTIONTo develop an infrastructure for downstream analysis of large trees.
Trait Evolution
• Toolkit to study the evolution of traits of interest on very large phylogenies– Diversification– Biogeographic patterns– Adaptation– Co-evolution – …
Current analyses (Proof of concept)
• Phylogenetically Independent Contrasts(Felsenstein 1985)
• Continuous Ancestral Character Estimation (Schulter et al. 1997, Paradis 2004)
• Discrete Ancestral Character Estimation (Pagel 1994, Paradis 2004)
Community Integrated (2 ½ Days Workshop)
• EUtils• Lopper• RAxML• Ninja• Phyml• Muscle• PHYLIP• VCF to GFF script• LRmaqqtl• FASTX quality stats• FASTX quality boxplot• FASTX nucleotide distribution• Cuffcompare• ERMINEJ• progressiveMauve• iPlantBorda (mlpy)• iPlantCanberra (mlpy)• vbay
• MECPM• OUCH• Picante• Ontologize• BOWTIE• BWA• TopHat• SHRiMP• Cuffdiff• GNU Core Text utilities• GeneMania• SRA import• PARS• PL• DTT• BBC biclustering
MY-PLANT.ORGTo easily share information and research, collaborate, and stay on top of the latest news in the field.
Collaborative ToolAVAILABLE NOW!
NEW AND
IMPROVED!
http://my-plant.org/
http://www.iplantcollaborative.org