franz ludaescher tdwg 2016 an update on taxonomic concept reasoning
TRANSCRIPT
An update ontaxonomic concept reasoning
Please
@taxonbytes
Nico Franz1 & Bertram Ludäscher2
1 School of Life Sciences, Arizona State University2 iSchool, University of Illinois at Urbana-Champaign
TDWG 2016 – Biodiversity Information Standards
December 06, 2016 – Instituto Tecnológico de Costa Rica (#TDWG16)
@ http://www.slideshare.net/taxonbytes/franz-ludaescher-tdwg-2016-an-update-on-taxonomic-concept-reasoning
The big picture:
Why taxonomic concept reasoning?
The pluralistic domain of human taxonomy making
Source: Rylands & Mittermeyer. 2014. Primate taxonomy: species and conservation. doi:10.1002/evan.21387
"100 yearsof primate
taxonomies"
The pluralistic domain of human taxonomy making• Taxonomies are endorsed by us (humans); more or less democratically.
• They consist of sets of labels, data, and theories about the natural world.
Source: Rylands & Mittermeyer. 2014. Primate taxonomy: species and conservation. doi:10.1002/evan.21387
"100 yearsof primate
taxonomies"
The pluralistic domain of human taxonomy making• Taxonomies are endorsed by us (humans); more or less democratically.
• They consist of sets of labels, data, and theories about the natural world.
• Over time, these theories change – converge or conflict (often in parallel).
Source: Rylands & Mittermeyer. 2014. Primate taxonomy: species and conservation. doi:10.1002/evan.21387
"100 yearsof primate
taxonomies"
A model to separate the human-made versus natural domains• While human taxonomy making unfolds (e.g. 1758 onwards), natural taxa –
which 'took' millions of years to realize – tend to not change much.
Domain of human taxonomy making("mimic")
• While human taxonomy making unfolds (e.g. 1758 onwards), natural taxa – which 'took' millions of years to realize – tend to not change much.
Natural domain ("model")
A model to separate the human-made versus natural domains
Domain of human taxonomy making("mimic")
• While human taxonomy making unfolds (e.g. 1758 onwards), natural taxa – which 'took' millions of years to realize – tend to not change much.
• At any time, our labels and theories (concepts) aim to stand for taxa; yet the alignment may be approximate.
Reliable?
Reliable?
Reliable?
A model to separate the human-made versus natural domains
Natural domain ("model")
Domain of human taxonomy making("mimic")
Concepts: tracking progress and conflict in the human domain• Taxonomic names and nomenclatural relationships are only so-so in terms of
tracking congruent and incongruent taxonomic perspectives.
Remsen: Using names, we're lucky when revisions are infrequent
"In biology, there are many taxa that are so under-studied thatthey are only known from their original description and
none or very few subsequent references […].
The name alone, so long as it is a unique name,is sufficient to locate all related material."
– David Remsen 2016: 213
Source: Remsen. 2016. The use and limits of scientific names in biological informatics. doi:10.3897/zookeys.550.9546
• Taxonomic names and nomenclatural relationships are only so-so in terms of tracking congruent and incongruent taxonomic perspectives.
• Logic-based multi-taxonomic alignments require better contextualization of labels and relationships, and better specification of "taxonomic sameness".
1912 vs. 1967Logically
reconcilable?
Δ = ?Δ
Δ
Δ
Concepts: tracking progress and conflict in the human domain
Still bigger (re: Synthesis):
Why taxonomic concept reasoning?
Why promote taxonomic pluralism? *• Our work extends and complements prior TDWG efforts related to the
Taxonomic Concept Transfer Schema (https://github.com/tdwg/tcs).
* See also Franz & Sterner @ TDWG16, Friday, 11:30 am (#1134) in Session "Data Gaps, Trust, Knowledge Acquisition"
• Our work extends and complements prior TDWG efforts related to the Taxonomic Concept Transfer Schema (https://github.com/tdwg/tcs).
• This work is necessary because using only Darwin Core tends to suppress taxonomic pluralism:
• DwC syntax is too under-powered for tracking multi-taxonomy alignments.
• DwC semantics ("Taxon") are too ambiguous to enforce a consistent recognition of the two domains (human taxonomy making vs. natural world).
* See also Franz & Sterner @ TDWG16, Friday, 11:30 am (#1134) in Session "Data Gaps, Trust, Knowledge Acquisition"
Why promote taxonomic pluralism? *
• Our work extends and complements prior TDWG efforts related to the Taxonomic Concept Transfer Schema (https://github.com/tdwg/tcs).
• This work is necessary because using only Darwin Core tends to suppress taxonomic pluralism:
• DwC syntax is too under-powered for tracking multi-taxonomy alignments.
• DwC semantics ("Taxon") are too ambiguous to enforce a consistent recognition of the two domains (human taxonomy making vs. natural world).
• Technical and political means of suppressing taxonomic pluralism "by design" have implications for data quality and trust in data aggregation.
* See also Franz & Sterner @ TDWG16, Friday, 11:30 am (#1134) in Session "Data Gaps, Trust, Knowledge Acquisition"
Why promote taxonomic pluralism? *
• Our work extends and complements prior TDWG efforts related to the Taxonomic Concept Transfer Schema (https://github.com/tdwg/tcs).
• This work is necessary because using only Darwin Core tends to suppress taxonomic pluralism:
• DwC syntax is too under-powered for tracking multi-taxonomy alignments.
• DwC semantics ("Taxon") are too ambiguous to enforce a consistent recognition of the two domains (human taxonomy making vs. natural world).
• Technical and political means of suppressing taxonomic pluralism "by design" have implications for data quality and trust in data aggregation.
• "Synthesis" does not necessarily require taxonomic monism ("backbone"). Logic-reconciled pluralism can provide a trust-generating path for systematists' contributions towards large-scale taxonomic data integration.
* See also Franz & Sterner @ TDWG16, Friday, 11:30 am (#1134) in Session "Data Gaps, Trust, Knowledge Acquisition"
Why promote taxonomic pluralism? *
An update on Euler/X:
Logic, use cases, and novel services
Euler/X – logically consistent RCC–5 alignments
• Input: multiple taxonomies and/or phylogenies; expert-provided articulations.
• Output: logic consistency checking; Maximally Informative Relations (MIR); alignment visualizations.
Products – concept taxonomy in theory and in practice ZooKeys. doi:10.3897/zookeys.528.6001
Semantic Web. doi:10.3233/SW-160220
Biological Theory (accepted). doi:10.1101/022145
PloS ONE. doi:10.1371/journal.pone.0118247
Systematics Biodiv. doi:10.1080/14772000.2013.806371
Systematic Biology. doi:10.1093/sysbio/syw023
Biodiversity Data Journal (accepted). #6093Research Ideas and Outcomes. doi: 10.3897/rio.2.e10610
Source: Thau, D.M. 2010. Reasoning about taxonomies. Thesis, UC Davis. http://gradworks.proquest.com/3422778.pdf
Region Connection Calculus (set constraints)
== < > >< !• Two regions N, M are either:
• congruent (N == M)• properly inclusive (N < M)• inversely properly inclusive (N > M)• overlapping (N >< M)• exclusive of each other (N ! M)
Source: Thau, D.M. 2010. Reasoning about taxonomies. Thesis, UC Davis. http://gradworks.proquest.com/3422778.pdf
Region Connection Calculus (set constraints)
== < > >< !• Two regions N, M are either:
• congruent (N == M)• properly inclusive (N < M)• inversely properly inclusive (N > M)• overlapping (N >< M)• exclusive of each other (N ! M)
• RCC–5 articulations answer the query: "can we join regions N and M?"
• Taxonomies have multiple RCC–5 alignable components: nodes (parents, children), node-associated traits, even node-anchoring specimens.
Use cases – primate classifications & avian phylogenies
1. Primate classifications sec. MSW2 (1993) versus MSW3 (2005)
a. Microcebus + Mirza sec. MSW3 (2005) with coverage constraint
b. Quantifying name (identifier) reliability
c. Reasoning achieves scalability (matrix)
2. Avian phylogenies sec. Prum et al. (2015) versus Jarvis et al. (2014)
a. Psittaciformes with & without coverage
b. Alignment of the "Neoavian explosion"
Use case 1:
Two primate classifications –
MSW2 (1993) versus MSW3 (2005)
Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)
"Taxonomic concept labels"identify input concept regions
RCC–5 articulations providedfor each species-level concept
• Input visualization: MSW3 (2005) versus MSW2 (1993)
Source: Franz et al. 2016. Two influential primate classifications logical aligned. doi:10.1093/sysbio/syw023
• Alignment visualization: "grey means taxonomically congruent"
Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)
One name &congruent region
Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)
• Alignment visualization: "grey means taxonomically congruent"
One name &congruent region
Many names &congruent region
Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)
• Alignment visualization: "grey means taxonomically congruent"
One name &congruent region
Many names &congruent region
One name &non-congruent regions
Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)
• Alignment visualization: "grey means taxonomically congruent"
One name &congruent region
Many names &congruent region
One name &non-congruent regions
Many names &non-congruent regions
Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)
• Alignment visualization: "grey means taxonomically congruent"
One name &congruent region
Many names &congruent region
One name &non-congruent regions
Many names &non-congruent regions
New names &exclusive regions
Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)
• Alignment visualization: "grey means taxonomically congruent"
One name &congruent region
Many names &congruent region
One name &non-congruent regions
Many names &non-congruent regions
New names &exclusive regions
• Application of coverage constraint: parent-to-parent articulations (><) are fully defined by alignment signal propagated from their respective children.
Sensible when complete sampling of children is intended.
Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)
• Alignment visualization: "grey means taxonomically congruent"
Use case 1.b.: Quantifying name (identifier) reliability
One name &congruent region
• Alignment visualization: RCC–5 as an identifier assessment tool [good / not]
Many names &congruent region
One name &non-congruent regions
Many names &non-congruent regions
New names &exclusive regions
One name &congruent region
• Alignment visualization: RCC–5 as an identifier assessment tool [good / not]
Many names &congruent region
One name &non-congruent regions
Many names &non-congruent regions
New names &exclusive regions
• Query services rendered: (1) MSW3 destabilizes MSW2; (2) non-congruence is not only caused by differential low-level sampling; (3) alignment constitutes a taxonomic meaning integration map to navigate across MSW3 & MSW2.
Use case 1.b.: Quantifying name (identifier) reliability
1 in 3 names is unreliable across MSW2/MSW3 classifications
Source: Franz et al. 2016. Two influential primate classifications logical aligned. doi:10.1093/sysbio/syw023
Use case 1.c.: Reasoning achieves scalability (MIR matrix)
Source: Dang et al. 2015. ProvenanceMatrix: a visualization tool for multi-taxonomy alignments. CEUR Workshop Proceedings 1456: 13–24. http://ceur-ws.org/Vol-1456/paper2.pdf
• Input: 402 articulations. Output: 153,111 Maximally Informative Relations
Salmon cells↔ reasoning
Use case 2:
Avian phylogenies sec. Prum et al. (2015)
versus Jarvis et al. (2014)
Source: Thomas, G.H. 2015. An avian explosion. Nature 526: 516–517. doi:10.1038/nature15638
2015 2014
Phylogenetic inferencescan vary over time.
Use case 2: Aves sec. Prum et al. (2015) versus Jarvis et al. (2014)
• Sampling is highly differential: 198 versus 48 species-level entities• Only 12 species-level concept pairs are congruent [green cells]
Use case 2.a.: Psittaciformes with & without coverage constraint
• Psittaciformes sec. 2015 – with global coverage constraint
Input visualization Only disjoint articulations
• Psittaciformes sec. 2015 – with global coverage constraint• No low-level congruence ↔ no congruent alignment regions
Input visualization Only disjoint articulations
Alignment visualization 108 MIR; all disjoint
Use case 2.a.: Psittaciformes with & without coverage constraint
• Psittaciformes sec. 2015 – with coverage locally relaxed
Input visualization
Use case 2.a.: Psittaciformes with & without coverage constraint
• Psittaciformes sec. 2015 – with coverage locally relaxed• "No coverage" constraint for 2014/2015.[Psittacidae, Nestor]
Input visualization
Use case 2.a.: Psittaciformes with & without coverage constraint
• Psittaciformes sec. 2015 – with coverage locally relaxed• "No coverage" constraint for 2014/2015.[Psittacidae, Nestor]
• Allows for 3 congruent & 7 inclusive RCC–5 articulations
Input visualization
Use case 2.a.: Psittaciformes with & without coverage constraint
• Psittaciformes sec. 2015 – with coverage locally relaxed• Higher-level congruence despite low-level non-congruence
• 160 MIR: 10 congruent; 65 (inversely) properly inclusive
Alignment visualization
Use case 2.a.: Psittaciformes with & without coverage constraint
• Psittaciformes sec. 2015 – with coverage locally relaxed• Higher-level congruence despite low-level non-congruence
• 160 MIR: 10 congruent; 65 (inversely) properly inclusive
Alignment visualization
Additional 2015 low-level sampling
Use case 2.a.: Psittaciformes with & without coverage constraint
Use case 2.b.: Alignment of the "Neoavian explosion"
• Aves sec. 2015/2014, down to ordinal level – with coverage locally relaxed
• Aves sec. 2015/2014, down to ordinal level – with coverage locally relaxed
Non-congruence within2015.Paleognathae Non-congruence within
2014.Pelecanimorphae
Use case 2.b.: Alignment of the "Neoavian explosion"
• Aves sec. 2015/2014, down to ordinal level – with coverage locally relaxed
Non-congruence within2015/2014.Neoaves
(see next slide)
Use case 2.b.: Precise semiotics for the "avian explosion"
• Neoaves sec. 2015/2014, and 3–4 less inclusive levels
26 overlapping articulations in the sub- Neoavian alignment region cannot be assigned to differential sampling 'Genuine' phylogenetic conflict
Use case 2.b.: Precise semiotics for the "avian explosion"
In conclusion:
Achievements, challenges, promise
Taxonomic concept reasoning – now & soon?• Current reasoning toolkit over can typically handle:
• 2-6 input taxonomies at once,
• maximally with ca. 3,200 input concepts.
Taxonomic concept reasoning – now & soon?• Current reasoning toolkit over can typically handle:
• 2-6 input taxonomies at once,
• maximally with ca. 3,200 input concepts.
• Wider adoption is increasingly a matter of making the case, generating will at various levels: publishing systematists, TDWG, aggregators, publishers, etc.
• Theory and reasoning performance are no longer most pressing limitations.
Taxonomic concept reasoning – now & soon?• Current reasoning toolkit over can typically handle:
• 2-6 input taxonomies at once,
• maximally with ca. 3,200 input concepts.
• Wider adoption is increasingly a matter of making the case, generating will at various levels: publishing systematists, TDWG, aggregators, publishers, etc.
• Theory and reasoning performance are no longer most pressing limitations.
• Two new applications in planning:
• Integration of taxonomic concept syntax and semantics into Pensoft's "Open Biodiversity Knowledge Management System" (OBKMS).
• Transition of a specimen-based Symbiota flora portal (SERNEC) to utilizing (only) taxonomic concepts and RCC–5 relationships.
Acknowledgements & links to products and references
• TDWG#16 organizers, especially Gail Kampmeier & William Ulate!
• Euler/X & ETC teams (extended): Shawn Bowers, Mingmin Chen, Hong Cui, Parisa Kianmajd, James Macklin, Timothy McPhillips, Robert Morris, Thomas Rodenhausen, and Shizhuo Yu.
• ProvenanceMatrix: Tuan Nhon Dang.
• NSF DEB–1155984, DBI–1342595 (PI Franz).
• NSF IIS–118088, DBI–1147273 (PI Ludäscher).
• Information @ http://taxonbytes.org/tag/concept-taxonomy/
• Euler/X code @ https://github.com/EulerProject/EulerX
Interested in exploringmulti-taxonomy & -
phylogeny alignments?Please contact me.
[email protected]@taxonbytes
https://biokic.asu.edu/