franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

An update ontaxonomic concept reasoning

Please

@taxonbytes

Nico Franz1 & Bertram Ludäscher2

1 School of Life Sciences, Arizona State University2 iSchool, University of Illinois at Urbana-Champaign

TDWG 2016 – Biodiversity Information Standards

December 06, 2016 – Instituto Tecnológico de Costa Rica (#TDWG16)

@ http://www.slideshare.net/taxonbytes/franz-ludaescher-tdwg-2016-an-update-on-taxonomic-concept-reasoning

https://twitter.com/taxonbytes

http://www.slideshare.net/taxonbytes/franz-et-al-escjam-2015-logic-resolution-taxonomic-variable



The big picture:

Why taxonomic concept reasoning?

The pluralistic domain of human taxonomy making

Source: Rylands & Mittermeyer. 2014. Primate taxonomy: species and conservation. doi:10.1002/evan.21387

"100 yearsof primate

taxonomies"

The pluralistic domain of human taxonomy making• Taxonomies are endorsed by us (humans); more or less democratically.

• They consist of sets of labels, data, and theories about the natural world.



taxonomies"

The pluralistic domain of human taxonomy making• Taxonomies are endorsed by us (humans); more or less democratically.

• They consist of sets of labels, data, and theories about the natural world.

• Over time, these theories change – converge or conflict (often in parallel).



taxonomies"

A model to separate the human-made versus natural domains• While human taxonomy making unfolds (e.g. 1758 onwards), natural taxa –

which 'took' millions of years to realize – tend to not change much.

Domain of human taxonomy making("mimic")

• While human taxonomy making unfolds (e.g. 1758 onwards), natural taxa – which 'took' millions of years to realize – tend to not change much.

Natural domain ("model")

A model to separate the human-made versus natural domains


• While human taxonomy making unfolds (e.g. 1758 onwards), natural taxa – which 'took' millions of years to realize – tend to not change much.

• At any time, our labels and theories (concepts) aim to stand for taxa; yet the alignment may be approximate.

Reliable?

Reliable?

Reliable?

A model to separate the human-made versus natural domains

Natural domain ("model")


Concepts: tracking progress and conflict in the human domain• Taxonomic names and nomenclatural relationships are only so-so in terms of

tracking congruent and incongruent taxonomic perspectives.

Remsen: Using names, we're lucky when revisions are infrequent

"In biology, there are many taxa that are so under-studied thatthey are only known from their original description and

none or very few subsequent references […].

The name alone, so long as it is a unique name,is sufficient to locate all related material."

– David Remsen 2016: 213

Source: Remsen. 2016. The use and limits of scientific names in biological informatics. doi:10.3897/zookeys.550.9546

• Taxonomic names and nomenclatural relationships are only so-so in terms of tracking congruent and incongruent taxonomic perspectives.

• Logic-based multi-taxonomic alignments require better contextualization of labels and relationships, and better specification of "taxonomic sameness".

1912 vs. 1967Logically

reconcilable?

Δ = ?Δ

Δ

Δ

Concepts: tracking progress and conflict in the human domain

Still bigger (re: Synthesis):

Why taxonomic concept reasoning?

Why promote taxonomic pluralism? *• Our work extends and complements prior TDWG efforts related to the

Taxonomic Concept Transfer Schema (https://github.com/tdwg/tcs).

* See also Franz & Sterner @ TDWG16, Friday, 11:30 am (#1134) in Session "Data Gaps, Trust, Knowledge Acquisition"

https://github.com/tdwg/tcs


• Our work extends and complements prior TDWG efforts related to the Taxonomic Concept Transfer Schema (https://github.com/tdwg/tcs).

• This work is necessary because using only Darwin Core tends to suppress taxonomic pluralism:

• DwC syntax is too under-powered for tracking multi-taxonomy alignments.

• DwC semantics ("Taxon") are too ambiguous to enforce a consistent recognition of the two domains (human taxonomy making vs. natural world).


Why promote taxonomic pluralism? *







• Technical and political means of suppressing taxonomic pluralism "by design" have implications for data quality and trust in data aggregation.









• Technical and political means of suppressing taxonomic pluralism "by design" have implications for data quality and trust in data aggregation.

• "Synthesis" does not necessarily require taxonomic monism ("backbone"). Logic-reconciled pluralism can provide a trust-generating path for systematists' contributions towards large-scale taxonomic data integration.





An update on Euler/X:

Logic, use cases, and novel services

Euler/X – logically consistent RCC–5 alignments

• Input: multiple taxonomies and/or phylogenies; expert-provided articulations.

• Output: logic consistency checking; Maximally Informative Relations (MIR); alignment visualizations.

Products – concept taxonomy in theory and in practice ZooKeys. doi:10.3897/zookeys.528.6001

Semantic Web. doi:10.3233/SW-160220

Biological Theory (accepted). doi:10.1101/022145

PloS ONE. doi:10.1371/journal.pone.0118247

Systematics Biodiv. doi:10.1080/14772000.2013.806371

Systematic Biology. doi:10.1093/sysbio/syw023

Biodiversity Data Journal (accepted). #6093Research Ideas and Outcomes. doi: 10.3897/rio.2.e10610

Source: Thau, D.M. 2010. Reasoning about taxonomies. Thesis, UC Davis. http://gradworks.proquest.com/3422778.pdf

Region Connection Calculus (set constraints)

== < > >< !• Two regions N, M are either:

• congruent (N == M)• properly inclusive (N < M)• inversely properly inclusive (N > M)• overlapping (N >< M)• exclusive of each other (N ! M)

http://gradworks.proquest.com/3422778.pdf



Source: Thau, D.M. 2010. Reasoning about taxonomies. Thesis, UC Davis. http://gradworks.proquest.com/3422778.pdf

Region Connection Calculus (set constraints)

== < > >< !• Two regions N, M are either:

• congruent (N == M)• properly inclusive (N < M)• inversely properly inclusive (N > M)• overlapping (N >< M)• exclusive of each other (N ! M)

• RCC–5 articulations answer the query: "can we join regions N and M?"

• Taxonomies have multiple RCC–5 alignable components: nodes (parents, children), node-associated traits, even node-anchoring specimens.




Use cases – primate classifications & avian phylogenies

1. Primate classifications sec. MSW2 (1993) versus MSW3 (2005)

a. Microcebus + Mirza sec. MSW3 (2005) with coverage constraint

b. Quantifying name (identifier) reliability

c. Reasoning achieves scalability (matrix)

2. Avian phylogenies sec. Prum et al. (2015) versus Jarvis et al. (2014)

a. Psittaciformes with & without coverage

b. Alignment of the "Neoavian explosion"

Use case 1:

Two primate classifications –

MSW2 (1993) versus MSW3 (2005)

Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)

"Taxonomic concept labels"identify input concept regions

RCC–5 articulations providedfor each species-level concept

• Input visualization: MSW3 (2005) versus MSW2 (1993)

Source: Franz et al. 2016. Two influential primate classifications logical aligned. doi:10.1093/sysbio/syw023

• Alignment visualization: "grey means taxonomically congruent"


One name &congruent region




Many names &congruent region





One name &non-congruent regions






Many names &non-congruent regions







New names &exclusive regions








• Application of coverage constraint: parent-to-parent articulations (><) are fully defined by alignment signal propagated from their respective children.

Sensible when complete sampling of children is intended.



Use case 1.b.: Quantifying name (identifier) reliability


• Alignment visualization: RCC–5 as an identifier assessment tool [good / not]






• Alignment visualization: RCC–5 as an identifier assessment tool [good / not]





• Query services rendered: (1) MSW3 destabilizes MSW2; (2) non-congruence is not only caused by differential low-level sampling; (3) alignment constitutes a taxonomic meaning integration map to navigate across MSW3 & MSW2.

Use case 1.b.: Quantifying name (identifier) reliability

1 in 3 names is unreliable across MSW2/MSW3 classifications

Source: Franz et al. 2016. Two influential primate classifications logical aligned. doi:10.1093/sysbio/syw023

Use case 1.c.: Reasoning achieves scalability (MIR matrix)

Source: Dang et al. 2015. ProvenanceMatrix: a visualization tool for multi-taxonomy alignments. CEUR Workshop Proceedings 1456: 13–24. http://ceur-ws.org/Vol-1456/paper2.pdf

• Input: 402 articulations. Output: 153,111 Maximally Informative Relations

Salmon cells↔ reasoning

http://ceur-ws.org/Vol-1456/paper2.pdf



Use case 2:

Avian phylogenies sec. Prum et al. (2015)

versus Jarvis et al. (2014)

Source: Thomas, G.H. 2015. An avian explosion. Nature 526: 516–517. doi:10.1038/nature15638

2015 2014

Phylogenetic inferencescan vary over time.

Use case 2: Aves sec. Prum et al. (2015) versus Jarvis et al. (2014)

• Sampling is highly differential: 198 versus 48 species-level entities• Only 12 species-level concept pairs are congruent [green cells]

Use case 2.a.: Psittaciformes with & without coverage constraint

• Psittaciformes sec. 2015 – with global coverage constraint

Input visualization Only disjoint articulations

• Psittaciformes sec. 2015 – with global coverage constraint• No low-level congruence ↔ no congruent alignment regions

Input visualization Only disjoint articulations

Alignment visualization 108 MIR; all disjoint


• Psittaciformes sec. 2015 – with coverage locally relaxed

Input visualization


• Psittaciformes sec. 2015 – with coverage locally relaxed• "No coverage" constraint for 2014/2015.[Psittacidae, Nestor]

Input visualization


• Psittaciformes sec. 2015 – with coverage locally relaxed• "No coverage" constraint for 2014/2015.[Psittacidae, Nestor]

• Allows for 3 congruent & 7 inclusive RCC–5 articulations

Input visualization


• Psittaciformes sec. 2015 – with coverage locally relaxed• Higher-level congruence despite low-level non-congruence

• 160 MIR: 10 congruent; 65 (inversely) properly inclusive

Alignment visualization


• Psittaciformes sec. 2015 – with coverage locally relaxed• Higher-level congruence despite low-level non-congruence

• 160 MIR: 10 congruent; 65 (inversely) properly inclusive

Alignment visualization

Additional 2015 low-level sampling


Use case 2.b.: Alignment of the "Neoavian explosion"

• Aves sec. 2015/2014, down to ordinal level – with coverage locally relaxed


Non-congruence within2015.Paleognathae Non-congruence within

2014.Pelecanimorphae

Use case 2.b.: Alignment of the "Neoavian explosion"


Non-congruence within2015/2014.Neoaves

(see next slide)

Use case 2.b.: Precise semiotics for the "avian explosion"

• Neoaves sec. 2015/2014, and 3–4 less inclusive levels

26 overlapping articulations in the sub- Neoavian alignment region cannot be assigned to differential sampling 'Genuine' phylogenetic conflict

Use case 2.b.: Precise semiotics for the "avian explosion"

In conclusion:

Achievements, challenges, promise

Taxonomic concept reasoning – now & soon?• Current reasoning toolkit over can typically handle:

• 2-6 input taxonomies at once,

• maximally with ca. 3,200 input concepts.




• Wider adoption is increasingly a matter of making the case, generating will at various levels: publishing systematists, TDWG, aggregators, publishers, etc.

• Theory and reasoning performance are no longer most pressing limitations.




• Wider adoption is increasingly a matter of making the case, generating will at various levels: publishing systematists, TDWG, aggregators, publishers, etc.

• Theory and reasoning performance are no longer most pressing limitations.

• Two new applications in planning:

• Integration of taxonomic concept syntax and semantics into Pensoft's "Open Biodiversity Knowledge Management System" (OBKMS).

• Transition of a specimen-based Symbiota flora portal (SERNEC) to utilizing (only) taxonomic concepts and RCC–5 relationships.

Acknowledgements & links to products and references

• TDWG#16 organizers, especially Gail Kampmeier & William Ulate!

• Euler/X & ETC teams (extended): Shawn Bowers, Mingmin Chen, Hong Cui, Parisa Kianmajd, James Macklin, Timothy McPhillips, Robert Morris, Thomas Rodenhausen, and Shizhuo Yu.

• ProvenanceMatrix: Tuan Nhon Dang.

• NSF DEB–1155984, DBI–1342595 (PI Franz).

• NSF IIS–118088, DBI–1147273 (PI Ludäscher).

• Information @ http://taxonbytes.org/tag/concept-taxonomy/

• Euler/X code @ https://github.com/EulerProject/EulerX

http://taxonbytes.org/tag/concept-taxonomy/

https://github.com/EulerProject/EulerX

https://github.com/EulerProject/EulerX

Interested in exploringmulti-taxonomy & -

phylogeny alignments?Please contact me.

[email protected]@taxonbytes

https://biokic.asu.edu/