markup in the mobilization of biodiversity literature
DESCRIPTION
A contribution to the pro-iBiosphere Final Conference on June 12, 2014 in Meise, Belgium. More info via http://wiki.pro-ibiosphere.eu/wiki/Final_Conference .TRANSCRIPT
Daniel Mietchen
Markup in the mobilization of biodiversity literature
Museum für Naturkunde Berlin
● hundreds of millions of pages○ ca. 20k treatments of new taxa per year○ 50-100k re-descriptions annually○ scattered across thousands of journals
and books
Biodiversity literature
● geared towards the human reader● not machine-readable (scans/ PDF)● accumulated over three centuries● includes much of what is published
today
Legacy literature
● digital > paper-only● open access > hidden● with > without open data● soon: machine readable > PDF● these biases may skew analyses
Use & citation
● identifying concepts● linking them using controlled vocabularies● integrating with other sources of information
Markup
Reis et al. (2008), marked up by Shotton et al. (2009). CC BY 2.5
● automated markup of prospective literature● crowdsourced markup of legacy literature● semi-automated markup with expert
assistance
Scaling up
● mark up taxonomic publications henceforth● focus on revisionary works (biotas) ● adjust granularity to concrete use cases● follow standards● automate workflows
Recommendations
Image credit: Playingwithbrushes, CC BY 2.0