abcd & biocase a quick introduction. motivation & rationale – abcd i “access to...
TRANSCRIPT
ABCD & BioCASe
A Quick Introduction
Motivation & Rationale – ABCD I
“Access to Biological Collection Data” v2.06 ratified by TDWG, v1.20 still in use
Comprehensive data exchange format to share detailed primary collection data of all
organisms (living & preserved specimens and observations)
lately “Extended For Geoscience”: ABCDEFG Fossils, rocks, minerals, meteorites
Covers ~925 concepts (EFG adds ~825)
Motivation & Rationale – ABCD II
Monolithic xml schema with single root including ~15 subschemas defining reusable types (->UBIF)
To ease processing & publishing ABCD refrains from: keys; no normalisation, no external standards recursive xml structures
Contains xsd:any extension slots for ad-hoc additions through external schemas
Variable atomisation (polymorphism) allows provision of data in different degrees of detail & standardisation simplifies publishing, but pushes work to consumers priority towards mobilisation of rich data sources
Motivation & Rationale – BioCASe
XML based protocol for discovery & retrieval in distributed, heterogenous systems
Derived from DiGIR to share ABCD records supports hierarchical, nested structures
Shared “ontology” defined in xml schema(s) concept = namespace + simple xpath to element in
potential instance document
Publishing Software Implementations
PyWrapper implements BioCASe protocol
Security layer for BioCASe services roles, signatures, access restriction, encryption,
SSL
Publishing Software Deployments
PyWrapper ABCD
~65 installations, 85 databases, mainly in Europe GCP Passport
~15 installations, 22 DBs, worldwide BioCASE Collection Metadata Profile
~25 installations & DBs, Europe SPICE / Species2000
4 installations & DBs, Europe
+35 installation p.a. (ABCD,MCPDH)
Consuming Software Implementations
Simple UI, a simple but truly distributed portal
Quertool for a single BioCASE service uses XSLTs (ABCD + EFG, DarwinCore)
Synthesys portal, in prep. uses multilingual XSLTs + central cache
Germplasm clearing house mechanism for GCP Passport & ABCD services + cache
GBIF indexer/portal
Consuming Software Deployments
Simple UI BioCASE GBIF germany botany
Quertool for a single BioCASE service part of PyWrapper, customized layout ~5 times
Synthesys portal, in prep. Synthesys GBIF-DE GBIF-FR
Germplasm clearing house mechanism portal
Potential Publishers
All biological & geoscience collections thousands, probably > 20.000 collections billions of records, 1.5-3 billions vouchered digitisation currently very low (5% ?)
Potential Customers
Researchers biologists (taxonomy, genetics, ecology) historians
Policy Makers e.g. land planning
Public, Education, ...more customers the more
detailed the data (e.g. images) exhaustive covering of collection
Success Factors
ComprehensivenessEase of use (software installation)Personal free technical support (for
Europe)Allows for custom extensions
Hurdles to Adoption
Complexity, both for consumers and publishers
lack of good documentation with examples
Big Picture
ABCD feeds into ontology modularisation (taxa, publications, ...) as UML, XML schema, OWL