biological science collections tagging and tracking presented at spnhc
TRANSCRIPT
Brian Stucky, University of Colorado, BoulderJohn Deck, University of California, BerkeleyLukasz Ziemba, University of Florida, GainesevilleNico Cellinese, University of Florida, GainesvilleRob Guralnick, University of Colorado, Boulder
BiSciCol Team:Reed Beaman, Nico Cellinese, Jonathan Coddington, Neil Davies, John Deck, RobGuralnick, Bryan P. Heidorn, Chris Meyer, Tom Orrell, Rich Pyle, Kate Rachwal, BrianStucky, Rob Whitton, Lukasz Ziemba
BiSciCol: Biological Science Collections Tracker
Tracking Biodiversity Objects to Brokering Standards
Univ. Hawai’iUniv. ArizonaSmithsonian
• National Science Foundation funded 2010 – 2014• Infrastructure to tag & track specimens & derivates in cyberspace• Relies on globally unique identifiers (GUIDs) to track objects • Implements a Linked Data approach
QUANTITY OF DATA IS FIRST LINK IN A LARGER CHAIN OF ISSUES
Here is the problem:
Lots of Data ….
Generates …
Taxonomic concepts: Catalog of Life, WORMS, ITIS, EOL, GNA Geography: GBIF, IUCN ranges, Map of Life, WDPA
Genes/genomes: Genbank, TreeBase, ToL Web, AVATOL, BOLD
Phenotypes and traits: MorphBank, TRY, Phenoscape
Standards
Data stores:
EOL
GBIF
NCBI
A Growing Constellation of Biodiversity Data and Knowledge
How do we link all these data together?
Borrowing from Facebook and social media…Can we track relationships for Biological Objects as well?
Taxonomic Type Filter
Class Filter
X
X
Specimens
Tissues
Sequences
FunctionsX Infer Relationships Across providers
A Biological Relationship Graph …
Moorea Biocode Example: From field collection through analysis, across multiple systems
(Biocode Event)
(Essig Museum Specimen)
(Smithsonian Tissue)
(CAMERA Gut Sample Event)
(Genbank Sequence)
(metagenomic Sequencing)
Key Blast*n
Taxon*nTaxon
Blast
Taxon
(Key)
(Taxon)
How to Guide: Tracking Biological Object RelationshipsGroup “like” terms into classes. In Darwin Core, e.g. groups of terms: Events, Locations, Occurrences, GeologicalContext, Identification, Taxon.Assign Identifiers to objects. Use globally unique, resolvable, persistent identifiers for each class or term.
Link Identifiers using relationship terms and specified classes. For example, “This object is related to that object.”
Put this data on the Web.
Global Unique identifiers: • Globally unique (mandatory)• Persistent (not mandatory, but very helpful)• Resolvable (not mandatory, but very helpful)
Examples:
http://example.org/urn:lsid:example.org:specimen/7217D220-836A-11DF-8395-0800200C9A66 http://mycollection.org/specimen/JDeckSpecimen1http://mycollection.org/specimen/uuid=7217D220-836A-11DF-8395-0800200C9A66http://dx.doi.org/10.5072/FK2JW8GKM
Simple relationshipterms:
Graph relationships:
ONE FINAL PIECE OF THE PUZZLE:
GIVING BIRTH TO DATA IN THE RIGHT
FORMAT FOR LINKING
“Triplifier” - creating the format for linking biological objects
KEMUMysql
BiSciC
ol
Triples
tore
Darwin Core Archive
Mysql
DarwinCoreArchive
TriplifierCreate links fromNative data formats
BiSciC
ol
Triples
tore
Qu
ery
Response
QUERY AND RESULTS ACROSS LINKED DATA
Aedes increpitusSearch Scientific Name: Run
Client Interface:
Results:OccurrenceID1 (Aedes increpitus Dyar, 1916 ) OccurrenceID3 (Aedes vittata Theobald, 1903)
Taxon SERVICE (ITIS / GNUB)http://lsid.itis.gov/urn:lsid:itis.gov:itis_tsn:126314http://lsid.itis.gov/urn:lsid:itis.gov:itis_tsn:126317http://gnub.org/8E19F1DC-74BA-47D4-A505-6498414B4CCE
BISCICOL SERVICE LOOKUP:dwc:IdentificationID1 :relatedTo http://lsid.itis.gov/urn:lsid:itis.gov:itis_tsn:126314dwc:IdentificationID1 :relatedTo dwc:OccurrenceID1dwc:IdentificationID2 :relatedTo http://lsid.itis.gov/urn:lsid:itis.gov:itis_tsn:126317dwc:IdentificationID2 :relatedTo dwc:OccurrenceID3
BISCICOL – EXAMPLE SEARCH
Working with Locations:
Tracking location in space of a
moving individual (whales)
EventID1
EventID2
EventID3
IndividualID1 GeoreferenceID1
GeoreferenceID2
GeoreferenceID3
Data Impact Factor – Graph Metrics
Occurrences
MBIO99999(1024 total descendents)
IMBL8888888(723 total descendents)
Events
Biocode10234(4234 direct children)
Expedition21234(1023 direct children)
Collectors
Gustav Paulay(102,000 direct children)
Christopher Meyer(83,000 direct children)
Craig Moritz(523 direct children)
[ ] GBIF Relations Graph[X] Moorea Biocode[X] SI MSNGR System[+] Add New Graph
Graphs
Cited occurrences over time
• New era of collections digitization• new & derived data objects created, replicated, annotated
• BiSciCol tackles preservation of nat. hist. collections challenge:• How to follow these digital objects• How to link together objects and derivatives back to specimens
• BiSciCol is about community, collaborative practice• Commitment to standards, ontologies• Agreement on permanent, resolvable identifiers• Triplification of data sources to enhance linked data
Why BiSciCol and Why SPNHC and Why Collaborations?