biocode field information management system (biocode fims ... · biocode field information...

1
Biocode Field Information Management System (Biocode FIMS): Connecting Field Data to the Laboratory and the World John Deck, Information Services and Technology/Berkeley Natural History Museums Neil Davies, Gump South Pacific Research Station Capturing critical data elements at the source The Biocode Field Information Management Systems (Biocode FIMS) takes spreadsheet data that is generated in the field and validates it, aligns it with global metadata standards, assigns unique identifiers, and publishes a private or public version that can be referenced by other applications such as collections management systems, Genbank, and data harvesters, but especially featuring Laboratory Information Management System Integration. Biocode FIMS is designed by building on keys to good data linking: persistent identifiers and alignment with standardized vocabularies and ontologies. More Information Information for interested users and developers is at: http://code.google.com/p/biocode-fims Field Data Collection Collecting terrestrial invertebrates as part of the Moorea Biocode Project Insect specimen KEY: subclass of has specified output has specified input instance of derives from BCO:material sampling process BCO:identificatio n process BCO:material sample OBI:sequencing assay OBI:sequence data Genbank sequence B TaxonID A TaxonID B Tissue sampling DNA extraction Identification using key Identification using BLAST Sequencing Biocode Sampling Tissue sample DNA molecules BCO:taxonomic name rdfs:Class Alignment with standardized vocabularies and ontologies Biocode FIMS links spreadsheet fields to standardized vocabularies such as the Darwin Core (DwC) to describe events and specimens and the Minimum Information of any type of Sequence (MIxS) to describe genomic data. We are also working with the OBO Foundry and the Ontology for Biomedical Investigations (OBI) to describe logical relationships of sample-based biological data in a new project called the Biological Collections Ontology (BCO) (https://code.google.com/p/bco/). A diagram showing how information is classified using the Biological Collections Ontology Spreadsheet Templates Identifier Keys by Project Validation Convert to RDF Triples Map Spreadsheet to Standards Upload Query sets of spreadsheets (graphs) Inferencing Setup Data submission Query Return Data Biocode FIMS Design The following chart shows how information is generated and organized logically in the Biocode FIMS database. Persistent Identifiers for Samples Assigning persistent identifiers for samples as they are isolated from nature or sub-sampled from other material is a critical component of the Biocode FIMS. As these events usually happen in the field, we need an identifier solution that works in the field while also ensuring that the identifier itself can resolve for years to come. To handle this challenge, we have worked together with the California Digital Library to develop an identifier solution based on the EZID (http://n2t.net/ezid) solution called Biocode Commons Identifiers (http://biscicol.org/bcid/). These identifiers look a lot like digital object identifiers (DOIs) but are built on the archival resource key (ARK) model: http://n2t.net/ark:/21547/R2 Technical Details Uses an XML configuration file to define validation rules for spreadsheets, how fields are logically related, and project codes to aid in assigning identifiers. Stores spreadsheet data in a Fuseki TDB triplestore. REST Service Framework integration with Biocode Commons Identifiers UI Available as a command-line tool and a Geneious Plugin. Coded in Java Code is open source and available under the Berkeley Standard Distribution license at http://code.google.com/p/biocode-fims Laboratory Information Management System Integration Biocode FIMS is partnering with Biomatters, makers of the Geneious software for analyzing field samples in the laboratory using sequencing technologies. Integration of Biocode FIMS data and tools is via a customized Geneious plugin. National Science Foundation Support from: Collaborative Research: BiSciCol Tracker: Towards a tagging and tracking infrastructure for biodiversity science collections (DBI-0956426); Research Coordination Network for the Genomic Standards Consortium (DBI-0840989); The National Evolutionary Synthesis Center (NESCent), NSF #EF- 0905606 Developed in conjunction with faculty and staff affiliated with the Berkeley Natural History Museums, UC Berkeley Development of the first version of Biocode FIMS supported by the Gordon and Betty Moore Foundation John Deck is a programmer affiliated with Information Services and Technology and Berkeley Natural History Museums. Contact is [email protected] Neil Davies is executive director of the UC Berkeley Gump Station, in Moorea, French Polynesia. Contact is [email protected]

Upload: others

Post on 05-Feb-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Biocode Field Information Management System (Biocode FIMS ... · Biocode Field Information Management System (Biocode FIMS): Connecting Field Data to the Laboratory and the World

Biocode Field Information Management System (Biocode FIMS): Connecting Field Data to the Laboratory and the World

John Deck, Information Services and Technology/Berkeley Natural History Museums Neil Davies, Gump South Pacific Research Station

Capturing critical data elements at the source The Biocode Field Information Management Systems (Biocode FIMS) takes spreadsheet data that is generated in the field and validates it, aligns it with global metadata standards, assigns unique identifiers, and publishes a private or public version that can be referenced by other applications such as collections management systems, Genbank, and data harvesters, but especially featuring Laboratory Information Management System Integration. Biocode FIMS is designed by building on keys to good data linking: persistent identifiers and alignment with standardized vocabularies and ontologies.

More Information Information for interested users and developers is at: http://code.google.com/p/biocode-fims

Field Data Collection

Collecting terrestrial invertebrates as part of the Moorea Biocode Project

Insect

specimen

KEY:

subclass of

has specified output

has specified input

instance of derives from

BCO:material

sampling process BCO:identificatio

n process

BCO:material

sample

OBI:sequencing

assay

OBI:sequence

data

Genbank

sequence B

TaxonID A

TaxonID B

Tissue

sampling

DNA

extraction

Identification

using key

Identification

using BLAST

Sequencing

Biocode

Sampling

Tissue

sample

DNA

molecules

BCO:taxonomic

name

rdfs:Class

Alignment with standardized vocabularies and ontologies Biocode FIMS links spreadsheet fields to standardized vocabularies such as the Darwin Core (DwC) to describe events and specimens and the Minimum Information of any type of Sequence (MIxS) to describe genomic data. We are also working with the OBO Foundry and the Ontology for Biomedical Investigations (OBI) to describe logical relationships of sample-based biological data in a new project called the Biological Collections Ontology (BCO) (https://code.google.com/p/bco/).

A diagram showing how information is classified using the Biological Collections Ontology

Spreadsheet Templates

Identifier Keys by Project

Validation Convert to RDF

Triples

Map Spreadsheet to

Standards

Upload

Query sets of spreadsheets

(graphs) Inferencing

Setup

Data submission

Query Return Data

Biocode FIMS Design The following chart shows how information is generated and organized logically in the Biocode FIMS database.

Persistent Identifiers for Samples Assigning persistent identifiers for samples as they are isolated from nature or sub-sampled from other material is a critical component of the Biocode FIMS. As these events usually happen in the field, we need an identifier solution that works in the field while also ensuring that the identifier itself can resolve for years to come. To handle this challenge, we have worked together with the California Digital Library to develop an identifier solution based on the EZID (http://n2t.net/ezid) solution called Biocode Commons Identifiers (http://biscicol.org/bcid/). These identifiers look a lot like digital object identifiers (DOIs) but are built on the archival resource key (ARK) model: http://n2t.net/ark:/21547/R2 Technical Details

• Uses an XML configuration file to define validation rules for

spreadsheets, how fields are logically related, and project codes to aid in assigning identifiers.

• Stores spreadsheet data in a Fuseki TDB triplestore. • REST Service Framework integration with Biocode Commons

Identifiers • UI Available as a command-line tool and a Geneious Plugin. • Coded in Java • Code is open source and available under the Berkeley

Standard Distribution license at http://code.google.com/p/biocode-fims

Laboratory Information Management System Integration Biocode FIMS is partnering with Biomatters, makers of the Geneious software for analyzing field samples in the laboratory using sequencing technologies. Integration of Biocode FIMS data and tools is via a customized Geneious plugin.

National Science Foundation Support from: Collaborative Research: BiSciCol Tracker: Towards a tagging and tracking infrastructure for biodiversity science collections (DBI-0956426); Research Coordination Network for the Genomic Standards Consortium (DBI-0840989); The National Evolutionary Synthesis Center (NESCent), NSF #EF-0905606

Developed in conjunction with faculty and staff affiliated with the Berkeley Natural History Museums, UC Berkeley

Development of the first version of Biocode FIMS supported by the Gordon and Betty Moore Foundation

John Deck is a programmer affiliated with Information Services and Technology and Berkeley Natural History Museums. Contact is [email protected]

Neil Davies is executive director of the UC Berkeley Gump Station, in Moorea, French Polynesia. Contact is [email protected]