gbif web services for biodiversity data, for usda grin, washington dc, usa (2005)
DESCRIPTION
Presentation of GBIF and the sharing of biodiversity data with web services. USDA GRIN Beltsville Washington DC, 13th December 2005. GBIF is the Global Biodiversity Information Facility for free and open access to biodiversity data.TRANSCRIPT
The Nordic Gene Bank, NGB, Alnarp, SwedenThe Nordic Gene Bank, NGB, Alnarp, Sweden
Presentation of GBIF and sharing of biodiversity data with Web Services
December 13, 2005USDA, Beltsville
Dag Terje Filip Endresen – The Nordic Gene Bank, IPGRI
Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 2
TOPICSTOPICS
Biodiversity data
Standards Data exchange
Web Services, technology
Workflows
Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 3
Biodiversity collections dataBiodiversity collections data
Preserved reference collections, such as those in
museums and herbaria. Living collections, like
botanical and zoological gardens, aquaria, seed banks, microbial strain cultures and tissue collections.
Data collections, from surveys of objects in the field, such as observations.
These collections have most of their attributes in common, although the terminology used to describe them may differ substantially.
[http://www.bgbm.org/TDWG/CODATA/ABCD-Evolution.htm]
Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 4
TDWG - TDWG - TTaxonomic axonomic DDatabases atabases WWorking orking GGrouproup
TDWG Mission:
To provide an international forum for biological data projects
To develop and promote the use of standards
To facilitate data exchange.
The TDWG web site is hosted by The Natural History Museum in London, UK.
[http://www.tdwg.org/]
Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 5
Biodiversity informatics standards
Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 6
MCPDMCPD MMulti ulti CCrop rop PPassport assport DDescriptorsescriptors
MCPD is developed jointly by IPGRI and FAO as an international standard for germplasm passport data exchange.
The MCPD is designed to be compatible with the IPGRI crop specific descriptor lists and the FAO World Information and Early Warning System (WIEWS).
The MCPD was first released in 1997.
[http://www.ipgri.cgiar.org/publications/pdf/124.pdf]
The MCPD descriptor list is compatible with ABCD. MCPD was in fact developed with some input from TDWG (on
plant uses categories, version 1998).
Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 7
IPGRI Crop Specific IPGRI Crop Specific DescriptorsDescriptors
The IPGRI crop descriptors (as well as other networks) expand the MCPD List to meet their specific needs. As long as these additions allow for an easy conversion to the format proposed in the multi-crop passport descriptors, basic passport data can be exchanged worldwide in a consistent manner.
The International Union for the Protection of New Varieties of Plants (UPOV) maintains crop descriptors for protection of intellectual property right (since 1961).
The COMECON descriptor lists came even earlier, and was the result of a cooperation of the Eastern European Genebanks in PGR documentation (1949 –1999).
Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 8
Taxonomic Database Working Taxonomic Database Working GroupGroup
Standards development and Standards development and maintenancemaintenance
Darwin Core 2 - Element definitions designed to support the sharing and integration of primary biodiversity data". [http://darwincore.calacademy.org/]
Access to Biological Collection Data (ABCD) 2.0 - An evolving comprehensive standard for the access to and exchange of data about specimens and observations (a.k.a. primary biodiversity data)“[http://www.bgbm.org/TDWG/CODATA/Schema/]
Structure of descriptive data (SDD) 1.0
Compare SDD with PGR evaluation and characterization data.[http://wiki.cs.umb.edu/twiki/bin/view/SDD/CurrentSchemaVersion]
Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 9
Darwin Core 2 (DwC2)Darwin Core 2 (DwC2)
The Darwin Core 2 is a simple set of data element definitions designed to support the sharing and integration of primary biodiversity data.
The Darwin Core is intended to be simple simplicity reduces the barriers for data providers.
The Darwin Core is not a sufficient model or data structure for managing primary data, such as a collection database.
Darwin Core can be compared to the MCPD of the PGR community as a minimum common descriptor list.
[http://darwincore.calacademy.org]
Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 10
ABCDABCD AAccess to ccess to BBiological iological CCollection ollection DDataata
ABCD is a common data specification for data on biological specimens and observations (including plant genetic resources seed banks).
The design goal is to be both comprehensive and general (ABCD 2 has about 1200 elements).
Development of the ABCD started after the 2000 meeting of the TDWG.
ABCD was developed with support from TDWG/CODATA, ENHSIN, BioCASE, and GBIF.
GBIF accepted the ABCD schema in 2002.
The MCPD descriptor list is now completely mapped and compatible to ABCD.
[http://www.bgbm.org/TDWG/CODATA/Schema/]
Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 11
PGR sub-unit of ABCDPGR sub-unit of ABCD
PGR
Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 12
Bioinformatics concepts and Bioinformatics concepts and OntologyOntology
Ontologies are specifications of the concepts in a given field and the relationships among those concepts.
Extensible Markup Language/ Resource Description Format (XML/RDF) is one way to describe the elements.
Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 13
Biodiversity informatics
data exchange tools
Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 14
DiGIRDiGIRDiDistributed stributed GGeneric eneric IInformation nformation RRetrievaletrieval
Distributed - a protocol for retrieving structured data from multiple, heterogeneous databases across the Internet.
Generic - a protocol independent of the data retrieved and of the software to retrieve it.
The DiGIR protocol uses the Darwin Core as its data definition.
[http://digir.net][https://sourceforge.net/projects/digir]
Major contributors to DiGIR are University of Kansas Natural History Museum, the MaNIS project (University of California, Berkeley) and GBIF.
Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 15
BioCASE establish web-based unified access to biological collections in Europe while leaving control of the information with the collection holders.
ABCD is the main data definition used by BioCASE.
The PyWrapper protocol is designed to handle any schema and connect to any SQL capable database.
BioCASE provide full access to its registry for GBIF. Being a BioCASE provider thus means being a GBIF provider.
[http://www.biocase.org/]
BioCASE development is coordinated by the Botanischer Garten und Botanisches Museum Berlin-Dahlem – BGBM.
BioCASEBioCASEBioBiological logical CCollection ollection AAccess for ccess for EEuropeurope
Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 16
Protocol integration - TAPIRProtocol integration - TAPIR
There is a need to integrate the current protocols in use by different biodiversity informatics community networks.
During the TDWG meeting in Christchurch, NZ in October 2004, the presented unified protocol under development was named TAPIR. The TDWG Access Protocol for Information Retrieval. It was agreed to start testing the protocol by rewriting the data provider software of the existing BioCASE and DiGIR implementations.
The TAPIR protocol will be supported by the next generation of DiGIR and BioCASE.
[http://ww3.bgbm.org/tapir]
Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 17
BioMOBYBioMOBY
BioMOBY is an international research project on methodologies for biological data representation, distribution, and discovery.
MOBY-S is a web service based interoperability solution.
S-MOBY is a Semantic Web-based interoperability solution.
[http://www.biomoby.org/]
Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 18
Web service
technology
Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 19
Simplicity and global standardsSimplicity and global standards
Important factors behind the success of the web is simplicity and ubiquity.
A service provider with a web site can reach the global community.
3 simple methods (GET, POST, and PUT) and a simple markup language.
Web services is about expanding the Web as a platform not only to information but also to services.
Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 20
Web Service definition – W3CWeb Service definition – W3C
A Web service is a software system identified by a URI, whose public interfaces and bindings are defined and described using XML.
Its definition can be discovered by other software systems.
These systems may then interact with the Web service in a manner prescribed by its definition, using XML based messages conveyed by Internet protocols.W3C, Web Services Glossary[http://www.w3.org/TR/ws-gloss]
Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 21
Some web service keywordsSome web service keywords
Application-to-application
Platform independent
Programming language independent
Object model independent
Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 22
Some Web Service standardsSome Web Service standards
XML: All exchanged data is formatted with XML tags. The message is transmitted through a transport protocol such as SOAP or RPC. Data can be transported between applications using common protocols such as HTTP, FTP or SMTP.
WSDL: The public interface to the web service is described by Web Services Description Language (WSDL). This is an XML-based service description on how to communicate with the web service.
UDDI: The web service information is published using this protocol. It enables applications to look up web services information in order to determine whether to use them.
[http://en.wikipedia.org/wiki/Web_services]
Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 23
Example of a service callExample of a service call
All exchanged data is formatted with XML tags.
Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 24
Example of a service Example of a service responseresponse
Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 25
Message transport protocolsMessage transport protocols
* The message (XML) is transmitted through a service transport protocol such as SOAP or RPC. * And wrapped in a common internet transport protocol like HTTP, FTP, SMTP ... for transport through the internet.
Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 26
Regular SOAP messageRegular SOAP message
Information intended for the recipient is written in the body. Such as Remote Procedure Call information, XML messages, or error messages.
The header contains additional information on the SOAP message.Such as digital signature information, transaction information, and routing information.
The SOAP envelope consists of a header and a body.
Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 27
Communication protocol Communication protocol
Although SOAP does not depend on the underlying communication protocol, HTTP is usually used. Because of this, it is possible to communicate with Web services protected by firewalls.
Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 28
Data warehouse modelData warehouse model(Slide by Samy Gaiji, IPGRI)(Slide by Samy Gaiji, IPGRI)
Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 29
Decentralized modelDecentralized model(Slide by Samy Gaiji, IPGRI)(Slide by Samy Gaiji, IPGRI)
Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 30
Network data flowNetwork data flow
WorkingDatabase
OnlineDatabase
Providerwrapper software
Portal
WorkingDatabase
WorkingDatabase
The Data Provider is the web service package (wrapper) installed at the data source.
The Data Portal is a gateway to data published from the data provider nodes.
Provider
etc...
DB
User
Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 31
Combination of servicesCombination of services
Web services can be combined to create new services.
Seed bankAccessionInventory
WeatherInfo Service
GIS SpeciesOccurrences
Service
New service to
plan collecting
missions for under-
collected species to a period ofgood
weather.
Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 32
Biodiversity informatics workflow
tools
Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 33
WorkbenchWorkbench
Bioinformatics analyses often involve combining the use of databases and analysis programs which are linked in a specific order to form a workflow process.
Flow of data from one analytical step to another can be captured in a formal workflow language.
Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 34
Taverna workflowTaverna workflow
The Taverna Workbench allows users to construct complex analysis workflows from components located on both remote and local machines, run these workflows on their own data and visualize the results.
BioMOBY objects can be connected in a workflow.
[http://taverna.sourceforge.net/]
Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 35
Science Environment for Science Environment for Ecological KnowledgeEcological Knowledge
The Science Environment for Ecological Knowledge (SEEK) is a system designed to facilitate not only data acquisition and archiving, but integrating, transforming, analyzing, and synthesizing ecological and biodiversity data.
[http://seek.ecoinformatics.org/][http://kepler-project.org/]
SEESEEKK
Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 36
Kepler workflow example - Kepler workflow example - GARPGARP
Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 37
Thank you for listening!