gbif web services for biodiversity data, for usda grin, washington dc, usa (2005)

37
The Nordic Gene Bank, NGB, Alnarp, Sweden The Nordic Gene Bank, NGB, Alnarp, Sweden Presentation of GBIF and sharing of biodiversity data with Web Services December 13, 2005 USDA, Beltsville Dag Terje Filip Endresen – The Nordic Gene Bank, IPGRI

Upload: dag-endresen

Post on 17-Nov-2014

14.599 views

Category:

Technology


1 download

DESCRIPTION

Presentation of GBIF and the sharing of biodiversity data with web services. USDA GRIN Beltsville Washington DC, 13th December 2005. GBIF is the Global Biodiversity Information Facility for free and open access to biodiversity data.

TRANSCRIPT

Page 1: GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2005)

The Nordic Gene Bank, NGB, Alnarp, SwedenThe Nordic Gene Bank, NGB, Alnarp, Sweden

Presentation of GBIF and sharing of biodiversity data with Web Services

December 13, 2005USDA, Beltsville

Dag Terje Filip Endresen – The Nordic Gene Bank, IPGRI

Page 2: GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2005)

Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 2

TOPICSTOPICS

Biodiversity data

Standards Data exchange

Web Services, technology

Workflows

Page 3: GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2005)

Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 3

Biodiversity collections dataBiodiversity collections data

Preserved reference collections, such as those in

museums and herbaria. Living collections, like

botanical and zoological gardens, aquaria, seed banks, microbial strain cultures and tissue collections.

Data collections, from surveys of objects in the field, such as observations.

These collections have most of their attributes in common, although the terminology used to describe them may differ substantially.

[http://www.bgbm.org/TDWG/CODATA/ABCD-Evolution.htm]

Page 4: GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2005)

Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 4

TDWG - TDWG - TTaxonomic axonomic DDatabases atabases WWorking orking GGrouproup

TDWG Mission:

To provide an international forum for biological data projects

To develop and promote the use of standards

To facilitate data exchange.

The TDWG web site is hosted by The Natural History Museum in London, UK.

[http://www.tdwg.org/]

Page 5: GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2005)

Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 5

Biodiversity informatics standards

Page 6: GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2005)

Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 6

MCPDMCPD MMulti ulti CCrop rop PPassport assport DDescriptorsescriptors

MCPD is developed jointly by IPGRI and FAO as an international standard for germplasm passport data exchange.

The MCPD is designed to be compatible with the IPGRI crop specific descriptor lists and the FAO World Information and Early Warning System (WIEWS).

The MCPD was first released in 1997.

[http://www.ipgri.cgiar.org/publications/pdf/124.pdf]

The MCPD descriptor list is compatible with ABCD. MCPD was in fact developed with some input from TDWG (on

plant uses categories, version 1998).

Page 7: GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2005)

Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 7

IPGRI Crop Specific IPGRI Crop Specific DescriptorsDescriptors

The IPGRI crop descriptors (as well as other networks) expand the MCPD List to meet their specific needs. As long as these additions allow for an easy conversion to the format proposed in the multi-crop passport descriptors, basic passport data can be exchanged worldwide in a consistent manner.

The International Union for the Protection of New Varieties of Plants (UPOV) maintains crop descriptors for protection of intellectual property right (since 1961).

The COMECON descriptor lists came even earlier, and was the result of a cooperation of the Eastern European Genebanks in PGR documentation (1949 –1999).

Page 8: GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2005)

Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 8

Taxonomic Database Working Taxonomic Database Working GroupGroup

Standards development and Standards development and maintenancemaintenance

Darwin Core 2 - Element definitions designed to support the sharing and integration of primary biodiversity data". [http://darwincore.calacademy.org/]

Access to Biological Collection Data (ABCD) 2.0 - An evolving comprehensive standard for the access to and exchange of data about specimens and observations (a.k.a. primary biodiversity data)“[http://www.bgbm.org/TDWG/CODATA/Schema/]

Structure of descriptive data (SDD) 1.0

Compare SDD with PGR evaluation and characterization data.[http://wiki.cs.umb.edu/twiki/bin/view/SDD/CurrentSchemaVersion]

Page 9: GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2005)

Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 9

Darwin Core 2 (DwC2)Darwin Core 2 (DwC2)

The Darwin Core 2 is a simple set of data element definitions designed to support the sharing and integration of primary biodiversity data.

The Darwin Core is intended to be simple simplicity reduces the barriers for data providers.

The Darwin Core is not a sufficient model or data structure for managing primary data, such as a collection database.

Darwin Core can be compared to the MCPD of the PGR community as a minimum common descriptor list.

[http://darwincore.calacademy.org]

Page 10: GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2005)

Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 10

ABCDABCD AAccess to ccess to BBiological iological CCollection ollection DDataata

ABCD is a common data specification for data on biological specimens and observations (including plant genetic resources seed banks).

The design goal is to be both comprehensive and general (ABCD 2 has about 1200 elements).

Development of the ABCD started after the 2000 meeting of the TDWG.

ABCD was developed with support from TDWG/CODATA, ENHSIN, BioCASE, and GBIF.

GBIF accepted the ABCD schema in 2002.

The MCPD descriptor list is now completely mapped and compatible to ABCD.

[http://www.bgbm.org/TDWG/CODATA/Schema/]

Page 11: GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2005)

Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 11

PGR sub-unit of ABCDPGR sub-unit of ABCD

PGR

Page 12: GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2005)

Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 12

Bioinformatics concepts and Bioinformatics concepts and OntologyOntology

Ontologies are specifications of the concepts in a given field and the relationships among those concepts.

Extensible Markup Language/ Resource Description Format (XML/RDF) is one way to describe the elements.

Page 13: GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2005)

Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 13

Biodiversity informatics

data exchange tools

Page 14: GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2005)

Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 14

DiGIRDiGIRDiDistributed stributed GGeneric eneric IInformation nformation RRetrievaletrieval

Distributed - a protocol for retrieving structured data from multiple, heterogeneous databases across the Internet.

Generic - a protocol independent of the data retrieved and of the software to retrieve it.

The DiGIR protocol uses the Darwin Core as its data definition.

[http://digir.net][https://sourceforge.net/projects/digir]

Major contributors to DiGIR are University of Kansas Natural History Museum, the MaNIS project (University of California, Berkeley) and GBIF.

Page 15: GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2005)

Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 15

BioCASE establish web-based unified access to biological collections in Europe while leaving control of the information with the collection holders.

ABCD is the main data definition used by BioCASE.

The PyWrapper protocol is designed to handle any schema and connect to any SQL capable database.

BioCASE provide full access to its registry for GBIF. Being a BioCASE provider thus means being a GBIF provider.

[http://www.biocase.org/]

BioCASE development is coordinated by the Botanischer Garten und Botanisches Museum Berlin-Dahlem – BGBM.

BioCASEBioCASEBioBiological logical CCollection ollection AAccess for ccess for EEuropeurope

Page 16: GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2005)

Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 16

Protocol integration - TAPIRProtocol integration - TAPIR

There is a need to integrate the current protocols in use by different biodiversity informatics community networks.

During the TDWG meeting in Christchurch, NZ in October 2004, the presented unified protocol under development was named TAPIR. The TDWG Access Protocol for Information Retrieval. It was agreed to start testing the protocol by rewriting the data provider software of the existing BioCASE and DiGIR implementations.

The TAPIR protocol will be supported by the next generation of DiGIR and BioCASE.

[http://ww3.bgbm.org/tapir]

Page 17: GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2005)

Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 17

BioMOBYBioMOBY

BioMOBY is an international research project on methodologies for biological data representation, distribution, and discovery.

MOBY-S is a web service based interoperability solution.

S-MOBY is a Semantic Web-based interoperability solution.

[http://www.biomoby.org/]

Page 18: GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2005)

Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 18

Web service

technology

Page 19: GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2005)

Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 19

Simplicity and global standardsSimplicity and global standards

Important factors behind the success of the web is simplicity and ubiquity.

A service provider with a web site can reach the global community.

3 simple methods (GET, POST, and PUT) and a simple markup language.

Web services is about expanding the Web as a platform not only to information but also to services.

Page 20: GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2005)

Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 20

Web Service definition – W3CWeb Service definition – W3C

A Web service is a software system identified by a URI, whose public interfaces and bindings are defined and described using XML.

Its definition can be discovered by other software systems.

These systems may then interact with the Web service in a manner prescribed by its definition, using XML based messages conveyed by Internet protocols.W3C, Web Services Glossary[http://www.w3.org/TR/ws-gloss]

Page 21: GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2005)

Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 21

Some web service keywordsSome web service keywords

Application-to-application

Platform independent

Programming language independent

Object model independent

Page 22: GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2005)

Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 22

Some Web Service standardsSome Web Service standards

XML: All exchanged data is formatted with XML tags. The message is transmitted through a transport protocol such as SOAP or RPC. Data can be transported between applications using common protocols such as HTTP, FTP or SMTP.

WSDL: The public interface to the web service is described by Web Services Description Language (WSDL). This is an XML-based service description on how to communicate with the web service.

UDDI: The web service information is published using this protocol. It enables applications to look up web services information in order to determine whether to use them.

[http://en.wikipedia.org/wiki/Web_services]

Page 23: GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2005)

Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 23

Example of a service callExample of a service call

All exchanged data is formatted with XML tags.

Page 24: GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2005)

Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 24

Example of a service Example of a service responseresponse

Page 25: GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2005)

Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 25

Message transport protocolsMessage transport protocols

* The message (XML) is transmitted through a service transport protocol such as SOAP or RPC. * And wrapped in a common internet transport protocol like HTTP, FTP, SMTP ... for transport through the internet.

Page 26: GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2005)

Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 26

Regular SOAP messageRegular SOAP message

Information intended for the recipient is written in the body. Such as Remote Procedure Call information, XML messages, or error messages.

The header contains additional information on the SOAP message.Such as digital signature information, transaction information, and routing information.

The SOAP envelope consists of a header and a body.

Page 27: GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2005)

Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 27

Communication protocol Communication protocol

Although SOAP does not depend on the underlying communication protocol, HTTP is usually used. Because of this, it is possible to communicate with Web services protected by firewalls.

Page 28: GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2005)

Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 28

Data warehouse modelData warehouse model(Slide by Samy Gaiji, IPGRI)(Slide by Samy Gaiji, IPGRI)

Page 29: GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2005)

Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 29

Decentralized modelDecentralized model(Slide by Samy Gaiji, IPGRI)(Slide by Samy Gaiji, IPGRI)

Page 30: GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2005)

Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 30

Network data flowNetwork data flow

WorkingDatabase

OnlineDatabase

Providerwrapper software

Portal

WorkingDatabase

WorkingDatabase

The Data Provider is the web service package (wrapper) installed at the data source.

The Data Portal is a gateway to data published from the data provider nodes.

Provider

etc...

DB

User

Page 31: GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2005)

Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 31

Combination of servicesCombination of services

Web services can be combined to create new services.

Seed bankAccessionInventory

WeatherInfo Service

GIS SpeciesOccurrences

Service

New service to

plan collecting

missions for under-

collected species to a period ofgood

weather.

Page 32: GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2005)

Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 32

Biodiversity informatics workflow

tools

Page 33: GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2005)

Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 33

WorkbenchWorkbench

Bioinformatics analyses often involve combining the use of databases and analysis programs which are linked in a specific order to form a workflow process.

Flow of data from one analytical step to another can be captured in a formal workflow language.

Page 34: GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2005)

Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 34

Taverna workflowTaverna workflow

The Taverna Workbench allows users to construct complex analysis workflows from components located on both remote and local machines, run these workflows on their own data and visualize the results.

BioMOBY objects can be connected in a workflow.

[http://taverna.sourceforge.net/]

Page 35: GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2005)

Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 35

Science Environment for Science Environment for Ecological KnowledgeEcological Knowledge

The Science Environment for Ecological Knowledge (SEEK) is a system designed to facilitate not only data acquisition and archiving, but integrating, transforming, analyzing, and synthesizing ecological and biodiversity data.

[http://seek.ecoinformatics.org/][http://kepler-project.org/]

SEESEEKK

Page 36: GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2005)

Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 36

Kepler workflow example - Kepler workflow example - GARPGARP

Page 37: GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2005)

Sharing of biodiversity data, December 13, 2005, USDA, BeltsvilleSharing of biodiversity data, December 13, 2005, USDA, Beltsville 37

Thank you for listening!