eurisco and gbif ipt, at the vavilov institute in st petersburg (27 april 2010)

23
Web service demo for EURISCO GBIF Tools and Darwin Core extension for germplasm N.I. Vavilov Research Institute of Plant Industry (VIR), April 26 th – 29 th 2010, St Petersburg, Russian Federation Dag Endresen, Jonas Nordling, Nordic Genetic Resources Center (NordGen)

Upload: dag-endresen

Post on 10-May-2015

816 views

Category:

Technology


0 download

DESCRIPTION

Visit to the NI Vavilov Institute for Plant Industry (VIR) in April 2010. Installation of the GBIF IPT toolkit for data publishing as a test upgrade for the EURISCO data infrastructure of European genebanks.

TRANSCRIPT

Page 1: EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)

Web service demo for EURISCOGBIF Tools and Darwin Core extension for germplasm

N.I. Vavilov Research Institute of Plant Industry (VIR), April 26th – 29th 2010, St Petersburg, Russian FederationDag Endresen, Jonas Nordling, Nordic Genetic Resources Center (NordGen)

Page 2: EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)

Topics for this session

Web service installations for EURISCO Overview of the current project Darwin Core and the extension for

germplasm GBIF informatics tools Integrated Publishing Toolkit (IPT) Distributed datasets

2

Page 3: EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)

Possible Upgraded PGR Network Model

3

The gene bank dataset is shared from the holding gene bank.

The National Inventory (NI) endorse all national gene banks (and eventually individual accessions) for EURISCO.

ECPGR Crop databases can access passport data from EURISCO and additional crop specific data from the genebank IPT interface.

Standard data sharing tools ensure that the genebank dataset is available to other relevant decentralized thematic, regional or global networks.

Page 4: EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)

Objectives of the EURISCO demo project

Evaluate the GBIF decentralized architecture

Install the IPT installation for 8 genebanks in Europe that, as far as possible, are also EURISCO/ECPGR partners.

Test the registration of IPT installation through the GBIF registryGlobal Biodiversity Resources Discovery System (GBRDS).

Test the Harvesting and Indexing Toolkit (HIT) installation for the EURISCO platform (Bioversity HQ, Rome).

Project runs until 20 December 2010.

4

Page 5: EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)

EURISCO NordGen (Nordic) Bioversity-Montpellier

(France) IPK Gatersleben (Germany) BLE (Germany) WUR CGN (The

Netherlands) CRI (Czech Republic) VIR (Russian Federation) SeedNET (Balkan) Baltic (Estonia, Latvia,

Lithuania)

2010 : IPT installations for EURISCO

5

Page 6: EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)

2005 : BioCASE demo

http://chm.grinfo.net/6

Page 7: EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)

Using GBIF technology (and contributing to its development), the PGR community can easily establish specific PGR networks without duplicating GBIF's work.

The compatibility of data standards between PGR and biodiversity collections made it possible to integrate the worldwide germplasm collections into the biodiversity community (TDWG, GBIF).

Potential of the GBIF technology

http://data.gbif.org/datasets/network/2

7

Page 8: EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)

Darwin CoreThe purpose of DwC terms is to facilitate data sharing • a well-defined standard core vocabulary

• a flexible framework to maximize re-usability

The Darwin Core can be extended by adding new terms to share additional information.

Approved as TDWG standard 2009

“The Darwin Core is primarily based on taxa, their occurrence in nature as documented by observations, specimens, and samples, and related information.”

http://rs.tdwg.org/dwc/

8

Page 9: EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)

http://code.google.com/p/darwincore-germplasm

http://rs.nordgen.org/dwc

DwC extension for germplasm

DwC Germplasm : DRAFT 0.1 : August 26, 2009

• “MCPD in Darwin Core”

• Maintained by gene banks worldwide

• Additional terms to describe germplasm samples

• Includes the new terms for crop trait experiments developed as part of the European EPGRIS3 project

• Includes a few additional terms for new international crop treaty regulations

9

Page 10: EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)

Mapping of DwC-G terms to the MCPD descriptors

(EURISCO data exchange format)

10

Page 11: EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)

Mapping of DwC-G terms to the MCPD descriptors (continued)

11

Page 12: EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)

MCPD -> ABCD 2.06 (2004) for BioCASE

National Inventory CodeInstitute CodeAccession NumberCollecting NumberCollecting Institute CodeGenusSpeciesSpecies Authority„Subtaxa“„Subtaxa“ AuthorityCommon Crop NameAccession NameAcquisition Date

Country of OriginLocation of Collection SiteLatitude of CSLongitude of CSElevation of CSCollecting Date of SampleBreeding Institute CodeBiological Status of

AccessionAncestral DataCollecting/Acquisition

Source

Donor Institute CodeDonor Accession NumberOther Identification (Number)

associated with the accessionLocation of Safety DuplicatesType of Germplasm StorageRemarksDecoded Collecting InstituteDecoded Breeding InstituteDecoded Donor InstituteDecoded Safety Duplication

LocationAccession URL

Helmut KnüpfferIPK Gatersleben

Walter BerendsohnBGBM

http://www.ecpgr.cgiar.org/epgris/Tech_papers/EURISCO_Descriptors.pdf

12

Page 13: EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)

GBIF Informatics Suite

GBIF tools to empower decentralized thematic or regional networks

Darwin Core extension for germplasm makes these tools usable for crop gene banks.

13

Page 14: EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)

A tool for data publishers.

A simple mechanism to share primary biodiversity data following the Darwin Core standard.

Open source, Java based web application.

Provides a local tool for data quality assessment, etc.

Integrated Publishing Toolkit (IPT)

14

Page 15: EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)

• Embeds its own database

• Multilingual

• Has a user management feature based on roles, which allows for multiple data managers to share a common instance

• Manages multiple data sources

• Several upload options: relational database management systems or data files

• Public web interface allows for data browsing and full text search

• Customised detail pages

15

Page 16: EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)

The IPT user interface includes

the germplasm extension

16

Page 17: EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)

XML interface includes thegermplasm extension

17

Page 18: EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)

VIR (RUS001)Passport data

VIR (RUS001)Crop departments

Global Crop Registries

European EURISCO Catalog

European ECPGR Crop Databases

18

Page 19: EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)

VIR (RUS001)Passport data

VIR Crop dataset

Global Crop Registries

EURISCO

ECPGR Crop Databases

Same dataset available from multiple information systems...

?!

19

Page 20: EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)

VIR (RUS001)Passport data

VIR Crop dataset

Global Crop Registries

EURISCO

ECPGR Crop Databases

Resolvable persistent identifiers can direct the user to the publisher of the primary dataset (official original dataset)

20

Page 21: EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)

• The Persistent Identifier (PI) is a digital name tag– Also called Global Unique Identifiers (GUID)– Life Science Identifiers (LSID) is one example– Digital Object Identifier (doi) is another example

• The Persistent Identifier concept for to naming and identification of data resources stored in multiple, distributed data stores.

• Effective identification of data objects is essential for linking the world’s biodiversity data.

Persistent Identifier

21

Page 22: EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)

Global crop collections

Moving towards… global integration of information

Threatened species

Migratory species

Spatial data

Global crop system

Crop standards

Legislation and regulations etc.

Crop collections in Europe

Genebank datasets

22

Page 23: EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)

• GBIF, Global Biodiversity Information Facility http://www.gbif.org

• TDWG, Biodiversity Information Standards http://www.tdwg.org

• Bioversity International http://www.bioversityinternational.org

Things can happen in a band, or any type of collaboration, that would not otherwise happen. (Jim Coleman, Musician)

Special thanks to: