laura russell ([email protected]) programmer vertnet buenos aires (argentina) 28 september 2011...

25
Laura Russell ([email protected]) Programmer VertNet Buenos Aires (Argentina) 28 September 2011 Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How to publish (data set) meta data

Upload: alexia-hutchinson

Post on 17-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Laura Russell (larussell@vertnet.org) Programmer VertNet Buenos Aires (Argentina) 28 September 2011 Training course on biodiversity data publishing and

Laura Russell ([email protected])ProgrammerVertNet

Buenos Aires (Argentina)28 September 2011

Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition

How to publish (data set) meta data

Page 2: Laura Russell (larussell@vertnet.org) Programmer VertNet Buenos Aires (Argentina) 28 September 2011 Training course on biodiversity data publishing and

Data publishing process

Page 3: Laura Russell (larussell@vertnet.org) Programmer VertNet Buenos Aires (Argentina) 28 September 2011 Training course on biodiversity data publishing and

Outline

- Why metadata?

- The GBIF EML profile

- Metadata standards

- Preparation of metadata

- Where does the metadata go?

- Preparing metadata (examples)

Page 4: Laura Russell (larussell@vertnet.org) Programmer VertNet Buenos Aires (Argentina) 28 September 2011 Training course on biodiversity data publishing and

Outline

- Why metadata?

- The GBIF EML profile

- Metadata standards

- Preparation of metadata

- Where does the metadata go?

- Preparing metadata (examples)

Page 5: Laura Russell (larussell@vertnet.org) Programmer VertNet Buenos Aires (Argentina) 28 September 2011 Training course on biodiversity data publishing and

”Data Intensive Science”

”Fourth Science Paradigm”

e-Infrastructure Reflection Group (European Strategy Forum on Research Infrastructures). Report on Data Management, November 2009. http://www.e-irg.eu/images/stories/publ/task_force_reports/dmtfjointreport.pdf

”Digital Data Deluge”

The Fourth Paradigm: Data-Intensive Scientific Discovery http://research.microsoft.com/en-us/collaboration/fourthparadigm/contents.aspx

high quality metadata for long-term curation and use of data sets

Key requirement:

Why metadata?

Page 6: Laura Russell (larussell@vertnet.org) Programmer VertNet Buenos Aires (Argentina) 28 September 2011 Training course on biodiversity data publishing and

Why metadata?

William K. Michener, Meta-information concepts for ecological data management, Ecological Informatics, Volume 1, Issue 1, January 2006, Pages 3-7, ISSN 1574-9541, DOI: 10.1016/j.ecoinf.2005.08.004.(http://www.sciencedirect.com/science/article/B7W63-4HJRS57-3/2/ea2e08412c6776456f540e66983546c0)

Information about data sets deteriorates over time!

Page 7: Laura Russell (larussell@vertnet.org) Programmer VertNet Buenos Aires (Argentina) 28 September 2011 Training course on biodiversity data publishing and

Why metadata?

Metadata supports:- Discovery- Interpretation/Evaluation

- Provenance- Quality- Fitness-for-use

- Analytical re-use

Page 8: Laura Russell (larussell@vertnet.org) Programmer VertNet Buenos Aires (Argentina) 28 September 2011 Training course on biodiversity data publishing and

Outline

- Why metadata?

- The GBIF EML profile

- Metadata standards

- Preparation of metadata

- Where does the metadata go?

- Preparing metadata (examples)

Page 9: Laura Russell (larussell@vertnet.org) Programmer VertNet Buenos Aires (Argentina) 28 September 2011 Training course on biodiversity data publishing and

Metadata Standards

Ecological Metadata Language (EML) v2.1.1http://knb.ecoinformatics.org/software/eml/

Dublin Core http://dublincore.org/documents/dcmi-terms/

Directory Interchange Format (DIF)http://gcmd.nasa.gov/User/difguide/difman.html

ISO 19115/19139 Geographic MetadataISO 19115: http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=26020ISO 19139: http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=32557

Page 10: Laura Russell (larussell@vertnet.org) Programmer VertNet Buenos Aires (Argentina) 28 September 2011 Training course on biodiversity data publishing and

Metadata Standards

Natural Collections Descriptions (NCD)http://www.tdwg.org/standards/312/

Federal Geographic Data Committee (FGDC) Biological Profile* http://www.fgdc.gov/standards/projects/FGDC-standards-projects/metadata/biometadata/

*An extension of the FGDC CSDGM (Content Standard for Digital Geospatial Metadata)

Multimedia Resources Metadata Schema http://www.tdwg.org/charters/article/view/448/36

Page 11: Laura Russell (larussell@vertnet.org) Programmer VertNet Buenos Aires (Argentina) 28 September 2011 Training course on biodiversity data publishing and

ISO 19115/19139North American Profile of ISO 19139http://www.fgdc.gov/standards/projects/incits-l1-standards-projects/NAP-Metadata/napMetadataProfileV101.pdf/view

Several Resources available for crosswalk; transform; view

EML to FGDC Biological Profilehttps://code.ecoinformatics.org/code/eml/trunk/lib/eml2tonbii/

# FGDC CSDGM to ISO Transform# FGDC CSDGM to ISO Crosswalk# ISO XML to HTML View: # FGDC BIO to ISO Transform# FGDC BIO to ISO Crosswalkhttp://www.ncddc.noaa.gov/technology/metadataandxml/view

FGDC CSDGMISO 19139

EML to ISO 19139http://code.google.com/p/gbif-metadata/source/browse/trunk/metadata/src/main/resources/eml2iso19139.xsl

Open source INSPIRE-compliant MD editor (multilingual functionality)http://www.inspire-geoportal.eu/EUOSME/

Page 12: Laura Russell (larussell@vertnet.org) Programmer VertNet Buenos Aires (Argentina) 28 September 2011 Training course on biodiversity data publishing and

Metadata and Languages

A Multilingual Metadata Catalog for the ILTER: Issues and Approaches. Vanderbilt, K.L., et al., Ecological Informatics, Volume 5, Issue 3, May 2010, Pages 187-193, doi:10.1016/j.ecoinf.2010.02.002

Adopt a lingua franca, e.g., English- data publishers provide discovery level metadata in English; - full metadata in local language.

Just use local language with keywords frommultilingual thesauri, e.g., GEMET, AGROVOC - GEMET, the GEneral Multilingual Environmental Thesaurus; 27

languages. http://www.eionet.europa.eu/gemet/- AGROVOC; agriculture, forestry, fisheries, food and related

domains; 20 languages. http://www4.fao.org/agrovoc/default.htm

Long term solution: multilingual ontologies

Issues? - additional burden; tools, metadata standards

Page 13: Laura Russell (larussell@vertnet.org) Programmer VertNet Buenos Aires (Argentina) 28 September 2011 Training course on biodiversity data publishing and

Outline

- Why metadata?

- The GBIF EML profile

- Metadata standards

- Preparation of metadata

- Where does the metadata go?

- Preparing metadata (examples)

Page 14: Laura Russell (larussell@vertnet.org) Programmer VertNet Buenos Aires (Argentina) 28 September 2011 Training course on biodiversity data publishing and

GBIF EML Profile- Requirements gathering

- GBIF Metadata Task Grouphttp://www2.gbif.org/GBIF-MIFTG-Report.pdf

- EML; ISO 19115; NCD; INSPIRE Directivehttp://community.gbif.org/mod/file/download.php?file_guid=10915;

http://community.gbif.org/mod/file/download.php?file_guid=5656

http://rs.gbif.org/schema/eml-gbif-profile/1.0/eml-gbif-profile.xsd

- GBIF EML schema

http://community.gbif.org/pg/groups/5258/gbif-metadata-network/

- GBIF community site: metadata network

- GBIF profile documentationhttp://links.gbif.org/gbif_metadata_profile_how-to_en_v1http://links.gbif.org/gbif_metadata_profile_guide_en_v1

Page 15: Laura Russell (larussell@vertnet.org) Programmer VertNet Buenos Aires (Argentina) 28 September 2011 Training course on biodiversity data publishing and

Outline

- Why metadata?

- The GBIF EML profile

- Metadata standards

- Preparation of metadata

- Where does the metadata go?

- Preparing metadata (examples)

Page 16: Laura Russell (larussell@vertnet.org) Programmer VertNet Buenos Aires (Argentina) 28 September 2011 Training course on biodiversity data publishing and

Preparing metadata

- Metadata editorse.g., IPT; Spreadsheet template; Morpho; EUOSME

- Scripting- Output directly from existing metadata database- Transform from another metadata specification to EML

- Editing XML directly- Validation essential

Page 17: Laura Russell (larussell@vertnet.org) Programmer VertNet Buenos Aires (Argentina) 28 September 2011 Training course on biodiversity data publishing and

Outline

- Why metadata?

- The GBIF EML profile

- Metadata standards

- Preparation of metadata

- Where does the metadata go?

- Preparing metadata (examples)

Page 18: Laura Russell (larussell@vertnet.org) Programmer VertNet Buenos Aires (Argentina) 28 September 2011 Training course on biodiversity data publishing and

Where does the metadata go?

http://metadata.gbif.org

Page 19: Laura Russell (larussell@vertnet.org) Programmer VertNet Buenos Aires (Argentina) 28 September 2011 Training course on biodiversity data publishing and

Sources of MetadataGBIF Data Cache

- Registered IPT installations- National/regional/organisation level catalogues- Thematic catalogues, e.g., OBIS

GBIF approach:

-no imposed metadata standard or preferred catalogue

implementation for participants;

-avoidance of lossy conversions in submitting metadata

GBIF Participants

External networkse.g., Knowledge Network for Biocomplexity (KNB)

Page 20: Laura Russell (larussell@vertnet.org) Programmer VertNet Buenos Aires (Argentina) 28 September 2011 Training course on biodiversity data publishing and

GBIF metadata architecture

GBIFCatalogue

GBIFRegistry

EuroGEOSSCatalogue

Cataloguee.g., GBIF

Node

IPTInstance

Catalogue e.g.,KNB GBIF

Data Cache

OAI-PMH

Direct payload

GBIF metadata catalogue specification: http://links.gbif.org/gbif_metadata_catalogue_specification.pdf

Page 21: Laura Russell (larussell@vertnet.org) Programmer VertNet Buenos Aires (Argentina) 28 September 2011 Training course on biodiversity data publishing and

OAI-PMH

Open Archives Initiative Protocol for Metadata Harvesting

Providing a low-barrier mechanism for interoperability across distributed metadata repositories

Data providers expose metadata

Service providers consume metadata through a client application known as a harvester that issues OAI-PMH service requests over HTTP.

http://www.openarchives.org/pmh/

GBIF: role as harvester and provider

Page 22: Laura Russell (larussell@vertnet.org) Programmer VertNet Buenos Aires (Argentina) 28 September 2011 Training course on biodiversity data publishing and

Outline

- Why metadata?

- The GBIF EML profile

- Metadata standards

- Preparation of metadata

- Where does the metadata go?

- Preparing metadata (examples)

Page 23: Laura Russell (larussell@vertnet.org) Programmer VertNet Buenos Aires (Argentina) 28 September 2011 Training course on biodiversity data publishing and

Spreadsheet processor

1. Download the spreadsheet from the site http://tools.gbif.org/spreadsheet-processor/templates/metadata/metadata-1_v1.xls

2. Complete the spreadsheet

3. Transform it as a GBIF metadata profile file by using the spreadsheet processor http://tools.gbif.org/spreadsheet-processor/

Note: the processor doesn’t publish a file to GBIF, it provides a publication-ready file.

Page 24: Laura Russell (larussell@vertnet.org) Programmer VertNet Buenos Aires (Argentina) 28 September 2011 Training course on biodiversity data publishing and

IPT metadata editor1. Create a new resource in the IPT

1. Complete the metadata- Dataset (Resource)- Project- People and Organisations- Keyword Set (General Keywords)- Coverage- Taxonomic Coverage- Geographic Coverage- Temporal Coverage- Intellectual Property Rights- Methods- Additional Metadata and Natural Collections Descriptions Data

2. Publish it- Metadata for published and unpublished data sets- Output as part of DwC-A zip file (EML.xml)

Page 25: Laura Russell (larussell@vertnet.org) Programmer VertNet Buenos Aires (Argentina) 28 September 2011 Training course on biodiversity data publishing and

Presenter (email)RoleOrganization

Buenos Aires (Argentina)28 September 2011

Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition

How to publish (data set) meta data