edge: the multi-metadata standards platform

15
National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California EDGE: The Multi-Metadata Standards Platform Thomas Huang and Edward Armstrong PO.DAAC/JPL 2014 ESIP Summer Meeting, Copper Mountain, CO

Upload: makara

Post on 15-Feb-2016

32 views

Category:

Documents


0 download

DESCRIPTION

EDGE: The Multi-Metadata Standards Platform. Thomas Huang and Edward Armstrong PO.DAAC/JPL. NASA PO.DAAC. http:// podaac.jpl.nasa.gov. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: EDGE: The Multi-Metadata Standards Platform

National Aeronautics and Space Administration

Jet Propulsion LaboratoryCalifornia Institute of TechnologyPasadena, California

EDGE: The Multi-Metadata Standards Platform

Thomas Huang and Edward Armstrong

PO.DAAC/JPL

2014 ESIP Summer Meeting, Copper Mountain, CO

Page 2: EDGE: The Multi-Metadata Standards Platform

National Aeronautics and Space Administration

Jet Propulsion LaboratoryCalifornia Institute of TechnologyPasadena, California

2014 ESIP Summer Meeting, Copper Mountain, CO 2

NASA PO.DAAC

• The NASA Physical Oceanographic Distributed Active Archive Center (PO.DAAC) at Jet Propulsion Laboratory is an element of the Earth Observing System Data and Information System (EOSDIS). The EOSDIS provides science data to a wide communities of user for NASA’s Science Mission Directorate.

• Archives and distributes data relevant to the physical state of the ocean

• The mission of the PO.DAAC is to preserve NASA’s ocean and climate data and make these universally accessible and meaningful.

thuang, JPL

http://podaac.jpl.nasa.gov

Page 3: EDGE: The Multi-Metadata Standards Platform

National Aeronautics and Space Administration

Jet Propulsion LaboratoryCalifornia Institute of TechnologyPasadena, California

2014 ESIP Summer Meeting, Copper Mountain, CO 3

Our Users Need Our Help

• Discover/Identify the relevant data• Deliver information

(metadata) that our user communities can understand• What to package?• How to package?

• Retrieve the relevant data

• Use and Understand the data content

thuang, JPL

PO.DAAC User Communities

Page 4: EDGE: The Multi-Metadata Standards Platform

National Aeronautics and Space Administration

Jet Propulsion LaboratoryCalifornia Institute of TechnologyPasadena, California

2014 ESIP Summer Meeting, Copper Mountain, CO 4

EDGE Architecture

• EDGE: Extensible Data Gateway Environment

• The brain behind PO.DAAC’s web portal and its Consolidated Web Service platform

• An architecture for metadata aggregation and translation

thuang, JPL

Page 5: EDGE: The Multi-Metadata Standards Platform

National Aeronautics and Space Administration

Jet Propulsion LaboratoryCalifornia Institute of TechnologyPasadena, California

2014 ESIP Summer Meeting, Copper Mountain, CO 5

METADATA TRANSLATION ARCHITECTUREEDGE

thuang, JPL

Metadata Standard Templates for Domain-

Specific Mappings

Page 6: EDGE: The Multi-Metadata Standards Platform

National Aeronautics and Space Administration

Jet Propulsion LaboratoryCalifornia Institute of TechnologyPasadena, California

2014 ESIP Summer Meeting, Copper Mountain, CO 6

PO.DAAC Web Portal and Datacasting

thuang, JPL

Page 7: EDGE: The Multi-Metadata Standards Platform

National Aeronautics and Space Administration

Jet Propulsion LaboratoryCalifornia Institute of TechnologyPasadena, California

2014 ESIP Summer Meeting, Copper Mountain, CO 7

OpenSearch

thuang, JPL

% curl -X GET \? "http://podaac.jpl.nasa.gov/ws/search/dataset/?format=rss&keyword=ocean"

Terminal

• ESIP Discovery Specification

• RSS and Atom

• Dataset and Granule searches

Page 8: EDGE: The Multi-Metadata Standards Platform

National Aeronautics and Space Administration

Jet Propulsion LaboratoryCalifornia Institute of TechnologyPasadena, California

2014 ESIP Summer Meeting, Copper Mountain, CO 8

Metadata Service

thuang, JPL

% curl -X GET \? "http://podaac.jpl.nasa.gov/ws/metadata/dataset/?format=iso&shortName=OSDPD-L2P-MSG02"

Terminal

• Common URL query to request dataset and granule metadata in various standards

• Formats supported• iso – GHRSST GDS 2.0• gcmd (dataset only) – Global

Climate Change Directory• fgdc – Federal Geographic

Data Committee• Datacasting

Page 9: EDGE: The Multi-Metadata Standards Platform

National Aeronautics and Space Administration

Jet Propulsion LaboratoryCalifornia Institute of TechnologyPasadena, California

2014 ESIP Summer Meeting, Copper Mountain, CO 9

ISO Metadata Model

• ISO 19115-2 metadata model for GHRSST GDS2 data sets – Utilizing MI_Metadata

thuang, JPL

Page 10: EDGE: The Multi-Metadata Standards Platform

National Aeronautics and Space Administration

Jet Propulsion LaboratoryCalifornia Institute of TechnologyPasadena, California

2014 ESIP Summer Meeting, Copper Mountain, CO 10

METADATA TO ISOGHRSST

thuang, JPL

GHRSST Data Processing Specification version 2 on Metadata Conventions depicting the workflow of metadata translation for both data set and granule (file) level metadata to ISO 19115-2.

Page 11: EDGE: The Multi-Metadata Standards Platform

National Aeronautics and Space Administration

Jet Propulsion LaboratoryCalifornia Institute of TechnologyPasadena, California

2014 ESIP Summer Meeting, Copper Mountain, CO 11

Example ISO Export Script

Example Python script to export dataset metadata in ISO format store them in individual XML files

thuang, JPL

#!/usr/bin/env pythonfrom xml.etree.ElementTree import parseimport urllib

url = 'http://podaac.jpl.nasa.gov/ws/search/dataset/?format=atom&'url += 'keyword=ocean'namespace = {"podaac": "http://podaac.jpl.nasa.gov/opensearch/", "opensearch": "http://a9.com/-/spec/opensearch/1.1/", "atom": "http://www.w3.org/2005/Atom"}

startIndex=0totalResults=1

while startIndex < totalResults: url = 'http://podaac.jpl.nasa.gov/ws/search/dataset/?format=atom&pretty=false&' url += 'keyword=amsr-e&startIndex=%d' % startIndex

xml = parse(urllib.urlopen(url))

totalResults = int(xml.find('{%(opensearch)s}totalResults' % namespace).text) startIndex += int(xml.find('{%(opensearch)s}itemsPerPage' % namespace).text)

items = xml.findall('{%(atom)s}entry' % namespace)

for elem in items: datasetId = elem.find('{%(podaac)s}datasetId' % namespace).text if datasetId: link = elem.find("{%(atom)s}link[@title='ISO-19115 Metadata']" % namespace).attrib['href'] filename = "%s.iso.xml" % datasetId urllib.urlretrieve(link, filename)

Page 12: EDGE: The Multi-Metadata Standards Platform

National Aeronautics and Space Administration

Jet Propulsion LaboratoryCalifornia Institute of TechnologyPasadena, California

2014 ESIP Summer Meeting, Copper Mountain, CO 12

Some of Our ISO 19115-2 Challenges

• Challenges in implementing ISO metadata on how best to separate and combine metadata to describe collections vs. granules

• Development challenges – maintenance of our internal template when error discovered

• Need more work to describe quality information in granules and datasets

• Certain ISO metadata objects require the following• Opening granule file to retrieve necessary information• MD_SpatialRepresentation needs dimension size, resolution• MD_ContentInformation, specifically, MI_CoverageDescription, needs physical

measurement variables• Input from provider and/or data engineers• DQ_DataQuality needs identification of what the quality flags are per dataset

• Collection of external information• MD_DistributionInfo, for example, needs information about remote distributors,

e.g. URL, contact person

thuang, JPL

Page 13: EDGE: The Multi-Metadata Standards Platform

National Aeronautics and Space Administration

Jet Propulsion LaboratoryCalifornia Institute of TechnologyPasadena, California

2014 ESIP Summer Meeting, Copper Mountain, CO 13

Onward

• ISO metadata quality improvements• Data description improvement for consistency• Resolving missing attributes (ISO)

• Already in progress• Use ISO metadata to describe quality information within a

granule as to which variables contain quality flags and other filtering information and what those flags mean

• Have a tool read this information and expose it to the user

• EDGE• Support ElasticSearch backend

thuang, JPL

Page 14: EDGE: The Multi-Metadata Standards Platform

National Aeronautics and Space Administration

Jet Propulsion LaboratoryCalifornia Institute of TechnologyPasadena, California

2014 ESIP Summer Meeting, Copper Mountain, CO 14

Summary

• There will always be• Metadata standards or recommendations• Different (may be better) ways to look for data

• Why PO.DAAC decides to invest in EDGE?• No need to redo the plumbing for each new metadata

standard• Portable platform to integrate with local/external data

services• Allows us to focus on the domain – metadata standard

and metadata resources

thuang, JPL

Page 15: EDGE: The Multi-Metadata Standards Platform

National Aeronautics and Space Administration

Jet Propulsion LaboratoryCalifornia Institute of TechnologyPasadena, California

2014 ESIP Summer Meeting, Copper Mountain, CO 15thuang, JPL

[email protected]@jpl.nasa.gov

THANKS